TestEngineering/Services/FxALoadTesting: Difference between revisions
< TestEngineering | Services
Jump to navigation
Jump to search
(Created page with "== Load Test Tool Client == * It is always best to configure an AWS instance as the load test tool client/host. * The actual Load Test broker and agents run in the Load Test e...") |
StuartPhilp (talk | contribs) m (StuartPhilp moved page QA/Services/FxALoadTesting to TestEngineering/Services/FxALoadTesting) |
||
| (186 intermediate revisions by one other user not shown) | |||
| Line 1: | Line 1: | ||
== | == Quick Verification Of Stage Deployments == | ||
* | * This is a quick sanity test of the environment before getting started on load tests. | ||
Install FxA-Auth-Server to a local host or an AWS instance (see below) | |||
$ cd fxa-auth-server | |||
Run the integration tests against the remote Stage server (load balancer) | |||
$ PUBLIC_URL=<FxA Stage> npm run test-remote | |||
Current example: | |||
$ PUBLIC_URL=https://api-accounts.stage.mozaws.net npm run test-remote | |||
* NOTE: Make sure to install and test from the same branch that is deployed to Stage (ie do not use Master for running the tests against Stage or Production). | |||
= | * Using TPS | ||
** The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py | |||
** See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS | |||
** See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675 | |||
== Quick Verification Of Production Deployments == | |||
* This is a quick sanity test of the environment after each new deployment. There are other verifications that can be run as well | |||
Install FxA-Auth-Server to a local host or an AWS instance (see below) | |||
$ cd fxa-auth-server | |||
Run the integration tests against the remote Stage server (load balancer) | |||
$ PUBLIC_URL=<FxA Prod> npm run test-remote | |||
Current example: | |||
$ PUBLIC_URL=https://api.accounts.firefox.com npm run test-remote | |||
* NOTE: Make sure to install and test from the same branch that is deployed to Production. | |||
=== | == Load Test Tool Client/Host == | ||
* It is always best to configure an AWS instance as the host for all load testing. | |||
* All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following links for more information: | |||
** https://wiki.mozilla.org/QA/Services/LoadsV1ClientTestHost | |||
** https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1 | |||
== Installing FxA-Auth-Server and the Loads tool on Localhost or AWS == | |||
Installation: | |||
$ git clone https://github.com/mozilla/fxa-auth-server.git | |||
$ cd ./fxa-auth-server | |||
Note: You may want to install a specific branch for testing vs defaulting to Master | |||
$ npm install | |||
$ npm test | |||
$ cd ./test/load | |||
$ make build | |||
* Note: 'npm install' may need to be run now as root. | |||
* Note: This will install a local copy of the Loads tool for use with FxA-Auth-Server. | |||
== Running the Loads tool against FxA Stage == | |||
* The basic load test can be run as follows | |||
$ make test SERVER_URL=https://api-accounts.stage.mozaws.net | |||
* The full, default load test can be run as follows | |||
$ make bench SERVER_URL=https://api-accounts.stage.mozaws.net | |||
Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost. | |||
The recommendation is to use 'make test' and 'make megabench' instead (see below)... | |||
* Configuring the bench load test - config folder: | |||
** The test.ini file (for make test) can be configured for the following: | |||
*** Number of hits | |||
*** Number of concurrent users | |||
** The bench.ini file (for make bench) can be configured for the following: | |||
*** Number of concurrent users | |||
*** Duration of test | |||
* For both tests, start with the defaults, then tweak the duration. Users and Agents are optional tweaks/changes. Also, we can configure the bench load test to run in detached mode with an appropriate loads detach and observer settings. | |||
* REF: https://github.com/mozilla/fxa-auth-server/tree/master/loadtest/config | |||
== Running the Loads tool against FxA Development or Production == | |||
* This can be done if we are comparing Stage vs. some other environment and have access to the AWS logs in Dev or Production: | |||
* Dev: | |||
$ make test SERVER_URL=https://accounts.dev.lcip.org | |||
$ make bench SERVER_URL=https://accounts.dev.lcip.org | |||
* Prod: | |||
$ make test SERVER_URL=https://api.accounts.firefox.com | |||
$ make bench SERVER_URL=https://api.accounts.firefox.com | |||
* The same optional configuration changes apply here. | |||
== Using the Loads V1 Services Cluster == | |||
* By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory. | |||
* Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench). | |||
* Testing against the Stage environment: | |||
$ make megabench SERVER_URL=https://api-accounts.stage.mozaws.net | |||
* Testing against the Dev environment: | |||
$ make megabench SERVER_URL=https://api-accounts.dev.lcip.org | |||
* Testing against the Prod enviornment: | |||
$ make megabench SERVER_URL=https://api.accounts.firefox.com | |||
* Configuring the megabench load test - config folder: | |||
** The megabench.ini file (for make megabench) can be configured for the following: | |||
*** Number of concurrent users | |||
*** Duration of test | |||
*** Include file (leave as defined for now) | |||
*** Python dependencies (leave as defined for now) | |||
*** Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster) | |||
*** Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running) | |||
*** Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost) | |||
*** Observer (this can be email or irc - the default is irc #services-dev channel) | |||
* REF: https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1 | |||
* REF: https://github.com/mozilla/fxa-auth-server/tree/master/loadtest/config | |||
== Configuring The Load Tests == | |||
* Makefile | |||
** The SERVER_URL constant can be changed. | |||
* Config files | |||
** For make test: | |||
*** Number of hits | |||
*** Number of concurrent users | |||
** For make bench: | |||
*** Number of concurrent users | |||
*** Duration of test | |||
** For make megabench: | |||
*** Number of concurrent users | |||
*** Duration of test | |||
*** Include file (this is code dependent) | |||
*** Python dependencies (this is code dependent) | |||
*** Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster) | |||
*** Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running) | |||
*** Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost) | |||
*** Observer (this can be email or irc - the default is irc #services-dev channel) | |||
* Load Test code: loadtests.py | |||
** The load test can be configured in the code - see the following lines: | |||
** https://github.com/mozilla/fxa-auth-server/blob/master/test/load/loadtests.py#L17-L39 | |||
* General REFs: | |||
** https://github.com/mozilla/fxa-auth-server/blob/master/test/load/loadtests.py | |||
== Test Coverage and Stats == | |||
* Basic tweakable values for all load tests | |||
** users = number of concurrent users/agent | |||
** agents = number of agents out of the cluster, otherwise errors out | |||
** duration = in seconds | |||
** hits = 1 or X number of rounds/hits/iterations | |||
* Location fxa-auth-server/loadtest/loadtests.py | |||
* The following items are covered in the load test | |||
** test_auth_server is the main entry point in the loadtests.py file | |||
*** account creation | |||
*** session creation | |||
** account deletion | |||
** session deletion | |||
* Integration tests | |||
** These are designed to cover the edge/error cases that are not applicable to the load test | |||
** The tests can be run against a remote server | |||
== Analyzing the Results == | |||
* TBD | |||
== Debugging the Issues == | |||
* There are several methods and tools for debugging the load test errors and other issues. | |||
* 1. Important logs for FxA-Auth-Server (per server) | |||
** /media/ephemeral0/fxa-auth-server/auth_err.log.* | |||
** /media/ephemeral0/fxa-auth-server/auth_out.log | |||
** /media/ephemeral0/heka/hekad_err.log | |||
** /media/ephemeral0/heka/hekad_out.log | |||
** /media/ephemeral0/nginx/logs/access.log | |||
** /media/ephemeral0/nginx/logs/error.log | |||
* Acceptable FxA-Auth-Server errors | |||
503s: especially of this type - /v1/certificate/sign - are usually a sign that we are overloading the hosts | |||
400s: we should never see these in the logs, especially if the "errno" value is 105. | |||
Check the fxa-auth-server/auth_err.log | |||
400s: "errno" values of 101, 102 are ok. These can be expected during a load test. | |||
ELB issues: we may see 503s and corresponding "err":"cannot enqueue work: maximum backlog exceeded (30)" | |||
messages if one or more of the hosts behind the ELB is receiving most of the load traffic. | |||
REF: https://github.com/mozilla/fxa-auth-server/issues/647 | |||
== Monitoring FxA Stage == | |||
* Loads dashboard: | |||
** http://ec2-54-212-44-143.us-west-2.compute.amazonaws.com/ | |||
** or http://loads.services.mozilla.com | |||
* Cluster status | |||
** Check directly from the Loads Cluster dashboard: | |||
Agents statuses | |||
Launch a health check on all agents | |||
* and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster | |||
* For all other monitoring, see the following section: | |||
** https://wiki.mozilla.org/QA/Services/FxATestEnvironments#Monitoring_the_Stage_Environment | |||
== Performance Testing Information == | |||
* TBD | |||
== Details on the Load Test tool == | |||
* The documentation can be found here: | |||
** https://loads.readthedocs.org/en/latest | |||
* The repositories are here: | |||
** https://github.com/mozilla-services/loads | |||
** https://github.com/mozilla-services/loads-aws | |||
** https://github.com/mozilla-services/loads-web | |||
* The Services cluster is here: | |||
** http://loads.services.mozilla.com | |||
== Known Bugs, Issues, and Tasks == | |||
* FxA | |||
** https://github.com/mozilla/fxa-auth-server/issues | |||
** https://github.com/mozilla/fxa-content-server/issues | |||
** and several others | |||
* Bugzilla | |||
** No specific category | |||
* Infrastructure | |||
** https://github.com/mozilla-services/puppet-config/issues | |||
** https://github.com/mozilla-services/svcops/issues | |||
* Loads Tool and clusters | |||
** https://github.com/mozilla-services/loads/issues | |||
** https://github.com/mozilla-services/loads-web/issues | |||
** https://github.com/mozilla-services/loads-aws/issues | |||
== Capacity Planning Stage and Production == | |||
* QA is tasked with providing some capacity requirements and constraints based on repeated load testing of the FxA-Auth-Server Stage environment. | |||
* The goal is to be able to work with OPs to develop a realistic plan for deploying and maintaining the production environment at a level expected for projected user traffic, etc. | |||
* Brainstorming the QA role: | |||
** QA needs to get some realistic numbers from the Product team. This could be as simple as traffic flow (number of users per day or per segments of the day - peaks and valleys) or more detailed: | |||
*** Traffic flow - QPS/RPS | |||
*** Average number of users per time segment | |||
*** Average and peak latency | |||
*** Error percentages and thresholds | |||
*** etc | |||
** QA gets help from OPs to learn how to measure those required numbers/values using StackDriver or other tools (or to get data from OPsView). If those numbers can not be measured then we either need to | |||
*** get a different set of data points from the Product team | |||
*** enhance the current tools to track and measure the required data | |||
** QA does repeated, scheduled, well-defined load tests in Stage while actively monitoring the results, logs, data, etc. | |||
** QA finds a stable configuration that - when scaled - would | |||
*** match the needs of Product when we release | |||
*** match the realisitc capacity planning that OPs normally does | |||
* Dependencies | |||
** Realistic traffic/user numbers from the FxA Product team | |||
** Timely training on monitoring tools from the OPs team | |||
** Regular and realistic scaling/testing of deployments to Stage by QA given our current pre-release and post-release schedules | |||
== References == | |||
* Repository: https://github.com/mozilla/fxa-auth-server | |||
* The QA Test Environments: | |||
** https://wiki.mozilla.org/QA/Services/FxATestEnvironments | |||
** https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments | |||
* Deploying the FxA Load Test environment for broker/agents usage: | |||
** https://github.com/mozilla/fxa-deployment | |||
* OPs pages for stats collection, logging, monitoring | |||
** TBD | |||
Latest revision as of 20:01, 26 August 2016
Quick Verification Of Stage Deployments
- This is a quick sanity test of the environment before getting started on load tests.
Install FxA-Auth-Server to a local host or an AWS instance (see below) $ cd fxa-auth-server Run the integration tests against the remote Stage server (load balancer) $ PUBLIC_URL=<FxA Stage> npm run test-remote Current example: $ PUBLIC_URL=https://api-accounts.stage.mozaws.net npm run test-remote
- NOTE: Make sure to install and test from the same branch that is deployed to Stage (ie do not use Master for running the tests against Stage or Production).
- Using TPS
- The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py
- See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS
- See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675
Quick Verification Of Production Deployments
- This is a quick sanity test of the environment after each new deployment. There are other verifications that can be run as well
Install FxA-Auth-Server to a local host or an AWS instance (see below) $ cd fxa-auth-server Run the integration tests against the remote Stage server (load balancer) $ PUBLIC_URL=<FxA Prod> npm run test-remote Current example: $ PUBLIC_URL=https://api.accounts.firefox.com npm run test-remote
- NOTE: Make sure to install and test from the same branch that is deployed to Production.
Load Test Tool Client/Host
- It is always best to configure an AWS instance as the host for all load testing.
- All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following links for more information:
Installing FxA-Auth-Server and the Loads tool on Localhost or AWS
Installation: $ git clone https://github.com/mozilla/fxa-auth-server.git $ cd ./fxa-auth-server Note: You may want to install a specific branch for testing vs defaulting to Master $ npm install $ npm test $ cd ./test/load $ make build
- Note: 'npm install' may need to be run now as root.
- Note: This will install a local copy of the Loads tool for use with FxA-Auth-Server.
Running the Loads tool against FxA Stage
- The basic load test can be run as follows
$ make test SERVER_URL=https://api-accounts.stage.mozaws.net
- The full, default load test can be run as follows
$ make bench SERVER_URL=https://api-accounts.stage.mozaws.net Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost. The recommendation is to use 'make test' and 'make megabench' instead (see below)...
- Configuring the bench load test - config folder:
- The test.ini file (for make test) can be configured for the following:
- Number of hits
- Number of concurrent users
- The bench.ini file (for make bench) can be configured for the following:
- Number of concurrent users
- Duration of test
- The test.ini file (for make test) can be configured for the following:
- For both tests, start with the defaults, then tweak the duration. Users and Agents are optional tweaks/changes. Also, we can configure the bench load test to run in detached mode with an appropriate loads detach and observer settings.
Running the Loads tool against FxA Development or Production
- This can be done if we are comparing Stage vs. some other environment and have access to the AWS logs in Dev or Production:
- Dev:
$ make test SERVER_URL=https://accounts.dev.lcip.org $ make bench SERVER_URL=https://accounts.dev.lcip.org
- Prod:
$ make test SERVER_URL=https://api.accounts.firefox.com $ make bench SERVER_URL=https://api.accounts.firefox.com
- The same optional configuration changes apply here.
Using the Loads V1 Services Cluster
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Testing against the Stage environment:
$ make megabench SERVER_URL=https://api-accounts.stage.mozaws.net
- Testing against the Dev environment:
$ make megabench SERVER_URL=https://api-accounts.dev.lcip.org
- Testing against the Prod enviornment:
$ make megabench SERVER_URL=https://api.accounts.firefox.com
- Configuring the megabench load test - config folder:
- The megabench.ini file (for make megabench) can be configured for the following:
- Number of concurrent users
- Duration of test
- Include file (leave as defined for now)
- Python dependencies (leave as defined for now)
- Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- The megabench.ini file (for make megabench) can be configured for the following:
- REF: https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
- REF: https://github.com/mozilla/fxa-auth-server/tree/master/loadtest/config
Configuring The Load Tests
- Makefile
- The SERVER_URL constant can be changed.
- Config files
- For make test:
- Number of hits
- Number of concurrent users
- For make test:
- For make bench:
- Number of concurrent users
- Duration of test
- For make bench:
- For make megabench:
- Number of concurrent users
- Duration of test
- Include file (this is code dependent)
- Python dependencies (this is code dependent)
- Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- For make megabench:
- Load Test code: loadtests.py
- The load test can be configured in the code - see the following lines:
- https://github.com/mozilla/fxa-auth-server/blob/master/test/load/loadtests.py#L17-L39
Test Coverage and Stats
- Basic tweakable values for all load tests
- users = number of concurrent users/agent
- agents = number of agents out of the cluster, otherwise errors out
- duration = in seconds
- hits = 1 or X number of rounds/hits/iterations
- Location fxa-auth-server/loadtest/loadtests.py
- The following items are covered in the load test
- test_auth_server is the main entry point in the loadtests.py file
- account creation
- session creation
- account deletion
- session deletion
- test_auth_server is the main entry point in the loadtests.py file
- Integration tests
- These are designed to cover the edge/error cases that are not applicable to the load test
- The tests can be run against a remote server
Analyzing the Results
- TBD
Debugging the Issues
- There are several methods and tools for debugging the load test errors and other issues.
- 1. Important logs for FxA-Auth-Server (per server)
- /media/ephemeral0/fxa-auth-server/auth_err.log.*
- /media/ephemeral0/fxa-auth-server/auth_out.log
- /media/ephemeral0/heka/hekad_err.log
- /media/ephemeral0/heka/hekad_out.log
- /media/ephemeral0/nginx/logs/access.log
- /media/ephemeral0/nginx/logs/error.log
- Acceptable FxA-Auth-Server errors
503s: especially of this type - /v1/certificate/sign - are usually a sign that we are overloading the hosts
400s: we should never see these in the logs, especially if the "errno" value is 105.
Check the fxa-auth-server/auth_err.log
400s: "errno" values of 101, 102 are ok. These can be expected during a load test.
ELB issues: we may see 503s and corresponding "err":"cannot enqueue work: maximum backlog exceeded (30)"
messages if one or more of the hosts behind the ELB is receiving most of the load traffic.
REF: https://github.com/mozilla/fxa-auth-server/issues/647
Monitoring FxA Stage
- Loads dashboard:
- Cluster status
- Check directly from the Loads Cluster dashboard:
Agents statuses Launch a health check on all agents
- and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
- For all other monitoring, see the following section:
Performance Testing Information
- TBD
Details on the Load Test tool
- The documentation can be found here:
- The repositories are here:
- The Services cluster is here:
Known Bugs, Issues, and Tasks
- FxA
- Bugzilla
- No specific category
- Infrastructure
- Loads Tool and clusters
Capacity Planning Stage and Production
- QA is tasked with providing some capacity requirements and constraints based on repeated load testing of the FxA-Auth-Server Stage environment.
- The goal is to be able to work with OPs to develop a realistic plan for deploying and maintaining the production environment at a level expected for projected user traffic, etc.
- Brainstorming the QA role:
- QA needs to get some realistic numbers from the Product team. This could be as simple as traffic flow (number of users per day or per segments of the day - peaks and valleys) or more detailed:
- Traffic flow - QPS/RPS
- Average number of users per time segment
- Average and peak latency
- Error percentages and thresholds
- etc
- QA gets help from OPs to learn how to measure those required numbers/values using StackDriver or other tools (or to get data from OPsView). If those numbers can not be measured then we either need to
- get a different set of data points from the Product team
- enhance the current tools to track and measure the required data
- QA does repeated, scheduled, well-defined load tests in Stage while actively monitoring the results, logs, data, etc.
- QA finds a stable configuration that - when scaled - would
- match the needs of Product when we release
- match the realisitc capacity planning that OPs normally does
- QA needs to get some realistic numbers from the Product team. This could be as simple as traffic flow (number of users per day or per segments of the day - peaks and valleys) or more detailed:
- Dependencies
- Realistic traffic/user numbers from the FxA Product team
- Timely training on monitoring tools from the OPs team
- Regular and realistic scaling/testing of deployments to Stage by QA given our current pre-release and post-release schedules
References
- Repository: https://github.com/mozilla/fxa-auth-server
- The QA Test Environments:
- Deploying the FxA Load Test environment for broker/agents usage:
- OPs pages for stats collection, logging, monitoring
- TBD