TestEngineering/Services/FxALoadTesting
From MozillaWiki
< TestEngineering | Services(Redirected from QA/Services/FxALoadTesting)
Contents
- 1 Quick Verification Of Stage Deployments
- 2 Quick Verification Of Production Deployments
- 3 Load Test Tool Client/Host
- 4 Installing FxA-Auth-Server and the Loads tool on Localhost or AWS
- 5 Running the Loads tool against FxA Stage
- 6 Running the Loads tool against FxA Development or Production
- 7 Using the Loads V1 Services Cluster
- 8 Configuring The Load Tests
- 9 Test Coverage and Stats
- 10 Analyzing the Results
- 11 Debugging the Issues
- 12 Monitoring FxA Stage
- 13 Performance Testing Information
- 14 Details on the Load Test tool
- 15 Known Bugs, Issues, and Tasks
- 16 Capacity Planning Stage and Production
- 17 References
Quick Verification Of Stage Deployments
- This is a quick sanity test of the environment before getting started on load tests.
Install FxA-Auth-Server to a local host or an AWS instance (see below) $ cd fxa-auth-server Run the integration tests against the remote Stage server (load balancer) $ PUBLIC_URL=<FxA Stage> npm run test-remote Current example: $ PUBLIC_URL=https://api-accounts.stage.mozaws.net npm run test-remote
- NOTE: Make sure to install and test from the same branch that is deployed to Stage (ie do not use Master for running the tests against Stage or Production).
- Using TPS
- The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py
- See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS
- See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675
Quick Verification Of Production Deployments
- This is a quick sanity test of the environment after each new deployment. There are other verifications that can be run as well
Install FxA-Auth-Server to a local host or an AWS instance (see below) $ cd fxa-auth-server Run the integration tests against the remote Stage server (load balancer) $ PUBLIC_URL=<FxA Prod> npm run test-remote Current example: $ PUBLIC_URL=https://api.accounts.firefox.com npm run test-remote
- NOTE: Make sure to install and test from the same branch that is deployed to Production.
Load Test Tool Client/Host
- It is always best to configure an AWS instance as the host for all load testing.
- All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following links for more information:
Installing FxA-Auth-Server and the Loads tool on Localhost or AWS
Installation: $ git clone https://github.com/mozilla/fxa-auth-server.git $ cd ./fxa-auth-server Note: You may want to install a specific branch for testing vs defaulting to Master $ npm install $ npm test $ cd ./test/load $ make build
- Note: 'npm install' may need to be run now as root.
- Note: This will install a local copy of the Loads tool for use with FxA-Auth-Server.
Running the Loads tool against FxA Stage
- The basic load test can be run as follows
$ make test SERVER_URL=https://api-accounts.stage.mozaws.net
- The full, default load test can be run as follows
$ make bench SERVER_URL=https://api-accounts.stage.mozaws.net Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost. The recommendation is to use 'make test' and 'make megabench' instead (see below)...
- Configuring the bench load test - config folder:
- The test.ini file (for make test) can be configured for the following:
- Number of hits
- Number of concurrent users
- The bench.ini file (for make bench) can be configured for the following:
- Number of concurrent users
- Duration of test
- The test.ini file (for make test) can be configured for the following:
- For both tests, start with the defaults, then tweak the duration. Users and Agents are optional tweaks/changes. Also, we can configure the bench load test to run in detached mode with an appropriate loads detach and observer settings.
Running the Loads tool against FxA Development or Production
- This can be done if we are comparing Stage vs. some other environment and have access to the AWS logs in Dev or Production:
- Dev:
$ make test SERVER_URL=https://accounts.dev.lcip.org $ make bench SERVER_URL=https://accounts.dev.lcip.org
- Prod:
$ make test SERVER_URL=https://api.accounts.firefox.com $ make bench SERVER_URL=https://api.accounts.firefox.com
- The same optional configuration changes apply here.
Using the Loads V1 Services Cluster
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Testing against the Stage environment:
$ make megabench SERVER_URL=https://api-accounts.stage.mozaws.net
- Testing against the Dev environment:
$ make megabench SERVER_URL=https://api-accounts.dev.lcip.org
- Testing against the Prod enviornment:
$ make megabench SERVER_URL=https://api.accounts.firefox.com
- Configuring the megabench load test - config folder:
- The megabench.ini file (for make megabench) can be configured for the following:
- Number of concurrent users
- Duration of test
- Include file (leave as defined for now)
- Python dependencies (leave as defined for now)
- Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- The megabench.ini file (for make megabench) can be configured for the following:
- REF: https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
- REF: https://github.com/mozilla/fxa-auth-server/tree/master/loadtest/config
Configuring The Load Tests
- Makefile
- The SERVER_URL constant can be changed.
- Config files
- For make test:
- Number of hits
- Number of concurrent users
- For make test:
- For make bench:
- Number of concurrent users
- Duration of test
- For make bench:
- For make megabench:
- Number of concurrent users
- Duration of test
- Include file (this is code dependent)
- Python dependencies (this is code dependent)
- Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- For make megabench:
- Load Test code: loadtests.py
- The load test can be configured in the code - see the following lines:
- https://github.com/mozilla/fxa-auth-server/blob/master/test/load/loadtests.py#L17-L39
Test Coverage and Stats
- Basic tweakable values for all load tests
- users = number of concurrent users/agent
- agents = number of agents out of the cluster, otherwise errors out
- duration = in seconds
- hits = 1 or X number of rounds/hits/iterations
- Location fxa-auth-server/loadtest/loadtests.py
- The following items are covered in the load test
- test_auth_server is the main entry point in the loadtests.py file
- account creation
- session creation
- account deletion
- session deletion
- test_auth_server is the main entry point in the loadtests.py file
- Integration tests
- These are designed to cover the edge/error cases that are not applicable to the load test
- The tests can be run against a remote server
Analyzing the Results
- TBD
Debugging the Issues
- There are several methods and tools for debugging the load test errors and other issues.
- 1. Important logs for FxA-Auth-Server (per server)
- /media/ephemeral0/fxa-auth-server/auth_err.log.*
- /media/ephemeral0/fxa-auth-server/auth_out.log
- /media/ephemeral0/heka/hekad_err.log
- /media/ephemeral0/heka/hekad_out.log
- /media/ephemeral0/nginx/logs/access.log
- /media/ephemeral0/nginx/logs/error.log
- Acceptable FxA-Auth-Server errors
503s: especially of this type - /v1/certificate/sign - are usually a sign that we are overloading the hosts 400s: we should never see these in the logs, especially if the "errno" value is 105. Check the fxa-auth-server/auth_err.log 400s: "errno" values of 101, 102 are ok. These can be expected during a load test. ELB issues: we may see 503s and corresponding "err":"cannot enqueue work: maximum backlog exceeded (30)" messages if one or more of the hosts behind the ELB is receiving most of the load traffic. REF: https://github.com/mozilla/fxa-auth-server/issues/647
Monitoring FxA Stage
- Loads dashboard:
- Cluster status
- Check directly from the Loads Cluster dashboard:
Agents statuses Launch a health check on all agents
- and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
- For all other monitoring, see the following section:
Performance Testing Information
- TBD
Details on the Load Test tool
- The documentation can be found here:
- The repositories are here:
- The Services cluster is here:
Known Bugs, Issues, and Tasks
- FxA
- Bugzilla
- No specific category
- Infrastructure
- Loads Tool and clusters
Capacity Planning Stage and Production
- QA is tasked with providing some capacity requirements and constraints based on repeated load testing of the FxA-Auth-Server Stage environment.
- The goal is to be able to work with OPs to develop a realistic plan for deploying and maintaining the production environment at a level expected for projected user traffic, etc.
- Brainstorming the QA role:
- QA needs to get some realistic numbers from the Product team. This could be as simple as traffic flow (number of users per day or per segments of the day - peaks and valleys) or more detailed:
- Traffic flow - QPS/RPS
- Average number of users per time segment
- Average and peak latency
- Error percentages and thresholds
- etc
- QA gets help from OPs to learn how to measure those required numbers/values using StackDriver or other tools (or to get data from OPsView). If those numbers can not be measured then we either need to
- get a different set of data points from the Product team
- enhance the current tools to track and measure the required data
- QA does repeated, scheduled, well-defined load tests in Stage while actively monitoring the results, logs, data, etc.
- QA finds a stable configuration that - when scaled - would
- match the needs of Product when we release
- match the realisitc capacity planning that OPs normally does
- QA needs to get some realistic numbers from the Product team. This could be as simple as traffic flow (number of users per day or per segments of the day - peaks and valleys) or more detailed:
- Dependencies
- Realistic traffic/user numbers from the FxA Product team
- Timely training on monitoring tools from the OPs team
- Regular and realistic scaling/testing of deployments to Stage by QA given our current pre-release and post-release schedules
References
- Repository: https://github.com/mozilla/fxa-auth-server
- The QA Test Environments:
- Deploying the FxA Load Test environment for broker/agents usage:
- OPs pages for stats collection, logging, monitoring
- TBD