TestEngineering/Services/TokenServerAndSyncLoadTesting
< TestEngineering | Services
Jump to navigation
Jump to search
Summary for Tokenserver, Verifier and Sync 1.5
- Latest Results
- Link to loads cluster: https://loads.services.mozilla.com/
- Note: this now requires login privileges and a password
- Snapshots from StackDriver - TBD
- Snapshots from Kibana - TBD
- Link to loads cluster: https://loads.services.mozilla.com/
- Latest Deployments
- TokenServer Stage: https://bugzilla.mozilla.org/show_bug.cgi?id=1014496
- TokenServer Prod: https://bugzilla.mozilla.org/show_bug.cgi?id=1027899
- Sync Server Stage: https://bugzilla.mozilla.org/show_bug.cgi?id=1026346
- Sync Server Prod: https://bugzilla.mozilla.org/show_bug.cgi?id=1026346
- Verifier Stage Deploy: ttps://bugzilla.mozilla.org/show_bug.cgi?id=1026644
- Verifier Prod Deploy: https://bugzilla.mozilla.org/show_bug.cgi?id=1027392
- In Progress
- Build out of Kibana dashboards
- Ongoing testing of Tokenserver, Verifier, and Sync releases
- Bug review and issue debug - there are a lot of issues to work on (see the long list near the bottom of the wiki)
- Bugs To Verify:
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=988095
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=1025767
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=1027444
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=1025735
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=735102
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=775395
- Planned
- Scaling for production traffic after release of Fx29
- Sync 1.5 migration work
- Operations readiness testing: See Bug 1006792
- Blockers
- none at this time
- Completed
- Pre-release load testing
- Previous load test results (short): http://loads.services.mozilla.com/
- Performance
- TBD
Quick Verification Of Stage Deployments
- This is a quick sanity test of the environment before getting started on load tests.
- TokenServer Stage environment:
Use the simple "make test" command from an install of tokenserver on the localhost or AWS instance. cd loadtest make test SERVER_URL=https://token.stage.mozaws.net Alternate method: Use the test tool from here: https://github.com/edmoz/fxa-sync-client Install and check all collection types for a known account in Stage: bin/sync-cli.js -e EMAIL -p PASSWORD --env stage -t COLLECTION where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
- Verifier Stage environment:
Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance. cd loadtest make test SERVER_URL=https://verifier.stage.mozaws.net
- Sync Server Stage environment:
Install server-syncstorage to the local host or AWS instance (see below) $ cd server-syncstorage Quick test against the TokenServer $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server <Stage TokenServer> Current example: $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server https://token.stage.mozaws.net/1.0/sync/1.5 Quick tests against the Sync nodes $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py <Stage Sync Node>#<Node Secret> Current examples: $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> Get the Node Secret information from OPs
- Using TPS
- The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py
- See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS
- See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675
Quick Verification Of Production Deployments
- This is a quick sanity test of the environment after a new deployment.
- Tokenserver Production Environment
Use the test tool from here: https://github.com/edmoz/fxa-sync-client Install and check all collection types for a known account in Production: bin/sync-cli.js -e PROD-EMAIL -p PASSWORD -t COLLECTION where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
- Verifier Production Environment
Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance. cd loadtest make test SERVER_URL=https://verifier.accounts.firefox.com
- Sync Server Stage environment
TBD
Load Test Tool Client/Host
- It is always best to configure an AWS instance as the host for all load testing.
- All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following link for more information: https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#Loads_Services_Cluster_Environment
Creating a RHEL AWS instance
- Pick a Region then Create Instance > Launch Instance
- Follow the prompts to create a basic, RHEL-flavored instance
- Use of the QA/Dev key pairs that have been set up for this:
- US East Key Pair: QA-Dev-Share (created by jbonacci) for general use
- US West Key Pair: QA-dev-share (created by RaFromBRC) for general use
- Once the instance is running, log in as "ec2-user"
- The following apps, tools, and libs will need to be installed for use with various Services applications:
- gcc, gcc-c++
- hg
- git
- python-devel
- automake, autoconf, and libtool (required for libzmq, for easy_install)
- pip
- virtualenv
- node/npm
- zeromq 3.X
- gmp, gmp-devel
- Also, general rhel updates:
$ sudo yum -y update and/or $ sudo yum -y upgrade
- Now, the instance should be ready for installing and using the Loads tool.
Creating an Ubuntu AWS instance
- Pick a Region then Create Instance > Launch Instance
- Follow the prompts to create a basic, Ubuntu-flavored instance
- Use of the QA/Dev key pairs that have been set up for this:
- US East Key Pair: QA-Dev-Share (created by jbonacci) for general use
- US West Key Pair: QA-dev-share (created by RaFromBRC) for general use
- Once the instance is running, log in as "ubuntu"
- The following apps, tools, and libs will need to be installed for use with various Services applications:
- gcc, g++
- mercurial
- git
- python-setuptools, python-virtualenv, and python-dev
- automake, autoconf, libtool
- m4
- node/npm
- libzmq and zeromq 3.X
- gmp-5.1.3 or newer
- Also, general rhel updates:
$ sudo apt-get update and/or $ sudo apt-get upgrade
- Now, the instance should be ready for installing and using the Loads tool.
Installing BrowserID-Verifier and the Loads tool on the AWS instance
- Installation:
$ git clone git://github.com/mozilla/browserid-verifier $ cd browserid-verifier $ npm install $ npm test $ cd loadtest $ make build Note: This should hit Stage by default: SERVER_URL=https://verifier.stage.mozaws.net
- Note: This will install a local copy of the Loads tool for use with the verifier.
Running the load test against the Verifier in Stage
- Stage environment:
$ make test or $ make test SERVER_URL=https://verifier.stage.mozaws.net $ make bench or $ make bench SERVER_URL=https://verifier.stage.mozaws.net NOTE: The URL for Stage environment will most likely change on a frequent basis. NOTE: NOTE: This also hits the Stage mockmyid server.
- Production environment:
$ make test SERVER_URL=https://verifier.accounts.firefox.com $ make bench SERVER_URL=https://verifier.accounts.firefox.com
Using the Loads Services Cluster for the Verifier
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://verifier.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=TBD
- Production environment:
$ make megabench SERVER_URL=https://verifier.accounts.firefox.com
- REFs:
Installing TokenServer and the Loads tool on the AWS instance
- Installation:
$ git clone https://github.com/mozilla-services/tokenserver $ cd tokenserver $ make build $ make test Note: This is for local testing only $ cd loadtest $ make build Note: This should hit Prod by default: SERVER_URL=https://token.services.mozilla.com
- Note: This will install a local copy of the Loads tool for use with TokenServer.
Running the load test against TokenServer in Stage
- Stage environment:
$ make test SERVER_URL=https://token.stage.mozaws.net $ make bench SERVER_URL=https://token.stage.mozaws.net NOTE: The URL for Stage environment will most likely change on a frequent basis. NOTE: This also hits the Stage Verifier, which in turns hits the Stage mockmyid server
- And while we are at it...
- Dev environment:
$ make test SERVER_URL=https://token.dev.lcip.org $ make bench SERVER_URL=https://token.dev.lcip.org
- Production environment:
$ make test SERVER_URL=https://token.services.mozilla.com $ make bench SERVER_URL=https://token.services.mozilla.com
Using the Loads Services Cluster for TokenServer
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
- Production environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com
- REFs:
Installing Sync 1.5 and the Loads tool on the AWS instance
Installation: $ git clone https://github.com/mozilla-services/server-syncstorage/ $ cd server-syncstorage $ make build $ make test $ cd loadtest $ make build
- Note: This will install a local copy of the Loads tool for use with Sync 1.5.
Running the load test against Sync 1.5 in Stage
- Loads against specific Sync nodes in Stage
$ make test SERVER_URL=https://your.storagenode.here#SECRET $ make bench SERVER_URL=https://your.storagenode.here#SECRET Sync Stage nodes: https://sync-1-us-east-1.stage.mozaws.net https://sync-2-us-east-1.stage.mozaws.net https://sync-3-us-east-1.stage.mozaws.net NOTE: The Stage sync nodes are likely to change frequently, so verify the URLs. See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#Sync_1.5_Stage_Environment NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
Using the Loads Services Cluster for Sync 1.5 in Stage
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://your.storagenode.here#SECRET
- REFs:
Running a combined load test against TokenServer and Sync 1.5 in Stage
- A combined loads test against TokenServer and Sync 1.5 in Stage
- This is done via the server-syncstorage directory that was cloned and built above
$ cd server-syncstorage $ cd loadtest $ make test SERVER_URL=https://your.tokenserver.here $ make bench SERVER_URL=https://your.tokenserver.here Examples for Stage: $ make test SERVER_URL=https://token.stage.mozaws.net $ make bench SERVER_URL=https://token.stage.mozaws.net See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_Stage_Environment
- And while we are at it...
Dev environment: Examples: $ make test SERVER_URL=https://token.dev.lcip.org $ make bench SERVER_URL=https://token.dev.lcip.org Prod environment: Examples: $ make test SERVER_URL=https://token.services.mozilla.com $ make bench SERVER_URL=https://token.services.mozilla.com See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#FxA.2C_TokenServer.2C_and_Sync_Production_Environments and https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_and_Sync_1.5_Dev_Environments
Using the Loads Services Cluster for a combined load test in Stage
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
- Prod environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com
- REFs:
Configuring The Load Tests
- Makefile
- The SERVER_URL constant can be changed.
- Config files
- For make test (BrowserID-Verifier, TokenServer, Sync, Combined):
- Number of hits
- Number of concurrent users
- For make test (BrowserID-Verifier, TokenServer, Sync, Combined):
- For make bench (BrowserID-Verifier, TokenServer, Sync, Combined):
- Number of concurrent users
- Duration of test
- For make bench (BrowserID-Verifier, TokenServer, Sync, Combined):
- For make megabench (using the LoadsCluster with BrowserID-Verifier, TokenSerer, Sync, Combined):
- Number of concurrent users
- Duration of test
- Include file (this is code dependent)
- Python dependencies (this is code dependent)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- SSH (the user account needed to SSH into the loads cluster - the default is ubuntu)
- For make megabench (using the LoadsCluster with BrowserID-Verifier, TokenSerer, Sync, Combined):
- Tokenserver load test code
- The Tokenserver load test can be configured - see the following lines:
- Basic Settings: https://github.com/mozilla-services/loop-server/blob/master/loadtests/loadtest.py
- MockMyID: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L19-L36
- Percentages: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L39-L51
- Verifier load test code
- The Verifier load test can be configured - see the following lines:
- Various settings: https://github.com/mozilla/browserid-verifier/blob/master/loadtest/loadtest.py#L13-L53
- Sync Server load test code
- The Sync Server load test can be configured - see the following lines:
- Setting MockMyID: https://github.com/mozilla-services/server-syncstorage/blob/master/loadtest/stress.py#L26-L45
- Setting test distributions: https://github.com/mozilla-services/server-syncstorage/blob/master/loadtest/stress.py#L48-L83
- REFs:
Test Coverage and Stats
- Basic tweakable values for all load tests
- users = number of concurrent users/agent
- agents = number of agents out of the cluster, otherwise errors out
- duration = in seconds
- hits = 1 or X number of rounds/hits/iterations
- TokenServer
- File location: tokenserver/loadtest/loadtest.py
- Inside NoteAssignmentTest, test_realistic is the main load test; the others are for specific behaviors
- The test runs as following:
95% ask for assertions on existing users (on a DB filled by test_single_token_exchange) 4% ask for assertion on a new use 1% ask for a bad assertion
- A bug has been filed to get the following additional coverage for the load test:
- generation numbers in assertion
- client state string
- A bug has been filed to get some integration tests written:
- to cover the edge/error cases not in the load test
- to be pointed at a remote server
- A bug has been filed to get the following additional coverage for the load test:
- Sync
- File location: server-syncstorage/loadtest/stress.py
- This is the Sync 2.0 load test that has been back-ported for Sync 1.5.
- The stress.py file is fully configurable for the following:
- client probability
- client distribution
- collections
- A bug has been filed to add support for load testing tabs
- The tab collection it uses memcache; we need to figure out a way to test it without overloading the server
- There are currently no constants to define how to select percentages per collection type
- Right now, we need to manually configure the collections list in stress.py:
- collections = ['bookmarks', 'forms', 'passwords', 'history', 'prefs']
- Basically, you can add more entries of each type, since the load test (per user/again/hit/pass) picks randomly from the list for any given request...
Analyzing the Results
- There are several methods and tools for analyzing the load test results.
- 1. Using the Loads Services Cluster dashboard
- All loads tests using this cluster generate a live report and a run report available on this site:
- You can quickly review the following here: Status, Configuration, Results, Custom Metrics, and Errors.
- Tokenserver Custom Metrics
- addFailure
- Verifier Custom Metrics
- addFailure
- Sync Custom Metrics
- addFailure
- NOTE: If you want more details on the dashboard, please file an issue here: https://github.com/mozilla-services/loads
Debugging the Issues
- There are several methods and tools for debugging the load test errors and other issues.
- 1. Important logs for TokenServer (per server)
- /media/ephemeral0/logs/
- /media/ephemeral0/nginx/logs/default.access.log
- /media/ephemeral0/nginx/logs/default.error.log
- /media/ephemeral0/nginx/logs/tokenserver.access.log
- /media/ephemeral0/nginx/logs/tokenserver.error.log
- /media/ephemeral0/logs/tokenserver/token.error.log
- /media/ephemeral0/logs/tokenserver/token.log.*
- /media/ephemeral0/logs/tokenserver/process_account_deletions.error.log
- /media/ephemeral0/logs/tokenserver/process_account_deletions.log
- /media/ephemeral0/logs/tokenserver/purge_old_records.log
- /media/ephemeral0/logs/tokenserver/purge_old_records.error.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
- /var/log/circus.log
- /var/log/hekad/tokenserver.stdout.log
- /var/log/hekad/tokenserver.stderr.log
- 2. Important logs for Verifier (per server)
- /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
- /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
- /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
- /media/ephemeral0/nginx/logs/default.access.log (not in use)
- /media/ephemeral0/nginx/logs/default.error.log (not in use)
- /media/ephemeral0/squid/access.log
- /var/log/circus.log
- /var/log/hekad/fxa-browserid_verifier.stderr.log
- /var/log/hekad/fxa-browserid_verifier.stdout.log
- 3. Important error logs for Sync (per Sync node)
- /media/ephemeral0/logs/
- /media/ephemeral0/nginx/access.log
- /media/ephemeral0/error.log
- /media/ephemeral0/sync/sync.err
- /media/ephemeral0/sync/sync.log
- Acceptable TokenServer errors:
1% - 2% failures (as the following) token.log: "name": "token.assertion.invalid_signature_error" "name": "token.assertion.verify_failure" nginx access.log: 401s NOTE: Values can be tweaked here: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L58-L60
Also, it may be the case that the following errors are "acceptable" if TS Stage is larger than Verifier Stage: /media/ephemeral0/logs/tokenserver/token.error.log Verifier-related errors of these types: "HttpConnectionPool is full, discarding connection: verifier.stage.mozaws.net" "Resetting dropped connection: verifier.stage.mozaws.net" "Starting new HTTPS connection (179): verifier.stage.mozaws.net"
- Acceptable Verifier errors:
In the verifier and squid logs: References to mozilla.org and login.mozilla.org - part of the "invalid domain" tests In the verifier logs: References to https://secret.mozilla.com, which are defined in the browserid-verifier load test https://github.com/mozilla/browserid-verifier/blob/master/loadtest/loadtest.py#L77 for example Depending on the number of users and agents, there will be 503s in the nginx and app logs. For a single instance deploy to Stage, a good starting configuration is 20 users, 1 agent. See also https://github.com/mozilla/browserid-verifier/issues/58
- Acceptable Sync node errors:
In the nginx access.log files: We will see some percentage of 404s. Right now we see the following: 14% 404s (compared to the total count of 200s) with the config set up as follows: users = 20 duration = 1800 agents = 5 Ideally, the overall percentage of 404s should drop the longer the load test.
Monitoring TS and Sync Stage
- Loads dashboard:
- Cluster status
- Check directly from the Loads Cluster dashboard:
Agents statuses Launch a health check on all agents
- and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
- Monitoring TS/Verifier/Sync Stage:
- Stackdriver
- Stage TS + FxA + Sync 1.5 meta-dash: https://app.stackdriver.com/groups/4388/stage-services-tag-sync15
- Kibana
- https://kibana.shared.us-east-1.stage.mozaws.net/
- https://kibana.shared.us-east-1.stage.mozaws.net/#/dashboard/file/weblogs.json
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20Web%20Logs
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Token%20App%20Logs%20POC
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/file/sync_http_status.json
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20Nginx%20Errors
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20App%20Logs
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/file/sync_mysql_slow_queries.json
- Stackdriver
- Heka
- https://heka.shared.us-east-1.stage.mozaws.net/
- https://heka.shared.us-east-1.stage.mozaws.net/#health
- https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes
- https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-SlowQueries/outputs/Sync-1_5-SlowQueries.Statistics.cbuf
- https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-HTTPStatus/outputs/Sync-1_5-HTTPStatus.HTTPStatus.cbuf
- https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-ResponseTime/outputs/Sync-1_5-ResponseTime.storagehistory.cbuf
- https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-ResponseTime/outputs/Sync-1_5-ResponseTime.storagebookmarks.cbuf
- https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-ResponseTime/outputs/Sync-1_5-ResponseTime.storageforms.cbuf
- Heka
Performance Testing Information
- TBD
Details on the Load Test tool
- The documentation can be found here:
- The repositories are here:
- The Services cluster is here:
Known Bugs, Issues, and Tasks
- Tokenserver, Verifier, and Sync
- Tokenserver:
- BrowserID-Verifier:
- Repo: https://github.com/mozilla/browserid-verifier/issues
- Bugzilla: no specific cateogory
- Sync:
- Repo: https://github.com/mozilla-services/server-syncstorage/issues
- Bugzilla: http://mzl.la/VUrYQ5
- OPs and Infrastructure
- Loads Tool and Cluster
References
- Other URLs
- Repositories
- Documentation
- The QA Test Environments:
- Deploying the FxA Load Test environment for broker/agents usage:
- Sync 1.5 protocol, documentation, etc.
- https://github.com/mozilla-services/docs
- https://docs.services.mozilla.com/#how-to
- https://docs.services.mozilla.com/howtos/run-fxa.html
- https://docs.services.mozilla.com/token/apis.html
- https://docs.services.mozilla.com/storage/apis-1.5.html
- https://docs.services.mozilla.com/howtos/run-sync-1.5.html
- https://docs.services.mozilla.com/howtos/run-sync-1.5.html
- https://github.com/mozilla-services/syncserver
- OPs pages for stats collection, logging, monitoring
- TBD