TestEngineering/Services/TokenServerAndSyncLoadTesting
< TestEngineering | Services
Jump to navigation
Jump to search
Summary for Tokenserver, Verifier and Sync 1.5
- Latest Results
- Link to loads cluster: https://loads.services.mozilla.com/
- Note: this now requires login privileges and a password
- Snapshots from StackDriver - TBD
- Snapshots from Kibana - TBD
- Link to loads cluster: https://loads.services.mozilla.com/
- Latest Deployments
- TokenServer Stage: https://bugzilla.mozilla.org/show_bug.cgi?id=1014496
- TokenServer Prod: TBD
- Sync Server Stage: https://bugzilla.mozilla.org/show_bug.cgi?id=1026346
- Sync Server Prod: https://bugzilla.mozilla.org/show_bug.cgi?id=1026346
- Verifier Stage Deploy: ttps://bugzilla.mozilla.org/show_bug.cgi?id=1026644
- Verifier Prod Deploy: https://bugzilla.mozilla.org/show_bug.cgi?id=1027392
- In Progress
- Build out of Kibana dashboards
- Ongoing testing of Tokenserver, Verifier, and Sync releases
- Bug review and issue debug - there are a lot of issues to work on (see the long list near the bottom of the wiki)
- Bugs To Verify:
- None at this time
- Planned
- Scaling for production traffic after release of Fx29
- Sync 1.5 migration work
- Operations readiness testing: See Bug 1006792
- Blockers
- none at this time
- Completed
- Pre-release load testing
- Previous load test results (short): http://loads.services.mozilla.com/
- Performance
- TBD
Quick Verification Of Stage Deployments
- This is a quick sanity test of the environment before getting started on load tests.
- TokenServer Stage environment: TBD
For now, just use the simple "make test" or "make bench" command from an install of tokenserver on the localhost or AWS instance.
- Verifier Stage environment: TBD
For now, just use the simple "make test" or "make bench" command from an install of browserid-verifier on the localhost or AWS instance.
- Sync Server Stage environment:
Install server-syncstorage to the local host or AWS instance (see below) $ cd server-syncstorage Quick test against the TokenServer $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server <Stage TokenServer> Current example: $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server https://token.stage.mozaws.net/1.0/sync/1.5 Quick tests against the Sync nodes $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py <Stage Sync Node>#<Node Secret> Current examples: $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> Get the Node Secret information from OPs
- Using TPS
- The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py
- See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS
- See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675
Load Test Tool Client/Host
- It is always best to configure an AWS instance as the host for all load testing.
- All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following link for more information: https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#Loads_Services_Cluster_Environment
Creating a RHEL AWS instance
- Pick a Region then Create Instance > Launch Instance
- Follow the prompts to create a basic, RHEL-flavored instance
- Use of the QA/Dev key pairs that have been set up for this:
- US East Key Pair: QA-Dev-Share (created by jbonacci) for general use
- US West Key Pair: QA-dev-share (created by RaFromBRC) for general use
- Once the instance is running, log in as "ec2-user"
- The following apps, tools, and libs will need to be installed for use with various Services applications:
- gcc, gcc-c++
- hg
- git
- python-devel
- automake, autoconf, and libtool (required for libzmq, for easy_install)
- pip
- virtualenv
- node/npm
- zeromq 3.X
- gmp, gmp-devel
- Also, general rhel updates:
$ sudo yum -y update and/or $ sudo yum -y upgrade
- Now, the instance should be ready for installing and using the Loads tool.
Creating an Ubuntu AWS instance
- Pick a Region then Create Instance > Launch Instance
- Follow the prompts to create a basic, Ubuntu-flavored instance
- Use of the QA/Dev key pairs that have been set up for this:
- US East Key Pair: QA-Dev-Share (created by jbonacci) for general use
- US West Key Pair: QA-dev-share (created by RaFromBRC) for general use
- Once the instance is running, log in as "ubuntu"
- The following apps, tools, and libs will need to be installed for use with various Services applications:
- gcc, g++
- mercurial
- git
- python-setuptools, python-virtualenv, and python-dev
- automake, autoconf, libtool
- m4
- node/npm
- libzmq and zeromq 3.X
- gmp-5.1.3 or newer
- Also, general rhel updates:
$ sudo apt-get update and/or $ sudo apt-get upgrade
- Now, the instance should be ready for installing and using the Loads tool.
Installing BrowserID-Verifier and the Loads tool on the AWS instance
- Installation:
$ git clone git://github.com/mozilla/browserid-verifier $ cd browserid-verifier $ npm install $ npm test $ cd loadtest $ make build Note: This should hit Stage by default: SERVER_URL=https://verifier.stage.mozaws.net
- Note: This will install a local copy of the Loads tool for use with the verifier.
Running the load test against the Verifier in Stage
- Stage environment:
$ make test or $ make test SERVER_URL=https://verifier.stage.mozaws.net $ make bench or $ make bench SERVER_URL=https://verifier.stage.mozaws.net NOTE: The URL for Stage environment will most likely change on a frequent basis. NOTE: NOTE: This also hits the Stage mockmyid server.
- And while we are at it...
- Dev environment:
$ make test SERVER_URL=TBD $ make bench SERVER_URL=TBD
- Production environment:
$ make test SERVER_URL=https://verifier.accounts.firefox.com $ make bench SERVER_URL=https://verifier.accounts.firefox.com
Using the Loads Services Cluster for the Verifier
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://verifier.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=TBD
- Production environment:
$ make megabench SERVER_URL=https://verifier.accounts.firefox.com
- REFs:
Installing TokenServer and the Loads tool on the AWS instance
- Installation:
$ git clone https://github.com/mozilla-services/tokenserver $ cd tokenserver $ make build $ make test Note: This is for local testing only $ cd loadtest $ make build Note: This should hit Prod by default: SERVER_URL=https://token.services.mozilla.com
- Note: This will install a local copy of the Loads tool for use with TokenServer.
Running the load test against TokenServer in Stage
- Stage environment:
$ make test SERVER_URL=https://token.stage.mozaws.net $ make bench SERVER_URL=https://token.stage.mozaws.net NOTE: The URL for Stage environment will most likely change on a frequent basis. NOTE: This also hits the Stage Verifier, which in turns hits the Stage mockmyid server
- And while we are at it...
- Dev environment:
$ make test SERVER_URL=https://token.dev.lcip.org $ make bench SERVER_URL=https://token.dev.lcip.org
- Production environment:
$ make test SERVER_URL=https://token.services.mozilla.com $ make bench SERVER_URL=https://token.services.mozilla.com
Using the Loads Services Cluster for TokenServer
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
- Production environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com
- REFs:
Installing Sync 1.5 and the Loads tool on the AWS instance
Installation: $ git clone https://github.com/mozilla-services/server-syncstorage/ $ cd server-syncstorage $ make build $ make test $ cd loadtest $ make build
- Note: This will install a local copy of the Loads tool for use with Sync 1.5.
Running the load test against Sync 1.5 in Stage
- Loads against specific Sync nodes in Stage
$ make test SERVER_URL=https://your.storagenode.here#SECRET $ make bench SERVER_URL=https://your.storagenode.here#SECRET Sync Stage nodes: https://sync-1-us-east-1.stage.mozaws.net https://sync-2-us-east-1.stage.mozaws.net https://sync-3-us-east-1.stage.mozaws.net NOTE: The Stage sync nodes are likely to change frequently, so verify the URLs. See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#Sync_1.5_Stage_Environment NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
Using the Loads Services Cluster for Sync 1.5 in Stage
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://your.storagenode.here#SECRET
- REFs:
Running a combined load test against TokenServer and Sync 1.5 in Stage
- A combined loads test against TokenServer and Sync 1.5 in Stage
- This is done via the server-syncstorage directory that was cloned and built above
$ cd server-syncstorage $ cd loadtest $ make test SERVER_URL=https://your.tokenserver.here $ make bench SERVER_URL=https://your.tokenserver.here Examples for Stage: $ make test SERVER_URL=https://token.stage.mozaws.net $ make bench SERVER_URL=https://token.stage.mozaws.net See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_Stage_Environment
- And while we are at it...
Dev environment: Examples: $ make test SERVER_URL=https://token.dev.lcip.org $ make bench SERVER_URL=https://token.dev.lcip.org Prod environment: Examples: $ make test SERVER_URL=https://token.services.mozilla.com $ make bench SERVER_URL=https://token.services.mozilla.com See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#FxA.2C_TokenServer.2C_and_Sync_Production_Environments and https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_and_Sync_1.5_Dev_Environments
Using the Loads Services Cluster for a combined load test in Stage
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
- Prod environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com
- REFs:
Configuring The Load Tests
- The TokenServer, Sync, and Combined load tests work with config files that can be edited to change how the load tests are run.
- For make test (TokenServer, Sync, Combined):
- Number of hits
- Number of concurrent users
- For make bench (TokenServer, Sync, Combined):
- Number of concurrent users
- Duration of test
- For make megabench (using the LoadsCluster with TokenSerer, Sync, Combined):
- Number of concurrent users
- Duration of test
- Include file (this is code dependent)
- Python dependencies (this is code dependent)
- Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- REF: https://github.com/mozilla-services/tokenserver/tree/master/loadtest/config
- and https://github.com/mozilla-services/server-syncstorage/tree/master/loadtest/config
Test Coverage and Stats
- Basic tweakable values for all load tests
- users = number of concurrent users/agent
- agents = number of agents out of the cluster, otherwise errors out
- duration = in seconds
- hits = 1 or X number of rounds/hits/iterations
- TokenServer
- File location: tokenserver/loadtest/loadtest.py
- Inside NoteAssignmentTest, test_realistic is the main load test; the others are for specific behaviors
- The test runs as following:
95% ask for assertions on existing users (on a DB filled by test_single_token_exchange) 4% ask for assertion on a new use 1% ask for a bad assertion
- A bug has been filed to get the following additional coverage for the load test:
- generation numbers in assertion
- client state string
- A bug has been filed to get some integration tests written:
- to cover the edge/error cases not in the load test
- to be pointed at a remote server
- A bug has been filed to get the following additional coverage for the load test:
- Sync
- File location: server-syncstorage/loadtest/stress.py
- This is the Sync 2.0 load test that has been back-ported for Sync 1.5.
- The stress.py file is fully configurable for the following:
- client probability
- client distribution
- collections
- A bug has been filed to add support for load testing tabs
- The tab collection it uses memcache; we need to figure out a way to test it without overloading the server
- There are currently no constants to define how to select percentages per collection type
- Right now, we need to manually configure the collections list in stress.py:
- collections = ['bookmarks', 'forms', 'passwords', 'history', 'prefs']
- Basically, you can add more entries of each type, since the load test (per user/again/hit/pass) picks randomly from the list for any given request...
Analyzing the Results
- There are several methods and tools for analyzing the load test results.
- 1. Using the Loads Services Cluster dashboard
- All loads tests using this cluster generate a live report and a run report available on this site:
- You can quickly review the following here: Status, Configuration, Results, Custom Metrics, and Errors.
- If you want more details on the dashboard, please file an issue here: https://github.com/mozilla-services/loads
Debugging the Issues
- There are several methods and tools for debugging the load test errors and other issues.
- 1. Important logs for TokenServer (per server)
- /media/ephemeral0/logs/
- /media/ephemeral0/logs/nginx/access.log
- /media/ephemeral0/logs/nginx/error.log
- /media/ephemeral0/logs/tokenserver/token.error.log
- /media/ephemeral0/logs/tokenserver/token.log.*
- /media/ephemeral0/logs/tokenserver/process_account_deletions.error.log
- /media/ephemeral0/logs/tokenserver/process_account_deletions.log
- /media/ephemeral0/squid/access.log
- /var/log/hekad/tokenserver.stdout.log
- /var/log/hekad/tokenserver.stderr.log
- 2. Important logs for Verifier (per server)
- /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
- GONE: /media/ephemeral0/heka/hekad_err.log
- GONE: /media/ephemeral0/heka/hekad_out.log
- GONE: /media/ephemeral0/nginx/logs/access.log
- GONE: /media/ephemeral0/nginx/logs/error.log
- /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
- /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
- /media/ephemeral0/nginx/logs/squid/access.log
- /var/log/hekad/fxa-browserid_verifier.stderr.log
- /var/log/hekad/fxa-browserid_verifier.stdout.log
- 3. Important error logs for Sync (per Sync node)
- /media/ephemeral0/logs/
- /media/ephemeral0/nginx/access.log
- /media/ephemeral0/error.log
- /media/ephemeral0/sync/sync.err
- /media/ephemeral0/sync/sync.log
- Acceptable TokenServer errors:
1% - 2% failures (as the following) token.log: "name": "token.assertion.invalid_signature_error" "name": "token.assertion.verify_failure" nginx access.log: 401s NOTE: Values can be tweaked here: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L58-L60
Also, it may be the case that the following errors are "acceptable" if TS Stage is larger than Verifier Stage: /media/ephemeral0/logs/tokenserver/token.error.log Verifier-related errors of these types: "HttpConnectionPool is full, discarding connection: verifier.stage.mozaws.net" "Resetting dropped connection: verifier.stage.mozaws.net" "Starting new HTTPS connection (179): verifier.stage.mozaws.net"
- Acceptable Verifier errors:
In the verifier and squid logs: References to mozilla.org and login.mozilla.org - part of the "invalid domain" tests In the verifier logs: References to https://secret.mozilla.com, which are defined in the browserid-verifier load test https://github.com/mozilla/browserid-verifier/blob/master/loadtest/loadtest.py#L77 for example
- Acceptable Sync node errors:
In the nginx access.log files: We will see some percentage of 404s. Right now we see the following: 14% 404s (compared to the total count of 200s) with the config set up as follows: users = 20 duration = 1800 agents = 5 Ideally, the overall percentage of 404s should drop the longer the load test.
Monitoring TS and Sync Stage
- Loads dashboard:
- Cluster status
- Check directly from the Loads Cluster dashboard:
Agents statuses Launch a health check on all agents
- and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
- Monitoring TS/Verifier/Sync Stage:
- Stackdriver
- Stage TS + FxA + Sync 1.5 meta-dash: https://app.stackdriver.com/groups/4388/stage-services-tag-sync15
- Kibana
- https://kibana.shared.us-east-1.stage.mozaws.net/
- https://kibana.shared.us-east-1.stage.mozaws.net/#/dashboard/file/weblogs.json
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20Web%20Logs
- https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Token%20App%20Logs%20POC
- Heka
- Stackdriver
Performance Testing Information
- TBD
Details on the Load Test tool
- The documentation can be found here:
- The repositories are here:
- The Services cluster is here:
Known Bugs, Issues, and Tasks
- Tokenserver, Verifier, and Sync
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=988095
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=982412
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=982415
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=985794
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=996870
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=956615
- Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=1001485
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=735102
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=776777
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=799727
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=982417
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=989117
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=992420
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=959034
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=996819
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=1007987
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=1016722
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=1008802
- Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=1025735
- OPs and Infrastructure
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=981933
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=982316
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=982985
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=993033
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=996199
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=998050
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=1009389
- OPs: https://bugzilla.mozilla.org/show_bug.cgi?id=1006031
- https://github.com/mozilla-services/puppet-config/issues/287
- https://github.com/mozilla-services/puppet-config/issues/292
- https://github.com/mozilla-services/puppet-config/issues/443
- https://github.com/mozilla-services/puppet-config/pull/546
- https://github.com/mozilla-services/puppet-config/pull/613
- Loads Tool and Cluster
- https://github.com/mozilla-services/loads/issues/222
- https://github.com/mozilla-services/loads/issues/234
- https://github.com/mozilla-services/loads/issues/235
- https://github.com/mozilla-services/loads/issues/251
- https://github.com/mozilla-services/loads/issues/257
- https://github.com/mozilla-services/loads/issues/259
- https://github.com/mozilla-services/loads/issues/265
- https://github.com/mozilla-services/loads/issues/266
- https://github.com/mozilla-services/loads-web/issues/24
References
- Other URLs
- Repositories
- Documentation
- The QA Test Environments:
- Deploying the FxA Load Test environment for broker/agents usage:
- Sync 1.5 protocol, documentation, etc.
- https://github.com/mozilla-services/docs
- https://docs.services.mozilla.com/#how-to
- https://docs.services.mozilla.com/howtos/run-fxa.html
- https://docs.services.mozilla.com/token/apis.html
- https://docs.services.mozilla.com/storage/apis-1.5.html
- https://docs.services.mozilla.com/howtos/run-sync-1.5.html
- https://docs.services.mozilla.com/howtos/run-sync-1.5.html
- https://github.com/mozilla-services/syncserver
- OPs pages for stats collection, logging, monitoring
- TBD