TestEngineering/Services/TokenServerAndSyncLoadTesting: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
 
(13 intermediate revisions by 3 users not shown)
Line 3: Line 3:
** A Tokenserver+Verifier stack: See the TokenServer sections below...
** A Tokenserver+Verifier stack: See the TokenServer sections below...


== Summary for Tokenserver, Verifier, and Sync 1.5 ==
* Latest Results
** Link to loads cluster: https://loads.services.mozilla.com/
*** Note: this now requires login privileges and a password
** Snapshots from StackDriver - TBD
** Snapshots from Kibana - TBD
* Latest Deployments
** TokenServer Stage: https://bugzilla.mozilla.org/show_bug.cgi?id=1014496
** TokenServer Prod: https://bugzilla.mozilla.org/show_bug.cgi?id=1027899
** Sync Server Stage: https://bugzilla.mozilla.org/show_bug.cgi?id=1026346
** Sync Server Prod: https://bugzilla.mozilla.org/show_bug.cgi?id=1026346
** Verifier Stage Deploy: ttps://bugzilla.mozilla.org/show_bug.cgi?id=1026644
** Verifier Prod Deploy: https://bugzilla.mozilla.org/show_bug.cgi?id=1027392
* In Progress
** Build out of Kibana dashboards
** Ongoing testing of Tokenserver, Verifier, and Sync releases
** Bug review and issue debug - there are a lot of issues to work on (see the long list near the bottom of the wiki)
* Bugs To Verify:
** Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=988095
** Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=1025767
** Tokenserver: https://bugzilla.mozilla.org/show_bug.cgi?id=1027444
** Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=1025735
** Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=735102
** Sync: https://bugzilla.mozilla.org/show_bug.cgi?id=775395
* Planned
** Scaling for production traffic after release of Fx29
** Sync 1.5 migration work
** Operations readiness testing: See Bug 1006792
* Blockers
** none at this time
* Completed
** Pre-release load testing
** Previous load test results (short): http://loads.services.mozilla.com/
* Performance
** TBD


== Quick Verification Of Stage Deployments ==
== Quick Verification Of Stage Deployments ==
Line 127: Line 91:
  $ git clone git://github.com/mozilla/browserid-verifier
  $ git clone git://github.com/mozilla/browserid-verifier
  $ cd browserid-verifier
  $ cd browserid-verifier
Note: You may want to install a specific branch for testing vs defaulting to Master
  $ npm install
  $ npm install
  $ npm test
  $ npm test
Line 172: Line 137:
  $ git clone https://github.com/mozilla-services/tokenserver
  $ git clone https://github.com/mozilla-services/tokenserver
  $ cd tokenserver
  $ cd tokenserver
Note: You may want to install a specific branch for testing vs defaulting to Master
  $ make build
  $ make build
  $ make test
  $ make test
Line 215: Line 181:
** https://github.com/mozilla-services/tokenserver/tree/master/loadtest
** https://github.com/mozilla-services/tokenserver/tree/master/loadtest


== Installing Sync 1.5 and the Loads tool on Localhost or AWS ==
== Installing Sync and load testing on Localhost or AWS ==
  Installation:
  Installation:
  $ git clone https://github.com/mozilla-services/server-syncstorage/
  $ git clone https://github.com/mozilla-services/syncstorage-loadtest/
  $ cd server-syncstorage
  $ cd syncstorage-loadtest
$ make build
  Note: You may want to install a specific branch for testing vs defaulting to Master
$ make test
$ pip install -r requirements.txt
$ cd loadtest
  $ make build
 
* Note: This will install a local copy of the Loads tool for use with Sync 1.5.


== Running the load test against Sync 1.5 in Stage ==
== Running the load test against Sync 1.5 in Stage ==
* Loads against specific Sync nodes in Stage
* Loads against specific Sync nodes in Stage
  $ make test SERVER_URL=https://your.storagenode.here#SECRET
  $ export SERVER_URL=https://your.storagenode.here#SECRET
$ make bench SERVER_URL=https://your.storagenode.here#SECRET
  Sync Stage nodes:
  Sync Stage nodes:
     https://sync-1-us-east-1.stage.mozaws.net
     https://sync-1-us-east-1.stage.mozaws.net
     https://sync-2-us-east-1.stage.mozaws.net
     https://sync-2-us-east-1.stage.mozaws.net
    https://sync-3-us-east-1.stage.mozaws.net
     ...etc...
     ...etc...
Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.   
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
   
   
  NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
  NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
* Load testing with Molotov: https://molotov.readthedocs.io/en/stable/
$ bin/molotov [commands] loadtest.py


== Using the Loads V1 Services Cluster for Sync 1.5 in Stage ==
== Using the Loads V1 Services Cluster for Sync 1.5 in Stage ==
* By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
* loadtesting from server-syncstorage has been deprecated, please refer to mozilla-services/syncstorage-loadtest
* Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
* Stage environment:
$ make megabench SERVER_URL=https://your.storagenode.here#SECRET
 
* REFs:
** https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#Loads_Services_Cluster_Environment
** https://github.com/mozilla-services/server-syncstorage/tree/master/loadtest
** https://github.com/mozilla-services/server-syncstorage/tree/master/loadtest


Line 262: Line 216:
  $ make test SERVER_URL=https://token.stage.mozaws.net
  $ make test SERVER_URL=https://token.stage.mozaws.net
  $ make bench SERVER_URL=https://token.stage.mozaws.net
  $ make bench SERVER_URL=https://token.stage.mozaws.net
  See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_Stage_Environment
  See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer.2BVerifier_Stage_Environment
   
   
  Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.     
  Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.     
Line 295: Line 249:


* REFs:
* REFs:
** https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#Loads_Services_Cluster_Environment
** https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
** https://github.com/mozilla-services/server-syncstorage/tree/master/loadtest
** https://github.com/mozilla-services/server-syncstorage/tree/master/loadtest


Line 455: Line 409:
     File \"/data/tokenserver/local/lib/python2.6/site-packages/gunicorn/workers/async.py\"...
     File \"/data/tokenserver/local/lib/python2.6/site-packages/gunicorn/workers/async.py\"...
     ..."Connection pool is full, discarding connection: 127.0.0.1", "...
     ..."Connection pool is full, discarding connection: 127.0.0.1", "...
Also, any 499s are probably an artifact of the current (V1) load test.
  REF:
  REF:
  https://bugzilla.mozilla.org/show_bug.cgi?id=1040396
  https://bugzilla.mozilla.org/show_bug.cgi?id=1040396
Line 472: Line 427:
  result: 'failure',\n  reason: 'algorithms do not match'
  result: 'failure',\n  reason: 'algorithms do not match'
  result: 'failure',\n  reason: 'audience mismatch: scheme mismatch'
  result: 'failure',\n  reason: 'audience mismatch: scheme mismatch'
Also, any 499s in the nginx logs are probably an artifact of the current (V1) load test.


* Acceptable Sync node errors:
* Acceptable Sync node errors:
Line 484: Line 440:
  Usually, you will not see 304s, 400s, 412s, or 415s for a load test,
  Usually, you will not see 304s, 400s, 412s, or 415s for a load test,
  although they may show up in the logs after running the remote integration tests.
  although they may show up in the logs after running the remote integration tests.
Also, any 499s are probably an artifact of the current (V1) load test.
   
   
  In /var/log/hekad/sync_1_5.stderr.log
  In /var/log/hekad/sync_1_5.stderr.log
Line 505: Line 462:
* and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
* and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster


* Monitoring TS/Verifier/Sync Stage:
* For all other monitoring, see the following section:
** Stackdriver
** https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#Monitoring_the_Stage_Environment
*** Stage TS + FxA + Sync 1.5 meta-dash: https://app.stackdriver.com/groups/4388/stage-services-tag-sync15
*** Per instance: https://app.stackdriver.com/instances/<INSTANCE>
** Kibana
*** https://kibana.shared.us-east-1.stage.mozaws.net/
*** https://kibana.shared.us-east-1.stage.mozaws.net/#/dashboard/file/weblogs.json
*** https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20Web%20Logs
*** https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Token%20App%20Logs%20POC
*** https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/file/sync_http_status.json
*** https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20Nginx%20Errors
*** https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/elasticsearch/Sync%20App%20Logs
*** https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/file/sync_mysql_slow_queries.json
 
** Heka
*** https://heka.shared.us-east-1.stage.mozaws.net/
*** https://heka.shared.us-east-1.stage.mozaws.net/#health
*** https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes
*** https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-SlowQueries/outputs/Sync-1_5-SlowQueries.Statistics.cbuf
*** https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-HTTPStatus/outputs/Sync-1_5-HTTPStatus.HTTPStatus.cbuf
*** https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-ResponseTime/outputs/Sync-1_5-ResponseTime.storagehistory.cbuf
*** https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-ResponseTime/outputs/Sync-1_5-ResponseTime.storagebookmarks.cbuf
*** https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/Sync-1_5-ResponseTime/outputs/Sync-1_5-ResponseTime.storageforms.cbuf
 
** REF: https://mana.mozilla.org/wiki/display/SVCOPS/Sync+1.5+Dash+Boards


== Performance Testing Information ==
== Performance Testing Information ==
Line 545: Line 479:


== Known Bugs, Issues, and Tasks ==
== Known Bugs, Issues, and Tasks ==
* Tokenserver, Verifier, and Sync
* Tokenserver:  
** Meta: https://bugzilla.mozilla.org/show_bug.cgi?id=907479
** Repo: https://github.com/mozilla-services/tokenserver/issues  
** Meta: https://bugzilla.mozilla.org/show_bug.cgi?id=1008066
** Bugzilla: http://mzl.la/1s4qZKn
** Meta: https://bugzilla.mozilla.org/show_bug.cgi?id=1014411
 
** Tokenserver:
*** Repo: https://github.com/mozilla-services/tokenserver/issues  
*** Bugzilla: http://mzl.la/1s4qZKn


** BrowserID-Verifier:
* BrowserID-Verifier:
** Repo: https://github.com/mozilla/browserid-verifier/issues
** Repo: https://github.com/mozilla/browserid-verifier/issues
** Bugzilla: no specific cateogory
** Bugzilla: no specific cateogory


** Sync:
* Sync:
** Repo: https://github.com/mozilla-services/server-syncstorage/issues
** Repo: https://github.com/mozilla-services/server-syncstorage/issues
** Bugzilla: http://mzl.la/VUrYQ5
** Bugzilla: http://mzl.la/VUrYQ5

Latest revision as of 14:32, 1 March 2019

  • NOTE: We currently have two Verifier stacks in Stage (and probably Production):
    • The standalone Browser_ID Verifier stack: See that Verifier sections below...
    • A Tokenserver+Verifier stack: See the TokenServer sections below...


Quick Verification Of Stage Deployments

  • This is a quick sanity test of the environment before getting started on load tests.
  • TokenServer+Verifier Stage environment:
From the browser: https://token.stage.mozaws.net
curl https://token.stage.mozaws.net
curl -I https://token.stage.mozaws.net

Use the simple "make test" command from an install of tokenserver on the localhost or AWS instance.
cd loadtest
make test SERVER_URL=https://token.stage.mozaws.net

Alternate method:
Use the test tool from here: https://github.com/edmoz/fxa-sync-client
Install and check all collection types for a known account in Stage:
bin/sync-cli.js -e EMAIL -p PASSWORD --env stage -t COLLECTION
    where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
  • Verifier Stage environment:
In the browser: https://verifier.stage.mozaws.net/
curl https://verifier.stage.mozaws.net
curl -I https://verifier.stage.mozaws.net

Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance.
cd loadtest
make test SERVER_URL=https://verifier.stage.mozaws.net
  • Sync Server Stage environment:
Install server-syncstorage to the local host or AWS instance (see below)
$ cd server-syncstorage
Quick test against the TokenServer
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server <Stage TokenServer>
Current example:
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server 
    https://token.stage.mozaws.net/1.0/sync/1.5
Quick tests against the Sync nodes
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py <Stage Sync Node>#<Node Secret>
Current examples:
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py 
    https://sync-1-us-east-1.stage.mozaws.net#<Node Secret>
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py 
    https://sync-1-us-east-1.stage.mozaws.net#<Node Secret>
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py 
    https://sync-1-us-east-1.stage.mozaws.net#<Node Secret>
Get the Node Secret information from OPs

Quick Verification Of Production Deployments

  • This is a quick sanity test of the environment after a new deployment.
  • Tokenserver+Verifier Production Environment
In the browser: https://token.services.mozilla.com
curl https://token.services.mozilla.com
curl -I https://token.services.mozilla.com
Then:
Use the test tool from here: https://github.com/edmoz/fxa-sync-client
Install and check all collection types for a known account in Production:
bin/sync-cli.js -e PROD-EMAIL -p PASSWORD -t COLLECTION
    where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
  • Verifier Production Environment
In the browser: https://verifier.accounts.firefox.com
curl https://verifier.accounts.firefox.com
curl -I https://verifier.accounts.firefox.com
Then:
Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance.
cd loadtest
make test SERVER_URL=https://verifier.accounts.firefox.com
  • Sync Server Stage environment
Sign in with a known FxA account and sync data with a current Production account (sync node).
Create a new FxA account and set up sync.

Load Test Tool Client/Host

Installing BrowserID-Verifier and the Loads tool on Localhost or AWS

  • Installation:
$ git clone git://github.com/mozilla/browserid-verifier
$ cd browserid-verifier
Note: You may want to install a specific branch for testing vs defaulting to Master
$ npm install
$ npm test
$ cd loadtest
$ make build
     Note: This should hit Stage by default: SERVER_URL=https://verifier.stage.mozaws.net
  • Note: This will install a local copy of the Loads tool for use with the verifier.

Running the load test against the Verifier in Stage

  • Stage environment:
$ make test
or
$ make test SERVER_URL=https://verifier.stage.mozaws.net
$ make bench
or
$ make bench SERVER_URL=https://verifier.stage.mozaws.net	

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
Note: The Stage Verifier hits the Stage mockmyid server
  • Production environment:
$ make test SERVER_URL=https://verifier.accounts.firefox.com
$ make bench SERVER_URL=https://verifier.accounts.firefox.com

Using the Loads V1 Services Cluster for the Verifier

  • By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
  • Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
  • Stage environment:
$ make megabench SERVER_URL=https://verifier.stage.mozaws.net
  • Dev environment:
$ make megabench SERVER_URL=TBD
  • Production environment:
$ make megabench SERVER_URL=https://verifier.accounts.firefox.com

Installing TokenServer+Verifier and the Loads tool on Localhost or AWS

  • Installation:
$ git clone https://github.com/mozilla-services/tokenserver
$ cd tokenserver
Note: You may want to install a specific branch for testing vs defaulting to Master
$ make build
$ make test
    Note: This is for local testing only
$ cd loadtest
$ make build
    Note: This should hit Prod by default: SERVER_URL=https://token.services.mozilla.com
  • Note: This will install a local copy of the Loads tool for use with TokenServer+Verifier.

Running the load test against TokenServer+Verifier in Stage

  • Stage environment:
$ make test SERVER_URL=https://token.stage.mozaws.net
$ make bench SERVER_URL=https://token.stage.mozaws.net		

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
Note: This also hits the Stage Verifier, which in turns hits the Stage mockmyid server
  • And while we are at it...
  • Dev environment:
$ make test SERVER_URL=https://token.dev.lcip.org
$ make bench SERVER_URL=https://token.dev.lcip.org
  • Production environment:
$ make test SERVER_URL=https://token.services.mozilla.com
$ make bench SERVER_URL=https://token.services.mozilla.com

Using the Loads V1 Services Cluster for TokenServer+Verifier

  • By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
  • Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
  • Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
  • Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
  • Production environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com

Installing Sync and load testing on Localhost or AWS

Installation:
$ git clone https://github.com/mozilla-services/syncstorage-loadtest/
$ cd syncstorage-loadtest
Note: You may want to install a specific branch for testing vs defaulting to Master
$ pip install -r requirements.txt

Running the load test against Sync 1.5 in Stage

  • Loads against specific Sync nodes in Stage
$ export SERVER_URL=https://your.storagenode.here#SECRET
Sync Stage nodes:
    https://sync-1-us-east-1.stage.mozaws.net
    https://sync-2-us-east-1.stage.mozaws.net
    ...etc...

NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
$ bin/molotov [commands] loadtest.py

Using the Loads V1 Services Cluster for Sync 1.5 in Stage

Running a combined load test against TokenServer+Verifier and Sync 1.5 in Stage

  • A combined loads test against TokenServer and Sync 1.5 in Stage
  • This is done via the server-syncstorage directory that was cloned and built above
$ cd server-syncstorage
$ cd loadtest
$ make test SERVER_URL=https://your.tokenserver.here
$ make bench SERVER_URL=https://your.tokenserver.here

Examples for Stage:
$ make test SERVER_URL=https://token.stage.mozaws.net
$ make bench SERVER_URL=https://token.stage.mozaws.net
See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer.2BVerifier_Stage_Environment

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
Note: The Stage Tokenserver hits the Stage Verifier, which, in turn, hits the mockmyid server.
  • And while we are at it...
Dev environment:
Examples:
$ make test SERVER_URL=https://token.dev.lcip.org
$ make bench SERVER_URL=https://token.dev.lcip.org

Prod environment:
Examples:
$ make test SERVER_URL=https://token.services.mozilla.com
$ make bench SERVER_URL=https://token.services.mozilla.com

See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#FxA.2C_TokenServer.2C_and_Sync_Production_Environments
and https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_and_Sync_1.5_Dev_Environments

Using the Loads V1 Services Cluster for a combined load test in Stage

  • By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
  • Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
  • Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
  • Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
  • Prod environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com

Configuring The Load Tests

  • Makefile
    • The SERVER_URL constant can be changed.
  • Config files
    • For make test (BrowserID-Verifier, TokenServer, Sync, Combined):
      • Number of hits
      • Number of concurrent users
    • For make bench (BrowserID-Verifier, TokenServer, Sync, Combined):
      • Number of concurrent users
      • Duration of test
    • For make megabench (using the LoadsCluster with BrowserID-Verifier, TokenSerer, Sync, Combined):
      • Number of concurrent users
      • Duration of test
      • Include file (this is code dependent)
      • Python dependencies (this is code dependent)
      • Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
      • Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
      • Observer (this can be email or irc - the default is irc #services-dev channel)
      • SSH (the user account needed to SSH into the loads cluster - the default is ubuntu)

Test Coverage and Stats

  • Basic tweakable values for all load tests
    • users = number of concurrent users/agent
    • agents = number of agents out of the cluster, otherwise errors out
    • duration = in seconds
    • hits = 1 or X number of rounds/hits/iterations
  • TokenServer
    • File location: tokenserver/loadtest/loadtest.py
    • Inside NoteAssignmentTest, test_realistic is the main load test; the others are for specific behaviors
    • The test runs as following:
95% ask for assertions on existing users (on a DB filled by test_single_token_exchange)
4% ask for assertion on a new use
1% ask for a bad assertion
    • A bug has been filed to get the following additional coverage for the load test:
      • generation numbers in assertion
      • client state string
    • A bug has been filed to get some integration tests written:
      • to cover the edge/error cases not in the load test
      • to be pointed at a remote server
  • Sync
    • File location: server-syncstorage/loadtest/stress.py
    • This is the Sync 2.0 load test that has been back-ported for Sync 1.5.
    • The stress.py file is fully configurable for the following:
      • client probability
      • client distribution
      • collections
    • A bug has been filed to add support for load testing tabs
      • The tab collection it uses memcache; we need to figure out a way to test it without overloading the server
    • There are currently no constants to define how to select percentages per collection type
    • Right now, we need to manually configure the collections list in stress.py:
      • collections = ['bookmarks', 'forms', 'passwords', 'history', 'prefs']
      • Basically, you can add more entries of each type, since the load test (per user/again/hit/pass) picks randomly from the list for any given request...

Analyzing the Results

  • There are several methods and tools for analyzing the load test results.
  • Tokenserver Custom Metrics
    • addFailure
  • Verifier Custom Metrics
    • addFailure
  • Sync Custom Metrics
    • addFailure

Debugging the Issues

  • There are several methods and tools for debugging the load test errors and other issues.
  • 1. Important logs for TokenServer (per server)
    • /media/ephemeral0/logs/
    • /media/ephemeral0/nginx/logs/default.access.log
    • /media/ephemeral0/nginx/logs/default.error.log
    • /media/ephemeral0/nginx/logs/tokenserver.access.log
    • /media/ephemeral0/nginx/logs/tokenserver.error.log
    • /media/ephemeral0/logs/tokenserver/token.error.log
    • /media/ephemeral0/logs/tokenserver/token.log.*
    • /media/ephemeral0/logs/tokenserver/process_account_deletions.error.log
    • /media/ephemeral0/logs/tokenserver/process_account_deletions.log
    • /media/ephemeral0/logs/tokenserver/purge_old_records.log
    • /media/ephemeral0/logs/tokenserver/purge_old_records.error.log
    • /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
    • /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
    • /var/log/circus.log
    • /var/log/hekad/tokenserver.stdout.log
    • /var/log/hekad/tokenserver.stderr.log
  • 2. Important logs for Verifier (per server)
    • /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
    • /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
    • /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
    • /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
    • /media/ephemeral0/nginx/logs/default.access.log (not in use)
    • /media/ephemeral0/nginx/logs/default.error.log (not in use)
    • /media/ephemeral0/squid/access.log
    • /var/log/circus.log
    • /var/log/hekad/fxa-browserid_verifier.stderr.log
    • /var/log/hekad/fxa-browserid_verifier.stdout.log
  • 3. Important error logs for Sync (per Sync node)
    • /media/ephemeral0/logs/
    • /media/ephemeral0/nginx/access.log
    • /media/ephemeral0/error.log
    • /media/ephemeral0/sync/sync.err
    • /media/ephemeral0/sync/sync.log


  • Acceptable TokenServer errors:
1% - 2% failures (as the following)
token.log:
"name": "token.assertion.invalid_signature_error"
"name": "token.assertion.verify_failure"
nginx access.log:
401s
NOTE: Values can be tweaked here:
    https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L58-L60

The following types of errors are known:
/media/ephemeral0/logs/tokenserver/token.error.log
    Exception KeyError: KeyError(49564400,) in <module 'threading'...
/media/ephemeral0/logs/tokenserver/token.log
    ..."Starting new HTTP connection (9): 127.0.0.1", "hostname": ...
    {"error": "StopIteration()", "traceback": "Uncaught exception:\n  
    File \"/data/tokenserver/local/lib/python2.6/site-packages/gunicorn/workers/async.py\"...
    ..."Connection pool is full, discarding connection: 127.0.0.1", "...
Also, any 499s are probably an artifact of the current (V1) load test.
REF:
https://bugzilla.mozilla.org/show_bug.cgi?id=1040396
https://bugzilla.mozilla.org/show_bug.cgi?id=1040397

OLD: Also, it may be the case that the following errors are "acceptable" if TS Stage is larger than Verifier Stage:
/media/ephemeral0/logs/tokenserver/token.error.log
Verifier-related errors of these types:
"HttpConnectionPool is full, discarding connection: verifier.stage.mozaws.net"
"Resetting dropped connection: verifier.stage.mozaws.net"
"Starting new HTTPS connection (179): verifier.stage.mozaws.net"
  • Acceptable Verifier errors:
The verifier_out.log will show errors of the following types:
result: 'failure',\n  reason: 'untrusted issuer...'
result: 'failure',\n  reason: 'expired'
result: 'failure',\n  reason: 'algorithms do not match'
result: 'failure',\n  reason: 'audience mismatch: scheme mismatch'
Also, any 499s in the nginx logs are probably an artifact of the current (V1) load test.
  • Acceptable Sync node errors:
In the nginx access.log files:
We will see some percentage of 404s. Right now we see the following:
    14% 404s (compared to the total count of 200s)
    with the config set up as follows:
         users = 20
         duration = 1800
         agents = 5
Ideally, the overall percentage of 404s should drop the longer the load test.
Usually, you will not see 304s, 400s, 412s, or 415s for a load test,
although they may show up in the logs after running the remote integration tests.
Also, any 499s are probably an artifact of the current (V1) load test.

In /var/log/hekad/sync_1_5.stderr.log
You may see some Decoder 'Sync-1_5-SlowQuery-MySqlSlowQueryDecoder' error: Failed parsing
and a lot of BSO INSERTs

In /media/ephemeral0/logs/sync/sync.err
You should see expected skew and QueuePool messages and Deprecation warnings
Also, these are known
Exception SystemExit
Exception KeyError
This is probably https://bugzilla.mozilla.org/show_bug.cgi?id=1040397

Monitoring TS and Sync Stage

Agents statuses
Launch a health check on all agents

Performance Testing Information

  • TBD

Details on the Load Test tool

Known Bugs, Issues, and Tasks

References