TestEngineering/Services/TokenServerAndSyncLoadTesting

From MozillaWiki
Jump to: navigation, search
  • NOTE: We currently have two Verifier stacks in Stage (and probably Production):
    • The standalone Browser_ID Verifier stack: See that Verifier sections below...
    • A Tokenserver+Verifier stack: See the TokenServer sections below...


Quick Verification Of Stage Deployments

  • This is a quick sanity test of the environment before getting started on load tests.
  • TokenServer+Verifier Stage environment:
From the browser: https://token.stage.mozaws.net
curl https://token.stage.mozaws.net
curl -I https://token.stage.mozaws.net

Use the simple "make test" command from an install of tokenserver on the localhost or AWS instance.
cd loadtest
make test SERVER_URL=https://token.stage.mozaws.net

Alternate method:
Use the test tool from here: https://github.com/edmoz/fxa-sync-client
Install and check all collection types for a known account in Stage:
bin/sync-cli.js -e EMAIL -p PASSWORD --env stage -t COLLECTION
    where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
  • Verifier Stage environment:
In the browser: https://verifier.stage.mozaws.net/
curl https://verifier.stage.mozaws.net
curl -I https://verifier.stage.mozaws.net

Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance.
cd loadtest
make test SERVER_URL=https://verifier.stage.mozaws.net
  • Sync Server Stage environment:
Install server-syncstorage to the local host or AWS instance (see below)
$ cd server-syncstorage
Quick test against the TokenServer
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server <Stage TokenServer>
Current example:
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server 
    https://token.stage.mozaws.net/1.0/sync/1.5
Quick tests against the Sync nodes
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py <Stage Sync Node>#<Node Secret>
Current examples:
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py 
    https://sync-1-us-east-1.stage.mozaws.net#<Node Secret>
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py 
    https://sync-1-us-east-1.stage.mozaws.net#<Node Secret>
$ ./local/bin/python ./syncstorage/tests/functional/test_storage.py 
    https://sync-1-us-east-1.stage.mozaws.net#<Node Secret>
Get the Node Secret information from OPs

Quick Verification Of Production Deployments

  • This is a quick sanity test of the environment after a new deployment.
  • Tokenserver+Verifier Production Environment
In the browser: https://token.services.mozilla.com
curl https://token.services.mozilla.com
curl -I https://token.services.mozilla.com
Then:
Use the test tool from here: https://github.com/edmoz/fxa-sync-client
Install and check all collection types for a known account in Production:
bin/sync-cli.js -e PROD-EMAIL -p PASSWORD -t COLLECTION
    where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
  • Verifier Production Environment
In the browser: https://verifier.accounts.firefox.com
curl https://verifier.accounts.firefox.com
curl -I https://verifier.accounts.firefox.com
Then:
Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance.
cd loadtest
make test SERVER_URL=https://verifier.accounts.firefox.com
  • Sync Server Stage environment
Sign in with a known FxA account and sync data with a current Production account (sync node).
Create a new FxA account and set up sync.

Load Test Tool Client/Host

Installing BrowserID-Verifier and the Loads tool on Localhost or AWS

  • Installation:
$ git clone git://github.com/mozilla/browserid-verifier
$ cd browserid-verifier
Note: You may want to install a specific branch for testing vs defaulting to Master
$ npm install
$ npm test
$ cd loadtest
$ make build
     Note: This should hit Stage by default: SERVER_URL=https://verifier.stage.mozaws.net
  • Note: This will install a local copy of the Loads tool for use with the verifier.

Running the load test against the Verifier in Stage

  • Stage environment:
$ make test
or
$ make test SERVER_URL=https://verifier.stage.mozaws.net
$ make bench
or
$ make bench SERVER_URL=https://verifier.stage.mozaws.net	

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
Note: The Stage Verifier hits the Stage mockmyid server
  • Production environment:
$ make test SERVER_URL=https://verifier.accounts.firefox.com
$ make bench SERVER_URL=https://verifier.accounts.firefox.com

Using the Loads V1 Services Cluster for the Verifier

  • By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
  • Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
  • Stage environment:
$ make megabench SERVER_URL=https://verifier.stage.mozaws.net
  • Dev environment:
$ make megabench SERVER_URL=TBD
  • Production environment:
$ make megabench SERVER_URL=https://verifier.accounts.firefox.com

Installing TokenServer+Verifier and the Loads tool on Localhost or AWS

  • Installation:
$ git clone https://github.com/mozilla-services/tokenserver
$ cd tokenserver
Note: You may want to install a specific branch for testing vs defaulting to Master
$ make build
$ make test
    Note: This is for local testing only
$ cd loadtest
$ make build
    Note: This should hit Prod by default: SERVER_URL=https://token.services.mozilla.com
  • Note: This will install a local copy of the Loads tool for use with TokenServer+Verifier.

Running the load test against TokenServer+Verifier in Stage

  • Stage environment:
$ make test SERVER_URL=https://token.stage.mozaws.net
$ make bench SERVER_URL=https://token.stage.mozaws.net		

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
Note: This also hits the Stage Verifier, which in turns hits the Stage mockmyid server
  • And while we are at it...
  • Dev environment:
$ make test SERVER_URL=https://token.dev.lcip.org
$ make bench SERVER_URL=https://token.dev.lcip.org
  • Production environment:
$ make test SERVER_URL=https://token.services.mozilla.com
$ make bench SERVER_URL=https://token.services.mozilla.com

Using the Loads V1 Services Cluster for TokenServer+Verifier

  • By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
  • Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
  • Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
  • Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
  • Production environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com

Installing Sync and load testing on Localhost or AWS

Installation:
$ git clone https://github.com/mozilla-services/syncstorage-loadtest/
$ cd syncstorage-loadtest
Note: You may want to install a specific branch for testing vs defaulting to Master
$ pip install -r requirements.txt

Running the load test against Sync 1.5 in Stage

  • Loads against specific Sync nodes in Stage
$ export SERVER_URL=https://your.storagenode.here#SECRET
Sync Stage nodes:
    https://sync-1-us-east-1.stage.mozaws.net
    https://sync-2-us-east-1.stage.mozaws.net
    ...etc...

NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
$ bin/molotov [commands] loadtest.py

Using the Loads V1 Services Cluster for Sync 1.5 in Stage

Running a combined load test against TokenServer+Verifier and Sync 1.5 in Stage

  • A combined loads test against TokenServer and Sync 1.5 in Stage
  • This is done via the server-syncstorage directory that was cloned and built above
$ cd server-syncstorage
$ cd loadtest
$ make test SERVER_URL=https://your.tokenserver.here
$ make bench SERVER_URL=https://your.tokenserver.here

Examples for Stage:
$ make test SERVER_URL=https://token.stage.mozaws.net
$ make bench SERVER_URL=https://token.stage.mozaws.net
See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer.2BVerifier_Stage_Environment

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...
Note: The Stage Tokenserver hits the Stage Verifier, which, in turn, hits the mockmyid server.
  • And while we are at it...
Dev environment:
Examples:
$ make test SERVER_URL=https://token.dev.lcip.org
$ make bench SERVER_URL=https://token.dev.lcip.org

Prod environment:
Examples:
$ make test SERVER_URL=https://token.services.mozilla.com
$ make bench SERVER_URL=https://token.services.mozilla.com

See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#FxA.2C_TokenServer.2C_and_Sync_Production_Environments
and https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_and_Sync_1.5_Dev_Environments

Using the Loads V1 Services Cluster for a combined load test in Stage

  • By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
  • Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
  • Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
  • Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
  • Prod environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com

Configuring The Load Tests

  • Makefile
    • The SERVER_URL constant can be changed.
  • Config files
    • For make test (BrowserID-Verifier, TokenServer, Sync, Combined):
      • Number of hits
      • Number of concurrent users
    • For make bench (BrowserID-Verifier, TokenServer, Sync, Combined):
      • Number of concurrent users
      • Duration of test
    • For make megabench (using the LoadsCluster with BrowserID-Verifier, TokenSerer, Sync, Combined):
      • Number of concurrent users
      • Duration of test
      • Include file (this is code dependent)
      • Python dependencies (this is code dependent)
      • Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
      • Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
      • Observer (this can be email or irc - the default is irc #services-dev channel)
      • SSH (the user account needed to SSH into the loads cluster - the default is ubuntu)

Test Coverage and Stats

  • Basic tweakable values for all load tests
    • users = number of concurrent users/agent
    • agents = number of agents out of the cluster, otherwise errors out
    • duration = in seconds
    • hits = 1 or X number of rounds/hits/iterations
  • TokenServer
    • File location: tokenserver/loadtest/loadtest.py
    • Inside NoteAssignmentTest, test_realistic is the main load test; the others are for specific behaviors
    • The test runs as following:
95% ask for assertions on existing users (on a DB filled by test_single_token_exchange)
4% ask for assertion on a new use
1% ask for a bad assertion
    • A bug has been filed to get the following additional coverage for the load test:
      • generation numbers in assertion
      • client state string
    • A bug has been filed to get some integration tests written:
      • to cover the edge/error cases not in the load test
      • to be pointed at a remote server
  • Sync
    • File location: server-syncstorage/loadtest/stress.py
    • This is the Sync 2.0 load test that has been back-ported for Sync 1.5.
    • The stress.py file is fully configurable for the following:
      • client probability
      • client distribution
      • collections
    • A bug has been filed to add support for load testing tabs
      • The tab collection it uses memcache; we need to figure out a way to test it without overloading the server
    • There are currently no constants to define how to select percentages per collection type
    • Right now, we need to manually configure the collections list in stress.py:
      • collections = ['bookmarks', 'forms', 'passwords', 'history', 'prefs']
      • Basically, you can add more entries of each type, since the load test (per user/again/hit/pass) picks randomly from the list for any given request...

Analyzing the Results

  • There are several methods and tools for analyzing the load test results.
  • Tokenserver Custom Metrics
    • addFailure
  • Verifier Custom Metrics
    • addFailure
  • Sync Custom Metrics
    • addFailure

Debugging the Issues

  • There are several methods and tools for debugging the load test errors and other issues.
  • 1. Important logs for TokenServer (per server)
    • /media/ephemeral0/logs/
    • /media/ephemeral0/nginx/logs/default.access.log
    • /media/ephemeral0/nginx/logs/default.error.log
    • /media/ephemeral0/nginx/logs/tokenserver.access.log
    • /media/ephemeral0/nginx/logs/tokenserver.error.log
    • /media/ephemeral0/logs/tokenserver/token.error.log
    • /media/ephemeral0/logs/tokenserver/token.log.*
    • /media/ephemeral0/logs/tokenserver/process_account_deletions.error.log
    • /media/ephemeral0/logs/tokenserver/process_account_deletions.log
    • /media/ephemeral0/logs/tokenserver/purge_old_records.log
    • /media/ephemeral0/logs/tokenserver/purge_old_records.error.log
    • /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
    • /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
    • /var/log/circus.log
    • /var/log/hekad/tokenserver.stdout.log
    • /var/log/hekad/tokenserver.stderr.log
  • 2. Important logs for Verifier (per server)
    • /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
    • /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
    • /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
    • /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
    • /media/ephemeral0/nginx/logs/default.access.log (not in use)
    • /media/ephemeral0/nginx/logs/default.error.log (not in use)
    • /media/ephemeral0/squid/access.log
    • /var/log/circus.log
    • /var/log/hekad/fxa-browserid_verifier.stderr.log
    • /var/log/hekad/fxa-browserid_verifier.stdout.log
  • 3. Important error logs for Sync (per Sync node)
    • /media/ephemeral0/logs/
    • /media/ephemeral0/nginx/access.log
    • /media/ephemeral0/error.log
    • /media/ephemeral0/sync/sync.err
    • /media/ephemeral0/sync/sync.log


  • Acceptable TokenServer errors:
1% - 2% failures (as the following)
token.log:
"name": "token.assertion.invalid_signature_error"
"name": "token.assertion.verify_failure"
nginx access.log:
401s
NOTE: Values can be tweaked here:
    https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L58-L60

The following types of errors are known:
/media/ephemeral0/logs/tokenserver/token.error.log
    Exception KeyError: KeyError(49564400,) in <module 'threading'...
/media/ephemeral0/logs/tokenserver/token.log
    ..."Starting new HTTP connection (9): 127.0.0.1", "hostname": ...
    {"error": "StopIteration()", "traceback": "Uncaught exception:\n  
    File \"/data/tokenserver/local/lib/python2.6/site-packages/gunicorn/workers/async.py\"...
    ..."Connection pool is full, discarding connection: 127.0.0.1", "...
Also, any 499s are probably an artifact of the current (V1) load test.
REF:
https://bugzilla.mozilla.org/show_bug.cgi?id=1040396
https://bugzilla.mozilla.org/show_bug.cgi?id=1040397

OLD: Also, it may be the case that the following errors are "acceptable" if TS Stage is larger than Verifier Stage:
/media/ephemeral0/logs/tokenserver/token.error.log
Verifier-related errors of these types:
"HttpConnectionPool is full, discarding connection: verifier.stage.mozaws.net"
"Resetting dropped connection: verifier.stage.mozaws.net"
"Starting new HTTPS connection (179): verifier.stage.mozaws.net"
  • Acceptable Verifier errors:
The verifier_out.log will show errors of the following types:
result: 'failure',\n  reason: 'untrusted issuer...'
result: 'failure',\n  reason: 'expired'
result: 'failure',\n  reason: 'algorithms do not match'
result: 'failure',\n  reason: 'audience mismatch: scheme mismatch'
Also, any 499s in the nginx logs are probably an artifact of the current (V1) load test.
  • Acceptable Sync node errors:
In the nginx access.log files:
We will see some percentage of 404s. Right now we see the following:
    14% 404s (compared to the total count of 200s)
    with the config set up as follows:
         users = 20
         duration = 1800
         agents = 5
Ideally, the overall percentage of 404s should drop the longer the load test.
Usually, you will not see 304s, 400s, 412s, or 415s for a load test,
although they may show up in the logs after running the remote integration tests.
Also, any 499s are probably an artifact of the current (V1) load test.

In /var/log/hekad/sync_1_5.stderr.log
You may see some Decoder 'Sync-1_5-SlowQuery-MySqlSlowQueryDecoder' error: Failed parsing
and a lot of BSO INSERTs

In /media/ephemeral0/logs/sync/sync.err
You should see expected skew and QueuePool messages and Deprecation warnings
Also, these are known
Exception SystemExit
Exception KeyError
This is probably https://bugzilla.mozilla.org/show_bug.cgi?id=1040397

Monitoring TS and Sync Stage

Agents statuses
Launch a health check on all agents

Performance Testing Information

  • TBD

Details on the Load Test tool

Known Bugs, Issues, and Tasks

References