TestEngineering/Services/FxALoadTesting: Difference between revisions

Latest revision as of 20:01, 26 August 2016

Quick Verification Of Stage Deployments

This is a quick sanity test of the environment before getting started on load tests.

Install FxA-Auth-Server to a local host or an AWS instance (see below)
$ cd fxa-auth-server
Run the integration tests against the remote Stage server (load balancer)
$ PUBLIC_URL=<FxA Stage> npm run test-remote
Current example:
$ PUBLIC_URL=https://api-accounts.stage.mozaws.net npm run test-remote

NOTE: Make sure to install and test from the same branch that is deployed to Stage (ie do not use Master for running the tests against Stage or Production).

Using TPS
- The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py
- See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS
- See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675

Quick Verification Of Production Deployments

This is a quick sanity test of the environment after each new deployment. There are other verifications that can be run as well

Install FxA-Auth-Server to a local host or an AWS instance (see below)
$ cd fxa-auth-server
Run the integration tests against the remote Stage server (load balancer)
$ PUBLIC_URL=<FxA Prod> npm run test-remote
Current example:
$ PUBLIC_URL=https://api.accounts.firefox.com npm run test-remote

NOTE: Make sure to install and test from the same branch that is deployed to Production.

Load Test Tool Client/Host

It is always best to configure an AWS instance as the host for all load testing.
All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following links for more information:
- https://wiki.mozilla.org/QA/Services/LoadsV1ClientTestHost
- https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1

Installing FxA-Auth-Server and the Loads tool on Localhost or AWS

Installation:
$ git clone https://github.com/mozilla/fxa-auth-server.git
$ cd ./fxa-auth-server
Note: You may want to install a specific branch for testing vs defaulting to Master
$ npm install
$ npm test
$ cd ./test/load
$ make build

Note: 'npm install' may need to be run now as root.
Note: This will install a local copy of the Loads tool for use with FxA-Auth-Server.

Running the Loads tool against FxA Stage

The basic load test can be run as follows

$ make test SERVER_URL=https://api-accounts.stage.mozaws.net

The full, default load test can be run as follows

$ make bench SERVER_URL=https://api-accounts.stage.mozaws.net

Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost.    
The recommendation is to use 'make test' and 'make megabench' instead (see below)...

Configuring the bench load test - config folder:
- The test.ini file (for make test) can be configured for the following:
  - Number of hits
  - Number of concurrent users
- The bench.ini file (for make bench) can be configured for the following:
  - Number of concurrent users
  - Duration of test

For both tests, start with the defaults, then tweak the duration. Users and Agents are optional tweaks/changes. Also, we can configure the bench load test to run in detached mode with an appropriate loads detach and observer settings.

REF: https://github.com/mozilla/fxa-auth-server/tree/master/loadtest/config

Running the Loads tool against FxA Development or Production

This can be done if we are comparing Stage vs. some other environment and have access to the AWS logs in Dev or Production:

Dev:

$ make test SERVER_URL=https://accounts.dev.lcip.org
$ make bench SERVER_URL=https://accounts.dev.lcip.org

Prod:

$ make test SERVER_URL=https://api.accounts.firefox.com
$ make bench SERVER_URL=https://api.accounts.firefox.com

The same optional configuration changes apply here.

Using the Loads V1 Services Cluster

By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).

Testing against the Stage environment:

$ make megabench SERVER_URL=https://api-accounts.stage.mozaws.net

Testing against the Dev environment:

$ make megabench SERVER_URL=https://api-accounts.dev.lcip.org

Testing against the Prod enviornment:

$ make megabench SERVER_URL=https://api.accounts.firefox.com

Configuring the megabench load test - config folder:
- The megabench.ini file (for make megabench) can be configured for the following:
  - Number of concurrent users
  - Duration of test
  - Include file (leave as defined for now)
  - Python dependencies (leave as defined for now)
  - Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
  - Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
  - Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
  - Observer (this can be email or irc - the default is irc #services-dev channel)

Configuring The Load Tests

Makefile
- The SERVER_URL constant can be changed.

Config files
- For make test:
  - Number of hits
  - Number of concurrent users

- For make bench:
  - Number of concurrent users
  - Duration of test

- For make megabench:
  - Number of concurrent users
  - Duration of test
  - Include file (this is code dependent)
  - Python dependencies (this is code dependent)
  - Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
  - Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
  - Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
  - Observer (this can be email or irc - the default is irc #services-dev channel)

Load Test code: loadtests.py
- The load test can be configured in the code - see the following lines:
- https://github.com/mozilla/fxa-auth-server/blob/master/test/load/loadtests.py#L17-L39

General REFs:
- https://github.com/mozilla/fxa-auth-server/blob/master/test/load/loadtests.py

Test Coverage and Stats

Basic tweakable values for all load tests
- users = number of concurrent users/agent
- agents = number of agents out of the cluster, otherwise errors out
- duration = in seconds
- hits = 1 or X number of rounds/hits/iterations

Location fxa-auth-server/loadtest/loadtests.py
The following items are covered in the load test
- test_auth_server is the main entry point in the loadtests.py file
  - account creation
  - session creation
- account deletion
- session deletion
Integration tests
- These are designed to cover the edge/error cases that are not applicable to the load test
- The tests can be run against a remote server

Analyzing the Results

TBD

Debugging the Issues

There are several methods and tools for debugging the load test errors and other issues.

1. Important logs for FxA-Auth-Server (per server)
- /media/ephemeral0/fxa-auth-server/auth_err.log.*
- /media/ephemeral0/fxa-auth-server/auth_out.log
- /media/ephemeral0/heka/hekad_err.log
- /media/ephemeral0/heka/hekad_out.log
- /media/ephemeral0/nginx/logs/access.log
- /media/ephemeral0/nginx/logs/error.log

Acceptable FxA-Auth-Server errors

503s: especially of this type - /v1/certificate/sign - are usually a sign that we are overloading the hosts

400s: we should never see these in the logs, especially if the "errno" value is 105. 
    Check the fxa-auth-server/auth_err.log
400s: "errno" values of 101, 102 are ok. These can be expected during a load test.

ELB issues: we may see 503s and corresponding "err":"cannot enqueue work: maximum backlog exceeded (30)" 
    messages if one or more of the hosts behind the ELB is receiving most of the load traffic.
REF: https://github.com/mozilla/fxa-auth-server/issues/647

Monitoring FxA Stage

Loads dashboard:
- http://ec2-54-212-44-143.us-west-2.compute.amazonaws.com/
- or http://loads.services.mozilla.com
Cluster status
- Check directly from the Loads Cluster dashboard:

Agents statuses
Launch a health check on all agents

and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster

For all other monitoring, see the following section:
- https://wiki.mozilla.org/QA/Services/FxATestEnvironments#Monitoring_the_Stage_Environment

Performance Testing Information

TBD

Details on the Load Test tool

The documentation can be found here:
- https://loads.readthedocs.org/en/latest
The repositories are here:
The Services cluster is here:
- http://loads.services.mozilla.com

Known Bugs, Issues, and Tasks

FxA
- https://github.com/mozilla/fxa-auth-server/issues
- https://github.com/mozilla/fxa-content-server/issues
- and several others

Bugzilla
- No specific category

Infrastructure
- https://github.com/mozilla-services/puppet-config/issues
- https://github.com/mozilla-services/svcops/issues

Loads Tool and clusters

Capacity Planning Stage and Production

QA is tasked with providing some capacity requirements and constraints based on repeated load testing of the FxA-Auth-Server Stage environment.
The goal is to be able to work with OPs to develop a realistic plan for deploying and maintaining the production environment at a level expected for projected user traffic, etc.

Brainstorming the QA role:
- QA needs to get some realistic numbers from the Product team. This could be as simple as traffic flow (number of users per day or per segments of the day - peaks and valleys) or more detailed:
  - Traffic flow - QPS/RPS
  - Average number of users per time segment
  - Average and peak latency
  - Error percentages and thresholds
  - etc
- QA gets help from OPs to learn how to measure those required numbers/values using StackDriver or other tools (or to get data from OPsView). If those numbers can not be measured then we either need to
  - get a different set of data points from the Product team
  - enhance the current tools to track and measure the required data
- QA does repeated, scheduled, well-defined load tests in Stage while actively monitoring the results, logs, data, etc.
- QA finds a stable configuration that - when scaled - would
  - match the needs of Product when we release
  - match the realisitc capacity planning that OPs normally does

Dependencies
- Realistic traffic/user numbers from the FxA Product team
- Timely training on monitoring tools from the OPs team
- Regular and realistic scaling/testing of deployments to Stage by QA given our current pre-release and post-release schedules

References

Repository: https://github.com/mozilla/fxa-auth-server
The QA Test Environments:
- https://wiki.mozilla.org/QA/Services/FxATestEnvironments
- https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments
Deploying the FxA Load Test environment for broker/agents usage:
- https://github.com/mozilla/fxa-deployment
OPs pages for stats collection, logging, monitoring
- TBD

TestEngineering/Services/FxALoadTesting: Difference between revisions

Latest revision as of 20:01, 26 August 2016

Contents

Quick Verification Of Stage Deployments

Quick Verification Of Production Deployments

Load Test Tool Client/Host

Installing FxA-Auth-Server and the Loads tool on Localhost or AWS

Running the Loads tool against FxA Stage

Running the Loads tool against FxA Development or Production

Using the Loads V1 Services Cluster

Configuring The Load Tests

Test Coverage and Stats

Analyzing the Results

Debugging the Issues

Monitoring FxA Stage

Performance Testing Information

Details on the Load Test tool

Known Bugs, Issues, and Tasks

Capacity Planning Stage and Production

References

Navigation menu

TestEngineering/Services/FxALoadTesting: Difference between revisions

Latest revision as of 20:01, 26 August 2016

Quick Verification Of Stage Deployments

Quick Verification Of Production Deployments

Load Test Tool Client/Host

Installing FxA-Auth-Server and the Loads tool on Localhost or AWS

Running the Loads tool against FxA Stage

Running the Loads tool against FxA Development or Production

Using the Loads V1 Services Cluster

Configuring The Load Tests

Test Coverage and Stats

Analyzing the Results

Debugging the Issues

Monitoring FxA Stage

Performance Testing Information

Details on the Load Test tool

Known Bugs, Issues, and Tasks

Capacity Planning Stage and Production

References

Navigation menu

Search