TestEngineering/Services/LoadsToolsAndTesting1: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Created page with "= Loads V1 and Vaurien = Two tools Loads (V1) and Vaurien * Most Stage deployment verification is partially handled through the use of the Loads tool for stress/load (and some...")
 
 
(17 intermediate revisions by one other user not shown)
Line 1: Line 1:
* NOTE 1: Source: https://etherpad.mozilla.org/Loads-Current-Status-Aug2014
* NOTE 2: This is specifically for Loads V1
* NOTE 3: For Loads V2 information, please see https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting2
= Loads V1 and Vaurien =
= Loads V1 and Vaurien =
Two tools Loads (V1) and Vaurien
Two tools Loads (V1) and Vaurien
Line 4: Line 8:
* Vaurien is a TCP proxy which will let you simulate chaos between your application and a backend server.
* Vaurien is a TCP proxy which will let you simulate chaos between your application and a backend server.
** One active(?) POC for Vaurien is with GeoLocation (ichnaea).
** One active(?) POC for Vaurien is with GeoLocation (ichnaea).
Usage Rules
 
== Loads Cluster Usage Rules ==
* Note: There are a number of open bugs and issues (see below) that require Loads use to be focused and specific per project:
* Note: There are a number of open bugs and issues (see below) that require Loads use to be focused and specific per project:
** Do not over do the loads test - start with the default values in the config files.
** Do not over do the loads test - start with the default values in the config files.
Line 11: Line 16:
** Do not run a load test of more than 8 - 10 hours
** Do not run a load test of more than 8 - 10 hours
** There are more limitations/rules...
** There are more limitations/rules...
Repos
 
== Loads V1 Cluster Environment/Stack ==
* URLs
** http://loads.services.mozilla.com/
** or http://ec2-54-212-44-143.us-west-2.compute.amazonaws.com/
 
* Versions
Loads Cluster/Broker/Agents:
$ cd /home/ubuntu/loads/bin
$ ./loads-runner --version
 
* AWS in US West
** loads-master (broker and agent processes)
** loads-slave-1 (agent processes)
** loads-slave-2 (agent processes)
** NOTE: there is no stack or ELB for this cluster
 
* Files
** /home/ubuntu
*** loads
*** loads-aws
*** loads-web
 
* Processes
** Search for processes owned by ubuntu, loads, nginx, circus
 
* Logs
** /var/log/redis
** /var/log/nginx
 
* QA access
** You need special access to be able to SSH into these devices
** You need to make some changes to your .ssh/config file
 
* Links
** http://loads.readthedocs.org/en/latest/
** https://github.com/mozilla-services/loads
** https://github.com/mozilla-services/loads-aws
 
== Loads V1 Cluster Monitoring ==
* Loads Dashboard
** http://loads.services.mozilla.com
 
* Stackdriver
** https://app.stackdriver.com/groups/6664/stage-loads-cluster
 
* Cluster status
** Check directly from the Loads Cluster dashboard: http://loads.services.mozilla.com
Agents statuses
Launch a health check on all agents
 
== Loads V1 Cluster Maintenance ==
* If things should go wrong...
 
* Checking the cluster dashboard
* TBD
 
* Checking the stack
* TBD
 
* Restarting the Master/Broker
* TBD
 
* Restarting the Slaves/Agents
* TBD
 
== Repos ==
* https://github.com/mozilla-services/loads
* https://github.com/mozilla-services/loads
* https://github.com/mozilla-services/loads-aws
* https://github.com/mozilla-services/loads-aws
Line 19: Line 90:
* https://github.com/mozilla-services/konfig
* https://github.com/mozilla-services/konfig
** https://pypi.python.org/pypi/konfig
** https://pypi.python.org/pypi/konfig
Bugs
 
== Bugs ==
* META: https://github.com/mozilla-services/loads/issues/279
* META: https://github.com/mozilla-services/loads/issues/279
* https://github.com/mozilla-services/loads/issues
* https://github.com/mozilla-services/loads/issues
* https://github.com/mozilla-services/vaurien/issues
* https://github.com/mozilla-services/vaurien/issues
Documentation
 
== Documentation ==
* http://loads.readthedocs.org/en/latest/
* http://loads.readthedocs.org/en/latest/
* http://vaurien.readthedocs.org/en/latest/
* http://vaurien.readthedocs.org/en/latest/
Loads Cluster Dashboard
 
== Loads Cluster Dashboard ==
* http://loads.services.mozilla.com/
* http://loads.services.mozilla.com/
* or http://ec2-54-212-44-143.us-west-2.compute.amazonaws.com/  
* or http://ec2-54-212-44-143.us-west-2.compute.amazonaws.com/  
* Note: This is a login/password protected site. For now, please get an account via Tarek.
* Note: This is a login/password protected site. For now, please get an account via Tarek.
* Note: You need to make some changes to your .ssh/config file  
* Note: You need to make some changes to your .ssh/config file  
Deployment and AWS Instances:
 
== Deployment and AWS Instances ==
* Master, two slaves in US West
* Master, two slaves in US West
* loads-master (broker and agent processes)
* loads-master (broker and agent processes)
Line 38: Line 113:
* Note: there is no CF stack or ELB for this cluster  
* Note: there is no CF stack or ELB for this cluster  
* Note: the load cluster state/health can be check directly from the dashboard (see above)
* Note: the load cluster state/health can be check directly from the dashboard (see above)
Monitoring the cluster via Stackdriver
 
== Monitoring the cluster via Stackdriver ==
* StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster  
* StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster  
Monitoring the Loads Cluster
 
* Via the dashboard: http://loads.services.mozilla.com/
== Monitoring the Loads Cluster via the Dashboard ==
* Dashboard: http://loads.services.mozilla.com/
* Check the loads cluster state/health directly from the dashboard:
* Check the loads cluster state/health directly from the dashboard:
** Agents statuses
** Agents statuses
** Launch a health check on all agents
** Launch a health check on all agents
Monitoring the Stage environment during Load Tests
 
== Monitoring the Stage environment during Load Tests ==
* We have various dashboards created by OPs that capture and display all sorts of data via the Heka/ES/Kibana pipeline
* We have various dashboards created by OPs that capture and display all sorts of data via the Heka/ES/Kibana pipeline
** Heka dashboard
** Heka dashboard
** Kibana dashboard
** Kibana dashboard
** Stackdriver
** Stackdriver
Load Test Results
 
== Load Test Results ==
* Load test results are always listed in the dashboard.
* Load test results are always listed in the dashboard.
* A clean up of the dashboard is a high priority for V2 - we want a much better/accurate representation of the test(s) run with the kind of human-readable results that provide additional meaning/context to the metrics provided by the various OPs dashboards
* A clean up of the dashboard is a high priority for V2 - we want a much better/accurate representation of the test(s) run with the kind of human-readable results that provide additional meaning/context to the metrics provided by the various OPs dashboards
Reporting (or lack of it)
 
== Reporting (or lack of it) ==
* There were plans to create some reporting in the style of what we had with the Funkload tool.
* There were plans to create some reporting in the style of what we had with the Funkload tool.
* There are bugs open about getting some reporting out of Loads.
* There are bugs open about getting some reporting out of Loads.
* No action taken at this time, but a very good candidate for V2.
* No action taken at this time, but a very good candidate for V2.
QA Wikis
 
== QA Wikis ==
* https://wiki.mozilla.org/QA/Services/FxATestEnvironments
* https://wiki.mozilla.org/QA/Services/FxATestEnvironments
* https://wiki.mozilla.org/QA/Services/FxALoadTesting
* https://wiki.mozilla.org/QA/Services/FxALoadTesting
Line 64: Line 145:
* https://wiki.mozilla.org/QA/Services/LoopTestEnvironments
* https://wiki.mozilla.org/QA/Services/LoopTestEnvironments
* https://wiki.mozilla.org/QA/Services/LoopServerLoadTesting
* https://wiki.mozilla.org/QA/Services/LoopServerLoadTesting
Current projects using Loads
 
== Current projects using Loads ==
* FxA-Auth-Server
* FxA-Auth-Server
* FxA-Scrypt-Helper (defunct)
* FxA-Scrypt-Helper (defunct)
Line 72: Line 154:
* MSISDN-Gateway
* MSISDN-Gateway
* GeoLocation: https://github.com/mozilla/ichnaea/tree/master/loadtest
* GeoLocation: https://github.com/mozilla/ichnaea/tree/master/loadtest
New/Planned projects using Loads
 
== New/Planned projects using Loads ==
* SimplePush (probably)
* SimplePush (probably)
* Tiles (maybe)
* Tiles (maybe)
Other projects doing load testing
 
* Tiles: https://github.com/mozilla-services/tiles-loadtest
== Other projects doing load testing ==
(which I think uses Siege)
* Tiles: https://github.com/mozilla-services/tiles-loadtest (which I think uses Siege)
* SimplePush:  
* SimplePush:  
** https://github.com/bbangert/push-tester (which is all Haskell-y)
** https://github.com/bbangert/push-tester (which is all Haskell-y)
** https://github.com/oremj/simplepush-testpod (straight-up JS)
** https://github.com/oremj/simplepush-testpod (straight-up JS)
** https://github.com/edmoz/load-test (the Python equivalent me thinks)
** https://github.com/edmoz/load-test (the Python equivalent me thinks)
Vaurien
 
== Vaurien ==
* Get Vaurien (or similar) working on FxA, TS, Verifier, Sync (as appropriate)
* Get Vaurien (or similar) working on FxA, TS, Verifier, Sync (as appropriate)
* See https://github.com/crankycoder/ichnaea for a working example
* See https://github.com/crankycoder/ichnaea for a working example
Line 92: Line 176:
** Verifier: https://github.com/mozilla/browserid-verifier/issues/50
** Verifier: https://github.com/mozilla/browserid-verifier/issues/50
** Sync: https://github.com/mozilla-services/server-syncstorage/issues/19
** Sync: https://github.com/mozilla-services/server-syncstorage/issues/19
= Loads V2 =
* What is it?
* Changes for V2
* Overview/Slides: http://blog.ziade.org/slides/loadsv2/#/
* Initial Diagram: http://blog.ziade.org/loads.jpg
* Initial Look: https://etherpad.mozilla.org/Loadsv2
* Ben's design work: https://etherpad.mozilla.org/loadsv2-design
Comparison of Load Test Tools
* Siege: http://www.joedog.org/siege-home/
* And some others in comparison: http://www.appdynamics.com/blog/devops/load-testing-tools-explained-the-server-side/
* https://github.com/newsapps/beeswithmachineguns
* Some of these require large sums of $ in order to run adequate load tests (size/time)
* Straight HTTP vs. smart tests (that we are sending)
* Dumb testing vs. smart testing (what we are doing)
* Some of the off-the-shelf are quite limited - we need to be able to use a programming language to define very specific tests/requirements
* The Grinder, for example,  is not really designed to be deployed on AWS, for example.
* Tsunami is good at sending a lot of load on web service, but it requires writing XML
Tasks
Tarek says:
So what I was thinking: I can lead Loads v2 development with the help of
QA and Ops and Benjamin for SimplePush, and then slowly transition
ownership to the
QA and Ops team - because at the end of the day that's the two teams
that should benefit the most from this tool.
New Repos
* https://github.com/loads
* https://github.com/loads/docs
* https://github.com/loads/loads-broker
* https://github.com/loads/loads-tester
* https://github.com/loads/old-loads-agent
* https://github.com/loads/old-loads-broker
* https://github.com/loads/old-loads-base
* https://github.com/loads/old-loads-web
* Note: naming is a bit strange right now because the architecture is in transition
New Documentation
* TBD: for now see https://github.com/loads/docs
September Brown Bag and Info Session
* https://etherpad.mozilla.org/loads-brownbag
= Brainstorming Loads and V2 =
* What we need going forward
* What we want going forward
Some issues (generalized - see the github issues for details):
1- very long runs (>10hours) are not really working. This is a design
problem.
2- spinning new slaves to make big tests has not yet been automated. We
have 2 slaves boxes that run 10 agents each. This was enough for most of
our needs though.
3- The dashboard is scarce. It'll tell you what;s going on, but we don't
have any real reporting features yet.
4- running a test using another language than Python is a bit of a pain
(you need to do some zmq messaging)
Stealing from Tarek's slide deck:
* Day-long runs don't really work
* Crappy dashboard
* No direct link to Logs/CPU/Mem usage of stressed servers
* No automatic slaves deployment yet
* Python client only really supported
* High bar to implement clients in Haskell/Go
* Also, we have a lot of open bugs that need to get fixed. Some prevent better use of the tool for newer projects/services.
* Get Loads "fixed" for Mac 10.9 and XCode 5.1.1: https://bugzilla.mozilla.org/show_bug.cgi?id=1010567
* Figure out how to run loads from personal AWS instances
Some of this is already in progress...
Monitoring
What we currently have for Stage
What do we want/need?
Reporting
Loads dashboard
What about CPU/memory information (like from atop, tops)
Links to some snapshoted graphs
code version
red/yellow/green states
Deployment bug
Bugs opened
Bugs closed
Scaling
Quarterly results/trending
PM targets
expected targets
actual targets
Wiki design
One per service?
One per service per deployment?
Weekly reporting
What does the QE team want to see
Getting Load tests to work from an AWS instance as localhost
Getting the right data/metrics requirements from PMs
then extracting that information and displaying on the Loads dashboard and/or in the OPs-built
dashboards

Latest revision as of 20:03, 26 August 2016

Loads V1 and Vaurien

Two tools Loads (V1) and Vaurien

  • Most Stage deployment verification is partially handled through the use of the Loads tool for stress/load (and someday performance) testing.
  • Vaurien is a TCP proxy which will let you simulate chaos between your application and a backend server.
    • One active(?) POC for Vaurien is with GeoLocation (ichnaea).

Loads Cluster Usage Rules

  • Note: There are a number of open bugs and issues (see below) that require Loads use to be focused and specific per project:
    • Do not over do the loads test - start with the default values in the config files.
    • Do not run more than two tests in parallel.
    • Do not use more than 5 agents per load test unless you need to use more.
    • Do not run a load test of more than 8 - 10 hours
    • There are more limitations/rules...

Loads V1 Cluster Environment/Stack

  • Versions
Loads Cluster/Broker/Agents:
$ cd /home/ubuntu/loads/bin
$ ./loads-runner --version
  • AWS in US West
    • loads-master (broker and agent processes)
    • loads-slave-1 (agent processes)
    • loads-slave-2 (agent processes)
    • NOTE: there is no stack or ELB for this cluster
  • Files
    • /home/ubuntu
      • loads
      • loads-aws
      • loads-web
  • Processes
    • Search for processes owned by ubuntu, loads, nginx, circus
  • Logs
    • /var/log/redis
    • /var/log/nginx
  • QA access
    • You need special access to be able to SSH into these devices
    • You need to make some changes to your .ssh/config file

Loads V1 Cluster Monitoring

Agents statuses
Launch a health check on all agents

Loads V1 Cluster Maintenance

  • If things should go wrong...
  • Checking the cluster dashboard
  • TBD
  • Checking the stack
  • TBD
  • Restarting the Master/Broker
  • TBD
  • Restarting the Slaves/Agents
  • TBD

Repos

Bugs

Documentation

Loads Cluster Dashboard

Deployment and AWS Instances

  • Master, two slaves in US West
  • loads-master (broker and agent processes)
  • loads-slave-1 (agent processes)
  • loads-slave-2 (agent processes)
  • Note: there is no CF stack or ELB for this cluster
  • Note: the load cluster state/health can be check directly from the dashboard (see above)

Monitoring the cluster via Stackdriver

Monitoring the Loads Cluster via the Dashboard

Monitoring the Stage environment during Load Tests

  • We have various dashboards created by OPs that capture and display all sorts of data via the Heka/ES/Kibana pipeline
    • Heka dashboard
    • Kibana dashboard
    • Stackdriver

Load Test Results

  • Load test results are always listed in the dashboard.
  • A clean up of the dashboard is a high priority for V2 - we want a much better/accurate representation of the test(s) run with the kind of human-readable results that provide additional meaning/context to the metrics provided by the various OPs dashboards

Reporting (or lack of it)

  • There were plans to create some reporting in the style of what we had with the Funkload tool.
  • There are bugs open about getting some reporting out of Loads.
  • No action taken at this time, but a very good candidate for V2.

QA Wikis

Current projects using Loads

New/Planned projects using Loads

  • SimplePush (probably)
  • Tiles (maybe)

Other projects doing load testing

Vaurien