TestEngineering/Services/LoadsToolsAndTesting2: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Created page with "* NOTE 1: Original Source: https://etherpad.mozilla.org/Loads-Current-Status-Aug2014 * NOTE 2: This is specifically for Loads V2 * NOTE 3: For Loads V1 information, please see...")
 
No edit summary
Line 2: Line 2:
* NOTE 2: This is specifically for Loads V2
* NOTE 2: This is specifically for Loads V2
* NOTE 3: For Loads V1 information, please see https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
* NOTE 3: For Loads V1 information, please see https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
= Loads V1 and Vaurien =
Two tools Loads (V1) and Vaurien
* Most Stage deployment verification is partially handled through the use of the Loads tool for stress/load (and someday performance) testing.
* Vaurien is a TCP proxy which will let you simulate chaos between your application and a backend server.
** One active(?) POC for Vaurien is with GeoLocation (ichnaea).
== Usage Rules ==
* Note: There are a number of open bugs and issues (see below) that require Loads use to be focused and specific per project:
** Do not over do the loads test - start with the default values in the config files.
** Do not run more than two tests in parallel.
** Do not use more than 5 agents per load test unless you need to use more.
** Do not run a load test of more than 8 - 10 hours
** There are more limitations/rules...
== Repos ==
* https://github.com/mozilla-services/loads
* https://github.com/mozilla-services/loads-aws
* https://github.com/mozilla-services/loads-web
* https://github.com/mozilla-services/loads.js
* https://github.com/mozilla-services/vaurien
* https://github.com/mozilla-services/konfig
** https://pypi.python.org/pypi/konfig
== Bugs ==
* META: https://github.com/mozilla-services/loads/issues/279
* https://github.com/mozilla-services/loads/issues
* https://github.com/mozilla-services/vaurien/issues
== Documentation ==
* http://loads.readthedocs.org/en/latest/
* http://vaurien.readthedocs.org/en/latest/
== Loads Cluster Dashboard ==
* http://loads.services.mozilla.com/
* or http://ec2-54-212-44-143.us-west-2.compute.amazonaws.com/
* Note: This is a login/password protected site. For now, please get an account via Tarek.
* Note: You need to make some changes to your .ssh/config file
== Deployment and AWS Instances ==
* Master, two slaves in US West
* loads-master (broker and agent processes)
* loads-slave-1 (agent processes)
* loads-slave-2 (agent processes)
* Note: there is no CF stack or ELB for this cluster
* Note: the load cluster state/health can be check directly from the dashboard (see above)
== Monitoring the cluster via Stackdriver ==
* StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
== Monitoring the Loads Cluster ==
* Via the dashboard: http://loads.services.mozilla.com/
* Check the loads cluster state/health directly from the dashboard:
** Agents statuses
** Launch a health check on all agents
== Monitoring the Stage environment during Load Tests ==
* We have various dashboards created by OPs that capture and display all sorts of data via the Heka/ES/Kibana pipeline
** Heka dashboard
** Kibana dashboard
** Stackdriver
== Load Test Results ==
* Load test results are always listed in the dashboard.
* A clean up of the dashboard is a high priority for V2 - we want a much better/accurate representation of the test(s) run with the kind of human-readable results that provide additional meaning/context to the metrics provided by the various OPs dashboards
== Reporting (or lack of it) ==
* There were plans to create some reporting in the style of what we had with the Funkload tool.
* There are bugs open about getting some reporting out of Loads.
* No action taken at this time, but a very good candidate for V2.
== QA Wikis ==
* https://wiki.mozilla.org/QA/Services/FxATestEnvironments
* https://wiki.mozilla.org/QA/Services/FxALoadTesting
* https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments
* https://wiki.mozilla.org/QA/Services/TokenServerAndSyncLoadTesting
* https://wiki.mozilla.org/QA/Services/LoopTestEnvironments
* https://wiki.mozilla.org/QA/Services/LoopServerLoadTesting
== Current projects using Loads ==
* FxA-Auth-Server
* FxA-Scrypt-Helper (defunct)
* Tokenserver/Verifier (now combined)
* Sync 1.5
* Loop-Server
* MSISDN-Gateway
* GeoLocation: https://github.com/mozilla/ichnaea/tree/master/loadtest
== New/Planned projects using Loads ==
* SimplePush (probably)
* Tiles (maybe)
== Other projects doing load testing ==
* Tiles: https://github.com/mozilla-services/tiles-loadtest (which I think uses Siege)
* SimplePush:
** https://github.com/bbangert/push-tester (which is all Haskell-y)
** https://github.com/oremj/simplepush-testpod (straight-up JS)
** https://github.com/edmoz/load-test (the Python equivalent me thinks)
== Vaurien ==
* Get Vaurien (or similar) working on FxA, TS, Verifier, Sync (as appropriate)
* See https://github.com/crankycoder/ichnaea for a working example
* Open Issues for this:
** Ichnaea: https://github.com/mozilla/ichnaea/issues/148
** Ichnaea: https://github.com/mozilla/ichnaea/issues/169
** FxA: https://github.com/mozilla/fxa-auth-server/issues/558
** Tokenserver: https://github.com/mozilla-services/tokenserver/issues/44
** Verifier: https://github.com/mozilla/browserid-verifier/issues/50
** Sync: https://github.com/mozilla-services/server-syncstorage/issues/19


= Loads V2 =
= Loads V2 =

Revision as of 18:23, 11 September 2014

Loads V2

Comparison of Load Test Tools

Tasks

  • Tarek says:
  • So what I was thinking: I can lead Loads v2 development with the help of QA and Ops and Benjamin for SimplePush, and then slowly transition ownership to the QA and Ops team - because at the end of the day that's the two teams that should benefit the most from this tool.

New Repos

New Documentation

Brown Bag and Info Session

Brainstorming Loads and V2

  • What we need going forward
  • What we want going forward
  • Some issues (generalized - see the github issues for details):
    • 1- very long runs (>10hours) are not really working. This is a design problem.
    • 2- spinning new slaves to make big tests has not yet been automated. We have 2 slaves boxes that run 10 agents each. This was enough for most of our needs though.
    • 3- The dashboard is scarce. It'll tell you what;s going on, but we don't have any real reporting features yet.
    • 4- running a test using another language than Python is a bit of a pain (you need to do some zmq messaging)
  • Stealing from Tarek's slide deck:
    • Day-long runs don't really work
    • Crappy dashboard
    • No direct link to Logs/CPU/Mem usage of stressed servers
    • No automatic slaves deployment yet
    • Python client only really supported
    • High bar to implement clients in Haskell/Go
  • Figure out how to run loads from personal AWS instances
  • Monitoring
    • What we currently have for Stage
    • What do we want/need?
  • Reporting
  • Loads dashboard
    • What about CPU/memory information (like from atop, tops)
    • Links to some snapshoted graphs
    • code version
    • red/yellow/green states
    • Deployment bug
    • Bugs opened
    • Bugs closed
  • Scaling the cluster dynamically (see V2)
  • Quarterly results/trending
  • Targets
    • PM targets
    • expected targets
    • actual targets
  • Wiki design
    • One per service?
    • One per service per deployment?
  • Weekly reporting
    • What does the QE team want to see
  • Getting the right data/metrics requirements from PMs then extracting that information and displaying on the Loads dashboard and/or in the OPs-built dashboards