Loop/Test/Loadtesting: Difference between revisions

m
formatting
m (loadtest monitoring)
m (formatting)
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Draft}}
{{LastUpdated}}
= Overview =
= Overview =
== Summary ==
== Summary ==
Loadtesting Loop is currently performed with loadsv1 tool.
Loadtesting Loop is currently performed with loadsv1 tool.
Loadsv2 is under development.
Loadsv2 is under development (stay tuned for more info...)


Loadtesting should be performed against
Loadtesting should be performed against
Line 54: Line 50:
'''Run Test'''
'''Run Test'''
There are 3 commands available for testing with loadsv1:
There are 3 commands available for testing with loadsv1:
* [[#make megabench|<code>$ make megabench</code>]]
* [[#make test|<code>$ make test</code>]]
* [[#make test|<code>$ make test</code>]]
* [[#make bench|<code>$ make bench</code>]]
* <code>$ make bench</code> (use megabench instead)
* [[#make megabench|<code>$ make megabench</code>]]
 




Line 80: Line 77:
Of the three tests available, <code>make megabench</code> will probably be the one you use most.
Of the three tests available, <code>make megabench</code> will probably be the one you use most.
<code>make test</code> is largely for a quick smoke test and <code>make bench</code> doesn't do anything you can't do with the other two.
<code>make test</code> is largely for a quick smoke test and <code>make bench</code> doesn't do anything you can't do with the other two.
For information about test monitoring:
[[Loop/Test/Monitoring]]




Line 212: Line 213:




== make bench ==
= Loadtest - Tweaking =
* Use megabench instead
loadtests can be configured in the code...
 
* Test server URL
 
** https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L15
[[#toc]]
* Error percentages
 
** https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L19-L22
 
 
 
 
= Loadtest Monitoring =  
== Monitoring Loop Stage ==
* Loads dashboard:
** http://loads.services.mozilla.com
* Cluster status
** Check directly from the Loads Cluster dashboard:  
Agents statuses
Launch a health check on all agents
 
[[#toc]]
 
== Log Monitoring ==
'''Logs'''
*        /media/ephemeral0/msisdn-gateway/msisdn-gateway_err.log
*        /media/ephemeral0/msisdn-gateway/msisdn-gateway_out.log
*       /media/ephemeral0/nginx/logs/default.access.log (not in use)
*       /media/ephemeral0/nginx/logs/default.error.log (not in use)
*        /media/ephemeral0/nginx/logs/msisdn-gateway.access.log
*        /media/ephemeral0/nginx/logs/msisdn-gateway.error.log
*        /var/log/circus.log
*        /var/log/hekad/msisdn_gateway.stderr.log
*        /var/log/hekad/msisdn_gateway.stdout.log
 
== HTTP Access Log - Parsing ==
 
'''unique REST activity with counts'''
<pre>
$ cat /media/ephemeral0/nginx/logs/loop_server.access.log | grep "HTTP/" | awk '{print $6" "$3" "}' | sort | uniq -c
 
'''unique REST activity with counts (from a .gz file)'''
<pre>
$ zgrep -a "HTTP/"  /media/ephemeral0/nginx/logs/loop_server.access.log-20150114.gz |  awk '{print $6" "$3" "}' | sort | uniq -c
</pre>
 
 
'''return only the interesting stuff'''
<pre>
$ cat loop_server.access.log | grep -v '200 ' | grep -v '101 ' | grep -v '499 ' | grep -v '201 ' | grep -v '204 '
</pre>
 


'''Tail multiple logs'''
<pre>
$ cd /media/ephemeral0/nginx/logs
$ tail -f loop_server.access.log loop_server.error.log  default.access.log default.error.log
</pre>


[[#toc]]
[[#toc]]
== Reference ==
* For all other monitoring, see the following section:
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Monitoring_the_Stage_Environment
* OPs and Infra
** https://github.com/mozilla-services/puppet-config/issues
** https://github.com/mozilla-services/svcops/issues
* Loads Tool and Cluster
** http://loads.services.mozilla.com
** https://loads.readthedocs.org/en/latest
** https://github.com/mozilla-services/loads/issues
** https://github.com/mozilla-services/loads-web/issues
** https://github.com/mozilla-services/loads-aws/issues
[[#toc]]




= Reference =
= Reference =
== Load Test Tool Client/Host ==
* It is always best to configure an AWS instance as the host for all load testing.
* All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following links for more information:
** https://wiki.mozilla.org/QA/Services/LoadsV1ClientTestHost
** https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
* Installation:
git clone https://github.com/mozilla-services/loop-server.git
cd loop-server
Note: You may want to install a specific branch for testing vs defaulting to Master
npm install
ulimit -S -n 2048
npm test *
cd loadtests
make build
make test
[[#toc]]


== Monitoring ==
* [[Loop/Test/Monitoring]]


= OLD Notes =
Coverage report can be found here:
/loop-server/coverage/lcov-report/index.html
* This step requires the redis server to be installed and running:
Mac:
brew install redis
redis-server /usr/local/etc/redis.conf
Ubuntu Linux:
sudo apt-get install redis-server
sudo /usr/bin/redis-server /etc/redis/redis.conf
sudo tail -f /var/log/redis/redis-server.log
RHEL Linux:
Install redis from here: http://download.redis.io/releases
then
/usr/local/bin/redis-server /home/ec2-user/redis-2.8.9/redis.conf
or similar
* Note:
** This will install a local copy of the Loads tool for use with the Loop-Server.


== Ops / Infra ==
* https://github.com/mozilla-services/puppet-config/issues
* https://github.com/mozilla-services/svcops/issues


** To hit the partner test servers, the following configuration file will need to be updated by OPs:
** /data/loop-server/config/settings.json
** Talk to OPs to toggle that configuration file and restart the Loop-Server in Stage.


* REFs:
== Loads Tool and Cluster ==
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Server_Stage_Environment
* http://loads.services.mozilla.com
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Server_Stage_Details
* https://loads.readthedocs.org/
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Server_Configuration
* https://github.com/mozilla-services/loads/issues
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Mock_Server_Stage_Details
* https://github.com/mozilla-services/loads-web/issues
* https://wiki.mozilla.org/QA/Services/LoadsV1ClientTestHost
* https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1




[[#toc]]
[[#toc]]
== Using the Loads V1 Services Cluster for the Loop-Server in Stage ==
* By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
* Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
* Stage environment:
$ make megabench SERVER_URL=https://loop.stage.mozaws.net
* By default, the Loop-Server in Stage is configured to talk to our mock server:
** https://loop-delayed-response.stage.mozaws.net/
* To hit the partner test servers, the following configuration file will need to be updated by OPs:
** /data/loop-server/config/settings.json
* Talk to OPs to toggle that configuration file and restart the Loop-Server in Stage.
* REFs:
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Server_Stage_Environment
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Server_Stage_Details
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Server_Configuration
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#Loop_Mock_Server_Stage_Details
** https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
** https://github.com/mozilla/browserid-verifier/tree/master/loadtest
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#MSISDN_Gateway_Server_Stage_Details
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#MSISDN_Mock_Server_Stage_Details
== Using the Loads V1 Services Cluster for the MSISDN-Gateway ==
* By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
* Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
* Stage environment:
$ make megabench SERVER_URL=https://msisdn.stage.mozaws.net
* This environment also contains its own mock server: http://omxen.dev.mozaws.net
* The configuration file on the Stage server: /data/msisdn-gateway/config/production.json
* REFs
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#MSISDN_Gateway_Server_Stage_Details
** https://wiki.mozilla.org/QA/Services/LoopTestEnvironments#MSISDN_Mock_Server_Stage_Details
** https://wiki.mozilla.org/QA/Services/LoadsToolsAndTesting1
** https://github.com/mozilla/browserid-verifier/tree/master/loadtest
== Configuring The Load Tests ==
* Makefile
** The SERVER_URL constant can be changed.
* Config files
** For make test (Loop-Server and MSISDN-Gateway):
*** Number of hits
*** Number of concurrent users
** For make bench (Loop-Server and MSISDN-Gateway):
*** Number of concurrent users
*** Duration of test
** For make megabench (Loop-Server and MSISDN-Gateway):
*** Number of concurrent users
*** Duration of test
*** Include file (this is code dependent)
*** Python dependencies (this is code dependent)
*** Broker to use for testing (leaves as defined for now - this is broker in the Loads Cluster)
*** Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
*** Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
*** Observer (this can be email or irc - the default is irc #services-dev channel)
* Loop-Server load test code
** The Loop-Server load test can not currently be configured in the code
* MSISDN-Gateway load test code
** The MSISDN-Gateway load test can be configured in the code - see the following lines:
** Test server URL: https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L15
** Error percentages: https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L19-L22
* General REFs:
** https://github.com/mozilla-services/loop-server/blob/master/loadtests/loadtest.py
** https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py
== Test Coverage and Stats ==
* Basic tweakable values for all load tests
** users = number of concurrent users/agent
** agents = number of agents out of the cluster, otherwise errors out
** duration = in seconds
** hits = 1 or X number of rounds/hits/iterations
== Analyzing the Results ==
* There are several methods and tools for analyzing the load test results.
* 1. Using the Loads Services Cluster dashboard
** All loads tests using this cluster generate a live report and a run report available on this site:
*** http://loads.services.mozilla.com
== Debugging the Issues ==
* There are several methods and tools for debugging the load test errors and other issues.
* 1. Important logs for Loop-Server (per server)
** /var/log/circus.log
** /var/log/loop_err.log
** /var/log/loop_out.log
** /var/log/hekad/loop.stdout.log
** /var/log/hekad/loop.stderr.log
** /var/log/nginx/access.log
** /var/log/nginx/error.log
* 2. Important logs for MSISDN-Gateway (per server)
** TBD
* Acceptable/Unacceptable Loop-Server errors:
hekad loop.stderr.log
The following are acceptable:
Decoder 'LoopServer-LoopServerDecoder' error: Failed parsing
Plugin 'AggregatorOutput' error: writing to heka.shared....
nginx logs:
The 200s (good stuff) and 101s (websockets) are acceptable.
The 499s are an artifact of the current load testing tool (V1).
You should only see them at the end of the load test.
Right now, we are getting a lot of 404s and 307s.
They all appear to be caused by bots. There is a bug open about this.
Any percentage of 405s, 502s, or 503s is not acceptable.
Application logs:
Right now, we are getting a lot of 404s and 307s.
They all appear to be caused by bots. There is a bug open about this.
See the loop_server.out.log file.
/var/log/loop_err.log
The following are acceptable: connect: res.on("header"): use on-headers module directly
In the Loads Cluster dashboard, watch for the following errors/failures:
string indices must be integers
OR
No JSON object could be decoded
OR
'hawk-session-token'
* Acceptable/Unacceptable MSISDN-Gateway errors:
The updated load test does generate a certain percentage of errors:
https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L19-L22
So, expect to see a predefined percentage of 204s and 400s, along with the usual 200s in the nginx access logs.
The msisdn-gateway app logs should be clean with just msisdn and test data.
Confirmed users
487

edits