Buildbot/Talos: Difference between revisions

Revision as of 16:42, 4 December 2012

Talos

Talos is a python performance testing framework that is usable on Windows, Mac and Linux. Talos is our versatile performance testing framework we use at Mozilla. It was created to serve as a test runner for the existing performance tests that Mozilla was running back in 2007 as well as providing an extensible framework for new tests as they were created.

So, why Talos? Talos is the bronze automaton of Greek myth. Talos protected the island of Crete, throwing giant boulders at unwary seamen. He's also purported to have heated himself glowing hot and then embraced his enemies. Basically, he was awesome.

The Code

Talos lives in hg : http://hg.mozilla.org/build/talos

The pageloader extension lives in talos's repository: hg.mozilla.org/build/talos/talos/pageloader

The Test Machines

All talos test are run on a pool of 2.26 Ghz Intel Core 2 Duo, 2Gb 1067 MHz DDR3 mac minis.

The machines are imaged to comply with the Test Reference Platforms.

Order of Operations

For each test a new profile is installed in the browser (either an empty base profile or a profile with an existing places.sqlite in case of the dirty tests). Profiles are not shared across test runs. To initialize the profile an initial open/close is done to the browser. This initial open/close is not included in the test results and is only for configuration purposes.

Regressions

To determine whether a good point is "good" or "bad", we take 20-30 points of historical data, and 5 points of future data. We compare these using a t-test. See https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf#page=74 . Regressions are mailed to the dev-tree-management mailing list. Regressions are calculated by the analyze_talos.py script which uses a configuration file based on http://hg.mozilla.org/graphs/file/tip/server/analysis/analysis.cfg.template

Talos Tests

N = _nochrome; P = _paint	TBPL Abbreviation (TestName)
TestName	Trunk	Aurora	Beta	Release	ESR
[tp5]	tp (tp5n)	tp (tp5n)	tp (tp5n)	tp (tp5row)	tp (tp5)
[tsvg]	s (tsvgr)	s (tsvgr)	s (tsvgr)	s (tsvg)	s (tsvg)
[tsvg_opacity]	s (tsvgr_opacity)	s (tsvgr_opacity)	s (tsvgr_opacity)	s (tsvg_opacity)	s (tsvg_opacity)
[tdhtml]	deactivated	deactivated	deactivated	deactivated	deactivated
[tdhtml_nochrome]	deactivated	deactivated	deactivated	deactivated	deactivated
[a11y]	o (a11yr P)	o (allyr P)	o (allyr P)	c (ally.2 P)	c (ally P)
[ts_paint]	o (ts_paint)	o (ts_paint)	o (ts_paint)	c (ts_paint)	c (ts_paint)
[tpaint] (aka twinopen/txul)	o (tpaint)	o (tpaint)	o (tpaint)	c (tpaint)	c (tpaint)
[dromaeo_css]	d	d	d	dr	dr
[dromaeo_dom]	d	d	d	dr	dr
[tsspider]	deactivated	deactivated	deactivated	c (tsspider.2 P)	c (tsspider P)
[tsspider_nochrome]	deactivated	deactivated	deactivated	n (tsspider.2 NP)	n (tsspider NP)
[xperf]	x (windows only)
[ts_places_generated_med]	p (P)	p (P)	p (P)	di	di
[ts_places_generated_max]	p (P)	p (P)	p (P)	di	di
[tscroll]	o (tscrollr)	o (tscrollr)	o (tscrollr)	c (tscroll.2)	c (tscroll)
[tresize]	c (tresize)	c (tresize)	c (tresize)
[sunspider 0.9.1]	d (sunspider)	d (sunspider)	d (sunspider)
[kraken]	d (kraken)	d (kraken)	d (kraken)
[v8 (version 7)]	d (v8_7)	d (v8_7)	d (v8_7)

Talos Test Types

There are two different species of Talos tests:

#Startup Tests : start up the browser and wait for either the load event or the paint event and exit, measuring the time
#Page Load Tests : load a manifest of pages

Startup Tests

Startup tests launch Firefox and measure the time to the onload or paint events. Firefox is invoked with a URL to:

http://hg.mozilla.org/build/talos/file/tip/talos/startup_test/startup_test.html for the onload event
http://hg.mozilla.org/build/talos/file/tip/talos/startup_test/tspaint_test.html for the paint event

Page Load Tests

Many of the talos tests use the page loader to load a manifest of pages. These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:

http://www.mozilla.org
http://www.mozilla.com

Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/svg/svg.manifest

Manifests may also specify that a test computes its own data by prepending a % in front of the line:

% http://www.mozilla.org
% http://www.mozilla.com

Example: http://hg.mozilla.org/build/talos/file/tip/talos/page_load_test/v8_7/v8.manifest

The file you created should be referenced in your config file, for example, open sample.config, and look for the line referring to the test you want to run:

- name: tp4
url: '-tp page_load_test/tp4.manifest -tpchrome -tpnoisy -tpformat tinderbox -tpcycles 10'

-tp controls the location of your manifest
-tpchrome tells Talos to run the browser with the normal browser UI active
-tpnoisy means "generate lots of output"
-tpformat controls the format of the results, they default to the results we send to displays like graphserver and tbpl.
-tpcycles controls the number of times we run the entire test.

Running Tp the Automation Way

In our automation, we run beneath many more restrictions than normal users. One of those restrictions is that our automation machines are walled off from the real-world networks. Because of this, and because we want to test pure page-loading and rendering time of Firefox, we serve the pages from localhost using Apache thus eliminating all network latency and uncertainty. You've probably noticed this if you looked at the talos/page_load_test/tp4.manifest.

To do this, we construct full downloads of sites in our manifest and they are placed on the automation slave at run time. Because we cannot at this time distribute our full page load test set, I'll walk through how these are set up and show you how to make your own. Note that our next version of the page load set will be distributable, so soon this won't be an issue.

In the meantime, here's the instructions:

Use this script or, you use the following wget command to fetch a page and everything it links to in order to have a complete page for offline use:

 $> wget -p -k -H -E -erobots=off --no-check-certificate -U "Mozilla/5.0 (firefox)" --restrict-file-names=windows --restrict-file-names=nocontrol $URL -o outputlog.txt

Once you have a cache of pages, install Apache:
```
 $> sudo apt-get install apache2
```
Copy your page into the proper location for Apache to serve it. Note that I like to create a page_load_test directory to separate talos from anything else on the webserver. So with Apache defaults, that's something like:
```
 $> mkdir /var/www/page_load_test; cp -R <dir> /var/www/page_load_test/.
```
Now, add the local URL into your manifest:
```
http://localhost/page_load_test/<dir>
```
Run the tp tests as above, pointing the config file at your manifest.

Paint Tests

Paint tests are measuring the time to receive both the MozAfterPaint and OnLoad event instead of just the OnLoad event.

Currently we run _paint tests for these tests:

ts_paint
tpaint
tp5n
sunspider
tdhtml
a11y
tscroll

NoChrome Tests

All tests run through the pageloader extension can be run with or without browser chrome. The tests load the same pages as described above in either case. The majority or tests are run with browser chrome enabled. On mobile (native android builds) we have to run everything as nochrome since we don't support additional xul windows.

The ability to run tests without the browser chrome opens up the ability to further isolate performance regressions.

Currently we run tdhtml as nochrome.

Adding a new test

Everybody wants moar tests, there is a lot that goes into adding a talos test:

file a bug to add appropriate rows to graphserver.
- file an additional IT bug to deploy the sql changes on the production and staging graph servers
- if this is adding new pages, ensure these sql changes include page definitions. NOTE: this one detail is usually forgotten since it is so rare
file a bug to add tests to talos.
create a talos.zip file and file a releng bug to upload it to the build network
create a patch for buildbot to add a definition of this new test and turn it on for the current branch
create a m-c (not inbound) patch to modify testing/talos/talos.json, get it reviewed and landed
create bugs for each time we uplift from mozilla-central->aurora->beta->release->esr to turn on your test
if this is an update to an existing test that could change the numbers, this needs to be treated as a new test and run side by side for a week to get a new baseline for the numbers.
file a bug to get tbpl updated with a new letter to track this test

While that is a laundry list of items to do, if you are developer of a component just talk to the a*team (jhammel or jmaher) and they will handle the majority of the steps above. When adding a new test, we really need to understand what we are doing. Here are some questions that you should know the answer to before adding a new test:

What does this test measure?
Does this test overlap with any existing test?
What is the unit of measurement that we are recording?
What would constitute a regression?
What is the expected range in the results over time?
Are there variables or conditions which would affect this test?
- browser configuration (prefs, environment variables)?
- OS, resources, time of day, etc... ?
Indepenedent of Observation? Will this test produce the same number regardless of what was run before it?
What considerations are there for how this test should be run and what tools are required?

Addon Testing

Talos currently runs a subset of tests against addons. For these tests the descriptions are the same as above (ts, tp) but the given addon is installed before test execution.

We attempt to have the correct preferences set for each individual addon to allow them to run correctly (skipping first-run pages, being enabled) and generate useful numbers. If preferences are missing please file a bug in [Testing/Talos].

All addon performance results are displayed on the [AddonTester waterfall] (including full log and any errors generated). The comparative results are found on the [Slow Performing Addons] mainpage.

Mobile testing

See the talos section of the main Android development page for details on this.

How are the numbers calculated?

To ensure that the base profile is correctly installed for every test the browser is opened/closed once before test execution. This first cold open is excluded for the test result calculation.

For "cold" tests the caches are cleared after this initial open/close - in this way the browser is configured and ready but returned to a "cold" state.

All tests are run with newly installed profiles - profiles are not shared across test runs.

Pageload style tests (tp5, tdhtml, etc)

The overall test number is determined by first calculating the median page load time for each page in the set (excluding the max page load per individual page). The max median from that set is then excluded and the average is taken; that becomes the number reported to the tinderbox waterfall.

Ts style tests (ts, twinopen, ts_cold, etc)

The overall test number is calculated by excluding the max opening time and taking an average of the remaining numbers.

Where are the numbers stored?

The results of every talos test are reported to the Perfomatic graph server. When running locally, you can specify output to a file using the --results_url argument to PerfConfigurator, e.g.

   PerfConfigurator --activeTests tsvg -e `which firefox` -o tsvg.yml --results_url file://${PWD}/tsvg.txt

Talos data formatting

Background Information

Naming convention

't' is pre-pended to the names to represent 'test'. Thus, ts = 'test startup', tp = 'test pageload', tdhtml = 'test dhtml'.

History of tp Tests

tp

The original tp test created by Mozilla to test browser page load time. Cycled through 40 pages. The pages were copied from the live web during November, 2000. Pages were cycled by loading them within the main browser window from a script that lived in content.

tp2/tp_js

The same tp test but loading the individual pages into a frame instead of the main browser window. Still used the old 40 page, year 2000 web page test set.

tp3

An update to both the page set and the method by which pages are cycled. The page set is now 393 pages from December, 2006. The pageloader is re-built as an extension that is pre-loaded into the browser chrome/components directories.

tp4

Updated web page test set to 100 pages from February 2009.

tp4m

This is a smaller pageset (21 pages) designed for mobile Firefox. This is a blend of regular and mobile friendly pages.

We landed on this on April 18th, 2011 in bug 648307. This runs for Android and Maemo mobile builds only.

tp5

Updated web page test set to 100 pages from April 8th, 2011. Effort was made for the pages to no longer be splash screens/login pages/home pages but to be pages that better reflect the actual content of the site in question.

Let's Run The Tests!

I have a patch for Talos, what tests to I run?

If you are making changes to talos obviously running the tests locally will be the first step. The next logical step is to run the tests on Try server (try: -b o -p all -u none -t all).

Testing Locally

Testing locally involves running some subset of the Talos tests on desktop and possibly mobile. Obviously not all permutations of tests and the ways of running them can be tried, so common sense should be used as to what is run. You may also want to run Talos' internal unittests: http://hg.mozilla.org/build/talos/file/tip/tests

You should tailor your choice of tests to pick those that cover what you've changed programmatically, but in general you should probably run at least one startup test and one pageloader test. A good baseline might be:

 # refer to running locally for more details
 talos -n -d -a ts:tsvg -e `which firefox` --develop --datazilla-url output.json --mozAfterPaint

Testing on Try Server

To speed up your development time and everybody else who uses Try server, there is no need to run all tests on all platforms unless this is a major change. Here are some guidelines to follow for testing patches to talos (or dependent modules):

If you patch touches one file and is very minor, get minimal testing on it (1 mobile test, 1 windows test, local testing): try: -b o -p win32,android -u none -t svgr,remote-tsvg
If you patch affects setup of the tests (config or profile) or launching of the process and is very minimal, get minimal testing on it (1 mobile test, 1 windows test, local testing): try: -b o -p win32,android -u none -t svgr,remote-tsvg
If your patch changes a test (add or edit), test that test locally (remember a new test will need graph server changes and buildbot changes)
If your patch changes mobile testing, test all mobile tests and 1 desktop test: (There is no good current way of doing this with try; I recommend two try runs:

try: -b o -p win32 -u none -t svgr

try: -b o -p android -u none -t all

)

If your patch changes results or output processing, run ts, tp5 on mobile, windows and locally: try: -b o -p win32,android -u none -t tpn,chromez,remote-ts,remote-tp4m_nochrome
If your patch changes any import statement (or code referenced in the import), please test on windows, linux, and mobile as these all run different versions of python: try: -b o -p linux,win32,android -u none -t all
If your patch changes a lot of stuff around, if you are not sure, or if you are going to deploy a new talos.zip, it is strongly recommended to run all talos tests: try: -b o -p all -u none -t all

If you are reviewing a talos patch, it is your responsibility to recommend the proper testing approach. If you think it needs more than these guidelines, call it out. If it needs less, call it out also.

Are my numbers ok

The best way to answer this question is to push to try server and compare the reported numbers from the logs (use tbpl as a log parser) and compare that with the [graph server]. I recommend using tbpl to open the link to the graphs.

If you are planning on landing on mozilla-central, look at tests from mozilla-central. Be aware of PGO vs Non PGO and Chrome vs Non Chrome. TBPL makes this a moderately pain free process (i.e. about 30 minutes). This is one of the big problems we are solving with datazilla.

Using try server

If you have access to generate try builds you can also have performance tests run against a custom version of talos. The performance results will be generated on the same machines that generate the talos results for all check-ins on all branches. This involves a few steps:

Run create_talos_zip.py from the root of your talos directory and upload the file somewhere that the build system can find it (e.g. http://people.mozilla.org/~wlachance/talos.zip)
Check out a copy of Mozilla central
Modify the file "testing/talos/talos.json" to point to the copy of talos you uploaded earlier
Push this change to try server using the right syntax (you can use TryChooser to help with this: recommended is to test talos thoroughly, but standard unit tests can be skipped)

A bit more information can be found in this blog post from armenzg: http://armenzg.blogspot.com/2011/12/taloszip-talosjson-and-you.html

Running locally - Source Code

http://hg.mozilla.org/build/talos/archive/tip.tar.gz may be installed in the usual manner for a python package (easy_install, etc). However, `pip` on windows fails due to the pywin32 dependency. See bug 787496

For the majority of the tests, we include test files and tools out of the box. We need to do these things:

clone talos:

hg clone http://hg.mozilla.org/build/talos

run the install script which

cd talos
python INSTALL.py
# (Ignore errors like "fatal error: 'yaml.h' file not found. To get rid of this error message you can download and install PyYAML from http://pyyaml.org/wiki/PyYAML but it is not necessary.

- creates a Virtualenv in the same directory as 'INSTALL.py'
- installs the talos python package into the virtualenv, including its MozBase dependencies via 'python setup.py develop'
- (alternatively, you can perform these steps yourself)
activate the virtualenv:

(on windows):

Scripts\activate.bat

(on osx/linux):

. bin/activate

unpack a copy of firefox somewhere (for this example, we'll use `which firefox` as if firefox was on your path)
setup a webserver if you're not using the '--develop' flag (WE STRONGLY RECOMMEND USING THE --develop FLAG)
- setup apache or similar webserver to have http://localhost -> the talos subdirectory of the talos checkout
- alternatively you can use the --develop flag to PerfConfigurator which configures to use a python webserver, mozhttpd, as shown below

run tests:

talos -n -d --develop --executablePath pathtofirefox --activeTests ts --results_url ts.txt --datazilla-url ts.json --output ts_desktop.yml --mozAfterPaint

- --develop indicates to run in develop mode and to set up a webserver for you
- --executablePath tells Talos where the firefox installation we want to run is located
  - we have pathtofirefox as an example, you can use '~/mozilla/objdir/dist/bin/firefox' or whatever the full path is to your firefox executable that will be running the tests.
- --activeTests is a list of tests we want to run separated by ':'. In this example, we are running the startup test, ts.
- --results_url indicates a HTTP URL to POST results to or a file to append to
- --output is the new config file we want to generate

You can use `talos --help` to get a complete list of options

If you're looking to run remote talos, instructions are at: https://wiki.mozilla.org/Mobile/Fennec/Android#talos

We do not include the tp5 or similar pagesets for legal restrictions.

Talos will refuse to run if you have an open browser.

If you want to load an extension while running Talos, you want a few more command line arguments:

 talos -n -d --executablePath=../firefox/firefox --sampleConfig=sample.config --activeTests=ts --extension=mozmill-1.5.2-sb+tb+fx+sm.xpi --addonID=mozmill@mozilla.com --output=my.config

--extension is the file of the XPI we want to install in the profile
--addonID is the ID of the addon to install

How Talos is Run in Production

buildbot constructs commands to launch PerfConfigurator from a config file: http://hg.mozilla.org/build/buildbot-configs/raw-file/tip/mozilla-tests/config.py ; there are a number of suites, each of which may contain multiple tests

a slave invokes run_tests.py on the generated YAML Talos configuration

Talos will run the tests and measured results which are uploaded to graphserver (after being suitably averaged per-page for Pageloader tests)
- (Talos also uploads raw results to datazilla)

the graphserver performs averaging across the pageset (for Pageloader tests) or across cycles (for Startup tests) and returns a number via HTTP to Talos which is then printed to the log

TBPL receives the name of the suite from buildbot. These are correlated to TBPL letters via http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js . The computed results from graphserver are scraped from the log and displayed when a TBPL suite letter is clicked on in the lower right corner

Running locally - Standalone Talos (DEPRECATED)

Talos, the pageloader, and a distributable web page test for tp has been packaged together into Standalone Talos. Following the given directions you will be able to run all the talos tests on your local machine. The results you collect will not be directly comparable to production talos runs on Mozilla machines, but by testing browser with/without whatever changes you are interested in you should be able to get an initial check on any performance regressions.

StandaloneTalos is deprecated: https://bugzilla.mozilla.org/show_bug.cgi?id=714659 and see https://wiki.mozilla.org/Buildbot/Talos#Running_locally_-_Source_Code for how to run modern Talos

Bugs

Talos bugs are filed under Testing/Talos, such as requests for new tests or repairs to the talos code itself.

Talos bugs that need to be staged and checked in are marked talos-checkin-needed in the whiteboard

Graph server bugs are filed under Webtools/Graph server.

Talos machine maintenance bugs are filed under mozilla.org/Release Engineering, such as bugs having to do with the hardware that talos is run on or requests to run extra talos tests against a given build.

A 2012Q1 effort is to get Talos on mozharness in production. See the tracking bug: https://bugzilla.mozilla.org/show_bug.cgi?id=713055

Other usage of Talos

https://wiki.mozilla.org/Auto-tools/Projects/JetPerf
compare-talos : https://bitbucket.org/mconnor/compare-talos deployed at http://perf.snarkfest.net/compare-talos/

Happy Talos testing!