Changes

Jump to: navigation, search

TestEngineering/Performance/Talos/Tests

595 bytes added, 19:00, 9 August 2019
no edit summary
= Talos At a glance =* Tests =are defined in [https://searchfox.org/mozilla-central/source/testing/talos/talos/test.py testing/talos/talos/test.py]* Treeherder abbreviations are defined in [https://searchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml taskcluster/ci/test/talos.yml]* Suites are defined for production in [https://searchfox.org/mozilla-central/source/testing/talos/talos.json testing/talos/talos.json]
== Where to get this information =Test lifecycle =* Talos tests are defined in Taskcluster schedules [https://dxrsearchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml talos jobs]* Taskcluster runs a Talos job on a hardware machine when one is available - this is bootstrapped by [https://searchfox.org/mozilla-central/source/testing/talosmozharness/mozharness/mozilla/testing/talos/test.py test.pymozharness]* Treeherder abbreviations are defined in * [https://dxr.mozillasearchfox.org/mozilla-central/source/taskclustertesting/mozharness/mozharness/cimozilla/testtesting/talos.yml#32 py mozharness downloads the build, talos.yml]* Talos suites are configured for production zip (found in [httpshttp://dxrhg.mozilla.org/mozilla-central/sourcetip/testing/talos/talos.json talos.json]; these names are mapped ), and creates a virtualenv for running the test.** mozharness [[TestEngineering/Performance/Talos/Running#How_Talos_is_Run_in_Production|configures the test and runs it]]** After the test is completed the data is uploaded to [https://treeherder.mozilla.org/perf.html#/graphs Perfherder]* Treeherder via regexes: displays a green (all OK) status and has a link to [https://githubtreeherder.mozilla.org/perf.comhtml#/graphs Perfherder]* 13 pushes later, [http://hg.mozilla.org/treeherdergraphs/blobfile/mastertip/treeherderserver/etlanalysis/buildbotanalyze_talos.py analyze_talos.py] is ran which compares your push to the previous 12 pushes and next 12 pushes to look for a [[TestEngineering/Performance/Talos/Data#L565 buildbotRegressions|regression]]** If a regression is found, it will be posted on [https://treeherder.pymozilla.org/perf.html#/alerts Perfherder Alerts]
== Talos Test Types =types =
There are two different species of Talos tests:
* [[#Startup Tests]] : start Start up the browser and wait for either the load event or the paint event and exit, measuring the time* [[#Page Load Tests]load] : load Load a manifest of pages
In addition we have some variations on existing tests:
* [[#Heavy Tests]]: Run tests with the heavy user profile instead of a blank one* [[#Web Extension Testsextension]]: Run tests with a web extension to see the perf impact extension have
Some tests measure different things:
* [[#Paint Tests]]: These measure events from the browser like moz_after_paint, etc.* [[#ASAP Tests]]: These tests go really fast and typically measure how many frames we can render in a time window
* [[#Benchmarks]]: These are benchmarks that measure specific items and report a summarized score
=== Startup Tests ===
[https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test Startup tests] launch Firefox and measure the time to the onload or paint events. We run this in a series of cycles (default to 20) to generate a full set of data. Tests that currently are startup tests are:
* [[#ts_paint|ts_paint]]* [[#tpaint|tpaint]]* [[#tresize|tresize]]* [[#sessionrestore.2Fsessionrestore_no_auto_restore.2Fsessionrestore_many_windows|sessionrestore]]* [[_no_auto_restore/_many_windows#sessionrestore_no_auto_restore]]* [[#sessionrestore_many_windows]]
=== Page Load Tests =load ==
Many of the talos tests use the page loader to load a manifest of pages.
These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:
</pre>
=== Heavy Tests ===
All our testing is done with empty blank profiles, this is not ideal for finding issues for end users. We recently undertook a task to create a daily update to a profile so it is modern and relevant. It browses a variety of web pages, and have history and cache to give us a more realistic scenario.
Currently we have issues with this on windows (takes too long to unpack the files from the profile), so we have turned this off there. Our goal is to run this on basic pageload and startup tests.
=== Web Extension Tests =extension ==
Web Extensions are what Firefox has switched to and there are different code paths and APIs used vs addons. Historically we don't test with addons (other than our test addons) and are missing out on common slowdowns. In 2017 we started running some startup and basic pageload tests with a web extension in the profile ({{bug|1398974}}). We have updated the Extension to be more real world and will continue to do that.
=== Paint Tests ===
Paint tests are measuring the time to receive both the [https://developer.mozilla.org/en-US/docs/Web/Events/MozAfterPaint MozAfterPaint] and OnLoad event instead of just the OnLoad event. Most tests now look for this unless they are an ASAP test, or an internal benchmark
=== ASAP Tests ===
We have a variety of tests which we now run in ASAP mode where we render as fast as possible (disabling vsync and letting the rendering iterate as fast as it can using `requestAnimationFrame`). In fact we have replaced some original tests with the 'x' versions to make them measure. We do this with RequestAnimationFrame().
ASAP Tests tests are:* [[#Basic_Compositor_Video|basic_compositor_video]]* [[#displaylist_mutate|displaylist_mutate]]* [[#glterrain|glterrain]]* [[#rasterflood_svg|rasterflood_svg]]* [[#rasterflood_gradient|rasterflood_gradient]]* [[#tsvg.2C_tsvgx|svgxtsvgx]]* [[#tscroll.2C_tscrollx|tscrollx]]* [[#tp5o_scroll|tp5o_scroll]]* [[#tabswitch|tabswitch]]* [[#TART|tart]]
=== Benchmarks ===
Many tests have internal benchmarks which we report as accurately as possible. These are the exceptions to the general rule of calculating the suite score as a geometric mean of the subtest values (which are median values of the raw data from the subtests).
Tests which are imported benchmarks are:
* [[#ares6|ARES-6]]* [[#JSS.2FDomaeo_Tests|Dromaeo]]* [[#jetstream|JetStream]]* [[#kraken|Kraken]]* [[#motionmark|MotionMark]]* [[#speedometer|speedometer]]* [[#stylebench|stylebench]] === Row Major vs. Column Major ===
== Row major vs. column major ==
To get more stable numbers, tests are run multiple times. There are two ways that we do this: row major and column major. Row major means each test is run multiple times and then we move to the next test (and run it multiple times). Column major means that each test is run once one after the other and then the whole sequence of tests is run again.
More background information about these approaches can be found in Joel Maher's [https://elvis314.wordpress.com/2012/03/12/reducing-the-noise-in-talos/ Reducing the Noise in Talos] blog post.
=== Page Sets =sets ==
We run our tests 100% offline, but serve pages via a webserver. Knowing this we need to store and make available the offline pages we use for testing.
==== tp5pages ====
Some tests make use of a set of 50 "real world" pages, known as the tp5n set. These pages are not part of the talos repository, but without them the tests which use them won't run.
* To add these pages to your local setup, download [https://github.com/rwood-moz/talos-pagesets/tp5n.zip tp5n.zip], and extract it such that `'''tp5n'''` ends up as `testing/talos/talos/tests/'''tp5n'''`.
* see also [[#tp5|tp5 test]].
==== tp6 ====
the tp6 pageset archives are also stored in their raw html (non-mitmdump archive) format on [https://github.com/rwood-moz/talos-pagesets github here]. If you wish to debug with the pagesets outside of mitmproxy then just clone that repo and you'll find them in the /talos-pagesets/tp6 folder.
== Talos Test Descriptions =definitions =
'''Please keep these in alphabetical order'''
=== a11y ===
* contact: :surkov
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/a11y a11y.manifest]
|}
=== about-preferences ===
* contact: :jaws
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/about-preferences/about_preferences_basic.manifest]
|}
=== ARES-6 ===
* contact: :jandem
* source: [https://searchfox.org/mozilla-central/source/third_party/webkit/PerformanceTests/ARES-6 ARES-6]
* unit: geometric mean / benchmark score
=== Basic Compositor Video =compositor video ==
* contact: :davidb
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/video video]
|}
=== perf-reftest ===
* contact: :bholley
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/perf-reftest perf-reftest]
|}
=== perf-reftest-singletons ===
* contact: :bholley
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/perf-reftest-singletons perf-reftest-singletons]
|}
=== cpstartup ===
* contact: :mconley
* measuring: Time from opening a new tab (which creates a new content process) to having that new content process be ready to load URLs.
|}
=== DAMP === 
* contact: :ochameau
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/devtools damp]
|}
=== displaylist_mutate ===
* contact: :mattwoodrow
* source: [[https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/layout/benchmarks/displaylist_mutate.html displaylist_mutate.html]]
This measures the amount of time it takes to render a page after changing its display list. The page has a large number of display list items (10,000), and mutates one every frame. The goal of the test is to make displaylist construction a bottleneck, rather than painting or other factors, and thus improvements or regressions to displaylist construction will be visible. The test runs in ASAP mode to maximize framerate, and the result is how quickly the test was able to mutate and re-paint 600 items, one during each frame.
=== Dromaeo Tests === 
Dromaeo suite of tests for JavaScript performance testing. See the [[Dromaeo|Dromaeo wiki]] for more information.
Each sub-suite is divided into tests, and each test is divided into sub-tests. Each sub-test takes some (in theory) fixed piece of work and measures how many times that piece of work can be performed in one second. The score for a test is then the geometric mean of the runs/second numbers for its sub-tests. The score for a sub-suite is the geometric mean of the scores for its tests.
==== Dromaeo CSS ====
* contact: :bz
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/dromaeo css.manifest]
|}
==== Dromaeo DOM (Linux64 only) ====
* contact: :bz
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/dromaeo dom.manifest]
Each page in the manifest is part of the dromaeo dom benchmark. These are the specific areas that Dromaeo DOM covers:
===== DOM Attributes =====
Measures performance of getting and setting a DOM attribute, both via <code>getAttribute</code> and via a reflecting DOM property. Also throws in some expando getting/setting for good measure.
===== DOM Modification =====
Measures performance of various things that modify the DOM tree: creating element and text nodes and inserting them into the DOM.
===== DOM Query =====
Measures performance of various methods of looking for nodes in the DOM: <code>getElementById</code>, <code>getElementsByTagName</code>, and so forth.
===== DOM Traversal =====
Measures performance of various accessors (<code>childNodes</code>, <code>firstChild</code>, etc) that would be used when doing a walk over the DOM tree.
Please see [[#Dromaeo CSS ]] for examples of data.
=== glterrain ===
* contact: :jgilbert
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/webgl/benchmarks/terrain glterrain]
|}
=== glvideo ===
* contact: :jgilbert
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/webgl/benchmarks/video glvideo]
This test playbacks a video file and ask WebGL to draw video frames as WebGL textures for 100 ticks. It collects the mean tick time across 100 ticks to measure how much time it will spend for a video texture upload to be a WebGL texture (gl.texImage2D). We run it for 5 times and ignore the first found. Lower results are better.
=== jetstream ===
* contact: :jandem
* source: [[https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/jetstream/jetstream.manifest jetstream.manifest]] and jetstream.zip from tooltool
This is the [http://browserbench.org/JetStream/in-depth.html JetStream] javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.
=== kraken ===
* contact: :sdetar
* source: [[https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/kraken kraken.manifest]]
|}
=== motionmark ===
* contact: :davidb
* source: [[https://searchfox.org/mozilla-central/source/third_party/webkit/PerformanceTests/MotionMark source]] [[https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/motionmark manifests]]
** suite: we take a geometric mean of all the subtests (9 for animometer, 11 for html suite)
=== pdfpaint Tests === 
* contact: :bdahl
* source:
|}
=== rasterflood_svg ===
* contact: :rhunt
* source: [https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/gfx/benchmarks/rasterflood_svg.html rasterflood_svg.html]
Improvements (or regressions) to general painting performance or SVG are likely to affect this benchmark.
=== rasterflood_gradient ===
* contact: :rhunt
* source: [https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/gfx/benchmarks/rasterflood_gradient.html rasterflood_gradient.html]
The test runs for 10 seconds, and the resulting score is how many frames we were able to render during that time. Higher is better. Improvements (or regressions) to general painting performance or gradient rendering will affect this benchmark.
=== sessionrestore/sessionrestore_no_auto_restore/sessionrestore_many_windows ===
* contact: :mikedeboer, :mconley, :felipe
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/sessionrestore talos/sessionrestore]
|}
== sessionrestore_no_auto_restore ==See [[#sessionrestore]]. == sessionrestore_many_windows ==See [[#sessionrestore]]. == speedometer ===
* contact: :selena
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/speedometer/speedometer.manifest speedometer.manifest]
|}
=== stylebench ===
* contact: :emilio
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/stylebench/stylebench.manifest stylebench.manifest]
* reporting: runs/minute score
=== startup_about_home_paint ===
* contact: :mconley
* source: [[https://hg.mozilla.org/mozilla-central/file/tip/testing/talos/talos/startup_test/startup_about_home_paint/addon/]]
|}
=== tabpaint ===
* contact: :mconley
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/tabpaint tabpaint]
** suite: geometric_mean(subtests)
=== TART =tart ==
* contact: :mconley
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/tart tart]
|}
=== tp5 ===
* contact: :jmaher
* source: [[#page_sets|tp5n.zip]]
|}
==== File IO ====
===== Possible regression causes =====
* '''nonmain_startup_fileio opt (with or without e10s) windows7-32''' – {{bug|1274018}} This test seems to consistently report a higher result for mozilla-central compared to Try even for an identical revision due to extension signing checks. In other words, if you are comparing Try and Mozilla-Central you may see a false-positive regression on perfherder. Graphs: [https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=%5Bmozilla-central,e5f5eaa174ef22fdd6b6e150e8c450aa827c2ff6,1,1%5D&series=%5Btry,e5f5eaa174ef22fdd6b6e150e8c450aa827c2ff6,1,1%5D non-e10s] [https://treeherder.mozilla.org/perf.html#/graphs?series=%5B%22mozilla-central%22,%222f3af3833d55ff371ecf01c41aeee1939ef3a782%22,1,1%5D&series=%5B%22try%22,%222f3af3833d55ff371ecf01c41aeee1939ef3a782%22,1,1%5D&timerange=604800 e10s]
==== Xres (X Resource Monitoring) ==== 
A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux only.
[https://linux.die.net/man/3/xres xres man page].
==== % CPU ==== 
Cpu usage tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only.
==== Responsiveness ====
contact: :jimm, :overholt
</pre>
=== tp5o_scroll ===
* contact: :kats
* source: [[#page_sets|tp5n.zip]]
|}
==== Possible regression causes ====
Some examples of things that cause regressions in this test are:
* Increased displayport size (which causes a larger display list to be built)
* Slowdown in rasterization of content
=== tp6 ===
* contact: :rwood, :jmaher
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/quantum_pageload tp6]
|}
==== Recording a tp6 test page ====
Test pages used for tp6 are mitmproxy recordings that are played back during the tp6 test (and ultimately loaded in Firefox via the local proxy). Each test page is a separate mitmproxy recording (*.mp) file, however all recordings for the tp6 suite are archived in a single zip file on tooltool. When tp6 is run, talos automatically downloads the mitmproxy recording archive for use during the test.
* Select "No proxy" and click the "OK" button
=== tpaint ===
* contact: :davidb
* source: [[https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tpaint.html tpaint-window.html]]
|}
==== Possible regression causes ====
* None listed yet. If you fix a regression for this test and have some tips to share, this is a good place for them.
=== twinopen (twinopen ext+twinopen:twinopen.html) ===
* contact: :bdahl, :jimm, :jmaher
* source: [[https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/twinopen twinopen]]
|}
==== Possible regression causes ====* None listed yet. If you fix a regression for this test and have some tips to share, this is a good place for them. === tabswitch ===
* contact: :mconley
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/tabswitch tabswitch]
|}
=== tresize ===
* contact: :jimm
* source: [[https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tresize/addon/content/tresize-test.html tresize-test.html]]
|}
==== Possible regression causes ====
* slowdown in the paint pipeline
* resizes also trigger a rendering flush so bugs in the flushing code can manifest as regressions
* introduction of more spurious MozAfterPaint events - see {{bug|1471961}}
=== ts_paint === 
* contact: :davidb
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tspaint_test.html tspaint_test.html]
|}
==== Possible regression causes ====
* (and/or maybe tpaint?) will regress if a new <panel> element is added to the browser window (e.g. browser.xul) and it's frame gets created. Fix this by ensuring it's display:none by default.
=== tscrollx ===
* contact: :jrmuizel
* source: [[https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/scroll scroll.manifest]]
|[https://groups.google.com/d/topic/mozilla.dev.platform/RICw5SJhNMo/discussion Replacing tscroll,tsvg with tscrollx,tsvgx]
|}
 
This test scrolls several pages where each represent a different known "hard" case to scroll (* needinfo), and measures the average frames interval (1/FPS) on each. The ASAP test (tscrollx) iterates in unlimited frame-rate mode thus reflecting the maximum scroll throughput per page. To turn on ASAP mode, we set these preferences:
|}
=== tsvgr_opacity ===
* contact: :jwatt, :dholbert
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/svg_opacity/svg_opacity.manifest]
|}
=== tsvg_static ===
* contact: :jwatt, :dholbert, :neerja
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/svg_static/ svg_static]
|}
=== tsvgx ===
* contact: :jwatt, :dholbert
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/svgx svgx]
|}
==== Possible regression causes ====
* Did you change the dimensions of the content area? Even a little? The tsvgx test seems to be sensitive to changes like this. See {{bug|1375479}}, for example. Usually, these sorts of "regressions" aren't real regressions - they just mean that we need to re-baseline our expectations from the test.
=== xperf ===
* contact: :aklotz, :jmaher
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/xtalos xperf instrumentation]
* [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/xperf.config#l11 'FileRead', 'FileWrite', 'FileFlush']
== Build Metrics =metrics
These are not part of the Talos code, but like Talos they are benchmarks that record data using the graphserver and are analyzed by the same scripts for regressions.
=== Number of Constructors constructors (num_ctors) === 
This test runs at build time and measures the number of static initializers in the compiled code. Reducing this number is helpful for [https://blog.mozilla.org/tglek/2010/05/27/startup-backward-constructors/ startup optimizations].
** these are run for linux 32+64 opt and pgo builds.
== Platform Microbenchmark Tests == === IsASCII and IsUTF8 gtest microbenchmarks ==microbenchmark =
== IsASCII and IsUTF8 gtest microbenchmarks ==
* contact: :hsivonen
* source: [https://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestStrings.cpp]
Test whose name starts with PerfIsUTF8 test the performance of the XPCOM string IsUTF8 function with ASCII inputs if different lengths.
==== Possible regression causes ====
* The --enable-rust-simd accidentally getting turned off in automation.
* Changes to encoding_rs internals.
* LLVM optimizations regressing between updates to the copy of LLVM included in the Rust compiler.
 === Microbench ===
* contact: :bholley
* source: [[https://dxr.mozilla.org/mozilla-central/source/testing/gtest/mozilla/MozGTestBench.cpp MozGTestBench.cpp]]
* summarization: Not a Talos test. This suite is provides a way to add low level platform performance regression tests for things that are not suited to be tested by Talos. See the [[[TestEngineering/Performance/Talos/Sheriffing#Microbench_Policy|Microbench Sheriffing Policy]]] for some notes on how to treat regressions.
=== PerfStrip Tests === 
* contact: :davidb
* source: https://dxr.mozilla.org/mozilla-central/source/xpcom/tests/gtest/TestStrings.cpp
PerfStripCharsCRLF() - call StripChars("\r\n") on 5 different test cases 20k times (each)
=== Stylo gtest microbenchmarks === 
* contact: :bholley, :SimonSapin
* source: [https://dxr.mozilla.org/mozilla-central/source/layout/style/test/gtest]
* data: each test is run and measured 5 times
* summarization: take the [[TestEngineering/Performance/Talos/Data#median|median]] of the 5 data points; [https://dxr.mozilla.org/mozilla-central/source/testing/gtest/mozilla/MozGTestBench.cpp#43-46 source: MozGTestBench.cpp]
 
Servo_StyleSheet_FromUTF8Bytes_Bench parses a sample stylesheet 20 times with Stylo’s CSS parser that is written in Rust. It starts from an in-memory UTF-8 string, so that I/O or UTF-16-to-UTF-8 conversion is not measured.
Confirm
2,177
edits

Navigation menu