Buildbot/Talos/Data: Difference between revisions

 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
#REDIRECT [[TestEngineering/Performance/Talos/Data]]
= Talos Data =
= Talos Data =
Raw data is generated by Talos.  We apply some filters to summarize and reduce the data, then we post it to a server:
Raw data is generated by Talos.  We apply some filters to summarize and reduce the data, then we post it to a server:
Line 5: Line 7:


== Terminology ==
== Terminology ==
=== Job ===
=== Job ===
When a build is completed we run a series of test (unittest and performance "talos") jobs.  Each job reserves a machine for itself, then runs the script which sets up, installs, executes the test, generates the results, and cleans up after itself.  In general we try to ensure the jobs complete in 30 minutes or less.
When a build is completed we run a series of test (unittest and performance "talos") jobs.  Each job reserves a machine for itself, then runs the script which sets up, installs, executes the test, generates the results, and cleans up after itself.  In general we try to ensure the jobs complete in 30 minutes or less.


For Talos we have a series of jobs and each job runs 1 or more tests (with the exception of tp and xperf, jobs run 2-4 suites at a time). For the purposes of discussing data we will refer to each test as a suite.  A suite would be something like 'ts_paint', 'Canvasmark', or 'tp5'.  Each Suite will run it's respective subtests, and summarize itself properly.  When all the suites in a job have completed, the results will be output (uploaded in some cases) and we will be able to look for regressions, view data in a graph, and query for the summarized data.
For Talos we have a series of jobs and each job runs 1 or more test suites (with the exception of tp and xperf, jobs run 2-4 suites at a time). A suite would be something like 'ts_paint', 'Canvasmark', or 'tp5'.  Each Suite will run its respective subtests, and provide a summary number which is representative of a meaningful aggregation of the individual subtest results.  When all the suites in a job have completed, the results will be output (uploaded in some cases) and we will be able to look for regressions, view data in a graph, and query for the summarized data.


=== Suite ===
=== Suite ===
A collection of subtests which run. The subtest results are summarized to the suite level.  Often these are referred to as 'tests'. Some examples are "tresize", "TART", "tp5", "ts_paint".
A collection of subtests which run. Often these are referred to as 'tests'. Some examples are "tresize", "TART", "tp5", "ts_paint".
* in graph server this is the lowest level of granularity available in the UI
* in graph server this is the lowest level of granularity available in the UI
* in Perfherder a suite is referenced as a 'Summary' (e.g "tp5o summary opt")
* in Perfherder suite-level results are called a 'summary' (e.g "tp5o summary opt")


=== Subtest ===
=== Subtest ===
A specific test (usually a webpage to load) which we collect replicates (numbers) from.  Typically we run many cycles of each subtest to build up a representative collection of replicates to make sure the data is meaningful.
A specific test (usually a webpage to load) which we collect data points from.  Typically we run many cycles of each subtest to build up a representative collection of data points to make sure the data is meaningful.
* in graph server Talos upload a single number for each subtest, the replicates are summarized by Talos prior to uploading.
* in graph server Talos upload a single number for each subtest, the data points are summarized by Talos prior to uploading.
* in Perfherder the subtest data is preserved as raw replicates as well as summarized by Talos.  We use the summarizations when showing a graph.
* in Perfherder the subtest data is preserved as raw data points as well as summarized by Talos.  We use the summarizations when showing a graph.


=== Replicates ===
=== Data Points (aka Replicates) ===
Replicates refer to the single numbers or data points we collect while executing a talos test.  In this regard, we collect a series of numbers (usually 20 or more) for each subtest.  Each of these 20+ numbers are called replicates.
Data Points refer to the single numbers or replicates we collect while executing a talos test.  In this regard, we collect a series of numbers (usually 20 or more) for each subtest.  Each of these 20+ numbers are called data points.


We do filtering on the replicates, mainly because the first few replicates are not a representative sample of the remaining replicates we collect.  The one exception would be [https://wiki.mozilla.org/Buildbot/Talos/Tests#Internal_Benchmarks internal benchmarks] (generally suites which measure something other than time).  For Benchmarks, there is usually a special formula applied to the replicates.
We do filtering on the data points, mainly because the first few data points are not a representative sample of the remaining data points we collect.  The one exception would be [https://wiki.mozilla.org/Buildbot/Talos/Tests#Internal_Benchmarks internal benchmarks] (generally suites which measure something other than time).  For Benchmarks, there is usually a special formula applied to the data points.


== Subtest Filters ==
== Subtest Filters ==
We have a variety of [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py filters] defined for Talos.  I will explain what each filter is, and you can see the exact settings used for each filter by looking at the individual [https://wiki.mozilla.org/index.php?title=Buildbot/Talos/Tests tests].
We have a variety of [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py filters] defined for Talos.  I will explain what each filter is, and you can see the exact settings used for each filter by looking at the individual [https://wiki.mozilla.org/index.php?title=Buildbot/Talos/Tests tests].


=== ignore_first ===
=== ignore_first ===
This filter ignores the first 'X' replicates allowing us to ignore warmup runs.
This filter ignores the first 'X' data points allowing us to ignore warmup runs.
* input: an array of subtest replicates
* input: an array of subtest data points
* returns: an array of replicates
* returns: an array of data points
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l127 filter.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l127 filter.py]
* used in most tests with X=1,2,5 (5 is the normal case)
* used in most tests with X=1, X=2, and X=5 (5 is the normal case)


=== median ===
=== median ===
This filter takes in an array of replicates and returns the median of the replicates (a single value).
This filter takes in an array of data points and returns the median of the data points (a single value).
* input: an array of subtest replicates
* input: an array of subtest data points
* returns: a single value
* returns: a single value
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l58 filter.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l58 filter.py]
* used in most tests
* used in most tests


=== mean ===
=== mean ===
This filter takes in an array of replicates and returns the mean value of the replicates (a single value).
This filter takes in an array of data points and returns the mean value of the data points (a single value).
* input: an array of subtest replicates
* input: an array of subtest data points
* returns: a single value
* returns: a single value
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l50 filter.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l50 filter.py]
* used in kraken for subtests
* used in kraken for subtests


=== dromaeo ===
=== dromaeo ===
This filter is a specific filter defined by dromaeo and respects the replicates as every 5 replicates represents a different metric being measured.
This filter is a specific filter defined by dromaeo and respects the data points as every 5 data points represents a different metric being measured.
* input: an array of dromaeo (DOM|CSS) subtest replicates
* input: an array of dromaeo (DOM|CSS) subtest data points
* returns: a single number (geometric_mean of the metric sumarization)
* returns: a single number (geometric_mean of the metric summarization)
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l92 filter.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l92 filter.py]
* used in dromaeo_dom and dromaeo_css to build a single value for the subtests
* used in dromaeo_dom and dromaeo_css to build a single value for the subtests


=== v8_subtest ===
=== v8_subtest ===
* input: an array of v8_7 subtest replicates
* input: an array of v8_7 subtest data points
* returns: a single value representing the benchmark weighted score for the subtest
* returns: a single value representing the benchmark weighted score for the subtest (see [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l168 filter.py] for details)
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l168 filter.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l168 filter.py]
* used in v8_7 for the subtests
* used in v8_7 for the subtests


Line 71: Line 74:
* inputs: array of subtest summarized data points (one point per subtest)
* inputs: array of subtest summarized data points (one point per subtest)
* returns: a single value representing the geometric mean of all the subtests
* returns: a single value representing the geometric mean of all the subtests
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l114 filter.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/filter.py#l114 filter.py]
* used for most tests
* used for most tests


Line 78: Line 81:
* inputs: array of v8 subtest summaries
* inputs: array of v8 subtest summaries
* returns: a single v8 score
* returns: a single v8 score
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/output.py#l102 output.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/output.py#l102 output.py]
* used for v8 version 7 only
* used for v8 version 7 only


=== Canvasmark_metric ===
=== Canvasmark_metric ===
This is the metric used to calculate the Canvasmark score from the subtest summarized results.  Essentially it is a sum of the subtests.
This is the metric used to calculate the Canvasmark score from the subtest summarized results.  Essentially it is a sum of the subtests. This is identical to the [https://wiki.mozilla.org/Buildbot/Talos/Data#js_metric js_metric]
* inputs: array of Canvasmark subtest results
* returns: a single Canvasmark score
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/output.py#l115 output.py]
* used for Canvasmark only


=== js_metric ===
=== js_metric ===
Line 92: Line 91:
* inputs: array of Kraken subtest results
* inputs: array of Kraken subtest results
* returns: a single Kraken score
* returns: a single Kraken score
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/output.py#l108 output.py]
* source: [https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/output.py#l108 output.py]
* used for Kraken only
* used for Kraken only


Line 111: Line 110:
</pre>
</pre>


In this case we use 23.217 for the value inside of perfherder.  This is the value that will be used for calculating alerts, displaying points on the graph, and for data when comparing two revisions.
In this case we would use 23.22 for the value inside of perfherder (perfherder rounds to two decimal places).  This is the value that will be used for calculating alerts, displaying points on the graph, and for data when comparing two revisions.


In all cases there should be a 'subtests' field as well that lists out each page loaded along with a set of values:
In all cases there should be a 'subtests' field as well that lists out each page loaded along with a set of values:
Confirmed users
2,206

edits