Buildbot/Talos/DataFormat: Difference between revisions

Jump to navigation Jump to search
m
redirect
m (update talos links)
m (redirect)
Line 1: Line 1:
= Talos Data =
old page, please visit [https://wiki.mozilla.org/Buildbot/Talos/Data Data]
Raw data is generated by Talos.  We apply some filters to summarize and reduce the data, then we post it to a server:
* Graphserver
* Perfherder
 
== Terminology ==
=== Job ===
When a build is completed we run a series of test (unittest and performance "talos") jobs.  Each job reserves a machine for itself, the runs the script which sets up, installs, executes the test, generates the results, and cleans up after itself.  In general we try to ensure the jobs complete in 30 minutes or less.
 
For Talos we have a series of jobs and each job runs 1 or more tests (with the exception of tp and xperf, jobs run 2-4 suites at a time).  For the purposes of discussing data we will refer to each test as a suite.  A suite would be something like 'ts_paint', 'Canvasmark', or 'tp5'.  Each Suite will run it's respective subtests, and summarize itself properly.  When all the suites in a job have completed, the results will be output (uploaded in some cases) and we will be able to look for regressions, view data in a graph, and query for the summarized data.
 
=== Suite ===
A collection of subtests which run.  The subtest results are summarized to the suite level.  Often these are referred to as 'tests'.  Some examples are "tresize", "TART", "tp5", "ts_paint".
* in graph server this is the lowest level of granularity available in the UI
* in Perfherder a suite is referenced as a 'Summary' (e.g "tp5o summary opt")
 
=== Subtest ===
A specific test (usually a webpage to load) which we collect replicates (numbers) from.  Typically we run many cycles of each subtest to build up a representative collection of replicates to make sure the data is meaningful.
* in graph server Talos upload a single number for each subtest, the replicates are summarized by Talos prior to uploading.
* in Perfherder the subtest data is preserved as raw replicates as well as summarized by Talos.  We use the summarizations when showing a graph.
 
=== Replicates ===
Replicates refer to the single numbers or data points we collect while executing a talos test.  In this regard, we collect a series of numbers (usually 20 or more) for each subtest.  Each of these 20+ numbers are called replicates.
 
We do filtering on the replicates, mainly because the first few replicates are not a representative sample of the remaining replicates we collect.  The one exception would be [https://wiki.mozilla.org/Buildbot/Talos/Tests#Internal_Benchmarks internal benchmarks] (generally suites which measure something other than time).  For Benchmarks, there is usually a special formula applied to the replicates.
 
== Subtest Filters ==
We have a variety of [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py filters] defined for Talos.  I will explain what each filter is, and you can see the exact settings used for each filter by looking at the individual [https://wiki.mozilla.org/index.php?title=Buildbot/Talos/Tests tests].
 
=== ignore_first ===
This filter ignores the first 'X' replicates allowing us to ignore warmup runs.
* input: an array of subtest replicates
* returns: an array of replicates
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l127 filter.py]
* used in most tests with X=1,2,5 (5 is the normal case)
 
=== median ===
This filter takes in an array of replicates and returns the median of the replicates (a single value).
* input: an array of subtest replicates
* returns: a single value
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l58 filter.py]
* used in most tests
 
=== mean ===
This filter takes in an array of replicates and returns the mean value of the replicates (a single value).
* input: an array of subtest replicates
* returns: a single value
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l50 filter.py]
* used in kraken for subtests
 
=== dromaeo ===
This filter is a specific filter defined by dromaeo and respects the replicates as every 5 replicates represents a different metric being measured.
* input: an array of dromaeo (DOM|CSS) subtest replicates
* returns: a single number (geometric_mean of the metric sumarization)
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l92 filter.py]
* used in dromaeo_dom and dromaeo_css to build a single value for the subtests
 
=== v8_subtest ===
* input: an array of v8_7 subtest replicates
* returns: a single value representing the benchmark weighted score for the subtest
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l168 filter.py]
* used in v8_7 for the subtests
 
NOTE: this deviates from the exact definition of v8 as we retain the Encrypt and Decrypt as subtests (instead of combining them into Crypto) as well as keeping Earley and Boyer (instead of combining them into EarleyBoyer).  There is a slight tweak in the final suite score, but it is <1% different.
 
== Suite Summarization Filters ==
Once we have a single number from each of the subtests, we need to generate a single number for the suite.  There are 4 specific calculations used.
 
=== geometric_mean ===
This is a standard geometric mean of the data:
* inputs: array of subtest summarized data points (one point per subtest)
* returns: a single value representing the geometric mean of all the subtests
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/filter.py#l114 filter.py]
* used for most tests
 
=== v8_metric ===
this is a custom metric which take the geometric_mean of the subtests and multiplies it by 100.
* inputs: array of v8 subtest summaries
* returns: a single v8 score
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/output.py#l102 output.py]
* used for v8 version 7 only
 
=== Canvasmark_metric ===
This is the metric used to calculate the Canvasmark score from the subtest summarized results.  Essentially it is a sum of the subtests.
* inputs: array of Canvasmark subtest results
* returns: a single Canvasmark score
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/output.py#l115 output.py]
* used for Canvasmark only
 
=== js_metric ===
This is the metric used to calculate the Kraken score from the subtest summarized results.  Essentially it is a sum of the subtests.
* inputs: array of Kraken subtest results
* returns: a single Kraken score
* source: [http://hg.mozilla.org/build/talos/file/3625fcaa75ea/talos/output.py#l108 output.py]
* used for Kraken only
 
 
== Perfherder ==
Perfherder ingests data from talos by parsing the raw log, then it stores the data in a database while preparing it for regression detection and displaying on graphs.
 
=== Raw Data ===
In the log files, we look for "TALOSDATA: " text followed by a valid json blob.  An example TALOSDATA blob looks like:
<pre>
[{"talos_counters": {}, "results": {"tresize": [23.26174999999999, 22.99621666666672, 22.66563333333331, 23.99620000000002, 22.940849999999948, 22.26951666666664, 22.975350000000006, 24.96453333333337, 23.6878333333334, 23.21740000000001, 24.743699999999976, 23.507333333333282, 22.927800000000033, 22.292066666666653, 23.28364999999999, 23.361950000000004, 22.18191666666666, 22.996466666666684, 23.54029999999997, 22.873883333333342]}, "summary": {"suite": 23.21740000000001, "subtests": {"tresize": {"std": 0.7716690474213389, "min": 22.18191666666666, "max": 24.96453333333337, "median": 23.21740000000001, "filtered": 23.21740000000001, "mean": 23.254913333333334}}}, "test_machine": {"platform": "x86", "osversion": "Ubuntu 12.04", "os": "linux", "name": "talos-linux32-ix-040"}, "testrun": {"date": 1440091515, "suite": "tresize", "options": {"responsiveness": false, "cycles": 20, "tpmozafterpaint": true, "shutdown": false, "rss": false}}, "test_build": {"name": "Firefox", "version": "43.0a1", "id": "20150820095841", "branch": "Mozilla-Inbound-Non-PGO", "revision": "bb85ec539217b9d3a5e83c40538d8565d292e72b"}}, {"talos_counters": {}, "results": {"Plasma - Maths- canvas shapes": [545.0, 572.0, 598.0, 662.0, 588.0], "Asteroids - Shapes- shadows- blending": [748.0, 737.0, 720.0, 742.0, 743.0], "Asteroids - Bitmaps- shapes- text": [1031.0, 1011.0, 913.0, 1063.0, 888.0], "Arena5 - Vectors- shadows- bitmaps- text": [892.0, 738.0, 900.0, 920.0, 806.0], "Asteroids - Vectors": [675.0, 735.0, 659.0, 789.0, 768.0], "3D Rendering - Maths- polygons- image transforms": [306.0, 434.0, 388.0, 426.0, 389.0], "Pixel blur - Math- getImageData- putImageData": [1291.0, 1435.0, 1553.0, 1461.0, 1521.0], "Asteroids - Bitmaps": [435.0, 418.0, 410.0, 403.0, 380.0]}, "summary": {"suite": 6204.0, "subtests": {"Plasma - Maths- canvas shapes": {"std": 34.19064199455752, "min": 572.0, "max": 662.0, "median": 593.0, "filtered": 593.0, "mean": 605.0}, "Asteroids - Shapes- shadows- blending": {"std": 9.233092656309694, "min": 720.0, "max": 743.0, "median": 739.5, "filtered": 739.5, "mean": 735.5}, "Asteroids - Bitmaps- shapes- text": {"std": 71.23333138355947, "min": 888.0, "max": 1063.0, "median": 962.0, "filtered": 962.0, "mean": 968.75}, "Arena5 - Vectors- shadows- bitmaps- text": {"std": 73.40980860893181, "min": 738.0, "max": 920.0, "median": 853.0, "filtered": 853.0, "mean": 841.0}, "Asteroids - Vectors": {"std": 49.37294299512639, "min": 659.0, "max": 789.0, "median": 751.5, "filtered": 751.5, "mean": 737.75}, "3D Rendering - Maths- polygons- image transforms": {"std": 20.94486810653149, "min": 388.0, "max": 434.0, "median": 407.5, "filtered": 407.5, "mean": 409.25}, "Pixel blur - Math- getImageData- putImageData": {"std": 46.82680856090878, "min": 1435.0, "max": 1553.0, "median": 1491.0, "filtered": 1491.0, "mean": 1492.5}, "Asteroids - Bitmaps": {"std": 14.16642156650719, "min": 380.0, "max": 418.0, "median": 406.5, "filtered": 406.5, "mean": 402.75}}}, "test_machine": {"platform": "x86", "osversion": "Ubuntu 12.04", "os": "linux", "name": "talos-linux32-ix-040"}, "testrun": {"date": 1440091515, "suite": "tcanvasmark", "options": {"responsiveness": false, "tpmozafterpaint": false, "tpchrome": true, "tppagecycles": 1, "tpcycles": 5, "tprender": false, "shutdown": false, "cycles": 1, "rss": false}}, "test_build": {"name": "Firefox", "version": "43.0a1", "id": "20150820095841", "branch": "Mozilla-Inbound-Non-PGO", "revision": "bb85ec539217b9d3a5e83c40538d8565d292e72b"}}]
</pre>
 
=== Filtering & Calculations ===
When the raw data comes in, we look for the summary tag in the json. 
<pre>
{"suite": 23.21740000000001, ... }
</pre>
 
In this case we use 23.217 for the value inside of perfherder.  This is the value that will be used for calculating alerts, displaying points on the graph, and for data when comparing two revisions.
 
In all cases there should be a 'subtests' field as well that lists out each page loaded along with a set of values:
<pre>
"subtests": {"tresize": {"std": 0.7716690474213389, "min": 22.18191666666666, "max": 24.96453333333337, "median": 23.21740000000001, "filtered": 23.21740000000001, "mean": 23.254913333333334}
</pre>
 
These values are used in the sub test specific view (not the suite summary).  When viewing a graph, you can switch between different values for each data point to see what the mean, median, etc. are.  This is where we get the fields.  In addition, the default value is the 'filtered' value, this takes into account filters (ignore first 'x' data points, median|mean, etc.) on the raw data so we have summarized data being calculated at a single point.
 
Each suite has the ability to set custom filters and keeping this logic inside of talos ensures that it is always done in a single place, in source code, where developers can easily look and find it.
 
== Graph Server ==
Data is packaged as a file in an HTTP post object.
 
=== VALUES/AVERAGE ===
 
Two different types of data to be sent:
# A single value to be stored as the 'average' in the test_runs table
# A set of (interval, value) pairs to be stored in the test_run_values table, 'average' to be calculated by collector script
 
First type will be called 'AVERAGE' second called 'VALUES'.
All data is formatted using comma separated notation.
 
date_run = seconds since epoch (linux time stamp)
page_name = is unique to pages when combined with the pageset_id from test table
 
* for sending interval, value pairs
START
VALUES
machine_name,test_name,branch_name,ref_changeset,ref_build_id,date_run
interval0,value0,page_name0
interval1,value1,page_name1
...
intervalEND,valueEND,page_id
END
* for sending a single value
  START
  AVERAGE
  machine_name,test_name,branch_name,ref_changeset,ref_build_id,date_run
  value0
  END
 
==== Examples ====
values input:
START
VALUES
machine_1, test_1, branch_1, changeset_1, 13, 1229477017
1,1.0,page_01
2,2.0,page_02
3,3.0,page_03
4,1.0,page_04
5,2.0,page_05
6,3.0,page_06
7,1.0,page_07
8,2.0,page_08
9,3.0,page_09
10,1.0,page_10
11,2.0,page_11
12,3.0,page_12
END
 
response:
Content-type: text/plain
RETURN\ttest_1\tgraph.html#type=series&tests=[{"test":45,"branch":3455,"machine":234,"testrun"=6667}]
RETURN\ttest_1\t2.00\tgraph.html#tests=[{"test":45,"branch":3455,"machine":234}]
 
average input:
START
AVERAGE
machine_1, test_1, branch_1, changeset_1, 13, 1229477017
2.0
END
 
response:
Content-type: text/plain
RETURN\ttest_1\t2.00\tgraph.html#tests=[{"test":45,"branch":3455,"machine":234}]
 
=== browser_output.txt ===
 
The data is harvested from browser_output.txt:
 
__start_tp_report
_x_x_mozilla_page_load,4070.909090909091,NaN,NaN
_x_x_mozilla_page_load_details,avgmedian|4070.909090909091|average|4070.73|minimum|NaN|maximum|NaN|stddev|NaN
|i|pagename|median|mean|min|max|runs|
|0;gearflowers.svg;162.5;163;162;226;226;165;163;162;162
|1;composite-scale.svg;77;77.25;77;115;115;77;77;78;77
|2;composite-scale-opacity.svg;31.5;31.75;30;62;62;31;34;30;32
|3;composite-scale-rotate.svg;31;31;29;60;60;29;32;33;30
|4;composite-scale-rotate-opacity.svg;31;31.5;29;36;35;31;31;29;36
|5;hixie-001.xml;15065;15063.75;15059;15086;15059;15065;15065;15066;15086
|6;hixie-002.xml;15064.5;15060.5;15047;15070;15070;15064;15066;15065;15047
|7;hixie-003.xml;5038;5038.75;5037;5054;5042;5037;5037;5054;5039
|8;hixie-004.xml;5081.5;5081.5;5079;5087;5087;5084;5079;5084;5079
|9;hixie-005.xml;6369.5;6367.5;6349;6405;6362;6405;6377;6382;6349
|10;hixie-006.xml;9270;9276;9239;9342;9239;9325;9278;9342;9262
|11;hixie-007.xml;3623.5;3619.25;3601;3653;3627;3629;3653;3601;3620
__end_tp_report
__start_cc_report
_x_x_mozilla_cycle_collect,1137
__end_cc_report
__startTimestamp1327556458940__endTimestamp
__startBeforeLaunchTimestamp1327556130230__endBeforeLaunchTimestamp
__startAfterTerminationTimestamp1327556459158__endAfterTerminationTimestamp
Confirmed users
3,376

edits

Navigation menu