Changes

Jump to: navigation, search

Auto-tools/Projects/OrangeFactor

3,982 bytes removed, 13:54, 23 May 2017
Update architecture section for bug 1366774 and other cleanup
== Status About ==
The [httphttps://brasstacks.mozilla.com/orangefactor/ OrangeFactor] web app provides is a variety of ways to view data on buildbot orange tool for tracking and analysing intermittent test failures. File bugs under [https:/encountered that occur during Firefox/bugzilla.mozilla.org/enter_bug.cgi?product=Tree+Management&component=OrangeFactor Tree Management::OrangeFactor]gecko continuous integration automation.
We were taking [[Auto-tools/Projects/NB: OrangeFactor/Meetings|minutes is considered near end of meetings]] early on in the project, but since then we are tracking features and bugs in [https://bugzillalife.mozilla.org/buglist.cgi?resolution=---&component=OrangeFactor&product=Tree+Management Bugzilla]A replacement will likely use Treeherder's API as a backend instead.
For an overview, see [http://people.mozilla.com/~mcote/war-on-orange/war-on-orange-paper-testistanbul.pdf this paper] and the accompanying [http://people.mozilla.com/~mcote/war-on-orange/war-on-orange-testistanbul-slides-presentation.pdf slides], prepared for the TestIstanbul 2012 conference.== Contributing ==
=== Contacts ===For help with OrangeFactor, please contact #ateam or :emorley, :gbrown, :jmaher.
For help with Orange Factor, please contact jgriffin, mcote, or emorley, all _at_mozilla_com_File bugs under [https://bugzilla.mozilla.org/enter_bug.cgi?product=Tree+Management&component=OrangeFactor Tree Management::OrangeFactor]. See open bugs [https://bugzilla.mozilla.org/buglist.cgi?resolution=---&component=OrangeFactor&product=Tree+Management here]
== Goals == '''Primary Goal''' To develop a The OrangeFactor web dashboard that is useful for identifying and tracking the state of intermittent oranges in our tinderbox unit tests. This should help developers identify which oranges are most 'interesting', and should give people a notion of the overall state of oranges over time. '''Secondary Goals''' Since the implementation of the dashboard will require tinderbox failures app can be put into a database, we could potentially use this database in the tinderbox+pushlog UI, which would allow it to query data from a (fast) database, rather than parsing buildbot logs as it sometimes currently doesrun locally== History == These projects are deprecated and replaced by See the new War on Orange/OrangeFactor application. '''Topfails''' [httpinstructions at://brasstacks.mozilla.com/topfails/ Topfails] was the first database-driven orange tracker developed in our team. It shows failures in terms of overall occurrences. It suffers from a buggy log parser, and a UI with relatively few views. source: httphttps://hg.mozilla.org/automation/topfails/ '''Old Orange Factor''' [http://jmaher.couchone.com/orange_factor/_design/woo/orange.html Orange Factor] is a newer dashboard by jmaher. It calculates the average number of oranges per push (the 'orange factor'), and tracks that number over time. We're currently using it as a base to explore the usefulness of other statistics. source: http://github.com/jmaher/Orange-Factor [[Auto-tools/Projects/OrangeFactor/InstallationInstructions|Installation instructions]] [https:orangefactor/file/bugzilla.mozilla.orgdefault/buglistREADME.cgi?resolution=---&component=OrangeFactor&product=Tree+Management bugs]txt
== Architecture ==
The system has several moving parts* [https://brasstacks.mozilla.com/ brasstacks.mozilla.com]** SSL termination occurs on the box.** Only listens on port 443, since HTTP->HTTPS redirection performed by Zeus.
* modifications to TBPL [https://brasstacks.mozilla.com/orangefactor/ OrangeFactor UI]:** Static UI that write orange comments to a databaseinteracts with the OrangeFactor API.** Served by [https://openresty.org/ OpenResty].** [https://hg.mozilla.org/automation/orangefactor/file/default/html Source].
* [https://brasstacks.mozilla.com/orangefactor/api/ OrangeFactor REST API]:** Python FastCGI app reverse proxied by OpenResty.** Read-only API apart from job classification submissions from [https://github.com/mozilla/treeherder Treeherder], which use [https://github.com/kumar303/mohawk Hawk authentication].** API responses are a combination of results from OrangeFactor's ES instance and the public hg.mozilla.org pushlog.** ES queries are made using [httphttps://pulsegithub.com/aparo/pyes pyes], plus a helper library we've written on top of it, [https://hg.mozilla.org/ Mozilla Pulseautomation/mozautoeslib/ mozautoeslib] consumer that listens for buildbot messages that are generated when unit tests are finished.** [https://hg.mozilla.org/automation/orangefactor/file/default/server Source].
* a unittest [httpOrangeFactor Elasticsearch instance:** Index `bugs`://hgThe intermittent test failure records submitted by Treeherder via OrangeFactor's REST API.mozilla** Index `bzcache`: A cache of public `keyword:intermittent-failure` bugs populated via a brasstacks cron job.org/automation/logparser/ logparser], that parses buildbot logs, and feeds the resulting data into ElasticSearch
* an OrangeFactor bzcache refresh task:** brasstacks cron job run every four hours, that populates the `bzcache` index on the OrangeFactor Elasticsearch instance of .** Fetches bug data using unauthenticated requests to Bugzilla's REST API.** [httphttps://wwwgithub.elasticsearchcom/jonallengriffin/bzcache/blob/master/bzcache/bz_cache_refresh.org/ ElasticSearchpy Source], which is hosted by the Metrics team, that stores the parsed log data and the TBPL bug data.
* OrangeFactor mailer task:** brasstacks cron job run every week, that emails a summary of top failures to [httphttps://brasstacksgroups.mozillagoogle.com/orangefactorforum/#!forum/ web dashboardmozilla.dev.tree-alerts mozilla.dev.tree-alerts] that pulls .** Fetches data from the database and displays various interesting statistics about itOrangeFactor REST API.** [https://hg.mozilla.org/automation/orangefactor/file/default/woo_mailer.py Source].
'''Note* OrangeFactor bug commenter task:''' The system no longer depends upon flume for writing data into ES. Instead** brasstacks cron job run in both a daily and weekly variant, we use that adds failure summary comments to bugs associated with the intermittent test failures ([https://githubbugzilla.mozilla.comorg/aparo/pyes pyesshow_bug.cgi?id=1365219#c6 example]).** Fetches data from the OrangeFactor REST API.** Posts bug comments using the orangefactor@bots.tld account, plus which has no additional permissions beyond a helper library we've written on top of it, standard user account.** [httphttps://hg.mozilla.org/automation/mozautoesliborangefactor/ mozautoeslibfile/default/woo_commenter.py Source].
=== Development & Deployment === The OrangeFactor web app can be run locally. See the instructions at: * [https://hgactivedata.mozillaallizom.org/automation/orangefactor/file/tip/READMEActiveData] mirror of OrangeFactor Elasticsearch data:** Currently synced manually by :eykle.** Plans to automate this in the future ({{bug|1344253}}).txt
== Making Oranges Interesting ==
* identify overall trends in orange occurrences, already part of the [http://jmaher.couchone.com/orange_factor/_design/woo/orange.html legacy Orange Factor app]; this can help track the 'orangeness' of a product over time, and can help measure the helpfulness of orange-fixing activities
== Dashboard Views Attacking Oranges ==
A list of dashboard views that may be interestinghttps://developer. We're currently using OrangeFactor as a platform to experiment with viewsmozilla.org/en/QA/Fixing_intermittent_oranges
* {{done|display}} of overall orange factor over time* {{done|display}} of failures/day, for a given failure* {{done|display}} of failures/commit/day, for a given failure* {{done|display}} of moving averages of the above* display of failure frequencies which exceed certain limits (probably based on standard deviation)* {{done|display}} of most common failures, in aggregate, and separated by various factors: platform, OS version, architecture, build type, etc* other...?== History ==
== Statistics ==These projects are deprecated and replaced by the new War on Orange/OrangeFactor application.
The amount of information yielded from the parsed logs is vast. The raw data will be noisy and the trends will not be easily discerned. So statistical analysis should be used to manipulate the data and seek insight into trends.'''Topfails'''
[[Auto-toolshttp://Projectsbrasstacks.mozilla.com/OrangeFactortopfails/Statistics|War On Orange Statistics]Topfails]was the first database-driven orange tracker developed in our team. It shows failures in terms of overall occurrences. It suffers from a buggy log parser, and a UI with relatively few views.
== How Tinderbox Stores Its Data ==source: http://hg.mozilla.org/automation/topfails/
Tinderbox stores logs in the format'''Old Orange Factor'''
1291054515[http://jmaher.1291054887couchone.26068com/orange_factor/_design/woo/orange.html Orange Factor] is a newer dashboard by jmaher. It calculates the average number of oranges per push (the 'orange factor'), and tracks that number over time. We're currently using it as a base to explore the usefulness of other statistics.gz
or  xxx.yyy.zzz.gz where  * xxx is approximately the time that buildbot picked up the test to run * yyy is the time the log was e-mailed to tinderbox * zzz is the pid of the perl process that processed the log ([httpssource://mxr.mozilla.org/webtools/source/tinderbox/processbuild.pl#143 no really]) Tinderbox maintains a list of bug->log associations at http://tinderboxgithub.mozilla.orgcom/Firefoxjmaher/notes.txt. The format used therein is:  1291056044|WINNT 5.2 mozillaOrange-central debug test mochitests-2/5|jmathies@mozilla.com|1291059392|Bug%20614474 or  yyy|mmm|[user]|ttt|[bug #] where  * yyy is the same as yyy above * mmm is a string representing the testrun, in a format which isn't in the raw buildbot log * ttt is the time that the bug was starred None of this data can be found in the raw buildbot logs themselves, although yyy is approximately the same as the timestamp of the logfile on stage.mozilla.org (they're not exact though, there is usually a few seconds difference between the time the log was e-mailed to tinderbox (yyy) and the time the log was copied to stage). == ElasticSearch Queries == The log metadata is all stored in ElasticSearch, see the [[Auto-tools/Projects/OrangeFactor/ElasticSearch|ElasticSearch]] page for details on querying this database.Factor
See also [http://people.mozilla.com/~mcote/war-on-orange/war-on-orange-paper-testistanbul.pdf this paper] and the accompanying [http://people.mozilla.com/~mcote/war-on-orange/war-on-orange-testistanbul-slides-presentation.pdf slides], prepared for the TestIstanbul 2012 conference.
== ActiveData ==
The OrangeFactor ElasticSearch metadata is replicated to [httphttps://activedata.allizom.org/ ActiveData], and can be queried there using the "orange_factor" index:
: <code>{"from": "orange_factor"}</code>
 
== Attacking Oranges ==
 
https://developer.mozilla.org/en/QA/Fixing_intermittent_oranges
== REST API ==
Returns the same data as the "bugs" property of the bybug returned data.
 
=== testrun ===
 
Returns information on one or more test runs.
 
Parameters:
 
The request can be made in one of two fashions. To get information about only one test run,
 
* starttime: Unix timestamp of the run's start time.
* machine: Hostname of the test machine.
 
Or to get information on several,
 
* runs: a comma-separated list of timestamps and machines, in the form <timestamp>|<machine>,<timestamp>|<machine>, eg testrun?runs=1297070365|talos-r3-leopard-012,1295484366|talos-r3-xp-037
 
Returns an object with properties in the form '<timestamp>|<machine>' (regardless of which parameter format was used). Each property has an array of matching test runs, with these properties:
 
* tree
* passed/failed/todo: the number of tests in each category; may be missing for testruns that never completed due to crashes, etc
* elapsedtime: number of seconds it took for the testrun to complete
* suitename
* builder: the buildbot builder string
* machine: the machine name
* cmdline: the command line used to invoke the test
* buildtype: opt or debug
* platform: buildbot platform string
* date: date the testrun was run, YYYY-MM-DD
* buildid: the buildbot buildid used to run the test
* revision: the hg revision used to run the test
* testfailure_count: the number of failing tests in this testrun.
* testrunerrors: an array of errors that could not be pinned to a specific test; these are usually memory leaks or crashes that occur at the end of a testrun. these errors are not included in 'testfailure_count'
* testfailures: an array of testfailures which is testfailure_count in length; each member of this array has two keys:
* test: the name of the test that failed
* failures: a list of failures that occurred
* logurl: URL to the complete test log.
 
=== testfailures ===
 
Returns information on test failures.
 
This is different from information on oranges, in that (a) these failures may not have been starred and (b) there may be more than one failure per test.
 
Parameters:
 
* startday: Mandatory. In ISO format, e.g. 2011-05-27.
* endday: Mandatory. Also in ISO format.
* tree: Optional, defaults to all.
* type: Build type, opt or debug, defaults to all.
 
Returns an object with properties named by test run ID and containing an array of failures. Each failure is an object with the following properties:
 
* buildtype
* errors: an array of objects with 'status' and 'text' properties describing the error.
* testfailure_id
* buildid
* os
* testgroup_id
* tree
* machine
* platform
* test
* starttime
* date
* duration
* testgroup
* revision
* testsuite_id
* logurl
Canmove, confirm
1,126
edits

Navigation menu