Treeherder is the dashboard used by sheriffs and the rest of Mozilla to see build and test results. When you first open Treeherder, there will be a lot of information thrown at you all at once. This document should help explain what to do with it all.
There are a lot of terms used within Mozilla that aren't necessarily common elsewhere.
- Repo - Shorthand for 'repository', this is where developers push their commits to have them become part of an upcoming version of Firefox. An example of this is the 'mozilla-inbound' repository.
- Tree - Another term for a repository, in reference to the branched nature of development under version control. 'Treeherder' is based on this, as a tool to "manage the trees".
- Job - The smallest unit of work shown in Treeherder's main UI. This can be a build job or a test job. Each test job can run many tests within it, but they get reported as a group to Treeherder. Examples of these would be the Windows 8 x64 Debug build job, or a mochitest-browser-chrome-1 test job.
Each push to a repository will run a number of build jobs for various platforms (Linux, MacOS, Windows, Android). Most platforms will make multiple builds in various configurations: OPT (optimized builds that strip out information helpful for debugging failures in favor of making things faster), DEBUG (unoptimized builds that don't strip out the debugging information), and PGO (the build compiler process runs through the code a few times, taking longer to complete, but speeding up some code paths to make the build run faster than Opt builds). Each of those builds will each be used to run some number of test jobs to verify that the compiled code is running as intended. When either a build or test job runs into trouble, the job is reported as a failure to Treeherder, needing to be classified (also called 'starred' at Mozilla. Classifying a job marks the job as classified by adding a star next to the job symbol in Treeherder by the current sheriff on duty (that's you!).
The Treeherder UI
At the top of the site's header section are some menus and buttons that can be useful. The 'Infra' menu holds links to a few other Mozilla tools/websites. The 'Repos' menu provides links to view other Mozilla source code repositories within Treeherder. The 'Filters' menu has tools to filter down the jobs shown within Treeherder on various criteria. The '?' menu has various help pages for Treeherder. The 'Login/Register' button lets you log in to Treeherder, allowing you to make changes to/with Treeherder's tools.
The next row in the site header includes a menu showing the currently selected repository, with information about the state of the repository (closed/open/approval-required). The 'Tiers' menu lets you select which tiers of jobs are displayed. The 'Excluded Jobs' button toggles whether jobs that have been intentionally hidden from view are shown. The '+' or '-' button toggles whether groups of non-failing jobs are collapsed or expanded. The various colored circle buttons toggle the visibility of jobs in various states (red toggles failing jobs, green toggles passing jobs, etc). The textbox at the end is a quick-filter box, you can filter jobs by various properties from it without constructing a more exact filter from the 'Filters' menu above. Treeherder displays the count of jobs that are currently "unclassified" in the middle of this row. More on this below.
Below the site header is the main part of Treeherder. This shows the recent commits to the currently selected repository grouped by the pushes they were landed in.
The left side of each push shows the commit data for each commit in the push (eg, the commit hash, the commit author, and the commit message). This can be just a single commit or the push could contain up to hundreds of commits. Treeherder caps the view at 20 commits for performance reasons. Click the "... and more" links to view the entire push's contents on hg.mozilla.org.
The right side of each push shows the build and test results for all of the jobs running with this push's commits. This is where most sheriffing work happens, as it is where job failures are reported, and the job failures need to be classified (also referred to as 'starred' in various places).
If you select one of the job symbols for a push, a panel at the bottom of the page will open. The panel has a few sections with information about the job.
On the left side of the panel, there are a few buttons that can be clicked to open the log file for the job as well as submit a request for the job to be re-run. Below those buttons is a list of information about the job, like the name of the job, when the job was requested by the build system, when the build system actually started working on the job, and when the job finished being run.
On the right side of the panel are a number tabs with more information about the job. Depending on the state of the job you selected, the tab that gets selected by default will vary. (A green/passing job will default to the 'Job Details' tab, a failing job will default to either the 'Failure Summary' or 'Failure Classification' tabs.)
The 'Job Details' tab lists more detailed information from the job, including all of the files created and uploaded as part of the job running (eg, log files, screenshots of certain failures, etc).
The 'Failure Summary' and 'Failure Classification' tabs show information on a job's reported failures. They both list the same information, but are used slightly differently. 'Failure Summary' handles all failures in a job as a whole, 'Failure Classification' can handle each individual failure within a job on its own, and is part of the Autoclassify system. These views will list each failure line, as well as any bugs from Bugzilla that match (or are at least similar in some way) the reported failure. If you're signed in, you'll also see extra controls for classifying the failures.
The 'Annotations' tab shows any classifications that have already been performed for the job. If you're signed in, you'll also see UI for removing classifications, in the case that something was classified incorrectly and you need to correct it.
The pinboard is an area where you can add one or more jobs and pair those with related bugs or comments/reasons, before you submit the classifications to the database. Typically you'd use this on a single job at a time, but if multiple jobs are broken for the same reason, e.g:
- a patch broke a test completely, and you need to classify the failed jobs against the push that broke/fixed it
- something in the build infrastructure is broken, causing lots of jobs to fail at once
It can save time to pin them all at once and apply a single classification to them all with a single click. <<pinboard image here>>
Treeherder displays build and test results in different colors, depending on the job status:
- lightgray = pending
- gray = running
- green = success
- orange = tests failed
- purple = infrastructure exception
- red = build error
- blue = build has been restarted
- pink =build was cancelled
See the 'Job Notation' section of the Treeherder's User Guide for more information.
Treeherder has several useful keyboard shortcuts. As Sheriffs these shortcuts are super helpful:
- u = Toggle showing only unclassified failures
- n = Highlight next unclassified failure
- p = Highlight previous unclassified failure
- spacebar = 'Pin' the currently selected job to Treeherder's Pinboard
A typical workflow would be to load Treeherder, hit 'u' to switch to the only-unclassified-failures mode (gets rid of a bunch of passing jobs that we typically have no interest in), then use either 'n' or 'p' to cycle through each failed job, classifying them as we go until all failures have been classified.
An overlay showing all of Treeherder's keyboard shortcuts can be shown by pressing the '?' key on your keyboard while Treeherder is open in your browser. (Note: Not the '?' menu in the site header, although the User Guide does list the shortcuts.)
What does it do?
- Submits the failure and classification to Intermittent Failures View. This dashboard helps track trends in classified failures, so failures that happen frequently (or suddenly start failing frequently) can be prioritized.
- Stores the classification comment and any associated bugs in Treeherders's database, so everyone viewing that tree/push can see that the failure has been starred (ie annotated with a comment and the star icon added next to the job symbol), so they know it has been dealt with.
The list of jobs and classifications updates on its own periodically as pushes and job information are reported to Treeherder.
How are things starred/classified?
You first need to sign in to Treeherder to classify failures and have the classifications recorded. Click the "Login/Register" button in the header and follow the login process.
After you're signed in, open this page. It should open up a Treeherder page to a specific push, and will select a specific job. This will open the bottom panel, and one of the Failure tabs will be selected. This example will use the 'Failure Summary' tab, as it is easier to explain to a newcomer than the 'Failure Classification' tab. Click the 'Failure Summary' tab yourself if it isn't already selected. The tab should look something like <<IMAGE GOES HERE>>.
In this example, the job reports three lines of failures. We're only interested in the first line, as the other two lines don't contain any actionable information (<<But why?>>). The first failure consists of the line 1536 INFO TEST-UNEXPECTED-FAIL | toolkit/components/places/tests/browser/browser_bug248970.js | Check the total items count - Got 17, expected 16. From this, there are three pieces of information to consider.
- "TEST-UNEXPECTED-FAIL" is the kind of failure. There can be other failures like "TEST-UNEXPECTED-TIMEOUT" and "PROCESS-CRASH".
- "toolkit/components/places/tests/browser/browser_bug248970.js" is the test file that reported the failure.
- "Check the total items count - Got 17, expected 16" is the error that occurred.
Below the failure line can be one or more bugs from Bugzilla that match up with part or all of the failure line, with the matching portions between the two highlighted in bold text. In this example, the entire bug summary (except for the added word "Intermittent") is in bold, meaning it matches up exactly with the failure line. Careful, this only matches each individual word from the failure line to each word from the bug summary, so the order of words doesn't matter for highlighting purposes. You'll need to check yourself whether the two actually match for real.
Next to the bug summary is a button with a 'pin' icon. If you click it, the job is added to the pinboard, and the bug number is added to the 'Bugs' section of the pinboard. Once you're confident that the classification is correct, you can click the pinboard's "Save" button to submit the classification to Treeherder's database. If this job isn't already classified, the job symbol will get a star icon added next to it.
That's it! You've classified/starred a job! You can move on to a different failed job.
Starring many failures at once
Sometimes things go seriously wrong in CI, e.g.:
- developer lands bad code
- a broken dependency is introduced
- a system configuration change goes awry
These types of issues manifest as lots on concurrent failures as displayed in Treeherder. Luckily, Treeherder allows you to pin up to 500 jobs at one time, greatly speeding up the starring process. The process you use is similar for each situation, and the specifics depend on how pervasive the failure is.
- If the failure is pervasive, i.e. platform-independent, you can use Treeherder to show all unclassified failures. Pin all the failures for a single push, and then mark them as "fixed by commit." Repeat until it's done for that single push, then move on to the next push.
- If only a subset of platforms are affected, i.e. all on one build target or test suites, you can filter on the top right of the Treeherder interface with the search field and also filter for the job name (substring) which is shown at the bottom left if you select a job
- If the failure doesn't affect all test jobs of one test suite on the same platform and build target, you will need to manually go through the list and add them to the pinboard before marking them as "fixed by commit." This reduces the likelihood of mis-starring.
- If it's an older regression which has been resolved many pushes ago, selecting and starring all these test failures as "fixed by commit" might outweigh the time needed to do it manually. Other regressions should have been caught by the later pushes after the backout of the obvious test failure.
More Treeherder Resources
More information about the Treeherder project can be found here.