Sheriffing/Manifest Scheduling

From MozillaWiki
Jump to: navigation, search

The "manifest scheduling" project is a major shift in Mozilla's CI system. It aims to reduce costs and improve regression detection by only running the exact tests that we need.

Disable Manifest Scheduling

If manifest scheduling is enabled and it is causing problems that make it too difficult to do your job, please reach out to :ahal or :marco and we can disable it for you. If neither of us are around, you can disable it yourselves by removing this line:


Under the old system, the CI roughly performs these steps:

  1. Compute all tests
  2. Split them across a hardcoded number of tasks (i.e total chunks)
  3. Figure out which tests should run
  4. Schedule the tasks that contain at least one of the tests we want to run

The major downside to the above system is that each task that contains a test we care about, also contains a whole lot of tests we don't care about. With "manifest scheduling" enabled, the steps now become:

  1. Compute the tests we care about
  2. Figure out how many chunks it would take to run them given a hardcoded time interval
  3. Split them across said chunks
  4. Schedule all chunks

With this new method, we *only* schedule the exact set of manifests that we have deemed important. This should represent a huge improvement in CI efficiency.

Sheriffing Implications

The benefits of "manifest scheduling" are fairly clear, but there are several drawbacks as well. Most of which are related to sheriffing.

Push Continuity

The main issue is that under "manifest scheduling" the same mochitest-1 task on push A, will run a completely different set of tests as it does on push B. In other words, it will no longer be possible to filter Treeherder by task label to identify test-level regressions (though it should still work for many types of infra related issues). Instead, sheriffs will need to filter Treeherder by "test path". Read this blog post for details of the feature.

UI showing active filter for a test path

Push showing tasks that executed the same test path

Armen also added a new "Test Groups" to the job panel. It contains a list of manifests that the test task ran and each manifest will link to the test path filter outlined above. For example: Treeherder-test-groups-prototype.png


Another major issue is backfilling. If tasks run different sets of tests on different pushes, then that will break the backfill action (as we'll need to make sure the exact same set of manifests were scheduled on each backfill push).

Thanks to work by Armen, the backfill action can now automatically detect if "manifest scheduling" was used for the task. If not it will perform the normal standard backfill you are used to. If so, it will run the same set of manifests from the originating push on all of the backfilled pushes. Because it's possible to run more than one backfill at a time, we need a way to identify which tasks were backfilled from where. To that end, the symbols of backfilled tasks have been changed to something like `<group>-bk(<symbol>-<rev>-bk)`. For example, if `M-fis(bc3)` was backfilled from revision `abcdef`, then the symbol for the backfill tasks would be `M-fis-bk(bc3-abcdef-bk)`. This notifies sheriffs that the task was backfilled starting at revision `abcdef` and contains the same set of test manifests as on that push.

UI showing a failed task, two backfill requests and few retriggered tasks

You can filter out a task you're backfilling and all backfilled tasks by selecting the task and selecting "Filter jobs containing these keywords" (text that shows once you hover the link). See the screenshot below for the location in the UI.

Link to filter tasks and backfilled tasks

Other than being aware of this change, there shouldn't be any differences in performing a backfill.

You can read this blog post for more details.

Add New Jobs

There is currently no way to specify test manifests when adding new jobs via Treeherder's "Add New Jobs" UI. This means that it can't be used to fill in tasks for the purpose of bisecting a regression. There are plans to add this feature in the future. If used, "Add New Jobs" would simply run the same manifests that ran

One risk here is that it will be more difficult (but still possible) to schedule a mozilla-central only task on autoland for the purposes of bisecting a regression caught on central. You'll now need to make sure that you use "Add New Jobs" on a backstop push as that will run all test manifests. Then you can bisect the task like normal.

In the future we plan to get rid of all mozilla-central only tasks and run them on an interval on autoland instead.


Retriggers should remain unaffected. Though in the future there are plans to add an action that only retriggers the manifests that failed. This may or may not become the default.

Intermittent Risk

One risk of "manifest scheduling" is since tests are no longer running in a deterministic order, we may get more bad interactions between tests that produce intermittents. It's possible we see many new intermittent bugs after "manifest scheduling" is enabled (though by the same logic, these new intermittents should also be way less frequent).

To mitigate this risk, we have only enabled "manifest scheduling" for suites that have the "run-by-manifest" feature. That is the harness restarts Firefox in between each new manifest. In practice, this seems to reduce nearly all of these "bad interaction" types of failures. Though it is still certainly possible for tests to influence one another even across restarts (e.g if they write stuff to disk outside of the profile for instance).

Keep in mind that a "manifest" is the smallest atomic unit of tests that we'll ever schedule. So tests will always run as part of their larger manifest.