Overview

There are many reasons we need to validate a new config:

upgrading an operating system to a new version
changing hardware the operating system runs on
upgrading the generic-worker or other infrastructure related components
changing the cloud providers we are using
building with different flags or configurations
installing tools on the test image in order to add support for a new test (e.g. fonts, xperf)
etc.

We often run into this scenario at least once/quarter and in conversations this is usually referred to as "greening up the tests." This is intended to document that process.

The scope here is to assume you are testing a single branch, a single platform or ideally a single build type/config, and will need to test most if not all the tests.

* NOTE: for linux/windows/android we run performance tests on physical devices, not virtual instances

Greening up

The goal here is to find all repeated failures and get them fixed. They might exist on the previous config or a similar config. Often we find that a new config will result in many failures:

tests failing as they have hardcoded config/platform options
timeout related issues (either test or harness)
tools might not work (minidumpstackwalk), or features could be missing (for example clipboard or crashreporter)

To do this work it is a loop where we basically do this:

while not green:
- push to try
- for each failed job:
  - file a bug, needinfo someone to help get a resolution

Typically this process results in dozens of bugs and often many test failures are the same root cause. It is a good idea to pick the top set of failures at first that seem to be common issues (installer, process/crash, reftest and fonts, canvas/webgl, etc.) and get bugs on file with examples of various tests. Having traction on those bugs will often result in fixing many more.

One word of caution here is that the browser is a changing environment every day. We change features, fix things, break things, and most importantly our tests are changing daily as well. So what failures you found yesterday might not be the same failures you see today. Typically I view this as a 3 pass process:

1) get jobs to schedule, basic tools to work (1 week)
2) push, file bugs, resolve issues (3 weeks)
3) push, confirm fixed bugs, file new bugs (2 weeks)
4) push, confirm fixed bugs, file new bugs (2 weeks)

Once you get to a more stable state, then it is waiting on a small number of bugs to be resolved and often you can start running the tests as tier-3 or tier-2 with the tests/jobs disabled until the bugs are fixed. This is not a good idea until at least the 3rd or 4th cycle in greening up.

Validate the config

In doing this, the most typical solution is a before/after comparison. Here we want to know what our current configuration is producing and compare that to the new configuration we want to deploy. In general there are 3 things to do:

push to try twice, once for the current/original config, second for the new config
for both try pushes ensure that we are building each job multiple times (I recommend 5 data points for each job)
once complete, we want to build a spreadsheet of each job before/after and compare the average runtime and intermittent failure rate.
if this is a change that affects the builds or the hardware, then we need to compare performance numbers before/after

TODO_INSERT_TOOL_HERE

List tools which you find or develop which will be helpful in this process.

Next Steps

Once completed, here is how to deploy

ReleaseEngineering/How To/Validate a New Config

Contents

Overview

Greening up

Validate the config

TODO_INSERT_TOOL_HERE

Next Steps

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools