ReleaseEngineering/How To/Validate a New Config
There are many reasons we need to validate a new config:
- upgrading an operating system to a new version
- changing hardware the operating system runs on
- upgrading the generic-worker or other infrastructure related components
- changing the cloud providers we are using
- building with different flags or configurations
- installing tools on the test image in order to add support for a new test (e.g. fonts, xperf)
We often run into this scenario at least once/quarter and in conversations this is usually referred to as "greening up the tests." This is intended to document that process.
The scope here is to assume you are testing a single branch, a single platform or ideally a single build type/config, and will need to test most if not all the tests.
* NOTE: for linux/windows/android we run performance tests on physical devices, not virtual instances
The goal here is to find all repeated failures and get them fixed. They might exist on the previous config or a similar config. Often we find that a new config will result in many failures:
- tests failing as they have hardcoded config/platform options
- timeout related issues (either test or harness)
- tools might not work (minidumpstackwalk), or features could be missing (for example clipboard or crashreporter)
To do this work it is a loop where we basically do this:
- while not green:
- push to try
- for each failed job:
- file a bug, needinfo someone to help get a resolution
Typically this process results in dozens of bugs and often many test failures are the same root cause. It is a good idea to pick the top set of failures at first that seem to be common issues (installer, process/crash, reftest and fonts, canvas/webgl, etc.) and get bugs on file with examples of various tests. Having traction on those bugs will often result in fixing many more.
One word of caution here is that the browser is a changing environment every day. We change features, fix things, break things, and most importantly our tests are changing daily as well. So what failures you found yesterday might not be the same failures you see today. Typically I view this as a 3 pass process:
1) get jobs to schedule, basic tools to work (1 week) 2) push, file bugs, resolve issues (3 weeks) 3) push, confirm fixed bugs, file new bugs (2 weeks) 4) push, confirm fixed bugs, file new bugs (2 weeks)
Once you get to a more stable state, then it is waiting on a small number of bugs to be resolved and often you can start running the tests as tier-3 or tier-2 with the tests/jobs disabled until the bugs are fixed. This is not a good idea until at least the 3rd or 4th cycle in greening up.
Validate the config
In doing this, the most typical solution is a before/after comparison. Here we want to know what our current configuration is producing and compare that to the new configuration we want to deploy. In general there are 3 things to do:
- push to try twice, once for the current/original config, second for the new config
- for both try pushes ensure that we are building each job multiple times (I recommend 5 data points for each job)
- once complete, we want to build a spreadsheet of each job before/after and compare the average runtime and intermittent failure rate.
- if this is a change that affects the builds or the hardware, then we need to compare performance numbers before/after
List tools which you find or develop which will be helpful in this process.
Once completed, here is how to deploy