New test environments: Difference between revisions
(→Task Overview: add link) |
(→Task Overview: add more details to page) |
||
Line 14: | Line 14: | ||
* ensure availability of machines (if hardware) | * ensure availability of machines (if hardware) | ||
** responsibility | ** responsibility: [[ReleaseEngineering|Release Engineering]] | ||
** some people (that I've worked with): | |||
*** jwatkins | |||
*** markco | |||
*** rthijssen | |||
*** dhouse | |||
* add platform to tryserver | * add platform to tryserver | ||
** responsibility: CI-A | ** responsibility: CI-A | ||
** task checklist: [[New_test_environments#Checklist|platform checklist]] and [[New_test_environments#Checklist_2|worker checklist]] | |||
* run test suites on tryserver | * run test suites on tryserver | ||
** responsibility: CI-A | ** responsibility: CI-A | ||
** task checklist: [[New_test_environments#Checklist_3|test checklist]] and [[New_test_environments#Checklist_4|baseline checklist]] | |||
''everything below can be executed in parallel among several engineers'' | |||
* begin greening process | * begin greening process | ||
** responsibility: CI-A | ** responsibility: CI-A | ||
* address issues with test case/platform | * address issues with test case/platform |
Revision as of 18:11, 28 August 2019
Overview
From time to time, there rises a need to upgrade the underlying operating system of a platform. This need arises in sync with new major releases of various operating systems that form part of the CI infrastructure.
For instance, as of 2019-08-07, all Firefox builds for Linux is executed on Ubuntu 16.04.5 docker containers. In other words, the version of Linux distribution used for testing is at least 2 major releases behind the likely dominant version on the market, which is Ubuntu 18.04.
Upgrade of the underlying operating system version has been, in the past, considered a large undertaking often taking upwards of 6 months. This causes a chicken-and-egg problem where regular upgrades do not occur due to the perceived amount of work, which in turn causes the amount of issues to multiply when the upgrade is finally tackled.
The aim of this document, and process is to establish a standardized process that can be used by anyone in Mozilla engineering to perform operating system upgrades.
Task Overview
Broadly speaking, the following discrete phases are involved when adding new platforms.
- ensure availability of machines (if hardware)
- responsibility: Release Engineering
- some people (that I've worked with):
- jwatkins
- markco
- rthijssen
- dhouse
- add platform to tryserver
- responsibility: CI-A
- task checklist: platform checklist and worker checklist
- run test suites on tryserver
- responsibility: CI-A
- task checklist: test checklist and baseline checklist
everything below can be executed in parallel among several engineers
- begin greening process
- responsibility: CI-A
- address issues with test case/platform
- responsibility: developers
- create, review and land migration patches
- responsibility: CI-A
Enable platform on taskgraph
The first step of any new test environment is to enable the test platform on Tryserver.
At the bare minimum, ensure the taskgraph is sound with each step. This can be verified using ./mach taskgraph full -v
Enable build
This step may have already been performed by other teams (eg. Releng). If so, skip to the next step.
First, the platform must have builds enabled before tests can be run.
Within the taskcluster/ci/build
directory, edit the appropriate YAML file for the platform. For example, if adding a new Windows build type, edit taskcluster/ci/build/windows.yml
.
Define all of the required attributes, using existing configurations as a template.
Example with windows10-aarch64
builds:
win64-aarch64/opt:
description: "AArch64 Win64 Opt"
index:
product: firefox
job-name: win64-aarch64-opt
attributes:
enable-full-crashsymbols: true
treeherder:
platform: windows2012-aarch64/opt
symbol: B
tier: 1
worker-type: b-win2012
worker:
max-run-time: 7200
env:
TOOLTOOL_MANIFEST: "browser/config/tooltool-manifests/win64/aarch64.manifest"
PERFHERDER_EXTRA_OPTIONS: aarch64
run:
actions: [get-secrets, build]
options: [append-env-variables-from-configs]
script: mozharness/scripts/fx_desktop_build.py
secrets: true
config:
- builds/releng_base_firefox.py
- builds/taskcluster_base_windows.py
- builds/taskcluster_base_win64.py
extra-config:
stage_platform: win64-aarch64
mozconfig_platform: win64-aarch64
fetches:
toolchain:
- win64-clang-cl
- win64-rust
- win64-rust-size
- win64-cbindgen
- win64-sccache
- win64-nasm
- win64-node
Example
Checklist
- does
./mach taskgraph full
succeed? - does build successfully complete on Tryserver?
Worker configuration
This step may have already been performed by other teams (eg. Releng), or not required at all (eg. OS upgrade). If so, skip to the next step.
Once the build task has been successfully enabled, test workers must be defined.
There are several files that need to have the new platform added in order to satisfy the taskgraph algorithm:
- taskcluster/taskgraph/transforms/tests.py
- taskcluster/taskgraph/util/workertypes.py
Check and add the new platform details in the following categories:
- worker types
- tiers
- treeherder name translations
Example
Bug 1527469
Bug 1550826 - a more involved example
Checklist
- does
./mach taskgraph full
succeed?
Test configuration
Once worker configuration is complete, test configuration must be added for the new platform.
Several files must be modified to support running tests against the new platform:
- taskcluster/ci/test/test-platforms.yml
- taskcluster/ci/test/test-sets.yml
The following is an example platform where the full suite of desktop Firefox tests are run, and thus the list of defined tests are long. Depending on the nature of the platform, the list of tests will vary.
Example test-sets.yml
:
macosx1014-64-tests:
- cppunit
- crashtest
- firefox-ui-functional-local
- firefox-ui-functional-remote
- gtest
- jittest
- jsreftest
- marionette
- mochitest
- mochitest-a11y
- mochitest-browser-chrome
- mochitest-chrome
- mochitest-devtools-chrome
- mochitest-devtools-webreplay
- mochitest-gpu
- mochitest-media
- mochitest-remote
- mochitest-webgl1-core
- mochitest-webgl1-ext
- mochitest-webgl2-core
- reftest
- telemetry-tests-client
- test-verify
- test-verify-gpu
- test-verify-wpt
- web-platform-tests
- web-platform-tests-reftests
- web-platform-tests-wdspec
- xpcshell
Example test-platforms.yml
:
macosx1014-64-shippable/opt:
build-platform: macosx64-shippable/opt
test-sets:
- macosx1014-64-tests
- macosx64-talos
- desktop-screenshot-capture
- awsy
- raptor-chromium
- raptor-firefox
- raptor-profiling
- marionette-media-tests
- web-platform-tests-wdspec-headless
Exact list of tests will vary depending on the platform. It is not possible to run cppunit
on android platforms, for example.
Note that:
- list of tests defined in
test-sets.yml
is used intest-platforms.yml
build-platform
attribute intest-platforms.yml
should match the build name chosen in the previous step
Example
Checklist
- does
./mach taskgraph full
succeed with test sets and test platforms defined? - do the tests show up in the try fuzzy selector?
Obtain baseline
Once build and tests are enabled on Tryserver, it is time to run a baseline push to take inventory of suites that pass and fail.
Using ./mach try fuzzy --no-artifact
, push tests belonging to the new platform to the Tryserver. Do not use artifact builds for the baseline push as some test results (outside of the compiled tests such as cppunit
) are affected by the artifact build.
Once the checklist is complete, create a diff on Phabricator and have it reviewed by:
- jmaher
- egao
and any other reviewers as necessary.
Example
Checklist
- does the build succeed?
- are the tests scheduled?
- does each test job run to completion?
- which tests pass?
- which tests fail?
- has patch been reviewed and landed to mozilla-central?
Green up tests
Once the build and tests are running on Tryserver, it is time to begin greening the tests.\