ReleaseEngineering/Buildduty manifesto

From MozillaWiki
Jump to: navigation, search

Intro

Buildduty is an operational support team dedicated to monitoring and maintaining the health of Firefox’s continuous integration (CI) infrastructure. Employees are contractors located in Romania and cover the EET (UTC+2) timezone working day. The team's responsibilities include but are not limited to:

While team’s responsibilities cover a wide variety of tasks, people sometimes contact them for (or file bugs in their queue for) issues that don’t fall under their area of expertise. The following are some of the most common examples:

Things Buildduty can help with

Firefox CI infrastructure outage coordination and investigation

When the Firefox CI system fails, getting services online again is buildduty's top priority. They are the initial point of contact for outages, but will likely escalate to additional teams with subject matter experts for resolution.

Monitoring, investigating, and debugging issues with the Linux, Windows, and OS X Firefox CI infrastructure

Buildduty monitors the Firefox CI infrastructure using the Nagios GUI and irc alerts in the #buildduty irc channel. They routinely look for system issues, resolve them using our automation tooling, or work with datacenter staff to repair offline or degraded hardware. They also monitor email from AWS about infrastructure that is degraded or requires maintenance.

Monitoring Firefox CI backlog/pending counts

Buildduty is the first point of contact for monitoring the load on the Firefox CI system and determining the cause of any high backlog or pending job counts. If they are unable to determine the root cause and solve the issue, buildduty escalates to other teams who have subject matter experts.

Tree closing and opening

Closing and opening the trees (denying and allowing code checkins to our mercurial repos) are typically handled by the Mozilla Code Sheriffs, but buildduty can also help out with this if needed.

Loaning Firefox build/test instances to developers

Buildduty processes bugzilla requests from developers for Firefox CI build or test loaners. To obtain a loaner, submit a request to bugzilla under Release Engineering::Buildduty and expect a response within one working day (UTC+2).

Upload new packages or Python modules to our internal mirrors

Buildduty can help a developer who needs a new software package uploaded to tooltool or a Python package uploaded to our internal PyPi mirror. They can also grant other developers access to upload packages to tooltool, for a given paths subset, to allow for future self-service.

Routine maintenance of the Firefox CI configuration

While most of the Taskcluster configuration is handled by the end-developer, we still have infrastructure using the Buildbot CI infrastructure as well. Buildduty has the knowledge and capability to modify the Firefox buildbot-configs and perform general maintenance of the Buildbot systems. Maintenance includes tasks such as retasking machines from one platform to another as capacity requirements demand, decommissioning machines, updating keys and secrets, etc.

Things Buildduty isn’t responsible for

Firefox release builds

Buildduty is not responsible for checking the state or fixing issues with any of the Firefox channel builds (Nightly, DevEdition, Release, etc). Releases of any kind are are handled by the ReleaseDuty team within Release Engineering.

Developer test failures

Buildduty is the first point of contact for any ongoing, systemic, Firefox CI test failures which appear to be caused by infrastructure-related issues, but they do not triage or fix other types of test failures. Any bugs filed in the buildduty queue for these general types of developer test failures (e.g. Mochitest, reftest, etc) will be moved during bug triage.