CIDuty: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Add href for everyday document.)
(Updated Buildduty general docs, added more useful links)
Line 1: Line 1:
__TOC__
__TOC__
= What is Buildduty? =
Buildduty is a team dedicated to helping out developers with releng-related issues. We currently have two folks based in Romania that are available during their regular work hours. This is similar to the sheriff role that rotates through the [[Sheriff|sheriffing team]] . To avoid confusion, the releng sheriff position is known as "'''buildduty'''."


= Manifesto =
= Manifesto =
The [[ReleaseEngineering/Buildduty_manifesto| buildduty manifesto]] describes the team responsibilities in a nutshell.
The [[ReleaseEngineering/Buildduty_manifesto| buildduty manifesto]] describes the team responsibilities in a nutshell.


= What do they regular check? =
= Buildduty priorities =
 
The [[ReleaseEngineering/Buildduty_actionable| buildduty actionable]] enumerates their daily/weekly sanity job.
The [[ReleaseEngineering/Buildduty_actionable| buildduty actionable]] enumerates their daily/weekly sanity job.
= What is buildduty? =
Every month, there is one person from the Release Engineering (releng) team dedicated to helping out developers with releng-related issues.  This person will be available during his or her regular work hours for the whole month. This is similar to the sheriff role that rotates through the [[Sheriff|sheriffing team]] . To avoid confusion, the releng sheriff position is known as "'''buildduty'''."
= Who is on buildduty? (schedule) =
The person on buildduty should have 'buildduty' appended to their IRC nick, and should be available in the #developers, #releng, and #buildduty IRC channels.
Mozilla Releng Buildduty Schedule ([https://www.google.com/calendar/embed?src=mozilla.com_30qa9d8c380jrqi454kjo34624%40group.calendar.google.com&ctz=America/Toronto Google Calendar]|[https://www.google.com/calendar/ical/mozilla.com_30qa9d8c380jrqi454kjo34624%40group.calendar.google.com/public/basic.ics iCal]|[https://www.google.com/calendar/feeds/mozilla.com_30qa9d8c380jrqi454kjo34624%40group.calendar.google.com/public/basic XML])
== Buildduty not around? ==
It happens, especially outside of standard North American working hours (0600-1800 PT). Please [https://bugzilla.mozilla.org/enter_bug.cgi?product=Release%20Engineering&component=Buildduty open a bug] under these circumstances.
= Buildduty priorities =
== How should I make myself available for duty? ==
* Add 'buildduty' to your IRC nick
* Be available in the following IRC channels (at least): [irc://irc.mozilla.org/#developers #developers], [irc://irc.mozilla.org/#releng #releng], and [irc://irc.mozilla.org/#buildduty #buildduty] (as well as #mozbuild of course)
** also useful to be in [irc://irc.mozilla.org/#mobile #mobile] and [irc://irc.mozilla.org/#ateam #ateam]
** if you are in the middle of an outage, or need IT help, it is useful to be in [irc://irc.mozilla.org/#moc #moc], [irc://irc.mozilla.org/#infra #infra], and [irc://irc.mozilla.org/#sysadmins #sysadmins].
== What should I take care of? ==
=== Outages ===
When things fail, getting systems and services stood back up again is buildduty's top priority. Note: this doesn't mean you need to do all the work yourself. For big outages, rope in whatever help you need: domain experts from releng, managers, netops, relops...whoever.
The [[ReleaseEngineering/Buildduty/Dealing With Outages|Dealing with Outages]] wiki has more instructions.
=== Developer Requests ===
Aside from dealing with service outages, responding promptly to developer requests is the most important part of buildduty.
Developer requests will come via bugzilla in the form of new bugs, or may come as “pings” in IRC. Buildduty is expected to be the first point-of-contact for these requests to avoid interrupts to the rest of the team. IRC pings should be translated into bugs once the essence of the request in understood. Bugs should then be triaged into the proper component, and appropriate engineers cc-ed. If the work is actionable by buildduty and the work won't interfere with the completion of other duties, then buildduty should fix the issue.
=== Daily ===
==== Buildduty Bug Triage ====
The [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/buildduty_report.html Buildduty report] (generated hourly) should be your starting point for triage.
'''Note:''' Use the "View list in bugzilla" links in the buildduty report to navigate the bugs more easily.
At the top, it lists unassigned bugs for loan requests. You should try to keep this queue empty to make sure developers are unblocked. The wiki has [[ReleaseEngineering/How_To/Loan_a_Slave|instructions for how to loan a machine]].
After loans are taken care of, make sure that bugs in the "No dependencies" section get dependencies filed, e.g. diagnosis bug, decomm bug, etc. The specific next steps will depend on the issue: [https://wiki.mozilla.org/Category:Release_Engineering_How_To Release Engineering How-Tos].
Do the same for bugs in the "All dependencies resolved" section to make sure the next action is taken (re-image, decomm, return to production, etc). Again, the specific next steps will depend on the issue: [https://wiki.mozilla.org/Category:Release_Engineering_How_To Release Engineering How-Tos].
'''Note:''' systemic issues (e.g. test failures that require further investigation) should *not* stay in the buildduty bugzilla component. It <i>may</i> be OK for you to take the bug and work on it depending on how much time you have, but generally these types of bugs should be moved to a more-appropriate component (e.g. General Automation) once buildduty has triaged them.
==== Alerts ====
Aside from the buildduty report, there may also be [https://nagios.mozilla.org/releng-scl3/cgi-bin/status.cgi?host=all&servicestatustypes=28&hoststatustypes=15&serviceprops=270346&hostprops=270346 unacknowledged nagios alerts] or SNS alerts from [https://papertrailapp.com/dashboard papertrail] in the #buildduty IRC channel. Deal with them, filing bugs as needed: [https://wiki.mozilla.org/Category:Release_Engineering_How_To Release Engineering How-Tos].
==== Infrastructure performance ====
In addition to the individual slave bugs tackled in triage above, there may be systemic issues that need investigating. The [[ReleaseEngineering/Buildduty/Infrastructure_Performance|Infrastructure performance]] wiki has more details about how to do this, and links to the wiki page for [[ReleaseEngineering/How_To/Dealing_with_high_pending_counts|how to deal with high pending counts]].
=== Semi-Daily ===
* '''Reconfigs'''
** Run [[ReleaseEngineering/Buildduty/Reconfigs|reconfigs]] (every day or two days) for other release engineers. Reconfigs are automated now, so this is usually just a matter of merging the relevant repositories to production.
* '''Review logs in papertrail'''
** We have SNS alerts for most known issues in papertrail. The alerts will appear in the #buildduty IRC channel. Buildduty should log into [https://papertrailapp.com/dashboard papertrail] a few times a week (M/W/F) to check for new issues that are *not* covered by alerts. New alerts should be setup for these issues.
=== Weekly ===
*'''Review AWS instances that have 'Unknown State/Type' or have been 'stopped for a while' '''
** ''When'':
*** Once a week. Traditionally this has been done on Friday.
** ''How'':
*** check the AWS sanity logs in [https://papertrailapp.com/dashboard papertrail]
*** for each host under heading "Unknown State", "Unknown Type"
**** follow steps in dealing with [[ReleaseEngineering/How_To/Manage_AWS_slaves#Unknown_Type_Or_State_Instances|instances with unknown state or type]]
*** for each host under heading "Stopped For A While"
**** follow steps in dealing with [[ReleaseEngineering/How_To/Manage_AWS_slaves#Stopped_For_A_While_Instances|instances that have been stopped for a while]]


== Others ==
== Others ==
There is a long list of '''[[ReleaseEngineering/Buildduty/Other_Duties|other, less-frequent duties]]''' that buildduty can assist with.
There is a long list of '''[[ReleaseEngineering/Buildduty/Other_Duties|other, less-frequent duties]]''' that buildduty can assist with.
= Documentation =
There's a [https://wiki.mozilla.org/ReleaseEngineering/Buildduty/How_To wiki page] that aggregates useful info related to the tasks Buildduty is taking care of (as of December 2017).


= Useful Links =
= Useful Links =
* [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Slave Health]
* [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Slave Health]
* [https://tools.taskcluster.net/provisioners Provision Explorer]
* [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/buildduty_report.html Buildduty Report]
* [https://secure.pub.build.mozilla.org/buildapi/ Build Dashboard Main Page]
* [https://secure.pub.build.mozilla.org/buildapi/ Build Dashboard Main Page]
** You can get JSON dumps for people to analyze by adding <code>&format=json</code>
** You can get JSON dumps for people to analyze by adding <code>&format=json</code>
* [[ReleaseEngineering/How_To|Public "How To" documents]]
* [[ReleaseEngineering/How_To|Public "How To" documents]]
* [https://mana.mozilla.org/wiki/dosearchsite.action?queryString=title%3A%22How%20To%22&where=RelEng Private "How To" documents]
* [https://mana.mozilla.org/wiki/dosearchsite.action?queryString=title%3A%22How%20To%22&where=RelEng Private "How To" documents]
= Standard Bugs =
* For IT bugs that are marked "infra only", yet still need to be readable by RelEng, it is not enough to add release@ alias - people get updates but not able to comment or read prior comments. Instead, cc the following:
** :bhearsum, :Callek, :catlee, :coop, :hwine, :jlund, :kmoir, :mrrrgn, :nthomas, :rail
** :Tomcat, :RyanVM, :KWierso


= Meeting Notes =
= Meeting Notes =

Revision as of 10:06, 13 December 2017

What is Buildduty?

Buildduty is a team dedicated to helping out developers with releng-related issues. We currently have two folks based in Romania that are available during their regular work hours. This is similar to the sheriff role that rotates through the sheriffing team . To avoid confusion, the releng sheriff position is known as "buildduty."

Manifesto

The buildduty manifesto describes the team responsibilities in a nutshell.

Buildduty priorities

The buildduty actionable enumerates their daily/weekly sanity job.

Others

There is a long list of other, less-frequent duties that buildduty can assist with.

Documentation

There's a wiki page that aggregates useful info related to the tasks Buildduty is taking care of (as of December 2017).

Useful Links

Meeting Notes