CIDuty/How To/Tree Closure

From MozillaWiki
< CIDuty‎ | How To
Jump to: navigation, search

How do I schedule a downtime?

Whenever Sheriffs/RelEng/IT/WebDev wants a tree-closing downtime because of a infra issue, they should contact the CiDuty person of the day in #ci, by email or preferably by putting exact wording they'd like in the downtime notice as it relates to their bug, and then nominating the bug using the "needs-treeclosure?" flag.

Once we know that we want to close the tree(s), head to: https://mozilla-releng.net/treestatus login select the tree(s) to be closed and add a note with the bug number, then update the tree to the correct status (open/closed).

When do we close the trees

  • Infra related issue:
    • Error code 500 (server load)
    • HG Issues (cloning issues, timeouts)
    • HDD problems (space, I/O)
    • Pending Tasks keep raising (Indicator for another problems, such as BBB not working properly)


When planning a downtime, CiDuty should consider:

  • the urgency of the work
  • what other work, if any, can be safely done in the same downtime

Preparing for the downtime

  • CiDuty will:
    • Create a bug with the explanation why the tress get closed
    • verify that the bug is assigned to the person who will actually be doing the work in the downtime
      • for security sensitive work, be vague, but still include the bug# and vague description. (this reduces confusion about whether an item of work is in/out of a downtime).

Who do I notify, and when?

  1. Just After the trees got closed:
    1. Updated the #ci Channel topic and add " TREES CLOSED - BUG: XXXX " right after the LDAP username.
    2. Send notifications to: #ci, #sheriffs (if needed), #developers, about the Trees being closed.
    3. If new updates come from #sheriffs, #developers, #releng or anyone, make sure #ci is being constantly updated with the current status.

After the downtime

CiDuty will:

  1. reopen the trees
  2. verify with sheriff that trees open, all ok?
  3. update the bugs with status.
  4. send "TREE OPEN" announcements in #ci, #sheriffs, #developers

For any questions, or if you're not sure about a particular server, please check with |ciduty in #ci.

If possible, consolidate RelEng and IT downtimes that need tree closures to avoid the disruption of having two tree closures soon after each other. This is "nice to do", not a "requirement"; if it reduces risk by doing two separate downtimes, that's fine!