CIDuty/How To/Tree Closure
Contents
How do I schedule a downtime?
Whenever Sheriffs/RelEng/IT/WebDev wants a tree-closing downtime because of a infra issue, they should contact the CiDuty person of the day in #ci, by email or preferably by putting exact wording they'd like in the downtime notice as it relates to their bug, and then nominating the bug using the "needs-treeclosure?" flag.
Once we know that we want to close the tree(s), head to: https://mozilla-releng.net/treestatus login select the tree(s) to be closed and add a note with the bug number, then update the tree to the correct status (open/closed).
When do we close the trees
- Infra related issue:
- Error code 500 (server load)
- HG Issues (cloning issues, timeouts)
- HDD problems (space, I/O)
- Pending Tasks keep raising (Indicator for another problems, such as BBB not working properly)
When planning a downtime, CiDuty should consider:
- the urgency of the work
- what other work, if any, can be safely done in the same downtime
Preparing for the downtime
- CiDuty will:
- Create a bug with the explanation why the tress get closed
- verify that the bug is assigned to the person who will actually be doing the work in the downtime
- for security sensitive work, be vague, but still include the bug# and vague description. (this reduces confusion about whether an item of work is in/out of a downtime).
Who do I notify, and when?
- Just After the trees got closed:
- Updated the #ci Channel topic and add " TREES CLOSED - BUG: XXXX " right after the LDAP username.
- Send notifications to: #ci, #sheriffs (if needed), #developers, about the Trees being closed.
- If new updates come from #sheriffs, #developers, #releng or anyone, make sure #ci is being constantly updated with the current status.
After the downtime
CiDuty will:
- reopen the trees
- verify with sheriff that trees open, all ok?
- update the bugs with status.
- send "TREE OPEN" announcements in #ci, #sheriffs, #developers
For any questions, or if you're not sure about a particular server, please check with |ciduty in #ci.
If possible, consolidate RelEng and IT downtimes that need tree closures to avoid the disruption of having two tree closures soon after each other. This is "nice to do", not a "requirement"; if it reduces risk by doing two separate downtimes, that's fine!