Services/NOC (proposal)

From MozillaWiki
Jump to: navigation, search

SREs

  • Minimum 7 person team for 24/7 coverage.
    • 8-9 is better (due to built-in turnover, addressed below).
  • Should be "Tier 1+"
    • will have root
    • should have reasonable associate-level Linux/Juniper/something-useful skills
    • There may never be less than one SRE (or more senior temp coverage, in emergencies) in the NOC.
  • Position is intended as a "gateway" position into Mozilla
    • 18-24mo minimum tour-of-duty
    • After 9-12mo, NOC staff are expected to work on external team projects where they hope to reorg to
    • A rotational Swing-shift will be offered to facilitate this work.

Duties

  • <5 min Ack of issues
    • Pages escalate to Secondary in 5 minutes
  • Monitoring of key IRC channels
    • Communicate large-scale issues to IRC.

Personnel

  • Initial team will be comprised of externally-hired consultants
    • Half (Secondary coverage) *may* be remote, as long as they're in a single facility for hand-offs
  • As we hire FT SREs, we will eliminate on-site consultants (let their contracts expire)
    • We may decide to keep secondary coverage as a dedicated external consultancy if it works well for us

Scheduling

Scheduling Chart

Assignment

  • SREs will rotate as follows: B/Secondary (2 weeks) -> A/Primary (2 weeks) -> C/Tertiary (2 weeks) -> S/Swing (1 week)
  • SREs will maintain either the 1st or 2nd shift, unless on Swing which only has 1 shift.
    • If the SRE prefers off-shift and the Swing tasks allow it, Swing shift timing is adjustable.

Facility

  • segregated physically
    • allows closed-door "war room" focus for events
    • default, however, is door-open, if the person on-staff wants this (social).
    • dedicated attached conference room.
  • Multiple (6+) large-screen displays
    • metrics (2+)
    • monitoring (2)
      • internal (nagios)
      • external (watchmouse)
    • video conference (1)
    • satellite-fed news (1)
    • key twitter feeds (1?)
    • IRC (1?)
  • 2-3 "NOC" desks
    • Each with a desktop system with 2-4 vertically-oriented screens
    • Each with a very good speaker phone
  • 2 "hotel" desks (don't necessarily need a good view of the screens)
  • dual discrete ethernet feeds
    • dedicated ("real") VPNs to prod
    • require cellular modem for backup connectivity