Operations Updates

  • stage has had some challenges and opportunities
  • systemd was running stuff as the wrong user
    • race condition in our infra that we hadn't hit in the prior six months
    • crash mover happened to start sooner this time
    • monitoring failure. only discovered because we were looking at a change on stage in detail and saw nothing was running
    • alerts were firing, but dev team was not seeing them
    • looking for a way to connect them to irc
  • stage admin node had not been running crontabber, but it was all green
    • divergence between our consul config and the code
    • crontabber has in the past taken its config from code
    • consul was overriding with a different set of jobs in config
    • now have monitoring to check to ensure crontabber is running every so often

Project Updates

  • (peterbe) Wanna monitor QA saucelabs tests?
    • do we want to tighten the loop?
    • there's a long delay with the QA tests being used by monitoring
    • tests failing on stage don't block a prod release, now
    • we had a bunch of flakey tests, and so removed the alerting in irc
    • now we only see them as bugs filed by QA after the fact, easy to miss
    • schedule something after this meeting
  • (peterbe) Graphics update
    • lars has a giant pull request being a blocked by an external dashboard run by the graphics team
    • peter wrote an alternate way for them to get the data on-demand instead of through cron + warehousing
    • PR is open for them, waiting on them to switch
  • (adrian) shipped links from postgres powered to es powered reports
    • email coming
    • looking for feedback, will mail stability.
  • (peterbe) featured versions maintenance need to be set on prod and stage
    • button on stage to match prod is up in a PR
    • baby step towards full automation

  • (lars) signature shortening change is coming
    • 5PM if JP is available, we will cut over to the new abbreviated signature method
    • if he is not, we will do it tomorrow instead
    • a few minutes of instability, insignificant

  • mbrandt - PTO 2015-10-15 - 2015-10-16