Meeting Info

Hot items

  • Still seeing intermittent sync failures (bug 1038678); ssh timeout tweaked on zeus, but mirror-pull could still stand to be more resilient
  • Add'l hgweb nodes? (bug 1049519) Added two spares, but how many should we have?

Last week

  • bkero
    • Deployed user repository fixes
    • Deployed serverlog extension on cluster, debugged
    • Build packages and installed python debugging packages, then installed on hgweb1
    • Diagnosed and created verbal (IRC) reports of some traffic statistics
  • fubar
    • Added two build trees to DXR! Also, staging working again, all cron and config bits now in build repo, and build script refactored
    • Two new hgweb nodes provisioned (9 & 10); added new webhead docs
    • Configured local2 syslog logging for new pash_wrapper and gps' extensions
  • hwine
    • oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive)
    • started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues.
    • installed pash_wrapper for ssh
  • laura
    • more headcount justifications
    • approval from bmoss for the extra blades
  • erik
    • Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case (n=2) is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.

Planned for this week

  • bkero
    • Update wsgi, deploy
    • Tune wsgi settings to see if it alleviates unavailability
    • Parse logs for patterns/errors/statistics
    • Push for hg update
  • fubar
    • ReviewBoard web heads/admin node
    • hg firefighting
    • more build repos in DXR
  • hwine
    • more added monitoring and correlation.
  • laura
    • get help from srich for rb deployment
    • status board bug
    • other than that: what's the most helpful thing I can do?

Other business

PTOs, etc