DeveloperServices/TeamMeetings/2014-08-12: Difference between revisions
Jump to navigation
Jump to search
(10 intermediate revisions by 5 users not shown) | |||
Line 9: | Line 9: | ||
== Hot items == | == Hot items == | ||
* Still seeing intermittent sync failures ({{bug|1038678}}); ssh timeout tweaked on zeus, but mirror-pull could still stand to be more resilient | |||
* Add'l hgweb nodes? ({{bug|1049519}}) Added two spares, but how many should we have? | |||
== Last week == | == Last week == | ||
* bkero | * bkero | ||
** | ** Deployed user repository fixes | ||
** | ** Deployed serverlog extension on cluster, debugged | ||
** Build packages and installed python debugging packages, then installed on hgweb1 | |||
** Diagnosed and created verbal (IRC) reports of some traffic statistics | |||
* fubar | * fubar | ||
** Added two build trees to DXR! Also, staging working again, all cron and config bits now in [http://hg.mozilla.org/webtools/dxr build repo], and build script refactored | ** Added two build trees to DXR! Also, staging working again, all cron and config bits now in [http://hg.mozilla.org/webtools/dxr build repo], and build script refactored | ||
Line 20: | Line 23: | ||
** Configured local2 syslog logging for new pash_wrapper and gps' extensions | ** Configured local2 syslog logging for new pash_wrapper and gps' extensions | ||
* hwine | * hwine | ||
** oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive) | |||
** started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues. | |||
** installed pash_wrapper for ssh | |||
* laura | * laura | ||
** more headcount justifications | |||
** approval from bmoss for the extra blades | |||
* erik | |||
** Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case (n=2) is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc. | |||
=== Planned for this week === | === Planned for this week === | ||
* bkero | * bkero | ||
** | ** Update wsgi, deploy | ||
** | ** Tune wsgi settings to see if it alleviates unavailability | ||
** Parse logs for patterns/errors/statistics | |||
** Push for hg update | |||
* fubar | * fubar | ||
** ReviewBoard web heads/admin node | |||
** hg firefighting | |||
** more build repos in DXR | |||
* hwine | * hwine | ||
** more added monitoring and correlation. | |||
* laura | * laura | ||
** get help from srich for rb deployment | |||
** status board bug | |||
** other than that: what's the most helpful thing I can do? | |||
== Other business == | == Other business == |
Latest revision as of 19:38, 12 August 2014
« previous meeting — index – next week » create?
Meeting Info
Hot items
- Still seeing intermittent sync failures (bug 1038678); ssh timeout tweaked on zeus, but mirror-pull could still stand to be more resilient
- Add'l hgweb nodes? (bug 1049519) Added two spares, but how many should we have?
Last week
- bkero
- Deployed user repository fixes
- Deployed serverlog extension on cluster, debugged
- Build packages and installed python debugging packages, then installed on hgweb1
- Diagnosed and created verbal (IRC) reports of some traffic statistics
- fubar
- Added two build trees to DXR! Also, staging working again, all cron and config bits now in build repo, and build script refactored
- Two new hgweb nodes provisioned (9 & 10); added new webhead docs
- Configured local2 syslog logging for new pash_wrapper and gps' extensions
- hwine
- oncall last week - only one late page -- clarified how unimportant the current nagios alert is (it's a leading indicator with about 80% false positive)
- started releng intern Mihai Tabara on looking at logs near the start of event to find root cause of issues.
- installed pash_wrapper for ssh
- laura
- more headcount justifications
- approval from bmoss for the extra blades
- erik
- Chased hgweb spins around. With bkero, got debug symbols installed on hgweb1 to pull Python tracebacks from running processes. What actually seems to be the case (n=2) is that the spins are happening while at apr_poll()→poll() in mod_wsgi, which is weird. I'd like to get a few more backtraces out of a spinning webhead to be sure that wasn't a fluke. Did a bunch of theorizing around mod_wsgi spin causes, directives we could frob, etc.
Planned for this week
- bkero
- Update wsgi, deploy
- Tune wsgi settings to see if it alleviates unavailability
- Parse logs for patterns/errors/statistics
- Push for hg update
- fubar
- ReviewBoard web heads/admin node
- hg firefighting
- more build repos in DXR
- hwine
- more added monitoring and correlation.
- laura
- get help from srich for rb deployment
- status board bug
- other than that: what's the most helpful thing I can do?