canmove, Confirmed users
2,850
edits
ChrisCooper (talk | contribs) |
ChrisCooper (talk | contribs) |
||
| Line 14: | Line 14: | ||
= Processing existing alerts = | = Processing existing alerts = | ||
== | == Backlog Age == | ||
* Affects: | * Affects: end-to-end time for developers. When we hit our warning threshold (currently 6hr), there have been builds waiting to *start* for that long. | ||
* | * Runs on: nagios server, checking https://secure.pub.build.mozilla.org/builddata/buildjson/builds-pending.js | ||
* Possible solutions: | |||
** kill off unnecessary jobs | |||
** make sure build-pending.js isn't stale | |||
** restart buildbot masters if they are slow | |||
** for full options: [[ReleaseEngineering/How_To/Dealing_with_high_pending_counts|Dealing with high pending counts]] | |||
== builds-4hr == | == builds-4hr == | ||
| Line 22: | Line 27: | ||
* Runs on: relengwebadm host, as a cronjob under the buildapi user | * Runs on: relengwebadm host, as a cronjob under the buildapi user | ||
* Possible solutions: usually this script fails or runs slowly when there are problems with the buildbot status database, either a lock, another long-running query, or simply load. Killing off the offending query and re-running the report-4hr script will fix this but be aware that the report-4hr script can take a while to run, especially on a cold cache. | * Possible solutions: usually this script fails or runs slowly when there are problems with the buildbot status database, either a lock, another long-running query, or simply load. Killing off the offending query and re-running the report-4hr script will fix this but be aware that the report-4hr script can take a while to run, especially on a cold cache. | ||
== Command Queue == | |||
* Affects: buildbot masters. These are jobs that become wedged (possibly failed) in the queue and need to be resubmitted or deleted. | |||
* See [[ReleaseEngineering/Queue_directories]] for debugging instructions. | |||