ReleaseEngineering/How To/Process nagios alerts: Difference between revisions

Jump to navigation Jump to search
 
Line 14: Line 14:
= Processing existing alerts =
= Processing existing alerts =


== Command Queue ==  
== Backlog Age ==
* Affects: buildbot masters. These are jobs that become wedged (possibly failed) in the queue and need to be resubmitted or deleted.
* Affects: end-to-end time for developers. When we hit our warning threshold (currently 6hr), there have been builds waiting to *start* for that long.
* See [[ReleaseEngineering/Queue_directories]] for debugging instructions.
* Runs on: nagios server, checking https://secure.pub.build.mozilla.org/builddata/buildjson/builds-pending.js
* Possible solutions:
** kill off unnecessary jobs
** make sure build-pending.js isn't stale
** restart buildbot masters if they are slow
** for full options: [[ReleaseEngineering/How_To/Dealing_with_high_pending_counts|Dealing with high pending counts]]


== builds-4hr ==  
== builds-4hr ==  
Line 22: Line 27:
* Runs on: relengwebadm host, as a cronjob under the buildapi user
* Runs on: relengwebadm host, as a cronjob under the buildapi user
* Possible solutions: usually this script fails or runs slowly when there are problems with the buildbot status database, either a lock, another long-running query, or simply load. Killing off the offending query and re-running the report-4hr script will fix this but be aware that the report-4hr script can take a while to run, especially on a cold cache.
* Possible solutions: usually this script fails or runs slowly when there are problems with the buildbot status database, either a lock, another long-running query, or simply load. Killing off the offending query and re-running the report-4hr script will fix this but be aware that the report-4hr script can take a while to run, especially on a cold cache.
== Command Queue ==
* Affects: buildbot masters. These are jobs that become wedged (possibly failed) in the queue and need to be resubmitted or deleted.
* See [[ReleaseEngineering/Queue_directories]] for debugging instructions.
canmove, Confirmed users
2,850

edits

Navigation menu