ReleaseEngineering/How To/Process nagios alerts: Difference between revisions

Jump to navigation Jump to search
no edit summary
mNo edit summary
No edit summary
Line 14: Line 14:
= Processing existing alerts =
= Processing existing alerts =


{| cellpadding="10" cellspacing="0" border="1"
== Command Queue ==
* Affects: buildbot masters. These are jobs that become wedged (possibly failed) in the queue and need to be resubmitted or deleted.
* See [[ReleaseEngineering/Queue_directories]] for debugging instructions.


! Alert !! Automated !! Host Types !! Further Notes
== builds-4hr ==
|-
* Affects: treeherder. This data is used to provide job history in treeherder.
| Command Queue ||  || Buildbot Masters || [[ReleaseEngineering/Queue_directories]]
* Runs on: relengwebadm host, as a cronjob under the buildapi user
|}
* Possible solutions: usually this script fails or runs slowly when there are problems with the buildbot status database, either a lock, another long-running query, or simply load. Killing off the offending query and re-running the report-4hr script will fix this but be aware that the report-4hr script can take a while to run, especially on a cold cache.
canmove, Confirmed users
2,850

edits

Navigation menu