canmove, Confirmed users
112
edits
(updated with new info and made some format changes) |
|||
| Line 83: | Line 83: | ||
* load on upload/stage: this can affect the download of artifacts for builds and tests, leading to retries and high pending counts | * load on upload/stage: this can affect the download of artifacts for builds and tests, leading to retries and high pending counts | ||
If there are no alerts, it is worth asking in # | If there are no alerts, it is worth asking in #netops and/or #systems to see if IT is tracking any events not currently on our nagios radar. | ||
Also, you can check on [https://mozilla.statuspage.io/history Mozilla Status] to see if there is any planned or unplanned action. | |||
=== TaskCluster === | === TaskCluster === | ||
| Line 100: | Line 101: | ||
==== If pending goes to CRITICAL ==== | ==== If pending goes to CRITICAL ==== | ||
1. Make sure that workers are picking tasks by looking into specific type of worker in TaskCluster. Machines may go "lazy" after ending a job as an exception (those might need a reboot). | 1. Make sure that workers are picking tasks by looking into specific type of worker in TaskCluster. Machines may go "lazy" after ending a job as an exception (those might need a [[CIDuty/How_To/Take_actions_to_RelEng_Hardware_from_TaskCluster_UI|reboot]]). | ||
2. Read backscroll in #taskcluster and search bugzilla under Taskcluster component to find correlation with pending alerts. | 2. Read backscroll in #taskcluster and search bugzilla under Taskcluster component to find correlation with pending alerts. | ||
3. If no correlation can be found, let people know in # | 3. If no correlation can be found, let people know in #ci about the spike in case it is not expected. | ||
= Rebooting taskcluster workers = | = Rebooting taskcluster workers = | ||
Rebooting taskcluster workers | Rebooting taskcluster workers it can be done manually or automatic depending on the type of machine. For gecko-t-win10-64 and gecko-t-linux-talos it can to be done via iLO or using [https://github.com/Akhliskun/taskcluster-worker-checker Taskcluster Worker Checker]. For gecko-t-osx-1010 it can be done via SSH or via [https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010 roller on Taskcluster]. | ||
= See also = | = See also = | ||
https://mana.mozilla.org/wiki/display/NAGIOS/Backlog+Age | [https://docs.google.com/spreadsheets/d/1pUFq6Z5M5a1ydbSzxNjQivFfryVoksdXa9xXTg9gtzc/edit#gid=0 CIDuty Escalation Path] | ||
[https://mana.mozilla.org/wiki/display/NAGIOS/Backlog+Age Backlog Age] | |||