CIDuty/How To/High Pending Counts: Difference between revisions

CIDuty/How To/High Pending Counts (view source)

Revision as of 09:48, 23 April 2019

557 bytes added , 23 April 2019

updated with new info and made some format changes

Apop

canmove, Confirmed users

112

edits

@@ Line 83: / Line 83: @@
 * load on upload/stage: this can affect the download of artifacts for builds and tests, leading to retries and high pending counts
-If there are no alerts, it is worth asking in #MOC and/or #infra to see if IT is tracking any events not currently on our nagios radar.
+If there are no alerts, it is worth asking in #netops and/or #systems to see if IT is tracking any events not currently on our nagios radar.
+Also, you can check on [https://mozilla.statuspage.io/history Mozilla Status] to see if there is any planned or unplanned action.
 === TaskCluster ===
@@ Line 100: / Line 101: @@
 ==== If pending goes to CRITICAL ====
-. Make sure that workers are picking tasks by looking into specific type of worker in TaskCluster. Machines may go "lazy" after ending a job as an exception (those might need a reboot).
+. Make sure that workers are picking tasks by looking into specific type of worker in TaskCluster. Machines may go "lazy" after ending a job as an exception (those might need a [[CIDuty/How_To/Take_actions_to_RelEng_Hardware_from_TaskCluster_UI|reboot]]).
 . Read backscroll in #taskcluster and search bugzilla under Taskcluster component to find correlation with pending alerts.
-. If no correlation can be found, let people know in #taskcluster about the spike in case it is not expected.
+. If no correlation can be found, let people know in #ci about the spike in case it is not expected.
 = Rebooting taskcluster workers =
-Rebooting taskcluster workers has to be done manually depending on the type of machine. For gecko-t-win10-64 and gecko-t-linux-talos it has to be done via ILO. For gecko-t-osx-1010 it can be done via SSH.
+Rebooting taskcluster workers it can be done manually or automatic depending on the type of machine. For gecko-t-win10-64 and gecko-t-linux-talos it can to be done via iLO or using [https://github.com/Akhliskun/taskcluster-worker-checker Taskcluster Worker Checker]. For gecko-t-osx-1010 it can be done via SSH or via [https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010 roller on Taskcluster].
 = See also =
-https://mana.mozilla.org/wiki/display/NAGIOS/Backlog+Age
+[https://docs.google.com/spreadsheets/d/1pUFq6Z5M5a1ydbSzxNjQivFfryVoksdXa9xXTg9gtzc/edit#gid=0 CIDuty Escalation Path]
+[https://mana.mozilla.org/wiki/display/NAGIOS/Backlog+Age Backlog Age]