CIDuty:QuarantineInstances

From MozillaWiki
Jump to: navigation, search
When and How to quarantine taskcluster instances
1. Choose the worker types you wish to investigate. You can find them here.
2. Check which instances have exception(orange) or failed(red) at task state and investigate each of them separately.

Worker List.png


Log analyses and quarantine the machine
1. If the last 4-5+ tests are problematic, be sure to check out a few of them.

Test name.png

Check public/logs/live_backing.log for errors on a few of the latest tests by going to <Test-Name> -> Run Artifacts -> public/logs/live_backing.log as shown below:

Log location.png

2. Judging by the error logs we will know if the machine is faulty (quarantine if this is true) or not. There is no correct answer for this, only that we will know it from experience. Thus far we know if the above conditions are met and the error log terminated with error code -1 and a message like :

Error log.png


3. Quarantine the machine for which all of the above is true by pressing the Quarantine button and leaving the default 1000 years as expiration date, as shown in below :
Quarantine pic.png
Bugzilla
1. Check if there is any bug opened for the affected machine on Bugzilla, under CIDuty and/or Relops using the keywords
ALL machine_name 
2. If there is a bug created, just update with a message that you have quarantined the machine and add the reason.
3. If there is no bug created, file a bug in Bugzilla under RelOps
Update Moonshot inventory
1. Update the Master Moonshot Inventory spreadsheet with the bug number for the affected machine.

Also check how to quarantine a machine or multiple machine using taskcluster cli