When and How to quarantine taskcluster instances
- 1. Choose the worker types you wish to investigate. You can find them here.
- 2. Check which instances have exception(orange) or failed(red) at task state and investigate each of them separately.
- 3. If the last 5-6+ tests are problematic, be sure to check out a few of them.
- Check public/logs/live_backing.log for errors on a few of the latest tests by going to <Test-Name> -> Run Artifacts -> public/logs/live_backing.log as shown below:
- 4. Judging by the error logs we will know if the machine is faulty (quarantine if this is true) or not. There is no black or white answer for this only that we will know it from experience. Thus far we know if the above conditions are met and the error log terminated with error code -1 and a message like :
- 5. Quarantine all the instances for which all of the above is true by pressing the Quarantine button and leaving the default 1000 years as expiration date, as shown in this image.
- 6. File a bug in Bugzilla under RelOps e.g.: https://bugzilla.mozilla.org/show_bug.cgi?id=1441820
- 7. Update the Master Moonshot Inventory spreadsheet with the details for the bug (usually BUG:<NUMBER>]