CIDuty/How To/Troubleshoot Hardware: Difference between revisions

Jump to navigation Jump to search
m
minor format update
(added more info after fusion with other page)
m (minor format update)
Line 26: Line 26:


==== Workers ====
==== Workers ====
===== How to add/define a worker if it is missing from Taskcluster =====
If we cannot ssh into OSX nodes, we can try to restart them from Taskcluster.<br />
But if they are not visible in the Taskcluster worker explorer, then you can create them using this version of [[CIDuty/How_To/QuarantineMultipleInstances|quarantine script]] that will add/define a worker if it is missing.
After setting up the taskcluster cli and script run the following command : e.g. :<code>python quarantine_tc.py --enable -p releng-hardware -w gecko-t-osx-1010 -g mdc2 t-yosemite-r7-449</code>
After the steps above the worker explorer will show the machine and you can reboot it from there, using roller<br />
If the issue is not fixed ( the machine does not take jobs and SSH is still not working ), create a bug for RelOps to physically reboot and reimage/netboot the machines.<br />
The Automatic Bug Generator will create a bug for RelOps if the restart fails.


===== Windows 10 =====
===== Windows 10 =====
Line 75: Line 67:
To learn more about a machine, if it is loaned, hardware issues, etc  you should find on [https://docs.google.com/spreadsheets/d/1IPTmppvqDw0PQV-O1LgXLJg_7TC-H_IAAnSxcur8c7I/edit?pli=1#gid=562893333 Moonshot Inventory] or/and on [https://github.com/mozilla-releng/build-puppet/search?q=T-W1064-MS-072&unscoped_q=T-W1064-MS-072 node definition] ( here we have searched for T-W1064-MS-072) but if you don't find enough information, you should check on [https://bugzilla.mozilla.org/ Bugzilla] using the following keywords : ALL machine_name.<br />
To learn more about a machine, if it is loaned, hardware issues, etc  you should find on [https://docs.google.com/spreadsheets/d/1IPTmppvqDw0PQV-O1LgXLJg_7TC-H_IAAnSxcur8c7I/edit?pli=1#gid=562893333 Moonshot Inventory] or/and on [https://github.com/mozilla-releng/build-puppet/search?q=T-W1064-MS-072&unscoped_q=T-W1064-MS-072 node definition] ( here we have searched for T-W1064-MS-072) but if you don't find enough information, you should check on [https://bugzilla.mozilla.org/ Bugzilla] using the following keywords : ALL machine_name.<br />
You can also check the actual status of the machine, [https://mozilla.service-now.com/nav_to.do Here].
You can also check the actual status of the machine, [https://mozilla.service-now.com/nav_to.do Here].
===== How to add/define a worker if it is missing from Taskcluster =====
If we cannot ssh into OSX nodes, we can try to restart them from Taskcluster.<br />
But if they are not visible in the Taskcluster worker explorer, then you can create them using this version of [[CIDuty/How_To/QuarantineMultipleInstances|quarantine script]] that will add/define a worker if it is missing.
After setting up the taskcluster cli and script run the following command : e.g. :<code>python quarantine_tc.py --enable -p releng-hardware -w gecko-t-osx-1010 -g mdc2 t-yosemite-r7-449</code>
After the steps above the worker explorer will show the machine and you can reboot it from there, using roller<br />
If the issue is not fixed ( the machine does not take jobs and SSH is still not working ), create a bug for RelOps to physically reboot and reimage/netboot the machines.<br />
The Automatic Bug Generator will create a bug for RelOps if the restart fails.


===== No video on all cartridges from a chassis =====
===== No video on all cartridges from a chassis =====
Line 86: Line 87:


===== SSH not working =====
===== SSH not working =====
* Check the [https://papertrailapp.com/ Papertrail logs]
* Check the [https://papertrailapp.com/ Papertrail logs]
* Reboot it from [[CIDuty/How_To/Take_actions_to_RelEng_Hardware_from_TaskCluster_UI|Taskcluster]]. It may have old auth keys or not completed re-imaging
* Reboot it from [[CIDuty/How_To/Take_actions_to_RelEng_Hardware_from_TaskCluster_UI|Taskcluster]]. It may have old auth keys or not completed re-imaging
* Create a tracking bug or update the existent one.
* Create a tracking bug or update the existent one.
canmove, Confirmed users
112

edits

Navigation menu