148
edits
(aws-manager1 -> aws-manager2) |
(Updated instructions on how to manage AWS slaves) |
||
| Line 74: | Line 74: | ||
*# ''Check machine current status (is it actually running right now) by either'' | *# ''Check machine current status (is it actually running right now) by either'' | ||
*#* Logging into [https://mozilla-releng.signin.aws.amazon.com/console AWS web console], look up instance, and see if it is still running | *#* Logging into [https://mozilla-releng.signin.aws.amazon.com/console AWS web console], look up instance, and see if it is still running | ||
*#** | *#** ['''Note''']: if you don't know the credentials for this, they probably have to be generated for you. Ask :catlee, as he has done this | ||
*#* Using releng cloud-tools from aws-manager2.srv.releng.scl3.mozilla.com | *#* Using releng cloud-tools from aws-manager2.srv.releng.scl3.mozilla.com | ||
*#** see [https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Manage_AWS_slaves#Usage usage] for 'status' command above | *#** see [https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Manage_AWS_slaves#Usage usage] for 'status' command above | ||
| Line 81: | Line 81: | ||
*#* If loaners/releng-dev machines: | *#* If loaners/releng-dev machines: | ||
*#** ssh as root into that machine, and run `last` | *#** ssh as root into that machine, and run `last` | ||
*#** find the bug that is | *#** find the bug that is associated with the instance and check latest comments. | ||
*#*** the bug number can also be found by looking at the instance tags in AWS console | |||
*#* If it's one of our Buildbot CI machines | *#* If it's one of our Buildbot CI machines | ||
*#** use [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Slave Health] or ssh into machine and tail twistd.log | *#** use [https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Slave Health] or ssh into machine and tail twistd.log | ||
*#** | *#** ['''Note''']: these machines should not be running long. It's put on the long running process list if it's up for more than 2h. So if it's been idle for while, further action will be required. | ||
*# ''For instances that have not had any recent builds/activity and you are sure they are not currently doing a build'' | *# ''For instances that have not had any recent builds/activity and you are sure they are not currently doing a build'' | ||
*#* If loaners/releng-dev machines: | *#* If loaners/releng-dev machines: | ||
*#** Poke the owner of the instance via the associated bug, checking if they still need the machine. | *#** Poke the owner of the instance via the associated bug, checking if they still need the machine. | ||
*#** use judgement for what's fair. ''eg: if it's been up for 24-48hrs, probably not cause for further action.'' | *#** use judgement for what's fair. ''eg: if it's been up for 24-48hrs, probably not cause for further action.'' | ||
*#** Store owner/usage detail in the moz-used-by instance Tag (if not already updated) | *#** Store owner/usage detail in the moz-used-by instance Tag (if not already updated) | ||
| Line 94: | Line 95: | ||
*#*** 'stop' the instance if owner wants to use it again soon but won't be working on it for a day or two | *#*** 'stop' the instance if owner wants to use it again soon but won't be working on it for a day or two | ||
*#**** see [https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Manage_AWS_slaves#Usage usage] for 'stop' command above | *#**** see [https://wiki.mozilla.org/index.php?title=ReleaseEngineering/How_To/Manage_AWS_slaves#Usage usage] for 'stop' command above | ||
*#**** | *#**** ['''Note''']: this should be made appealing to the owner as turning it back on is *easy* and fast! | ||
*#*** 'terminate' the instance if owner has stated to be finished forever or if bug is resolved | *#*** 'terminate' the instance if owner has stated to be finished forever or if bug is resolved | ||
*#**** see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#Reclaiming Reclaiming Loaners] | *#**** see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#Reclaiming Reclaiming Loaners] | ||
*#**** | *#**** ['''Note''']: don't forget to revert vpn access bug and delete A/PTR records. | ||
*#* If it's one of our Buildbot CI machines: | *#* If it's one of our Buildbot CI machines: | ||
*#** Decide whether to stop or terminate instance | *#** Decide whether to stop or terminate instance | ||
*#*** if this is a spot instance: | *#*** if this is a spot instance: | ||
*#**** [https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#AWS_machines_2 terminate it] (don't delete A/ATR records) | *#**** [https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#AWS_machines_2 terminate it] (don't delete A/ATR records) | ||
*#**** | *#**** ['''Note''']: they need to be terminated because spot instances don't really have a 'stopped' state. | ||
*#*** if this is not a spot instance: | *#*** if this is not a spot instance: | ||
*#**** shut it down by: | *#**** shut it down by: | ||
| Line 108: | Line 109: | ||
*#***** logging into [https://mozilla-releng.signin.aws.amazon.com/console AWS web console] and choose 'stop' in dropdown | *#***** logging into [https://mozilla-releng.signin.aws.amazon.com/console AWS web console] and choose 'stop' in dropdown | ||
*#***** ssh in to machine and: $ shutdown -h now | *#***** ssh in to machine and: $ shutdown -h now | ||
*#**** | *#**** ['''Note''']: stopping will allow aws_watch_pending to deal with deciding when it needs to be started up again | ||
* '''For repeating problematic instances''', further action will be required. Ask in #releng and possibly esculate to catlee/rail | * '''For repeating problematic instances''', further action will be required. Ask in #releng and possibly esculate to catlee/rail | ||
== Unknown Type Or State Instances == | == Unknown Type Or State Instances == | ||
edits