ReleaseEngineering/How To/Loan a Slave

From MozillaWiki
Jump to: navigation, search


NOTE: If you're a developer looking to request a slave, please see the entry on Requesting a loaner.

Loaning

High level steps are:

  • Reserve: pick a password; fileabug if developer hasn't already; file a tracking bug that depends on the loan bug; allocate hardware using slavealloc (and maybe reboot) or see detailed instructions for AWS.
  • Cleanup: do some basic cleanup on macosx and linux, none needed for windows.
  • Notify: send an email; update and assign the bug to the developer.

Choosing a host

If you're loaning systems *not* in AWS, find an unreserved machine in SlaveAlloc and visit Slave Health to validate that the machine is:

  • not part of a jacuzzi pool
  • not currently running a build

Reserving

  • Choose a new password for the slave, make note of it. If you're loaning out a t-xp32-ix, t-w732-ix or t-w864-ix machine, you cannot choose a custom password. If you don't know this password, ask someone.
  • File a bug for the developer loan request, if the developer hasn't already. Assign the bug to the developer requesting the slave.
    • The bug will stay open and assigned to the developer while they are still using the slave.
    • Closing the bug is the cue releng uses to recover the slave.
  • If a problem tracking bug does not already exist for the slave, file one. e.g. bld-centos6-hp-005. Make sure the slave bug is dependent on the developer request bug.

Hardware machines

  • Disable the machine in slavealloc. Ensure you pick a machine that is a good known state i.e. actively running jobs to ensure it's being updated. Once the machine is disabled, ensure that it is not actively running jobs before continuing with the next steps of the loan process.
  • Add a note with the developer request bug number, username, and password in slavealloc
  • Grant VPN access. Note: there is no need to do this if the requester is in IT/RelOps.

t-xp32-ix, t-w732-ix, t-w864-ix

Windows slaves needs to be moved to the loaner OU (organizational unit). Changing OUs automatically applies the correct GPOs to change passwords, etc. Most people on the team should have access to the Windows admin host to affect OU changes. Here are the steps:

  • login to the winadmin host via RDP (winadmin.srv.releng.scl3.mozilla.com, domain: RELENG, credentials are in e-mail sent by Q around mid January 2015)
  • open Start->Administrative Tools->Active Directory Users and Computers
  • navigate to the machine you want to move between OUs. It will be somewhere under releng.ad.mozilla.com->machines->windows:
    • test: 7|8|XP->SCL3->tester
  • right-click on the machine you're moving, and select "Move..." from the drop-down
  • select the new OU to move to:
    • test: 7|8|XP->SCL3->tester_loaner
  • reboot the machine (via ssh/slaveapi/whatever). It will automatically come up in the new OU after the loan GPOs are applied. This may take a few minutes.
  • empirically we've determined that we need to reboot the machine again after the GPOs are applied to make sure that the VNC password updates correctly.
  • NOTE: BE SURE TO LOG OFF (Start->Log off) WHEN YOU ARE DONE. Only two users can be connected at once. The next user may need to kick you off so they can connect if you forget to log off.

If you don't have access yourself, or someone who does have access is not readily available, please file a RelOps bug to have the slave moved to the loaner OU and rebooted.

AWS machines

Environment Setup

Log onto aws-manager2.srv.releng.scl3.mozilla.com as yourself and run the following:

sudo su - buildduty
source /builds/aws_manager/bin/activate
cd /builds/aws_manager

See below for creation commands for specific types of machines.

Build machines

Set up the environment then run the following, substituting values appropriately:

arch=64
bug=<bug#>
user=<loan-requester>
email="$user@mozilla.com"
slavetype=dev-linux64-ec2
host=$slavetype-$user
ip=`python cloud-tools/scripts/free_ips.py -c cloud-tools/configs/dev-linux$arch -r us-east-1 -n1`
# double-check that the IP address is not in use by some other machine
host $ip
# create a DNS entry
# use full LDAP e.g. user@mozilla.com
invtool A create --ip $ip --fqdn $host.dev.releng.use1.mozilla.com --private  --description "bug $bug: loaner for $user"
# create a DNS reverse-mapping (required for puppet certs to work properly)
invtool PTR create --ip $ip --target $host.dev.releng.use1.mozilla.com --private --description "bug $bug: loaner for $user"
# create a CNAME as well (for ease of lookup)
invtool CNAME create --fqdn $host.build.mozilla.org --target $host.dev.releng.use1.mozilla.com --private --description "Releng convention"
# wait for DNS changes to be SVN up'd onto each DNS server
i=1
while ! host $host.build.mozilla.org; do
   sleep 1m
   echo "Slept ${i} minutes"
   let i++
done
echo "DNS updated"
# below may take awhile to complete, so feel free to tail the
# per-host logs under buildduty@aws-manager1:/builds/aws_manager/ or /root/puppetize.log on the new
# instance (connect using the aws-releng key)
python cloud-tools/scripts/aws_create_instance.py -c cloud-tools/configs/dev-linux64 -r us-east-1 -s aws-releng \
  --loaned-to $email --bug $bug -k secrets/aws-secrets.json \
  --ssh-key ~/.ssh/aws-ssh-key \
  -i cloud-tools/instance_data/us-east-1.instance_data_dev.json $host

Grant VPN access. Note: there is no need to do this if the requester is in IT/RelOps.

The instance comes with 35G of disk at /. If the user needs more (eg several staging releases) then connect up the hidden 150G disk by doing

# root@<instance>
pvcreate /dev/xvdb
vgextend cloud_root /dev/xvdb
lvextend -l 100%VG /dev/cloud_root/lv_root
resize2fs /dev/cloud_root/lv_root

Test machines

Set up the environment then run the following, substituting values appropriately:

arch=64
bug=<bug#>
user=<loan-requester>
email="$user@mozilla.com"
slavetype=tst-linux$arch-ec2
host=$slavetype-$user
ip=`python cloud-tools/scripts/free_ips.py -c cloud-tools/configs/tst-linux$arch -r us-east-1 -n1`
# double-check that the IP address is not in use by some other machine
host $ip
# create a DNS entry
invtool A create --ip $ip --fqdn $host.test.releng.use1.mozilla.com --private  --description "bug $bug: loaner for $user"
# create a DNS reverse-mapping (required for puppet certs to work properly)
invtool PTR create --ip $ip --target $host.test.releng.use1.mozilla.com --private --description "bug $bug: loaner for $user"
# create a CNAME for the loan too
invtool CNAME create --fqdn $host.build.mozilla.org --target $host.test.releng.use1.mozilla.com --private --description "Releng convention"
# wait for DNS changes to be SVN up'd onto each DNS server
i=1
while ! host $host.build.mozilla.org; do
  sleep 1m
  let i++
  echo "Slept ${i} minutes"
done
echo "DNS updated"
# below may take awhile to complete, so feel free to tail the
# per-host logs under buildduty@aws-manager1:/builds/aws_manager/ or /root/puppetize.log on the new
# instance (connect using the aws-releng key)
python cloud-tools/scripts/aws_create_instance.py -c cloud-tools/configs/tst-linux$arch -r us-east-1 \
  -s aws-releng -k secrets/aws-secrets.json \
  --ssh-key ~/.ssh/aws-ssh-key \
  --loaned-to $email --bug $bug \
  -i cloud-tools/instance_data/us-east-1.instance_data_tests.json $host

Grant VPN access. Note: there is no need to do this if the requester is in IT/RelOps.

Note: Some tests actually run on larger instance types than the m1.medium type that is the usual loaner associated with tst-linux32 and tst-linux64. For instance Android 4.3 opt mochitest, reftests and all debug tests. In this case, you will need to change the commands to that assign ip and create the instance to point to the cloud-tools/configs/tst-emulator64 config. Or you can go into the amazon console, stop the instance you have created above, change the instance type to c3.xlarge and start it again.

Windows 2008 machines

Set up the environment then run the following, substituting values appropriately:

bug=<bug#>
user=<loan-requester>
email="$user@mozilla.com"
slavetype=b-2008-ec2
host=$slavetype-$user
ip=`python cloud-tools/scripts/free_ips.py -c cloud-tools/configs/b-2008 -r us-east-1 -n1`
# double-check that the IP address is not in use by some other machine
host $ip
# create a DNS entry
# use full LDAP e.g. user@mozilla.com
invtool A create --ip $ip --fqdn $host.build.releng.use1.mozilla.com --private  --description "bug $bug: loaner for $user"
# create a DNS reverse-mapping (required for puppet certs to work properly)
invtool PTR create --ip $ip --target $host.build.releng.use1.mozilla.com --private --description "bug $bug: loaner for $user"
# create a CNAME as well (for ease of lookup)
invtool CNAME create --fqdn $host.build.mozilla.org --target $host.build.releng.use1.mozilla.com --private --description "Releng convention"
# wait for DNS changes to be SVN up'd onto each DNS server
i=1
while ! host $host.build.mozilla.org; do
   sleep 1m
   echo "Slept ${i} minutes"
   let i++
done
echo "DNS updated"
# below may take awhile to complete (~ 2 hours) , so feel free to tail the
# c:\log\ for userdata run log. Connect with RDC as root
python cloud-tools/scripts/aws_create_instance.py -c cloud-tools/configs/b-2008 -r us-east-1 -s aws-releng \
 --loaned-to $email --bug $bug -k secrets/aws-secrets.json \
 --ssh-key ~/.ssh/aws-ssh-key \
 -i cloud-tools/instance_data/us-east-1.instance_data_prod.json $host

Grant VPN access.


Windows 7 AWS machines

Set up the environment then run the following, substituting values appropriately:

bug=<bug#>
user=<loan-requester>
email="$user@mozilla.com"
slavetype=t-w732-ec2
host=$slavetype-$user
ip=`python cloud-tools/scripts/free_ips.py -c cloud-tools/configs/t-w732 -r us-east-1 -n1`
# double-check that the IP address is not in use by some other machine
host $ip
# create a DNS entry
# use full LDAP e.g. user@mozilla.com
invtool A create --ip $ip --fqdn $host.test.releng.use1.mozilla.com --private  --description "bug $bug: loaner for $user"
# create a DNS reverse-mapping (required for puppet certs to work properly)
invtool PTR create --ip $ip --target $host.test.releng.use1.mozilla.com --private --description "bug $bug: loaner for $user"
# create a CNAME as well (for ease of lookup)
invtool CNAME create --fqdn $host.build.mozilla.org --target $host.test.releng.use1.mozilla.com --private --description "Releng convention"
# wait for DNS changes to be SVN up'd onto each DNS server
i=1
while ! host $host.build.mozilla.org; do
   sleep 1m
   echo "Slept ${i} minutes"
   let i++
done
echo "DNS updated"
# below may take awhile to complete (~ 2 hours) , so feel free to tail the
# c:\log\ for userdata run log. Connect with RDC as root
python cloud-tools/scripts/aws_create_instance.py -c cloud-tools/configs/t-w732 -r us-east-1 -s aws-releng \
 --loaned-to $email --bug $bug -k secrets/aws-secrets.json \
 --ssh-key ~/.ssh/aws-ssh-key \
 -i cloud-tools/instance_data/us-east-1.instance_data_tests.json $host 

Grant VPN access.

Trouble-shooting

  • if aws_create_instance.py get's stuck with a message like the following (+10 times):
   2014-02-06 15:17:09,974 - WARNING - problem assimilating Instance:i-d8022cf8, retrying in 10 sec
  • check the status of the instance. It may have been created and is hung. You can check the status via aws web console.
  • if you see 'instance state'=running but 'status check'=1/2checks,
  • if this doesn't work after a few attempts, ctrl-c aws_create_instance.py, terminate newly created instance, run aws_create_instance.py again
  • login as root using the aws-releng SSH key

Cleaning

bld-linux64-ec2

  • Change root and cltbld passwords:
passwd root
passwd cltbld
  • Run the following:
rm -rf /var/lib/puppet/ssl/private_keys /etc/init.d/puppet ~cltbld/.bash_history \
   ~root/.bash_history ~root/.sh_history ~cltbld/.ssh /opt/runner
find /builds -maxdepth 1 -type f -print -delete

Reboot.

b-2008-ec2

  • Run the following:
# cltbld@B-2008-IX-0002 ~
rm -rf /c/opt/runner/tasks.d/* /c/ProgramData/PuppetLabs/puppet/var/ssl/private_keys/
  • Setup VNC access: connect to the machine via RDP as root --> open "c:\program files\uvnc bvba\UltraVnc\uvnc_settings.exe" --> set the VNC password

Reboot


t-w732-ec2

  • Run the following:
# cltbld@t-w732-IX-0002 ~
rm -rf /c/opt/runner/tasks.d/* /c/ProgramData/PuppetLabs/puppet/var/ssl/private_keys/  (Program Data is a hidden directory)
  • Setup VNC access: connect to the machine via RDP as root --> open "c:\program files\uvnc bvba\UltraVnc\uvnc_settings.exe" --> set the VNC password

Reboot

  • important hint (machine need to be stopped manually after the instance is created and a manually reboot is needed in order to set the new password)


bld-lion-r5, talos-mtnlion-r5, t-yosemite-r5, t-yosemite-r7

  • Change root and cltbld passwords:
passwd root
passwd cltbld
  • Setup VNC for the developer by following the information in here (not applicable for builders).
  • Run the following:
rm -rf /var/lib/puppet/ssl/private_keys /Library/LaunchDaemons/com.mozilla.puppet.plist \
   /Library/LaunchDaemons/org.mozilla.puppetize.plist ~cltbld/.bash_history \
   ~root/.bash_history ~root/.sh_history /etc/kcpassword ~cltbld/.ssh \
   /opt/runner
find /builds -maxdepth 1 -type f -print -delete
sudo reboot

t-snow-r4

  • Change root and cltbld passwords:
passwd root
passwd cltbld
  • Setup VNC for the developer by following the information in here.
  • Run the following:
rm -rf ~cltbld/.bash_history ~root/.bash_history ~root/.sh_history \
   ~cltbld/.ssh /etc/puppet/ssl /var/lib/puppet/ssl/private_keys \
   /etc/kcpassword /opt/runner
#for good measure I stopped the puppet daemon and removed the script, otherwise it continues to run 
sudo launchctl stop com.mozilla.puppet
sudo rm /usr/local/bin/run-puppet.sh
sudo reboot
  • For VNC access, go to System Preferences -> Sharing -> Click Screen Sharing and enable for the builder user
  • Next to Screen Sharing: On -> Select Computer Settings -> Select VNC Viewers may control screen with password and enter password and select okay
  • The preferences settings are needed to allow VNC clients other than Mac to connect

talos-linux32-ix, talos-linux64-ix, tst-linux32-ec2, tst-linux64-ec2

  • Change root and cltbld passwords:
passwd root
passwd cltbld
  • put a cleartext vnc password in /etc/vnc_passwdfile
  • (do this fast!) Run the following:
rm -rf ~cltbld/.bash_history ~root/.bash_history ~root/.sh_history \
   ~cltbld/.ssh /var/lib/puppet/ssl/private_keys /etc/puppet/init \
   ~cltbld/.config/autostart/gnome-terminal.desktop /opt/runner
sed -i 's/manual/start on started Xsession/' /etc/init/x11vnc.conf
sudo reboot

t-xp32-ix, t-w732-ix, t-w864-ix

GPO handles everything. Remember to reboot the machine after moving it to the loaner OU.

t-w864-ix machines currently need a second reboot to ensure that the VNC password is properly updated. This is being tracked in bug 1185116.

Notifying

Send the developer the following e-mail, substituting where appropriate. It's good to cc release+buildduty@mozilla.com on these as well:

Hello <Judith>,

I have just finished setting up <machine> per your request in bug <NNNNNN> and enabled VPN access to this loaner machine over SSH, VNC and RDP (where it applies).

In order to access it you need to:
* Please review the information in here [1], specially with regards to running tests under VNC.
* Setup the "Mozilla VPN" setup [2]
** Even if you already have access now, you will need to disconnect and reconnect, otherwise, 
   the VPN server won't recognize that you have access to the host
* cltbld/root user, with password: <"password">
** VNC password: <"vnc_password">
* Machines fqdn is: <machine_fqdn> (<private IP>)
* If you need to run mozharness tests, here are some pointers [4]

NOTES:
* Mozilla has recently changed VPN access policies. If you think you should have 
  access and you do not, please file a bug against the MOC requesting vpn_default access [3].
* For SSH access to a Windows loaner, please use "ssh -m hmac-md5 -o PreferredAuthentications=password <user>@<machine_fqdn>"

Let us know if you have any problems or questions!

[1] https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave#Accessing_your_slave
[2] For VPN setup please see Mana https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=30769829
[3] http://bit.ly/vpn_default_bug
[4] https://wiki.mozilla.org/ReleaseEngineering/Mozharness/How_to_run_tests_as_a_developer

You can also comment in the bug after this is all done. Something among the lines of:

Email sent to <loanee> for further instructions. 

Loaning machines: 
- <machine>

Hi <loanee>, 

I am going to assign this to you to keep track of the loan. 

When you are finished with the loan forever, please comment stating so here in the bug, and mark the bug as RESOLVED.

<#### for aws instances, you can also remind the loanee that we can start/stop the instance ####>
By the way, now that this aws instance has been created, starting and stopping it can happen in a flash!
If you are not going to be using this machine for multiple hours, let us know in this bug and we can stop it.

Comment again when you want it started back up.
* For faster turnaround, ping #releng (look for nick with 'buildduty')

Reclaiming

Hardware machines

# replace placeholder with creds from oob-password.txt.gpg
export IPMI_USERNAME=XXXXXXXX
export IPMI_PASSWORD=XXXXXXXX

AWS machines

Terminate the machine. Log onto aws-manager2.srv.releng.scl3.mozilla.com as buildduty and run the following:

# buildduty@aws-manager2
host=dev-linux64-ec2-coop
terminate ${host}

We now want to remove the A and PTR records (You will have to enter full ldap/password 4 times):

invtool search -q "$host (type=:A OR type=:PTR OR type=:CNAME)"
invtool CNAME delete --pk <first column from output of the CNAME record>
invtool A delete --pk <first column from output of the A record>
invtool PTR delete --pk <first column from output of the PTR record>

This is what it will look like:

$ invtool search -q "$host (type=:A OR type=:PTR OR type=:CNAME)"
ldap username: jwood@mozilla.com
ldap password: 
11895 dev-linux64-ec2-coop.build.mozilla.org. None IN  CNAME dev-linux64-ec2-coop.dev.releng.use1.mozilla.com.
32437 dev-linux64-ec2-coop.dev.releng.use1.mozilla.com. None IN  A    10.134.53.194
33069 194.53.134.10.in-addr.arpa.              3600 IN  PTR  dev-linux64-ec2-coop.dev.releng.use1.mozilla.com.

$ invtool CNAME delete --pk 11895
ldap username: jwood@mozilla.com
ldap password:
http_status: 204 (request fulfilled)

$ invtool A delete --pk 32437
ldap username: jwood@mozilla.com
ldap password: 
http_status: 204 (request fulfilled)

$ invtool PTR delete --pk 33069
ldap username: jwood@mozilla.com
ldap password: 
http_status: 204 (request fulfilled)

Customized loaner requests

In case if you need to loan a slave with customized AMIs or puppet manifests, that would require initial puppetization made against your user puppet environment (like in bug 1236925, follow the following recipe:

 PUPPET_EXTRA_OPTIONS="--environment <your env name>"
  • Follow the regular loaning procedure