Revision as of 20:42, 22 January 2013

If the machine is a re-purposed machine there are more steps that these needed. Check How to create new slaves or move them to other pools.

If your machine has simply been re-imaged follow the instructions from the appropriate section.

Linux/Mac

hostname verification

Linux

Verify the hostname, checking that it ends in 'build.(datacenter).mozilla.com':

hostname --fqdn

To fix it:

"su -" to become root.
edit the file /etc/sysconfig/network
- changing the hostname to the host's *long* (with datacenter) fully qualified domain name.
reboot before running puppet

Mac

Verify the hostname, checking that it ends in 'build.(datacenter).mozilla.com':

hostname

To fix it (su - to become root):

run the following: scutil --set HostName XXX

armenzg: TODO: Who knows why this step is needed for? From my experience, even though the hostname looks like talos-r3-leopard-ref (XX) 1) Web Sharing, 2) Remote login and 3) Remote Management seems to have the correct HostName after running scutil and having rebooted

open System Preferences -> Sharing and change host name there

From Armen's experience this does not require intervention:

Note: cltbld user is listed for auto-login in the System Preferences -> Accounts-->Login Options dialog

Aki couldn't get CotVNC to work:

cmd-K vnc://... on mac finder

puppet

Note: For PuppetAgain slaves (e.g. HP Slaves) you should not need to do anything special for it to puppetize after a reimage. Just make sure that ~root/puppetize.log is from somepoint after it was imaged and the last lines in it do not show errors.
See PuppetAgain Process Docs for the gritty details on why this is true.

Note that initial setup of puppet on slaves is very different than from buildbot masters. On slaves, the daemon is not run, rather updates are polled for when it won't impact jobs. Do not enable the standard puppet service daemon on slaves.

To find the correct master to use with your slave(s), consult the puppet server list. If your slave isn't using a PuppetAgain master, you'll have to adjust /etc/sysconfig/puppet manually to reflect the correct master value for PUPPET_SERVER. (Search for your slave's hostname in the http://hg.mozilla.org/build/puppet-manifests *production.pp files at the root of the repo)

Darwin Note: remember to kill all instances of run-puppet-and-buildbot.sh script as it will be running with the refimage config and that will be overwriting your attempts to fix the puppet certs until you do

talos-r3-fed example (to help doing it):

 uname -a # to know if the hostname is correct and the FQDN
 su - # switch to root
 # on linux slave:
 rm -rf /var/lib/puppet/ssl/certs/*
 # on mac slave:
 rm -rf /etc/puppet/ssl/certs/*

 # on master
 # you have to figure out the master depending on the datacenter the slave belongs to
 puppetca --clean talos-r3-fed64-007.build.scl1.mozilla.com

 # on slave
 puppetd --test --server scl-production-puppet.build.scl1.mozilla.com

 # on puppet master
 puppetca --sign talos-r3-fed64-007.build.scl1.mozilla.com

 # on slave
 puppetd --test --server scl-production-puppet.build.scl1.mozilla.com
 # wait few seconds and it should reboot

get the slave talking to puppet. This will require a lot of repetitive work:
- adjust the master it talks to be appropriate to its location
  - NOTE: syncing against the correct masters (I believe) it adjusts these values.
  - Linux builder: /etc/sysconfig/puppet
  - Linux tester: /home/cltbld/.config/autostart/gnome-terminal.desktop
  - Mac: /Library/LaunchDaemons/com.reductiv*.plist
- run puppetd --test --noop --server $server_you_chose
  - If you get an error about directory not existing on linux, run without "--noop" once, so the directories can be created.
  - If you get an error that the slave can't access /N to access certain packages, update the fileserver.conf on the slave to ensure that that the subnet that the slave resides on is included in the list of subnets that can access /N. Otherwise the slave won't be able to access the resources it needs from the /N directory served by Apache.
  - note that the scl server has a funny name!
  - if you see errors about certificates, remove the certificate files (/var/lib/puppet/ssl/certs/*, /var/puppet/ssl/certs/*, or /etc/puppet/ssl/certs/*, depending on the slave)
  - run puppetca --sign $slave_fqdn repeatedly on the appropriate puppet master. Cron runs it every 60 seconds, but waiting for the crontask just slows you down.
  - if told to, run puppetca --clear $slave_fqdn on the master - this occurs when the master has an old key for this slave
  - Note that if you see a successful run but nothing happens, you're probably talking to a master which has no configuration for this slave - check that you're talking to the right master, and that the master's site.pp file contains the slave's name, and try again.
- once puppet hits the right master, it will both blow away the certificates (even though they were correct) and reboot. So you'll need to wait for a restart, log in, and go through the above process again. Hopefully you'll only need to do this once.
once puppet is done eviscerating itself, have a look at the slave's twistd.log. If it's getting an UnauthorizedLogin for connection to the staging master, fix the password or add the slave to the master's config. Otherwise, watch the staging master until the slave finishes a job.

How to fix the hostname for Windows

Instead of replicating the information. Here are the instructions for all of our Windows platforms.

right-click on 'My Computer', go to 'Properties', 'Computer Name'
change the hostname
the domain name should be build.mozilla.org
- Otherwise, click 'Change', type the computer name, click 'More', type the domain, and click OK until it restarts.

Windows slaves will come back from a re-image with "talos-r3-xp-ref" or "talos-r3-w7-ref" as the hostname.

Windows 2008 64-bit (MDT & unmanaged)

These machines are set up almost all the way with Group policy and only this is required to be setup after re-imaging:

fix the hostname
follow the post reimaging steps.

Windows 2003 (soon to be obsoleted)

Activation

Nothing to be done but keeping track of it. Windows 2003 already comes pre-activated. You can check with:

oobe/msoobe /a

Hostname

change the hostname by following the steps on How to fix the hostname for Windows.

OPSI

No action required

The entry of your slave on OPSI has been created from a template which has a package called "passwordupdate". That package is set to run "always" which ensures that the snapshot could have an older password and be updated immediately to the current ones.

Windows XP (OPSI partially)

tasklist

Make sure that you can run the command tasklist. If you can't, ask IT to re-image again. This issue is documented in their imaging instructions: https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=28575847

Hostname

change the hostname by following the steps on ReleaseEngineering/How_To/Set_Up_a_Freshly_Imaged_Slave#How_to_fix_the_hostname_for_Windows.

If you don't add the DNS change for Windows slaves using OPSI you will most likely get a Mit_Netzlaufwerken_verbinden error before the machine logs in.

OPSI

No action required

The entry of your slave on OPSI has been created from a template which has a package called "passwordupdate". That package is set to run "always" which ensures that the snapshot could have an older password and be updated immediately to the current ones.

Windows 7 (unmanaged)

The test reference platform is fairly complete.

Activation

Win7 will need to be activated. IT should have done this, but check by going to Control Panel -> System -> Activate Windows - a failure to activate will burn builds later.

If it is not activated asked IT to do so.

Hostname

change the hostname by following the steps on ReleaseEngineering/Set_Up_a_Freshly_Imaged_Slave#How_to_fix_the_hostname_for_Windows.

Slavealloc notes and settings keys

Install the correct set of secrets on the machine. These include:
- Update the ssh keys to the correct values for the destination pool
- if you're troubleshooting a recently returned slave, you may want to also reverse engineer How To/Clean A Slave For Shipment Externally
Change the slave's fields, eg production (non-try):
- Trust: core
- Environ: prod
- Pool: build-scl1 (or whatever is appropriate)
reboot it.

@@ Line 146: / Line 146: @@
 # Install the correct set of secrets on the machine. These include:
 #* [[ReleaseEngineering/How_To/Adjust_SSH_keys_on_a_slave|Update the ssh keys]] to the correct values for the destination pool
-#* Linux (doing android builds): <tt>scp -oBatchMode=no -r other_same_class_host:.{android,mozpass.cfg} .</tt>
 #* if you're troubleshooting a recently returned slave, you may want to also reverse engineer [https://intranet.mozilla.org/RelEngWiki/index.php/How_To/Clean_A_Slave_For_Shipment_Externally How To/Clean A Slave For Shipment Externally]
 # Change the slave's fields, eg production (non-try):

ReleaseEngineering/How To/Set Up a Freshly Imaged Slave: Difference between revisions

Revision as of 20:42, 22 January 2013

Contents

Linux/Mac

hostname verification

Linux

Mac

puppet

How to fix the hostname for Windows

Windows 2008 64-bit (MDT & unmanaged)

Windows 2003 (soon to be obsoleted)

Activation

Hostname

OPSI

Windows XP (OPSI partially)

tasklist

Hostname

OPSI

Windows 7 (unmanaged)

Activation

Hostname

Slavealloc notes and settings keys

Navigation menu