ReleaseEngineering/How To/Set Up a Freshly Imaged Slave: Difference between revisions

no edit summary
No edit summary
Line 1: Line 1:
{{Release Engineering How To|Set Up a Freshly Imaged Slave}}
{{Release Engineering How To|Set Up a Freshly Imaged Slave}}
Generally, we bring up a fresh slave in preproduction first, and watch it do a job or two before moving it to production.  This is one of the few cases where it's OK to have an otherwise-production slave running against a preproduction or personal master.


Consult [[ReferencePlatforms]] for information about this particular slave platform, and update any important steps that are omitted here.
If the machine is a re-purposed machine there are more steps that these needed. Check [[ReleaseEngineering/How_To/Create_new_slaves_or_move_them_to_other_pools|How to create new slaves or move them to other pools]].
 
If your machine has simply been re-imaged follow the instructions from the appropriate section.


= Linux/Mac =
= Linux/Mac =
Line 90: Line 91:
* once puppet is done eviscerating itself, have a look at the slave's twistd.log.  If it's getting an UnauthorizedLogin for connection to the staging master, fix the password or add the slave to the master's config.  Otherwise, watch the staging master until the slave finishes a job.
* once puppet is done eviscerating itself, have a look at the slave's twistd.log.  If it's getting an UnauthorizedLogin for connection to the staging master, fix the password or add the slave to the master's config.  Otherwise, watch the staging master until the slave finishes a job.


= Windows 2003 & XP (OPSI) =
= How to fix the hostname for Windows =
== Fix hostname ==
Instead of replicating the information. Here are the instructions for all of our Windows platforms.
Generally, Windows slaves will come back from a re-image with "talos-r3-xp-ref" as the hostname.
 
* right-click on 'My Computer', go to 'Properties', 'Computer Name'
* right-click on 'My Computer', go to 'Properties', 'Computer Name'
* change the hostname
* change the hostname
* check that the domain is ''build.mozilla.org''
* the domain name should be ''build.mozilla.org''
** Otherwise, click 'Change', type the computer name, click 'More', type the domain, and click OK until it restarts.
** Otherwise, click 'Change', type the computer name, click 'More', type the domain, and click OK until it restarts.


If you don't add the DNS change for Windows slaves using OPSI you will most likely get a [[ReleaseEngineering/OPSI#Mit_Netzlaufwerken_verbinden.2C_bitte_noch_etwas_warten|Mit_Netzlaufwerken_verbinden]] error before the machine logs in.
Windows slaves will come back from a re-image with "talos-r3-xp-ref" or "talos-r3-w7-ref" as the hostname.
 
= Windows 2008 64-bit (MDT & unmanaged)=
These machines are set up almost all the way with Group policy and only this is required to be setup after re-imaging:
* fix the hostname
* follow the [[ReferencePlatforms/Win64#Post-reimaging_steps|post reimaging]] steps.


== New machine or moving between production/staging ==
= Windows 2003 (soon to be obsoleted) =
The machine is new or moving between production and staging (so the OPSI server does not know about it)If this is the case, you'll need to follow [[ReleaseEngineering/How_To/Move_a_Slave_Between_Production_and_Staging#OPSI these instructions]].
== Activation ==
Nothing to be done but keeping track of it.
Windows 2003 already comes pre-activated. You can check with:
  oobe/msoobe /a


OPSI is currently merely used to:
== Hostname ==
* deploy a new package
* change the hostname by following the steps on [[ReleaseEngineering/Set_Up_a_Freshly_Imaged_Slave#How_to_fix_the_hostname_for_Windows|How to fix the hostname for Windows]].
* run "always" the packages "cleanup" and "passwordupdate"
<strike>
== OPSI (talos-r3-xp & w32-ix slaves) ==
Once you delete the entry of a slave from OPSI, there is a script that will regenerate the entry for that slave based on a template. The template will have the information of which packages are already installed on the ref machine and which ones actually need to be "setup".


=== Reimaged ===
== OPSI ==
The machine has been re-imaged (so the OPSI server already has a key for such host). If this is the case, remove the previous HostKey on the OPSI master and reboot the machine to register again with the OPSI master (this is mentioned at the end [[ReferencePlatforms/Win32#Add_slave.28s.29_to_configuration_files|this]] section).
* No action required
To do so:
The entry of your slave on OPSI has been created from a template which has a package called "passwordupdate". That package is set to run "always" which ensures that the snapshot could have an older password and be updated immediately to the current ones.


* start -> run -> eventvwr
= Windows XP (OPSI partially)=
** option -> action menu -> clear all events
== Hostname ==
* find out the OPSI hostname of the slave by looking in ''/etc/opsi/pckeys'' (production-opsi)
* change the hostname by following the steps on [[ReleaseEngineering/Set_Up_a_Freshly_Imaged_Slave#How_to_fix_the_hostname_for_Windows]].
* then delete the old key with  <tt>opsi-admin -d method deleteClient $hostname</tt>.  You can also delete it in the OPSI.jar console through a right click.
* reboot the machine
</strike>


<strike>
If you don't add the DNS change for Windows slaves using OPSI you will most likely get a [[ReleaseEngineering/OPSI#Mit_Netzlaufwerken_verbinden.2C_bitte_noch_etwas_warten|Mit_Netzlaufwerken_verbinden]] error before the machine logs in.
=== Packages already have state ===
When a new slave appears on OPSI it will take the state of its reference machine (e.g. win32-ix-ref and talos-r3-xp-ref). This means that the slave should be ready.


Just as an example, don't take it as a reference.
== OPSI ==
{|
* No action required
| w32-ix-slaveNN
The entry of your slave on OPSI has been created from a template which has a package called "passwordupdate". That package is set to run "always" which ensures that the snapshot could have an older password and be updated immediately to the current ones.
| talos-r3-xp-NNN
|-
| [[File:Releng-opsi-w32.jpg|200px]]
|
|}
</strike>


= Windows 7 (unmanaged) =
= Windows 7 (unmanaged) =
Line 141: Line 135:
Win7 will need to be activated.  IT should have done this, but check by going to Control Panel -> System -> Activate Windows - a failure to activate will burn builds later.
Win7 will need to be activated.  IT should have done this, but check by going to Control Panel -> System -> Activate Windows - a failure to activate will burn builds later.


Windows 2003 already comes pre-activated. You can check with:
If it is not activated asked IT to do so.
oobe/msoobe /a
 
== Hostname and DNS suffix ==
 
* fixing the hostname and the DNS suffix is good enough:
** right-click on the 'Computer' option on the right, and choose properties
** click on the "Change settings" which is to the right under the Windows logo
** Click "Change" in the next screen.  Set the appropriate computer name, and click "More"
** Set the primary DNS suffix to "build.mozilla.org".  Click OK several times until Windows restarts.
 
== Old - delete ==
<strike>
Note that UltraVNC often fails in such a way that it will only repaint under your cursor.  Often a VNC login will come to a blank screen, or will paint the screen but not let you interact.  You can use RDP instead of VNC, but be sure to restart before starting buildbot, as tests will fail if an RDP login has been made since the last reboot.
 
* The OS will have detected a new network, and will want to know what sort it is.  Use "Public".  You may not need to do this - Windows is fickle.
* Set the hostname and domain - open the start menu, right-click on the 'Computer' option on the right, and choose properties.  Mentally compare the Windows Experience Index with your own experiences, then scroll down to "Computer name, domain ..", and click "Change settings".  Click "Change" in the next screen.  Set the appropriate computer name, and click "More" (is your experience index increasing?).  Set the primary DNS suffix to "build.mozilla.org".  Click OK and whatnot until Windows restarts.
* Change the VNC password - find UltraVNC in the systray (click up arrow, it's the yellow box with the eye), and go to "Administrative Properties".  Set both the VNC password and the read-only password, noting that they are not confirmed against one another - type carefully.  You'll need to click through a UAC prompt when you click OK.
* Change the cltbld password.  Doing this via VNC does not appear to work, so login via RDP and reset it via the control panel.
* Change the autologin password - see [[ReleaseEngineering/How To/Change the Autologin on Windows]]
* Open up gVim and edit buildbot.tac to point it to the right (staging) master.
* Make sure that Windows is activated (see {{bug|630108}})
* reboot and see if the slave starts.  If it doesn't, debug it.</strike>
 
= Windows 2008 64-bit (MDT & manual setup)=
These machines are set up almost all the way with Group policy and only this is required to be setup after re-imaging:
* fix the hostname
* follow the [[https://wiki.mozilla.org/ReferencePlatforms/Win64#Post-reimaging_steps|post imaging]] steps.
 
= Bake in preproduction =
Review this section.
<strike>
The slave should now be attached to the preproduction master.
 
Once talos slaves are basically in one piece, send them right to production.
 
Build slaves should be run in the preproduction environment to make sure there are no issues.  Before letting the machines bake for a long time, you should ensure that the slave is connected to the preproduction master.  Note that preproduction uses a special set of preproduction keys.  It should also be able to ssh into staging-stage:
ssh -i ~/.ssh/ffxbld_dsa ffxbld@staging-stage.build.mozilla.org
 
Once this is done, wait for a number of different types of jobs to cycle as green.  If the slave is in good working order, it needs to be moved into production.  For that, see [[ReleaseEngineering/How To/Move a Slave Between Production and Staging]].
</strike>
 
== Put slave in preproduction ==


'''NOTE:''' As you move the host between production and staging environment, you'll need to reset the ssh keys on them. See [[ReleaseEngineering/How_To/Adjust_SSH_keys_on_a_slave|details]]. Best practice is to completely replace <tt>~cltbld/.ssh</tt> directory with values from a known good host of the same type in the same datacenter.
== Hostname ==
 
* change the hostname by following the steps on [[ReleaseEngineering/Set_Up_a_Freshly_Imaged_Slave#How_to_fix_the_hostname_for_Windows]].
TODO: I mention to put the slave in preproduction but unfortunately it only works well when the slave and the master are in the same location. Most slaves are in scl1 while preproduction-master is in sjc1.
remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]
 
Change the slave's fields to this:
<pre>
Trust: dev
Environ: dev/pp
Pool: Preprod (or preprod-tests)
</pre>
and reboot it.


== Put slave in production ==
= Slavealloc notes and settings keys =
'''NOTE:''' if you're deploying a try slave, ensure there are no "<tt>~/.ssh/ffx*</tt>" files on the box before continuing.
# [[ReleaseEngineering/How_To/Adjust_SSH_keys_on_a_slave|Update the ssh keys]] to the correct values for the destination pool
# [[ReleaseEngineering/How_To/Adjust_SSH_keys_on_a_slave|Update the ssh keys]] to the correct values for the destination pool
# Change the slave's fields, eg production (non-try):
# Change the slave's fields, eg production (non-try):
Confirmed users
3,990

edits