Buildbot/IT Unittest Support Document: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 128: Line 128:
** If there are, kill them (End process on sh.exe) or reboot the machine
** If there are, kill them (End process on sh.exe) or reboot the machine
* from the Command Line:
* from the Command Line:
** cd C:\slave\trunk_2k3\mozilla (C:\slave\trunk\mozilla on winxp)
** cd C:\slave
** rmdir /s /q objdir
** rmdir /s /q trunk*
* restart the slave
* restart the slave
** cd \
** cd \

Revision as of 14:06, 21 January 2008

Machines

Production

Name Platform CVS Branch for Config Tree Support Tier*
qm-rhel02 (master) linux - Firefox 1
qm-centos5-01 linux Trunk Firefox 1
qm-xserve01 Mac OS X 10.4 Trunk Firefox 1
qm-winxp01 Windows XP Trunk Firefox 1
qm-win2k3-01 (master) Windows Server 2003 Trunk Firefox 1
qm-leak-centos5-01 linux Trunk MozillaTest X
qm-leak-tiger-01 Mac OS X 10.4 - MozillaTest X
qm-leak-winxp01 Windows XP - MozillaTest X
qm-leak-w23k-01 Windows Server 2003 - MozillaTest X

- * - tiers explained

Staging

master:

  • qm-unittest02

slaves:

  • qm-stage-centos5-01
  • qm-stage-osx-01
  • qm-stage-winxp01
  • qm-stage-win2k3-01

list of steps to try

If #1 doesn't work, try #2. If that doesn't work, try #3. If #3 doesn't work, contact bhearsum or robcee.

1. first check

  • check waterfall at: http://qm-rhel02.mozilla.org:2005/ (mpt-vpn)
  • see if slave is connected.
  • if so, click the machine name link, try a "Force Build"
    • fill out the name and reason fields, click the button

2. second check - restart slave

  • login to machine using provided credentials

2a. Windows Remote Desktop (qm-winxp01, qm-win2k3-01):

  • on windows, ctrl-C in the command window, answer Yes to terminate buildbot process
  • check the Task Manager to see if there are any stuck sh.exe and make.exe processes
    • if so, reboot the machine using whatever means necessary (the stuck sh.exe and make.exe processes can make shutting down tricky. Kill them if you can)
    • when machine is rebooted, log back in and open a command prompt
    • restart buildbot:
      • cd c:\
      • buildbot start slave (command does not return)

2b. Mac OS X VNC (qm-xserve01):

  • in the Terminal window, type "buildbot stop slave"
    • note: message about not returning is expected on Mac
  • if necessary, reboot machine (if it appears to be responding strangely. I think we've only rebooted it once or twice since incept)
  • in a Terminal window,
    • cd /builds/
    • buildbot start slave

2c. Linux VNC (qm-centos5-01):

  • in the Terminal, type "buildbot stop slave"
  • reboot if necessary (never needed to yet)
  • restart buildbot:
    • cd /build/slave (should already be there)
    • DISPLAY=:2 buildbot start slave

2d. Verify that the slave is connected

  • check the waterfall at: http://qm-rhel02.mozilla.org:2005/
  • the slave sometimes takes a couple of minutes to reconnect
  • if it does, and is necessary, click the machine name link and force a build as above (fill name and reason fields, click the button)

3. clobbering manually

  • Sometimes a machine will need to be "clobbered" (have its objdir directory removed)

3a. Windows Remote Desktop (qm-win2k3-01, qm-winxp01)

  • login to the machine.
  • stop the slave (as above, ctrl-C in the command window)
  • check the task manager to make sure there are no errant sh.exe or make.exe files.
    • If there are, kill them (End process on sh.exe) or reboot the machine
  • from the Command Line:
    • cd C:\slave
    • rmdir /s /q trunk*
  • restart the slave
    • cd \
    • buildbot start slave (command does not return)

3b. Linux VNC (qm-centos5-01)

  • login to the machine
  • stop the slave (as above, buildbot stop slave)
    • cd /build/slave/trunk_linux/mozilla
    • rm -rf objdir
  • restart the slave
    • cd /build
    • buildbot start slave

3c. Mac OS X VNC (qm-xserve01)

  • login to the machine
  • stop the slave (as above, buildbot stop slave)
    • cd /builds/slave/trunk_osx/mozilla
    • rm -rf objdir
  • restart the slave
    • cd /builds
    • buildbot start slave
      • message that buildbot didn't return is expected

4. Restarting the Farm

  • In the worst case, the entire buildbot farm needs to be restarted
  • shutdown each slave as per the instructions above
    • Ctrl-C in Command window on Windows
    • buildbot stop slave in Terminal on Mac and Linux
  • shutdown master on qm-rhel02
    • cd /build
    • buildbot stop master
  • reboot qm-rhel02 and slave machines if necessary (stuck processes, strange behavior)
  • restart master on qm-rhel02
    • cd /build
    • buildbot start master
  • restart slaves as above
    • qm-centos5-01, qm-xserve01, qm-winxp01, qm-win2k3-01
  • verify waterfall at http://qm-rhel02.mozilla.org:2005/ is visible and slaves are connected