Buildbot/IT Talos Support Document: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 13: Line 13:


1. first check
1. first check
* check waterfall at: http://qm-buildbot01.mozilla.org:2004/ (mpt-vpn)
* check waterfall at: http://qm-rhel02.mozilla.org:2006/ (mpt-vpn)
* see if slave is connected.
* see if slave is connected.


Line 38: Line 38:
** Ctrl-C in Command window on Windows
** Ctrl-C in Command window on Windows
** buildbot stop slave in Terminal on Mac and Linux
** buildbot stop slave in Terminal on Mac and Linux
* shutdown master on qm-buildbot01
* shutdown master on qm-rhel02
** cd /build
** cd /build
** buildbot stop perfmaster
** buildbot stop perfmaster
* reboot qm-buildbot01 and slave machines if necessary (stuck processes, strange behavior)
* reboot qm-rhel02 and slave machines if necessary (stuck processes, strange behavior)
* restart master on qm-buildbot01
* restart master on qm-rhel02
** cd /build
** cd /build
** buildbot start perfmaster
** buildbot start perfmaster
* restart slaves as above
* restart slaves as above
* verify waterfall at http://qm-buildbot01.mozilla.org:2004/ is visible and slaves are connected
* verify waterfall at http://qm-rhel02.mozilla.org:2006/ is visible and slaves are connected

Revision as of 19:53, 26 October 2007

Machines

master:

  • qm-buildbot01

slaves:

  • qm-pxp01-05
  • qm-mini*

see also: Buildbot/Talos/Machines

list of steps to try

1. first check

2. second check - restart slave

  • login to machine using provided credentials
    • VNC (qm-pxp01-05, qm-mini*):
  • close running instances of firefox or dialog windows (make sure to check the taskbar)
  • on windows, ctrl-C in the command window, answer Yes to terminate buildbot process
    • ctrl-c in the command window, answer yes to terminate buildbot process
    • cd c:\
    • on qm-pxp*, 'buildbot start slave' (command does not return)
    • on qm-mini*, 'buildbot start talos-slave' (command does not return)
  • on linux/mac
    • login via ssh
    • 'buildbot stop talos-slave' (ignore 'never saw slave...' message on mac)
    • 'buildbot start talos-slave' (ignore 'never saw slave...' message on mac)
  • verify slave reappears on buildbot waterfall page

note builds are triggered by finished builds on the Tinderbox (Firefox for trunk, Mozilla1.8 for branch). Then, depending on when the master was started, may take up to 10 minutes to recognize a change. If the master is restarted, first completed tinderbox builds are often missed so sometimes it can take upwards of 30-40 minutes to verify that systems are working as expected.

3. Restarting the Master

  • In the worst case, the entire buildbot farm needs to be restarted
  • shutdown each slave as per the instructions above
    • Ctrl-C in Command window on Windows
    • buildbot stop slave in Terminal on Mac and Linux
  • shutdown master on qm-rhel02
    • cd /build
    • buildbot stop perfmaster
  • reboot qm-rhel02 and slave machines if necessary (stuck processes, strange behavior)
  • restart master on qm-rhel02
    • cd /build
    • buildbot start perfmaster
  • restart slaves as above
  • verify waterfall at http://qm-rhel02.mozilla.org:2006/ is visible and slaves are connected