Build:Linux:Moz2:ITSupport

From MozillaWiki
Jump to navigation Jump to search

Steps to Try

Intermittent Burning (not related to checkins)

This is often caused by a single slave having a problem such as a host key not accepted, out of disk space, etc.

  • If the problem is obvious (eg, hanging on a host key):
    1. Log into the failing slave as 'cltbld'
    2. Accept the host key/free some space(1)/whatever.
  • If you don't know what the problem is:
    1. Login to the slave as 'cltbld'
    2. Stop the slave with 'buildbot stop /builds/moz2_slave'
    3. Notify someone in RelEng to investigate further.
    4. Note: Unless 3 or more slaves from a platform are failing the tree does not need to be closed.

(1) Slaves sometimes run out of disk space because of failed nightly builds that did not clean-up after themselves. Here's a helper script to clean those up:

for i in `find /builds/moz2_slave -maxdepth 1 -iname "*-nightly" -type d`; do find $i -maxdepth 1 -type d -iname build -exec rm -rf {} \;; done

Fixing Failing Builds (clobber steps)

Login as cltbld with SSH

Clobber the failing slave and force a new build. (Note: forcing a new build will not necessarily cause a build on the failing slave. This is OK.)

  1. Find the hostname in the Tinderbox log (look for 'Building on: XXX' where 'XXX' is a hostname)
  2. Login to the slave using provided credentials (SSH for Linux).
    1. Go to the builder directory /builds/moz2_slave/$builder_name [where $builder_name matches the name of the failing build on Tinderbox]
    2. Delete the subdirectory named 'build'.
    3. Force a new build
      1. Click on the appropriate column title on the Waterfall (eg. 'Linux mozilla-central build')
      2. Fill out the Force Build form. 'Branch to build' MUST be specified (branch name is part of the builders name. eg, mozilla-central or actionmonkey).
      3. Click 'Force Build'.
    4. If the next build doesn't go green, contact RelEng.

Builds not happening at all

Make sure all slaves are connected

Note: do NOT use the 'Ping Builder' button on the Waterfall, it will break future builds
  1. Go to the Buildslave List page.
  2. For any slaves that are listed as 'NOT connected' we need to connect the slave:
    1. Login to the slave using provided credentials (SSH for Linux).
      1. Start Buildbot with 'buildbot start /builds/moz2_slave'.
Wait 30 seconds and check the Buildslave List page again. 
If slaves still aren't connected contact RelEng.


It is almost always a bad idea to reboot the buildmaster VM. It runs multiple Buildbot masters. Unless it is completely unreachable please do not restart this machine without talking to someone from RelEng first.