Build:TryServer:Maintenance

From MozillaWiki
Jump to: navigation, search

Common try server maintenance

Build exceptions on Win32

Symptoms

  • Purple boxes on Waterfall display that say "exception"
  • "SIGKILL failed to kill process" errors

Cause

  • Hanging cygwin processes

Solution

  1. Logon to try1-win32-slave
  2. Stop the buildslave
  3. Do a 'ps' and kill all processes (ps only shows cygwin processes).
  4. Double check in Task Manager that there are no instances of 'make', 'sh', 'mkdepend', or other cygwin processes.
  5. Restart the slave

Build directory clobber fails

Symptoms

  • Clobber step fails and subsequently causes the build as a whole to fail.
  • Usually you will see 'file in use' errors in the log for the clobber BuildStep.

Cause

  • This is usually caused by a build starting after the previous one has encountered the build exceptions error noted above.

Solution

  1. Logon to try1-win32-slave
  2. Stop the buildslave
  3. Manually delete all of the build directory (D:\buildbot\sendchange-slave\sendchange-win32)
  4. Do a 'ps' and kill all processes (ps only shows cygwin processes).
  5. Double check in Task Manager that there are no instances of 'make', 'sh', 'mkdepend', or other cygwin processes.
  6. Restart the slave

Linux slave fails with 'no space left on device' errors

Symtoms

  • Linux build fails with a 'no space left on device' error.
  • The disk will appear to have room on it, but you won't be able to write to it.

Cause

  • Unknown (A larger disk was added for buildbot, if this error still happens I do not know what caused it.)

Solution

  1. Reboot the try1-linux-slave

Mac slave fails when hdiutil hangs

Symptoms

  • Package step will take a very long time and eventually fail with the following error in the log:
hdiutil makehybrid -hfs -hfs-volume-name Minefield -hfs-openfolder pkg-dmg.6574.kke6UGzw/stage -ov pkg-dmg.6574.kke6UGzw/stage -o pkg-dmg.6574.kke6UGzw/hybrid.dmg

command timed out: 1200 seconds without output, killing pid 6370
process killed by signal 9
program finished with exit code -1

Cause

  • Unknown

Solution

  1. Reboot bm-xserve15