TinderboxTLC

From MozillaWiki
Jump to: navigation, search

Congratulations! You may be the proud new user and/or sheriff of a Mozilla Firefox Tinderbox. Here are some important bits of info you should know.

There's a new sheriff in town

Uh oh, something broke

The Tinderbox tends to be inflammable (what a country!). While code checkins can obviously cause build/test failures, a box may fail in various ways not related to checkins. Some problems fix themselves in the next cycle, others require filing a server ops bug (in mozilla.org / Server Ops: Tinderbox Maintenance).

A unit test failed

If you see a unittest failure, please check the dependencies of bug 438871 to see if it's already filed. If it isn't, then you should file a bug! A failing test *is always a bug*. It's either a bug in the test or in the code, but it needs to be tracked.

Why isn't that box doing anything?

  • Some boxes only start doing something when there's a checkin. If a cycle fails for some reason, if will remain red/orange until a checkin triggers a new build.
  • Force a build to start by making a trivial checkin:

Failed to kill process

  • The Windows unit test boxes may fail with errors like:
buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
  • This might indicate a hang while running the test
  • It might also mean that a test threw an exception that caused the test harness itself to fail, and the test running script timed out and tried to kill the browser
  • This condition may not be caused by a specific checkin, and might fix itself upon the next cycle (when someone checks in).

XUL Popup test failures

  • These are due to browser window focus problems on the test box, often because someone logged in to do maintenance and left something else focused.
  • File a server ops bug to have the problem corrected
  • Sometimes it's unclear why there's a failure, and the box needs rebooted?

libpr0n reftest failures

  • Multiple failures on Windows in modules/libpr0n/test/reftest/ can be due to the box reverting back to 16-bit color mode. See bug 414720 for history.
    • Caused by someone connecting to the box with a RDP client in 16-bit color mode.
    • File a server ops bug to have the problem corrected.
  • REFTEST UNEXPECTED FAIL (LOADING)

make[6]: *** INTERNAL: readdir: Bad file number

  • Happens occasionally on the Windows unit test box (eg qm-win2k3-01).
  • This condition should fix itself upon the next cycle (when someone checks in).

Error: bloat test timed out after 1800 seconds.

  • Seen sometimes on qm-win2k3-01 (other platforms too?)
  • Should fix itself upon the next cycle (when someone checks in)

bzip2: Data integrity error when decompressing.

  • Happens sporadically on Talos boxes
  • Should fix itself on next cycle?
  • Fixed by bug 427728, 15 May.

Spurious RLk test failure on bm-xserve11

  • Error: Leak Test Failed: Number of leaks 1234 is greater than LeakFailureThreshold 0
  • Infrequent, often missed due to fast cycle time of this box
  • Tracking in bug 412545

(add more common failures here)

See also: http://wiki.mozilla.org/Buildbot/IT_Unittest_Support_Document