Buildbot/OutageReports/20080904-01

From MozillaWiki
Jump to: navigation, search

2008-09-04

On 2008-09-04 at 00:42PDT, try1-win32-slave experienced a service outage for 10 hours.

bug 394841

What was affected:

Windows try builds on trunk.

What was the cause of the outage:

Traceback (most recent call last):
Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process

Has this type of outage happened before?

Yes - I've seen this happen before. This seems to be a pretty common Windows outage.

What was done to repair:

Box was rebooted to fix unrelated keyboard input issue.

What will be done to prevent this in the future:

  • upgrade buildbot to trunk code ?
  • use KillableProcess.py ?