Personal tools

Tree Closures

From MozillaWiki

Jump to: navigation, search

Whenever the main tinderbox tree has to be closed, please record the date, the close start time, a rough time when the problem first started (if different from the close start time), and eventually, a tree open time. We need this information in order to track infrastructure problems, and try to resolve them in the future.

Please keep all times in Mozilla Standard Time (US Pacific, same time as on tinderbox). Put more recent closures on top of old ones. Please include links to any relevant bugs.

Live status for your tree can be found on https://treestatus.mozilla.org/

Contents

2012

2011

  • Sept 28: Closed to land PRBool -> bool switch. bug 675553
  • Aug 5: Closed again due to tinderbox messages queue explosion
  • Aug 3: Closed due to tinderbox messages queue explosion bug 676219
  • Jul 29: Closed due to Stage server bustage bug 675170
  • Jul 28: Closed due to Android ndk deployed with wrong build config - bug 674855
  • Jul 1: Closed because Android had a permaorange and we don't have enough Tegra builders
  • June 1: Closed due to issues with surf - bug 661386
  • May 24: Closed due to DNS issues - bug 659238
  • May 23: Closed because people wanted to land lots of patches before the Aurora merge
  • May 6: Closed, bug 655197.
  • Apr 18 - 7:30-9:50: Closed due to Linux PGO landings that had some bumpiness on landings - bug 559964
  • Apr 18 - 5:14-7:30: Closed due to bug 653405 and then trying to land
  • Mar 1: Closed tree due to tbpl not reporting results. bug 637594
  • Feb 16: Closed to land bug 626602 and bug 629799.

2010

  • Dec 23: Closed tree due to broken w32 builds, see bug 621183
  • Sep 26: enable universal 10.6 osx builds and updates
  • Sep 19: 650castro network upgrades
  • Aug 10: Builds not showing up on tinderbox bug 586179
  • Aug 5: Closed tree due to tinderbox timeouts & large build backlog bug 584365
  • June 6 - June 8: Closed tree due to hg file rename fallout.
  • May 25 11:30 am - 3:20 pm: Closed due to outages from this morning's MPT connectivity issues. bug 568005
  • May 11 2:30 pm: Closed because builds weren't building and a backlog of unbuilt changes was building up (bsmedberg)
  • May 9 3am - : Closed because Windows slaves can only stay connected for 3 minutes bug 555794
  • May 8 6pm - May 9 3am: Closed because talos slaves that weren't ready to be put in production had been put in production and were failing from not having hg bug 564658
  • May 8 10am - 6pm : Closed because nobody bothered to tell us that the talos slaves had been switched back to the old connection
  • May 7 11pm - May 8 10am : Closed because the talos slaves weren't switched back to the old connection, and were still failing
  • May 6 7pm - May 7 11pm : Closed for "replacing the Netscreen SSG in San Jose with a Juniper SRX240 to better handle the VPN traffic from the build cluster in Castro." bug 563968. (And then reverting the change.)
  • Apr 29 12pm : Closed to land the add-ons manager rewrite
  • Apr 17 2pm : Netapp upgrade, taking hg, svn, others, offline bug 502151
  • Apr 15 11am : Mountain View lost connectivity, taking talos offline bug 559617
  • Apr 9 7pm - 8pm : Mac debug tests missing bug 558501
  • Apr 8 10:30pm - Apr 9 8am : Mac opt tests missing bug 558258
  • Apr 2 8am-9amPDT : scheduled talos downtime bug 555327
  • Mar 29 2pm - 7pm : Windows builders timing out uploading builds bug 555794
  • Mar 18 9pm - Mar 19 1:35 PDT: Linux debug tests missing bug 553769
  • Mar 18 5:30-7pm PDT - Graph server needed to have its ears boxed bug 553750
  • Mar 15 2pm - Mar 16 4pm, PDT - Connectivity issues between MV and Mpt caused perma-red on many Talos and build boxes bug 552506
  • Mar 12 - Graph server bustage again. (IT replaced graph server box to fix, I think) (Closed from approx 9am - 5pm PST)
  • Mar 10 0800 - graph server bustage again bug 548371
  • Mar 8 - graph server issues required closing of all trees bug 548371
  • Mar 4 - buildbot master problems on an exceptionally busy day resulted in lost builds, need to figure out which changes are causing the orange
  • 2010 Feb 12: Windows compiler crashes during PGO are happening so frequently that we haven't had an opt build to test in nearly 20 hours.

2009

  • Tue Dec 1 4:00pm PST - 7:50pm - buildbot master is overloaded & misreporting failures (bug 532228, followed by colo hvac issue
  • Tue Nov 24 6:05pm PST - 8:15pm PST - buildbot master restart needed for Firefox 3.6b4 builds
  • Fri Nov 20 6:05pm PST - 1.9.2+mobile for Mountain View power outage bug 524047
  • Thu Nov 19 4:10pm PST - 5:45 PST - ftp.m.o & stage.m.o broken bug 529961
  • Mon Nov 2 1:59pm PST - 2:32pm PST - Closed to let HG recover from a DoS
  • Thu Oct 22 5:48am PDT - 12:30pm PDT - Closed for scheduled testing of split mochitest setup
  • Sun Oct 11 9:38pm PDT - Mon Oct 12 2:40am PDT - Only one Windows build slave still working (bug 521722), trying to cover all the builds for trunk and 1.9.2 and falling further and further behind
  • Fri Sep 25 3:41am PDT - 9:47am PDT - Lots of orange. Caused by fallout from bug 473506 (backed out), with some contribution from bug 518274 (test disabled).
  • Thu Sep 24 5:14am PDT - Scheduled RelEng maintenance
  • Sun Aug 09 4:15pm - midnight PDT - Air conditioning failure at colo, bug 509351 for unfixed fallout
  • Wed July 22 6am - 11am PDT - Adding electrolysis branch bug 500755, Maemo TraceMonkey builds bug 505219, enabling TraceMonkey leaktest builds bug 504435
  • Wed July 21 10:15pm - 11:30pm - bug 505669, buildbot master died
  • Sat Apr 18 11am - ? Talos downtime, roll out of: bug 480413 - Bug 480413 (design test to monitor browser shut down time)
  • Wed Mar 25: 3:30pm - Thu 4:45am, too much randomness while storage array recovering bug 485123
  • Tue Mar 17: 11:50 am - 2pm build timeouts due to swapping? bug 472463 is related
  • Tue Mar 3: 11:30am - ??

2008

    • Closed by platform meeting to let Beta 3 blockers land cleanly
  • Wed Jan 9: 10.30am (approx) - ?
    • Closed because of orange in the tree.
  • Wed Jan 6: 14.30pm (approx) - 15:07pm
    • Closed because of orange and red trees.
  • Wed Dec 31: 9.30am (approx) - 17:25pm PST
    • Closed because of DHCP problem in build network. See details in bug 471679.
  • Tuesday Dec 23, 19:30-20:20PST
    • Closed For Joe Drew to work on bug 455508 - 20% Tp regression on linux, September 5
  • Tuesday Dec 23, 14:00-17:30 PST
  • Friday Dec 12, 06:00-10:57
    • planned release team maintenance downtime (06:00-08:00), see the dev.planing post for details
    • unit test boxes did not cycle green until 10:57
  • Tuesday Dec 9, 13:50-Wed 02:25
    • Emergency ESX maintenance (note from Aravind in dev.planning) took out boxes like graph server, caused rampant network related bustage.
    • Could have reopened much earlier than this, just no-one round to do it
  • Monday, Dec 8, 10:25 - 13:11
    • reftests had been broken for 10.5 hours due to error in manifest file that wasn't actually causing orange (bug 468476)
    • closed tree until fix cycled to prevent more from piling on
    • hg.mozilla.org also misbehaving (pushlog db locked); hard to push fixes or load pushlog
  • Friday, Dec 5, 18:20 - 19:05
    • Waiting on Windows talos machines to start a run that includes the perf-sensitive changeset a0c0ed9f461f. (Talos had ignored the last 6 completed builds)
  • Thursday, Dec 4, 15:00 - 20:10
    • bug 468014 Investigating mozilla-central Vista TS, TP3, and TSVG increases. This is likely due to rebooting the Vista talos servers bug 463020 especially since it brought these numbers up to around the same range as 1.9.1 as well as XP on both 1.9.1 and 1.9.2
    • Rebooted 1.9.2 Vista talos boxes and waiting on the results.
    • After further investigation it appears that rebooting the talos systems caused this. See bug 468014 for more details.
  • Thursday, Dec 4, 12:10 - 13:10
  • Tuesday, Nov 12, 15:35
    • null pointer dereference causing crashes on linux and OSX leak test build boxes; bug 464571
  • Tuesday, Nov 12, 08:00
    • Tinderbox and various other infrastructure down
  • Friday, Nov 7, 0200 - 05:00
    • Backing out changesets to find the cause of the 10% Ts on OSX
  • Friday, Oct 24, 09:00 - 12:30
  • Sunday, Sep 28, 17:54 -
  • Friday, Sep 26, 13:54 - 15:32
    • reftests orange due to botched reftest.list change by sgautherie
    • windows still leaking from sdwilsh's landing and backout
      • required multiple corrections to patch to bug 455940
  • Friday, Sep 26, 10:45 - 13:52
    • sdwilsh is trying to land the places fsync work again and wants the tree closed for stable perf numbers
    • sdwilsh backed out
    • tree remained closed for tracking down performance regression from day before, bug 457288
    • sdwilsh backed out more
  • Wednesday, Sep 24, 12:00 - 7:00
    • New Windows boxes moz2-win32-slave07 / 08 are orange due to leaks
    • Old qm-win2k3-moz2-01 box had been leaking too
    • Tracked down to bug 454781 from 9/20, which had unfortunately landed in the middle of a period with other bustage and leaks. Fun!
  • Tuesday, Sep 23, 9:00
    • Closed because of MPT power outage.
  • Monday, Sep 22, 3:30
  • Thursday, Sep 18, 6:30 - Friday 8:00
    • Places fsync work will be landing once the tree gets greener - possible perf regressions
    • Window unit test boxes appeared to be hanging, so places fsync was backed out
    • Places fsync work backout caused leaks; clobber requested bug 455934
    • Closed for bug 455791, resolved by backing out bug 454735.
    • Talos bustage may or may not be fixed, see discussion in bug 455791... but it's all green currently, so reopening the tree
  • Tuesday, Sep 16, 12:50 - Wednesday, Sep 17, 3:20
    • sdwilsh's third landing of the new SQLite bug 449443 is still causing a huge Ts spike on Linux, despite no hit on tryserver.
    • Rather than immediately backing it out, we are trying to gather a little bit of data to help him understand what's going on, since he can't reproduce this offline.
    • First step is clobbering the linux machines that feed talos, since tryserver is always a clobber build, but the tinderbox machines aren't, and that's the only real difference (identical images, same hardware).
  • Friday, Sep 12, 10:00 - 11:45
  • Thusrday, Sep 11, 13:00 - 17:30
    • memory usage regression (working set/rss) bug 454865 - Started on 09-09-2008 ~ 18:40.
  • Tuesday, Sep 9, 15:30 -
  • Thursday, Sep 4, 16:00 - 19:45
  • Tuesday, Sep 2, 16:15 -
    • Closing tree to get it green in order to land tracemonkey updates, and update tinderbox.
  • Tuesday, Aug 26, 05:40 - 10:00
    • Tree closed to track the perf impact of landing bug 432131
  • Wednesday, Aug 13, ~8am - 10:20pm
    • Tree closed due to perf regression (bug 450401). Unable to find cause, reopened tree.
  • Tuesday, Aug 12, 8:00 -
    • Scheduled unit-test master migration/downtime
  • Saturday, July 26, 09:22 - 11:53, 12:30 - Sun 04:30
    • Mac OS X builder out of disk space, bug 448115
    • problem seems to have gone away on its own, although disk space probably still low
  • Friday, July 25, 09:45 - Saturday, July 26, 09:10
    • talos machines all broke due to stage.mozilla.org
      • no ETA given
    • turned green around 19:00-20:00
    • turned red again around 21:30
    • bug 448019 had already been filed earlier in the day, but not linked from here or tinderbox
    • hardware on stage was replaced; talos went green again
  • Friday, July 18, 20:30-23:00
    • brendan checked in a patch (bug 445893) that made xpcshell hang or crash on Windows
    • bug 446143 filed to get tinderboxes fixed
      • since this requires manual maintenance, see bug 445578
  • Wednesday, July 16, 10am-5pm
    • tree effectively closed most of the day due to multiple sources of orange
      • no active fixing until around noon, when dbaron backed out bug 431842
    • Windows tinderboxes needed manual maintenance (bug 445571) after xpcshell test hang
      • filed bug 445578 on making this case not require manual maintenance
    • filed bug 445610 on making it more likely that multiple simultaneous failures will all be caught
  • Friday, July 11, 4:10pm-6:10pm
    • multiple failures on linux and windows unit tests prompted closure. Backed out a test change that broke other tests that relied on the changed one.
  • Friday, July 11, 7:30am - 11:30am PDT
    • both linux and both windows test boxes were orange, so the tree was closed.
    • WINNT 5.2 mozilla-central qm-win2k3-moz2-01 dep unit test went green all by itself
    • Linux mozilla-central qm-centos5-moz2-01 dep unit test went green all by itself
    • The browser window was not focused for WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test (45 reftests were failing). Box was stopped, focus restored, and a new test run was kicked off by bhearsum (no bug filed).
    • Linux mozilla-central qm-centos5-03 dep unit test went green all by itself.
    • Linux mozilla-central qm-centos5-moz2-01 dep unit test went orange
      • leaked 124036 bytes during test execution
    • WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test went orange
      • 28 reftest failures
    • Linux mozilla-central qm-centos5-03 dep unit test went orange again failing test_bug_406857.js
    • Linux mozilla-central qm-centos5-moz2-01 dep unit test still orange
      • leaked 124036 bytes during test execution
      • failed an xpcshell test case (test_sleep_wake.js) with lots of Gdk-CRITICAL assertions
      • failed one chrome test (bug 443763)
    • WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test still orange.
      • No more test failures.
      • leaked 292389 bytes during test execution
    • Linux mozilla-central qm-centos5-moz2-01 dep unit test still orange.
      • failed one chrome test (bug 443763)
      • leaked 124036 bytes during test execution
    • Linux mozilla-central qm-centos5-03 dep unit test went green.
    • Linux mozilla-central qm-centos5-moz2-01 dep unit test.
      • Tree re-opening since we have coverage on at least one windows machine for metered checkins (11:30am)
    • WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test went green (11:36am)
  • Thursday, July 10, 12:49pm - 10:44pm PDT
    • qm-moz2mini01 went orange, reporting 300k of leaks, and enough other tinderboxes were orange or red (although the issues were understood and being addressed) to warrant the precaution of closing the tree while investigating qm-moz2mini01.
    • qm-moz2mini01 subsequently went green in the following cycle without explanation.
    • After that, the sheriff (Myk) held the tree closed until at least one of the two windows unit test boxes (which had both been clobbered to resolve a residual problem from an earlier checkin that had been backed out) finished building successfully.
    • But those machines both went orange with 45 MochiTest failures, so the sheriff had the four patches since the previous build backed out.
    • After those patches were backed out, the next cycle, which included all the backouts, showed the same problem.
    • The 45 failures all looked popup-related, so maybe the wrong thing was focused on the test machines.
    • The sheriff requested another clobber from IT.
    • IT performed another clobber, which didn't work. IT also confirmed that the machines were in the appropriate state (the cmd window open and minimized, no other windows open) after both clobbers (although later there was discussion that perhaps the window was still focused after minimization, and perhaps it was necessary to click on the desktop to unfocus the window).
    • The sheriff escalated to build, cc:ing robcee and bhearsum on the latest bug about the clobber ( bug 444674), per Unittest:Win2k3:Moz2:ITSupport.
    • lsblakk did a source clobber, and qm-win2k3-moz2-01 cycled green after that (with qm-win2k3-03 expected to do so as well once out-of-space issues were resolved by coop), but qm-pxp-fast03 and mozilla-central turned red in the meantime, so the sheriff left the tree closed and went to look at those.
    • The problem looked related to various IT maintenance that evening (kernel upgrades on hg, ftp, and other servers as well as some DNS changes), so the sheriff waited.
    • qm-pxp-fast03 and mozilla-central turned green on their next cycle, so the sheriff reopened the tree.
  • Tuesday, July 8, 9:30am PDT-1:15pm
    • unit-test failures caused by a typo in code, plus a tinderbox that hung due to a code error and didn't come back up correctly (no display)
  • Friday, July 4, 5am PDT - 10am PDT
  • Tuesday, July 1, 02:04 - 10:57
    • bug 442875 - graph server cause most things to go orange
    • bug 442887 - 1.9-only: qm-xserve01 needs repair
    • bug 442843 - trunk-only: qm-moz2-unittest01 is out of space
  • Wednesday June 25, 08:20 - 12:20
    • Planned downtime for VM host maintenance
  • Sunday, June 8, 11:55 - Tuesday, June 10, 15:57
    • Linux build tinderbox and unit test machine went read-only around 5am
    • all talos machines stopped reporting around 3:30am
    • filed bug 437877
    • filed bug 437893
    • netapp migration started a day early (Sunday) due to failures
    • Tuesday morning status:
      • unit test machines intermittently failing leak check on mochitests on their return
      • talos machines occasionally appearing but still not functional
  • Saturday, June 7, 9:00 - 14:27
    • windows builders (main and debug) both went red due to open files
    • filed bug 437785
  • Friday, June 6, 15:00 - 23:15
    • brendan broke the tree in two different ways
      • windows crashing
      • failing JSON test
    • DNS outage slowed down the fixing
  • Friday, June 6, 5:15 - 9:00
    • Scheduled closure to clobber and land NSPR/NSS
      • bsmedberg goofed and the client.py step had to be removed from the master.cfg of builders and unit-testers, which took longer than expected
  • Tuesday, June 3, 10:00 - 15:20
    • Windows build red for some reason
      • clobbered by bhearsum, didn't help for some reason
  • Sun, May 25 15:51 - Monday, May 26 01:25
    • We are currently experiencing intermittent VMware/netapp problems, which causes entire sets of machines to start failing with cvs conflicts, corrupted .o files, etc, even when no checkins have occurred. Tree reopened to load-test fix. See bug 435134 for details.
  • Fri, May 23
    • 12:45 - 18:25
      • Buildbot master for Talos and Unittest went down (all talos boxes went red)
      • fx-linux-tbox also had cvs conflicts in browser/ and accessible/, tree clobbered
  • Wed, May 7
    • 8:00 AM - 12:50 PM
      • Waiting on backout of (bug 432492) to cycle through.
  • Tuesday, May 6
    • 8:20 - 9:00 PM
      • Talos machines were failing due to cvs-mirror issues (bug 432570).
  • Thursday, May 1
    • Start 12:00 PM
      • TUnit does not complete, qm-centos5-02 orange since yesterday. jst and mkaply in range, jst and sicking investigating.
  • Thursday, April 24
    • 8pm - 11pm
      • expected outage for graph server and buildbot maintenance
  • Wednesday April 16
    • 3PM - 1AM
      • Unexpected outage started ~3pm, bug 429406
      • Tree closed at 8:10 due to qm-xserve-01 still not working
  • Tuesday April 8
    • 2:24AM PDT - 4:31AM
      • Tree closed due to bug 427723 and bug 427728.
      • Windows nightly box restarted and completed, talos boxes started testing
  • Monday April 7
    • 7 PM PDT
      • Tree has been orange for too long (unit test failures) and then someone checked in theme changes (bug 427555) that caused red.
      • The orange was fixed after bug 426501 was backed out.
  • Saturday April 5
    • 00:48 PDT - 20:45
      • unit test failures across 3 platforms
      • filed bug 427248, remaining issues spun off to:
        • bug 425987 worked around reftest failures with a larger timeout
        • bug 426997 for the new PGO test box - still burning, will ignore.
  • Friday, Mar 28, 2008
    • 16:23 PDT - 22:40 PDT
      • Bonsai DB replication issues
  • Tuesday, Mar 25, 2008
    • 14:30 PDT - 15:05 PDT
      • Leak test machines orange
        • Kai's patch for bug 420187 caused a leak, he fixed it
    • 14:30 PDT - 16:10 PDT
      • Windows unit test failures
        • the Windows unit test machine (qm-win2k3-01) had failed for a few cycles for various reasons, wanted to get a green cycle in before accepting more checkins.
        • bug 425081 filed about machine trouble
  • Saturday, Mar 22, 2008
    • 13:48 PDT - 18:10 PDT
      • Windows talos machines are all red
        • fallout was from enabling strict file: URI security policy
        • alice checked in a config change to talos to disable strict URI policy on talos, filed bug 424594 to get talos in line with this strict policy
        • still closed waiting on unit test orange to resolve.
  • Tuesday, March 18, 2008
    • 12:03 Wednesday, Mar 19, 2008
      • johnath re-opened tree after test failures cleared and rapid cycle test boxes were reporting numbers in the pre-closure range
    • 20:22
      • Major network issues at the MPT colo which hosts... everything. Closed until services are back online (including IRC).
      • bug 423882 for details on the missing talos boxes.
    • 17:31 - 17:39
      • Network issues at moco, closing to avoid a mess if tinderboxes and/or bonsai is taken down by it.
  • Friday March 14, 2008
    • 14:25 - 16:03
      • mac and windows unit test boxes stopped cycling sometime between 2am and 7:45am
      • dbaron noticed when firebot announced tinderbox falling off 12hr waterfall page at 14:23, Waldo had noticed and commented in #developers at 12:16 (and again at 12:42) but was preoccupied and only had time to really follow up at 14:23 to ask for tree closure
      • tree closed, bug 423015 filed
    • 6:00am - 07:40am
      • stage migration, closed to make sure Talos reconfig works OK and builds keep flowing
  • Thursday March 6, 2008
    • 7:00pm - 7:40pm
      • closed due to fx-win32-tbox bustage. Cleared stuck process, reopening assuming that was the issue rather than wait another hour for the PGO-lengthened build to finish
  • Wednesday March 5, 2008
    • 2:40pm - 4:00pm
      • Closed for orange on win3k3-01 that looks like memory corruption
      • Caused by bug 418703, backed out
    • 8am - 11:30am
      • bugzilla/CVS down, closed tree for talos bustage because of no-CVS
  • Wednesday, February 27, 2008
    • 9:30pm - 12:10am
      • Overlooked 2 tests that had also broken during the day (bug 420028 due to removal of DOMi, bug 384370 due to incomplete backout) and were not focus issues
      • Box has some other test/build failures that went away by themselves after being kicked
    • 7:40pm - 9:30pm
      • Windows box was orange with focus problems (bug 420010)
      • Missed from earlier in day due to expected issues while PGO was landing
    • 5:30pm - 7:40pm
      • Closed to do Linux kernel upgrades (bug 407796)
      • Quiet tree due to B4 freeze, and holding approval1.9b4 flags for enabling PGO on Windows, so closure has minimal impact
      • Took a bit longer than expected due to a reboot problem (420007)
  • Tuesday, February 26, 2008
    • 22:38 - 00:49
      • problem started at 22:33 when qm-win2k3-01 turned red again
      • same problem as earlier in the day: "Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process" during tests
      • myk closed the tree around 22:38 to wait out the bustage, since the machine is our only windows unit testerbox
      • got stuck with an open file such that each build failed really quickly
      • bug 419799 filed to get sysadmin help to fix unit test box
      • mrz, who was on-call that evening, jumped into IRC and then went and kicked buildbot; first he closed some dialog about some process having crashed; then he restarted buildbot, but it didn't start building; then he killed both buildbit and its cmd process and then restarted it, after which it started building and completed successfully
      • dbaron reopened the tree to metered checkins of b4 blockers
  • Tuesday, February 26, 2008
    • 18:44 - 19:40
      • problems started with linux txul perf regression
      • continued with fxdbug-win32-tb reporting: ###!!! ASSERTION: invalid active window: 'Error', file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/embedding/components/windowwatcher/src/nsWindowWatcher.cpp, line 1086
      • continued with five cross-platform unit test failures, three reftests and two mochitests
      • reed backed out bug 419452, one of two candidates for the perf regression
      • dbaron fixed the three reftests and one mochitest, which were from his checkin for bug 363248
      • myk backed out sicking's fix for bug 416534, which had caused the last mochitest failure
      • various folks speculated that the fxdbug-win32-tb assertion was random (it didn't show up on Mac or Linux)
      • myk reopened the tree, feeling that things were under control
      • reed backed out second perf regression candidate (bug 395609) when initial backout didn't resolve it
      • sicking fixed test failure in bug 416534 and relanded
      • others started landing again
      • unit test tinderboxes cycled green
      • reed's second backout fixed perf regression
      • reed's first or second backout also fixed the fxdbug-win32-tb assertion
  • Tuesday, February 26, 2008
    • 5:10pm - 5:46pm
      • problem started at 5:06pm when qm-win2k3-01 turned red
      • reed said that machine frequently hits this random bustage and then recovers
      • reed previously noted at the top of the page that "qm-win2k3-01 is the only Windows unit test machine, so if it is orange or red, you should NOT check in."
      • myk, the sheriff for the day, closed the tree to wait out the apparently random failure and then reopened it when the next build came up green
      • reed thought there might be an old bug on the problem but wasn't sure, so dbaron filed bug bug 419761 on the problem to make sure it's tracked and not forgotten
      • wolf also filed bug 419759 on fixing or replacing winxp01 so we aren't entirely reliant on win2k3-01 for windows unittests
  • Sunday, February 24, 2008
    • 11:30am - 4:45pm (with orange lasting longer, with tree open)
      • problem started 9:23am, amid some other bustage
      • closed by dbaron a little after 11:30am
      • filed bug 419328: Windows unit test box stopped cycling
      • aravind hard-rebooted the box, came back with a bunch of popup tests orange.
      • test still orange after another cycle (forced by dbaron)
      • dbaron reopened tree 4:45pm despite the unfixed machine-related orange
      • joduinn rebooted the box again around 4:30pm
      • this time it came back with the color depth wrong, so the PNG reftests failed (but the mochitests worked)
      • color depth issue fixed 6:45am Monday
  • Thursday, February 21, 2008
    • 10pm - 12am
      • Dietrich
      • Closed to facilitate l10n string freeze
  • Tuesday, February 19, 2008
    • 9:10 PM - 10:45 PM
    • 1:00 AM - 3:00 AM (guess)
      • No sheriff (night) (guess)
      • Closed for experimental landing of bug 399852. The checkin stuck.
  • Wednesday, February 13, 2008
    • 1:00 PM - 1:40 PM (problem first noticed around 9:40 AM)
      • No sheriff
      • bug 417313 -- graphs.mozilla.org can't keep up with data being submitted
      • reopened after machines started going green; db load lessened, but underlying issue has not been fixed.