Tree Closures: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(update treestatus link)
 
(224 intermediate revisions by 46 users not shown)
Line 1: Line 1:
=== Overview ===
Tree closure dates are no longer recorded here, please see the logs on:  
 
https://treestatus.mozilla-releng.net/
Whenever the main tinderbox tree has to be closed, please record the date, the close start time, a rough time when the problem first started (if different from the close start time), and eventually, a tree open time.  We need this information in order to track infrastructure problems, and try to resolve them in the future.
 
Please keep all times in Mozilla Standard Time (US Pacific, same time as on tinderbox).  Put more recent closures on top of old ones.  Please include links to any relevant bugs.
 
=== Recent Closures ===
* Wed, May 7
** 8:00 AM -
*** Waiting on backout of ({{bug|432492}}) to cycle through.
 
=== Older Closures ===
* Tuesday, May 6
** 8:20 - 9:00 PM
*** Talos machines were failing due to cvs-mirror issues ({{bug|432570}}).
 
* Thursday, May 1
** Start 12:00 PM
*** TUnit does not complete,  qm-centos5-02 orange since yesterday.  jst and mkaply in range, jst and sicking investigating.
 
* Friday, April 25
** 11:40 AM - 2:00 PM
** {{bug|430820}}
 
* Thursday, April 24
** 8pm - 11pm
*** expected outage for graph server and buildbot maintenance
 
* Wednesday April 16
** 3PM - 1AM
*** Unexpected outage started ~3pm, {{bug|429406}}
*** Tree closed at 8:10 due to qm-xserve-01 still not working
 
* Tuesday April 8
** 2:24AM PDT - 4:31AM
*** Tree closed due to {{bug|427723}} and {{bug|427728}}.
*** Windows nightly box restarted and completed, talos boxes started testing
 
* Monday April 7
** 7 PM PDT
*** Tree has been orange for too long (unit test failures) and then someone checked in theme changes (bug 427555) that caused red.
*** The orange was fixed after {{bug|426501}} was backed out.
 
* Saturday April 5
** 00:48 PDT - 20:45
*** unit test failures across 3 platforms
*** filed {{bug|427248}}, remaining issues spun off to:
**** {{bug|425987}} worked around reftest failures with a larger timeout
**** {{bug|426997}} for the new PGO test box - still burning, will ignore.
 
* Friday April 4
** 03:24 PDT - 09:05PDT
*** Announced closure to get clean perf numbers from {{bug|425941}}
*** See dev.planning/d.a.f thread: http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/d02e523b8483c914#
 
* Friday, Mar 28, 2008
** 16:23 PDT - 22:40 PDT
*** Bonsai DB replication issues
 
* Tuesday, Mar 25, 2008
** 14:30 PDT - 15:05 PDT
*** Leak test machines orange
**** Kai's patch for bug 420187 caused a leak, he fixed it
** 14:30 PDT - 16:10 PDT
*** Windows unit test failures
**** the Windows unit test machine (qm-win2k3-01) had failed for a few cycles for various reasons, wanted to get a green cycle in before accepting more checkins.
**** {{bug|425081}} filed about machine trouble
 
* Saturday, Mar 22, 2008
** 13:48 PDT - 18:10 PDT
*** Windows talos machines are all red
**** fallout was from enabling strict file: URI security policy
**** alice checked in a config change to talos to disable strict URI policy on talos, filed {{bug|424594}} to get talos in line with this strict policy
**** still closed waiting on unit test orange to resolve.
 
* Tuesday, March 18, 2008
** 12:03 Wednesday, Mar 19, 2008
*** johnath re-opened tree after test failures cleared and rapid cycle test boxes were reporting numbers in the pre-closure range
** 20:22
*** Major network issues at the MPT colo which hosts... everything. Closed until services are back online (including IRC).
*** {{bug|423882}} for details on the missing talos boxes.
** 17:31 - 17:39
*** Network issues at moco, closing to avoid a mess if tinderboxes and/or bonsai is taken down by it.
 
* Friday March 14, 2008
** 14:25 - 16:03
*** mac and windows unit test boxes stopped cycling sometime between 2am and 7:45am
*** dbaron noticed when firebot announced tinderbox falling off 12hr waterfall page at 14:23, Waldo had noticed and commented in #developers at 12:16 (and again at 12:42) but was preoccupied and only had time to really follow up at 14:23 to ask for tree closure
*** tree closed, {{bug|423015}} filed
** 6:00am - 07:40am
*** stage migration, closed to make sure Talos reconfig works OK and builds keep flowing
* Thursday March 6, 2008
** 7:00pm - 7:40pm
*** closed due to fx-win32-tbox bustage. Cleared stuck process, reopening assuming that was the issue rather than wait another hour for the PGO-lengthened build to finish
* Wednesday March 5, 2008
** 2:40pm - 4:00pm
*** Closed for orange on win3k3-01 that looks like memory corruption
*** Caused by {{bug|418703}}, backed out
** 8am - 11:30am
*** bugzilla/CVS down, closed tree for talos bustage because of no-CVS
* Wednesday, February 27, 2008
** 9:30pm - 12:10am
*** Overlooked 2 tests that had also broken during the day ({{bug|420028}} due to removal of DOMi, {{bug|384370}} due to incomplete backout) and were not focus issues
*** Box has some other test/build failures that went away by themselves after being kicked
** 7:40pm - 9:30pm
*** Windows box was orange with focus problems ([https://bugzilla.mozilla.org/show_bug.cgi?id=420010 bug 420010])
*** Missed from earlier in day due to expected issues while PGO was landing
** 5:30pm - 7:40pm
*** Closed to do Linux kernel upgrades ([https://bugzilla.mozilla.org/show_bug.cgi?id=407796 bug 407796])
*** Quiet tree due to B4 freeze, and holding approval1.9b4 flags for enabling PGO on Windows, so closure has minimal impact
*** Took a bit longer than expected due to a reboot problem ([https://bugzilla.mozilla.org/show_bug.cgi?id=420007 420007])
 
* Tuesday, February 26, 2008
** 22:38 - 00:49
*** problem started at 22:33 when qm-win2k3-01 turned red again
*** same problem as earlier in the day: "Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process" during tests
*** myk closed the tree around 22:38 to wait out the bustage, since the machine is our only windows unit testerbox
*** got stuck with an open file such that each build failed really quickly
*** {{bug|419799}} filed to get sysadmin help to fix unit test box
*** mrz, who was on-call that evening, jumped into IRC and then went and kicked buildbot; first he closed some dialog about some process having crashed; then he restarted buildbot, but it didn't start building; then he killed both buildbit and its cmd process and then restarted it, after which it started building and completed successfully
*** dbaron reopened the tree to metered checkins of b4 blockers
 
* Tuesday, February 26, 2008
** 18:44 - 19:40
*** problems started with linux txul perf regression
*** continued with fxdbug-win32-tb reporting: ###!!! ASSERTION: invalid active window: 'Error', file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/embedding/components/windowwatcher/src/nsWindowWatcher.cpp, line 1086
*** continued with five cross-platform unit test failures, three reftests and two mochitests
*** reed backed out {{bug|419452}}, one of two candidates for the perf regression
*** dbaron fixed the three reftests and one mochitest, which were from his checkin for {{bug|363248}}
*** myk backed out sicking's fix for {{bug|416534}}, which had caused the last mochitest failure
*** various folks speculated that the fxdbug-win32-tb assertion was random (it didn't show up on Mac or Linux)
*** myk reopened the tree, feeling that things were under control
*** reed backed out second perf regression candidate ({{bug|395609}}) when initial backout didn't resolve it
*** sicking fixed test failure in {{bug|416534}} and relanded
*** others started landing again
*** unit test tinderboxes cycled green
*** reed's second backout fixed perf regression
*** reed's first or second backout also fixed the fxdbug-win32-tb assertion
 
* Tuesday, February 26, 2008
** 5:10pm - 5:46pm
*** problem started at 5:06pm when qm-win2k3-01 turned red
*** reed said that machine frequently hits this random bustage and then recovers
*** reed previously noted at the top of the page that "qm-win2k3-01 is the only Windows unit test machine, so if it is orange or red, you should NOT check in."
*** myk, the sheriff for the day, closed the tree to wait out the apparently random failure and then reopened it when the next build came up green
*** reed thought there might be an old bug on the problem but wasn't sure, so dbaron filed bug {{bug|419761}} on the problem to make sure it's tracked and not forgotten
*** wolf also filed {{bug|419759}} on fixing or replacing winxp01 so we aren't entirely reliant on win2k3-01 for windows unittests
 
* Sunday, February 24, 2008
** 11:30am - 4:45pm (with orange lasting longer, with tree open)
*** problem started 9:23am, amid some other bustage
*** closed by dbaron a little after 11:30am
*** filed {{bug|419328}}: Windows unit test box stopped cycling
*** aravind hard-rebooted the box, came back with a bunch of popup tests orange.
*** test still orange after another cycle (forced by dbaron)
*** dbaron reopened tree 4:45pm despite the unfixed machine-related orange
*** joduinn rebooted the box again around 4:30pm
*** this time it came back with the color depth wrong, so the PNG reftests failed (but the mochitests worked)
*** color depth issue fixed 6:45am '''Monday'''
 
* Thursday, February 21, 2008
** 10pm - 12am
*** Dietrich
*** Closed to facilitate l10n string freeze
 
* Tuesday, February 19, 2008
** 9:10 PM - 10:45 PM
*** No sheriff (night)
*** Closed due to at least three people landing on orange ([http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1203480120.1203482373.3631.gz seemingly random TUnit failure on Windows]). The tree was too wide for anyone to notice the orange. [http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1203480120.1203484232.8866.gz A mochitest timeout on Linux], also seemingly random, occurred almost immediately after.
** 1:00 AM - 3:00 AM (guess)
*** No sheriff (night) (guess)
*** Closed for experimental landing of {{bug|399852}}.  The checkin stuck.
 
* Wednesday, February 13, 2008
** 1:00 PM - 1:40 PM (problem first noticed around 9:40 AM)
*** No sheriff
*** {{bug|417313}} -- graphs.mozilla.org can't keep up with data being submitted
*** reopened after machines started going green; db load lessened, but underlying issue has not been fixed.

Latest revision as of 16:41, 16 November 2023

Tree closure dates are no longer recorded here, please see the logs on: https://treestatus.mozilla-releng.net/