Releases:Release Post Mortem:2016-02-24
From MozillaWiki
Meeting Details
- 3:30pm ET
- Vidyo - Release Engineering room
- Dial-in:
- 650-903-0800 or 650-215-1282 x92 Conf# 98225 (US/INTL)
- 1-800-707-2533 (pin 369) Conf# 98225 (US)
- #releaseduty on irc.mozilla.org
- https://trello.com/b/MXHaVRcP/release-promotion-meeting
« previous week |
index |
next week »
< most recent |
upcoming >
Contents
Release Duty
- FF 45 cycle: mtabara
Misc
- ESR45 branch set-up in progress
Shipped
Firefox 45.0b9 (bhearsum/nick/mtabara)
Issues, Build 1:
- a couple of auto-retries in Fennec repacks, one for a terminated instance and one for out of space
- Both Firefox Linux builds failed in symbol upload due to https://bugzilla.mozilla.org/show_bug.cgi?id=1250374
- Had to be rebuilt manually
- Got killed afterwards, because we want a build2.
Issues, Build 2:
- Fennec push to mirrors failed because build1 had already run it.
- Removed build1 from the CDN and replaced it with build2. This is less than ideal, because some things will have cached build1, but the CDN isn't our primary means of distribution so it shouldn't hurt to much.
- The usual update verify fails related to Linux GTK.
- AV builder failed with:
00:34:41 FATAL - IncompleteRead: IncompleteRead(0 bytes read, 40957406 more expected)
- Retrigger worked. Probably should make this job more resilient (or maybe BeetMover already is?)
- Replication issue between the Balrog rw and ro database caused delay in shipping. More details in bug 1250940
Firefox 45.0b8 (bhearsum/nick/mtabara)
- TODO
Thunderbird_45.0b2 (jlund/rail/callek/nick/mtabara)
- mozilla commit: https://hg.mozilla.org/releases/mozilla-beta/rev/THUNDERBIRD450b2_2016020915_RELBRANCH
- comm commit: https://hg.mozilla.org/releases/comm-beta/rev/9e83eb98346f
Issues:
- win32 build step failed for Thunderbird 45.0b2 build1 - nthomas: "we needed to recreate AWS instances of windows builders, and this was collateral damage" - there was a windows slave outage in bug 1249499 so is probably related.
- failed at repack_7/10 on win32 - release_repacks timed-out, retriggered.
- failed at repack_7/10 on linux - intermittent cloning issue for mercurial due to some amazon certs. Automatic retry succeeded ~1h after
- failed at repack_9/10 on win32 - two consecutive fails due to slaves dropping connection. I suspect it's a tail of the AWS windows builders instances recreation.
- failed at repack_2/10 on win32 - the same story as 9/10 win32
Firefox 45.0b7 (nick/mtabara/rail/jlund)
- https://hg.mozilla.org/releases/mozilla-beta/rev/49d5c178d339
- both windows en-US build just disappeared, something had marked them as complete and results=2 (failed) in the schedulerdb at ~18:39 PST. This is in the middle of the windows slave outage in bug 1249499 so is probably related. DB state was
mysql> select br.id, br.buildername, from_unixtime(br.complete_at) as complete, br.complete, br.results, substr(br.claimed_by_name,1,20) as claimed_by, from_unixtime(b.start_time) as start, from_unixtime(b.finish_time) as finish from buildrequests as br left join builds as b on b.brid=br.id where br.buildername like 'release-mozilla-beta-%_build' order by br.id desc limit 6; +----------+-------------------------------------+---------------------+----------+---------+----------------------+---------------------+---------------------+ | id | buildername | complete | complete | results | claimed_by | start | finish | +----------+-------------------------------------+---------------------+----------+---------+----------------------+---------------------+---------------------+ | 98369068 | release-mozilla-beta-win64_build | 2016-02-18 18:39:51 | 1 | 2 | NULL | NULL | NULL | | 98369067 | release-mozilla-beta-macosx64_build | 2016-02-18 19:09:57 | 1 | 0 | buildbot-master84.bb | 2016-02-18 17:18:46 | 2016-02-18 19:09:57 | | 98369066 | release-mozilla-beta-win32_build | 2016-02-18 18:39:28 | 1 | 2 | NULL | NULL | NULL | | 98369065 | release-mozilla-beta-linux64_build | 2016-02-18 19:33:19 | 1 | 0 | buildbot-master74.bb | 2016-02-18 17:18:49 | 2016-02-18 19:33:19 | | 98369064 | release-mozilla-beta-linux_build | 2016-02-18 19:50:35 | 1 | 0 | buildbot-master74.bb | 2016-02-18 17:18:49 | 2016-02-18 19:50:35 | | 97867785 | release-mozilla-beta-win64_build | 2016-02-15 18:00:55 | 1 | 0 | buildbot-master70.bb | 2016-02-15 14:10:23 | 2016-02-15 18:00:55 | +----------+-------------------------------------+---------------------+----------+---------+----------------------+---------------------+---------------------+
- fixed with this sql to reset the buildrequest state, with the builds starting very quickly with the right buildID and tag properties
mysql> select id, buildername, complete, results from buildrequests where buildername like 'release-mozilla-beta-win%_build' order by id desc limit 4; +----------+----------------------------------+----------+---------+ | id | buildername | complete | results | +----------+----------------------------------+----------+---------+ | 98369068 | release-mozilla-beta-win64_build | 1 | 2 | | 98369066 | release-mozilla-beta-win32_build | 1 | 2 | | 97867785 | release-mozilla-beta-win64_build | 1 | 0 | | 97867783 | release-mozilla-beta-win32_build | 1 | 0 | +----------+----------------------------------+----------+---------+ mysql> update buildrequests set complete=0, results=NULL where id in (98369068, 98369066) limit 2; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0
Fennec 45.0b6 (nick/mtabara/rail)
- awaiting the Google Play email to run the post-release and move this to the Shipped section
- "Starting the build 6 without the last Hello changes. They broke m-b"
Roundtable
- bhearsum: should we really be blocking shipping a chemspill release on what's new page configuration? I don't have opinion on this particular what's new page, but holding back in-the-wild fixes because of a what's new page seems bad
Context:
20:12:19 <bhearsum> a question for the postmortem, maybe: should we really be blocking shipping a chemspill release on what's new page configuration? 20:12:55 <rail> we should stop showing it 20:12:59 — ~mtabara agrees 20:13:19 <rail> we are about to ship esr45 :) 20:13:27 <bhearsum> i don't have opinion on this particular what's new page 20:13:55 <bhearsum> but holding back in-the-wild fixes because of a what's new page seems bad 20:14:59 <lizzard> we’r only holding it back for a short time 20:15:03 <lizzard> but good question.... 20:15:32 <bhearsum> yeah, and it's only for esr in this case 20:15:41 <bhearsum> i doubt it made a practical difference for that userbase 20:15:51 <lizzard> For esr, i can’t imagine enterprise folks can deploy this so quickly as to mind an hour’s diference 20:15:52 <bhearsum> but if were the firefox release channel it might be a different story 20:17:26 <bhearsum> i guess it's also an important point that screwing up the WNP has more effect on the release channel
- mtabara: while deploying TB 38.6.0 with nthomas we had to change the balrog update rates to 50%. While attempting we realized they were already change and there has been some strictly UI issue in Balrog as rate changes did not shown on rule history - bug 1248475. nthomas did a db query to find the answers we were then looking for:
mysql> select change_id, changed_by, from_unixtime(substr(timestamp, 1, 10)) as timestamp, backgroundRate from rules_history where rule_id=170 order by change_id desc limit 10; +-----------+---------------------+----------------------------+----------------+ | change_id | changed_by | timestamp | backgroundRate | +-----------+---------------------+----------------------------+----------------+ | 4423 | tbirdbld | 2016-02-15 20:41:30.000000 | 50 | | 3890 | tbirdbld | 2016-01-07 22:21:11.000000 | 50 | | 3889 | jlund@mozilla.com | 2016-01-07 22:20:21.000000 | 50 | | 3810 | jwood@mozilla.com | 2015-12-30 16:03:00.000000 | 100 | | 3740 | raliiev@mozilla.com | 2015-12-23 19:01:11.000000 | 30 | | 3734 | tbirdbld | 2015-12-23 15:18:47.000000 | 30 | | 3733 | raliiev@mozilla.com | 2015-12-23 15:16:31.000000 | 30 | | 3425 | jlund@mozilla.com | 2015-12-02 18:54:06.000000 | 100 | | 3378 | jlund@mozilla.com | 2015-11-27 17:07:23.000000 | 0 | | 3334 | tbirdbld | 2015-11-25 18:58:35.000000 | 30 | +-----------+---------------------+----------------------------+----------------+
Question for bhearsum: any chance I can get *read-only* access on that DB as well for future scenarios?