Releases/Post-mortems/Firefox 3.6.7

From MozillaWiki
Jump to: navigation, search

Schedule / Location / Call Information

  • Tuesday, 2010-08-03 @ 1:00 pm PST
  • In Warp Core
  • 650-903-0800 x92 Conf# 8605 (US/INTL)
  • 1-800-707-2533 (pin 369) Conf# 8605 (US)
  • join irc.mozilla.org #planning for back channel

Overview

  1. Firefox 3.6.7 was released on July 20th, 2010, containing these fixes
  2. A regression with object/embed caused crash stats to spike
  3. The regression was actually caught and fixed on trunk on July 12th, 2010
  4. The regression had been nominated for blocking on July 2nd, 2010, but at the time it was not called a regression and the seriousness was not conveyed (if known)
  5. Firefox 3.6.8 was released on July 23rd, 2010

Things that went right

  • The bug was nominated for branch blocking, so it wasn't a case of the bug being ignored
  • Crash-stats quickly showed the issue/spike, making it easy to see the impact
  • Quick turn-around on 3.6.8
  • A regression that could have had similar effects was caught before shipping
  • Communication from affected parties was good, quick, and got to us
    • Sony contacted via Live-Chat, issue was recognized and escalated
  • Metrics saw it with a new trending URL/spidering system (before and after release)

Things that went wrong

  • Bug was fixed on trunk without an additional push to get it into the branches
  • The issue was not caught in unit testing, automated testing, QA manual testing, and living-on, even though it was essentially 100% reproducible
    • Is the ITMS store in the topsites? If not, should it be?
  • Christian actually caught it, but didn't raise alarms and wasn't able to tie it back to the regression bug. In the hand-off email to beltzner:
Firefox 3.6.7 is trending a little higher in crashes than 3.6.6 (http://crash-stats.mozilla.com/daily?form_selection=by_version&p=Firefox&v[]=3.6.7&throttle[]=100&v[]=3.6.6&throttle[]=10&v[]=&throttle[]=100&v[]=&throttle[]=100&hang_type=crash&os[]=Windows&os[]=Mac&os[]=Linux&date_start=2010-07-02&date_end=2010-07-16&submit=Generate)

I'm not sure if it's statistically significant, and I don't see any crashes on the top crasher report to be alarmed about, though I am watching "strlen | nsACString_internal::Assign(char const*, unsigned int)"....but it looks like it happened on 3.6.6 as well. Don't know why it spiked, though it looks like a QT, itunes, or ITMS page could have changed.

Suggested Improvements

  • Provide knobs to filter out known noise in the crash-stats reports (clegnitto)
  • Richer bug relations in bugzilla so that queries can be more robust and regression relationships are very clear. This is being worked on by Christian. It may not have caught this issue, though it will make sure others don't slip through (clegnitto)
  • Crash-stats should be viewed as part of the triage meetings (clegnitto)
  • Tomcat is running now top-site tests (top 25000 pages) on all Release Branches and platforms for every release to catch regressions on top sites (tomcat)
  • This bug on crash-stats will likely help (tomcat)
  • Better hand-off for top urls found during beta/release (beltzner)
  • Topsites testing should include locale-specific topsites
  • Topsites testing should include media sites, even if they aren't in top sites
  • Run automated topsites testing for each build
  • Security triage process should make sure that component owners are aware of new bugs (bsmedberg)
  • This should help prevent update fatigue if we screw up in the future