In the aftermath of 3.0.2+3.0.3, a few of us started looking at the stats around when releases go into scary territory (4 or more respins and/or followup regression fix releases). Some of us already suspected this to be the case, but essentially the more fixes we cram into a given timeframe, the higher the likelihood that we will end up in the bad place again. We've also fallen into the trap of taking patches because they're there and probably safe, but compounding risk and stacking QA's plate higher creates a much more difficult landscape.
http://spreadsheets.google.com/pub?key=pI03pN79HP0vol5hjHJDocw&gid=0 has the pure data (big thanks to Sam and Tim for getting the numbers right and cleaned up for easier perusal). is probably the best single graph showing the bottom line stats on when we succeed and when we don't.
Another factor (not yet gussied up with pretty pictures) is the tendency (due to various factors) for the bulk of the checkins for stable branches to land in the last week before code freeze, leaving QA relatively little time before release candidates come out and we're into verification steps. Thus, we're making it harder or even impossible for QA to verify test plan coverage and the quality of individual fixes. BFTs/FFTs don't catch the weird edge cases, and that's all we're really going to have time for if we continue back-loading the release cycle.
The net is that we've been stacking the deck against QA in a pretty crappy way, and despite their best efforts, we're still ending up with regressions that make their way to users (i.e. we've broken password manager twice in the last 18 months (first one was totally my fault)). Burning out QA on a regular basis for our typical releases and still having to do regression-fix releases means we have a lot of room for improvement, and a lot of reasons to try to change things.
There are three key elements to the plan, with the overall goal of reducing stress and pressure on QA, and reducing the number of respins/regression releases that we're forced to do.
1) Aggressively mitigate risk by setting a much higher bar for patch acceptance on the branch.
New Branch Criteria:
- In general, we will only accept patches which address one of the following:
- Security issues
- Topcrashes (top 20)
- Regressions from previous branch fixes.
- All patches must have tests provided by the developer:
- Automated tests for all cases that can be tested with current frameworks
- Analysis and plans for those tests that need to go into Litmus
- All exceptions to the above criteria will have to be explicitly approved by the appropriate product drivers (i.e. Beltzner and myself for front end issues, possibly shaver + vlad + damon for platform). As we are moving to a more frequent release schedule for feature releases, it is likely that exceptions will be granted for only the most critical user-facing issues.
2) Ship more frequently (typically four or five weeks) with smaller scope for each release.
This has a few key benefits, but the primary goal is to spread out the patches that are still going on branch into smaller increments so that we can stay on top of testing and driving. This should reduce the pressure to take patches for the current release, rather than when we're confident in the fix. I would like to decouple, as much as possible, fixing bugs for the branch and security releases. To do shorter cycle releases, especially with the schedule below, we'll need to start planning and driving the next release even before the current release has signoff and ships.
3) Change the structure of the release cycle to create more time for QA to verify and analyze fixes and test plans
This is perhaps the most radical change, especially given shorter cycles, but please bear with me. We already know that a large quantity of patches land in the last week of the "development" cycle, leaving QA with a large stack of work to do in a fairly limited time. More frequent cycles and significantly fewer patches do mitigate this in a real way, but I think we can still do better. I believe that giving QA time to really dig in on issues will have wider-reaching effects, from teaching developers to think more critically about how their code interacts with other code, and how to test effectively, to catching more issues before we get to release candidates.
Given all of that, we're proposing that we front-load landing patches, and only have the tree be open for a week per cycle, the week the previous release ships. Once we code freeze, QA will have up to two weeks to verify the bugs we took in the checkin phase. As we likely won't be taking more than 30 bugs per release in general (40 at the outside, but unlikely), we almost certainly won't need two full weeks. but we'll leave space in the general plan so we're sure it'll be enough. All bugs should be verified by QA before RelEng starts builds. Once we have candidate builds, since QA has already done a great deal of testing with the fixes landed, we should be able to push to the beta channel almost immediately, and do any remaining verification testing in parallel with the beta feedback period. Thus, we end up with something like this:
Week 1: Land fixes Week 2: Start bug verification Week 3: Finish verification/start builds Week 4: Beta baking + QA signoff