Release Management/Rapid Betas

From MozillaWiki
Jump to: navigation, search

This plan has been implemented in 2012. This page is kept just-in-case.

Benefits of Daily Betas

  • shorter verification turn-around time
  • freed up RelEng/QA resources
  • late-breaking beta issues can be much more easily resolved
  • more user testing of the latest bits over the 6 week cycle

Proposed Timeline

  • -4/23
    • Consider major pain points, begin scoping work.
  • 4/24-6/4
    • Complete scoping work, begin implementation
    • Communicate the plan to Mozilla community
    • Move to Tuesday go, Wednesday ship? (see QA questions below)
    • Any necessary in-product changes should land while FF15 is on m-c
  • 6/5-7/16
    • Complete implementation, begin process testing on Nightly/Aurora
  • Week of 7/17
    • Push out our first daily beta while FF15 on mozilla-beta
  • Update: We're now targeting 10/9 for the launch date for Rapid Betas, given higher B2G priorities in involved groups

In-Product

Questions

  • What do we do about non-admin users? What's the current experience? Can we back off on offering them an update by 7 days?
    • should be possible be possible by only incrementing the update xml's appVersion attribute once a week see bug 753103
  • Will the user be prompted at all to "apply the update now"? Can we disable that?
  • If the user's browser downloads an update, then the user doesn't close and open their browser for 3 days, what build will be applied?
  • Is the user going to be prompted every day to update, or will it be truely silent and the user won't notice it until the next day.
  • Will our current beta users be happy with the additional bandwidth of potentially downloading updates every day?

Blockers

  • Background updates will be landing in FF15.
    • Feature page
    • Tracking bug 307181, bug 753103
    • Telemetry bugs (for getting uptime data): bug 727184 - Increase granularity of telemetry uptime measurement, bug 733591 - Send uptime with telemetry data
    • Decreasing prompt level since download (if you don't close the browser): bug 754470 - Once we have daily betas, we should change the beta channel to prompt users about a downloaded update once a week

Automation

Previous Proposal

QA

  • Post: Perform triage of failed tests when they occur
  • Post: Continue to perform weekly manual testing
  • Pre: Decide on subset of automated testing to run daily

Automation

  • Pre: Receive and pass off token w/ RelEng (go and finish- could use Rabbit MQ for messages between the two automation harnesses)
  • Pre: Hooking all these automated tests into a harness, and report back results - blocked on new harness, updating tests (Q2)
  • Pre: Automating 4 manual tests (Flash video, Silverlight/Netflix video, Switch-to-tab, Downloading a file)
  • Pre: Automating update testing with plugins/add-ons installed
    • Pre: Creation of VMs/system images for profiles below

Automation (not blockers):

  • Pre: Maybe automating a crash report test (not a blocker, we check for this as part of crash-kill)
  • Pre: Maybe smoke testing really basic website functionality (should be covered by update testing in SoftVision)
  • Sync Smoketest scenario would be; create an account, add a second client to that account then sync data between the clients. Mozmill can do the account creation/add client. TPS can do the data syncing. TPS can run Mozmill tests. However, we don't have this scenario currently automated.

Release Management

  • Pre: List of platforms, plugins, and add-ons for update testing
  • Pre: Talk to engagement about new quality bar
    • Pre: Thinking about whether any top sites need to be tested live - can we get overnight QA test this?
  • Pre: Crash reporting volume on a chron job?
  • Post: we need to fix any test issues preventing automated daily sign-off

Addons & Plugin Automated Update testing

  • Create profiles as outlined below and run automated tests checking updates & startup with these profiles installed.
  • Based on information gathered here using these top add-ons.
  • Bug on Token passing between QA/Automation & RelEng Automation (bug 764042)
  • Suggested Profiles:
    • Each of the 4 profiles should have several media plugins, one AV plugin, and a couple of toolbars
    • Three of the profiles should have a mix of AMO and Non-AMO add-ons
    • One of the profiles should have older versions of the add-ons
    • Profile 1:
      • Media Plugins: Adobe Acrobat Reader, Java Virtual Machine, Microsoft Silverlight, Adobe Flash Player, Quicktime (itunes integration)
      • AV Plugin: AVG
      • Toolbars: Skype, Ask, AVG
      • Addons: AVG Safe Search, Skype, Java Quick Starter, Babylon, TestPilot, DownloadThemAll, NoScript, GreaseMonkey
    • Profile 2:
      • Media Plugins: Adobe Acrobat Reader, Java Virtual Machine, Microsoft Silverlight, Adobe Flash Player, RealPlayer
      • AV Plugin: avast!
      • Toolbars: Yahoo! Toolbar, SweetIM Toolbar
      • Addons: Skype, Babylon, Adblock Plus, RealPlayer Browser Record Plugin, TestPilot, Download Statusbar, FlashGot
    • Profile 3:
      • Media Plugins: Adobe Acrobat Reader, Java Virtual Machine, Microsoft Silverlight, Adobe Flash Player
      • AV Plugins: Norton, Kaspersky
      • Toolbars: Norton Toolbar, Yandex Toolbar
      • Addons: Norton IPS, HP Smart Web Printing, Video Download Helper, Kaspersky URL Advisor, IDM CC, DataMngr, virtualKeyboard@kaspersky.ru, Anti-Banner(Kaspersky), Easy YouTube Video Downloader
    • Profile 4 (old versions of addons):
      • Media Plugins: Adobe Acrobat Reader, Java Virtual Machine, Microsoft Silverlight, Adobe Flash Player, RealPlayer
      • AV Plugins: McAfee, AVG
      • Toolbars: Searchqu Toolbar
      • Addons: McAfee Script Scan (Extension Version: <= 14.4.0), McAfee SiteAdvisor, Adobe Acrobat - Create PDF (1.1), RealPlayer Browser Record Plugin (15.0.4), DealPly (1.0.7), Java Console (6.0.32), AVG Do Not Track (12.0.0.2166)

Questions

  • Do you care about live site testing? for instance, Netflix
  • Where are "fake" updates staged (necessary for addons - compatible test) (probably releasetest as per bug 748815)
  • What are the communication points that need to happen (what needs to be communicated and when between the QA & RelEng)
  • What can we do to keep automation running as smoothly as possible with known/unknown failures or automation glitches? ie: we will still need troubleshooting from QA when automation cuts out due to errors

Blockers

QA

Previous Actions

  • Clint to investigate sending cfg files as Pulse messages
  • [MISSED] Anthony to consult RelEng about their current status and to get answers to questions

QA sign-off strategy

This is just a proposal and currently open to debate

  • daily Mozmill automation results triage for regressions
  • daily bug triage: qawanted > incoming > fixed
  • stricter component watching/owernship for P1 bugs
  • twice per-month Beta Bugdays
  • twice per-month Beta Testdays
  • twice per-month Feature sign-off
  • twice per-month Compatibility sign-off
  • utilization of core community members, give them more responsibility (feature ownership, bug triage, event participation, etc)

Dependencies

Need to evaluate these dependencies to determine which are hard blockers

  • [ON TRACK] Improved stability and scalability of Mozmill automation
    • [DONE] 3x Windows nodes in PHX
    • [DONE] 3x Linux nodes in PHX
    • [ON TRACK] 3x Mac nodes in PHX
  • Pulse integration to notify when build is qualified and when updates/builds are ready to test
    • We have this for Aurora & Nightly builds, assuming the same or similar requirement for Rapid Betas
  • Buildbot integration
    • Delayed due to RelEng/B2G resource priorities
    • Not a firm requirement as we can do it partially automated like we do today (Pulse CI + on-demand CFG)
    • How often do we expect to have to qualify updates?
    • Can RelEng create an "updates available" Pulse signal?
  • Mozmill 2.0
    • Not a firm requirement as we lose nothing by sticking with Mozmill 1.5
    • What does Mozmill 2.0 give us?
  • Mozmill results in tbpl
    • What does this give us?
  • Integration of functional/update automation into RelEng workflow
    • Not a firm requirement but we'd like to see us take steps toward this

Outstanding Questions

Questions needing answers so we can further develop a strategy and identify potential dependencies

  • Breakdown of when sign off has uncovered new regressions in a beta? (and if it has, was it through automation or manual (human) tests?)
    • If we find that sign-off has not found new critical beta regressions, can we start shipping betas on Wednesdays after automated testing runs through (prior to daily betas being implemented)? Bug verification, exploratory testing, and other smoke tests could continue out-of-band.
  • Can automating the release test plan be done against nightly/aurora in preparation for moving to faster beta releases?
  • Can we automate mobile update testing? If so, we can even make the weekly push to the Play Store automatic.
  • What happens to the human mechanism of final sign-off before pushing live updates?
  • What's our "update" story for rapid betas?
    • How are people keeping updated?
    • If it's in the background people still need to restart?
    • Does this skew our data?
    • How will QA be notified that a new beta build is coming?
    • What is the mechanism/process to shut off updates?
  • What happens when builds are out on weekends/holidays?
  • How do we deal with things like patch tuesdays or new third-party releases?
  • Can we host cfg files on hg somewhere?
    • would give buildbot access to cfg files
    • gives QA the ability to alter testruns by pushing changes to the cfg files
    • no gating on RelEng/IT to make changes to our testruns
    • hg.mo/qa/mozmill-automation is probably the sanest place to host the cfg files
  • What is our current Nightly automation story and can it be scaled to Betas?
  • What is the strategy for Stability triage and tracking?

Notes from 2013-01-17

  • identify the minimum testing to evaluate a beta as "usable"
  • once turnaround time is 2-4 hours we can start doing at least a couple Betas a week
  • as manual tasks and handoffs are automated we can move more toward daily Betas
  • the rest of the qualification (testdays, features, compatibility checks, triage) can happen out of band
  • need a checklist to ensure all things are signed off by the end of the cycle
  • Q1 goal: tuesday and thursday Betas -- what needs to happen from QA and Softvision

Support

Questions

  • any pain points in supporting Aurora? (expand those into to-do items for supporting daily betas)

Blockers

RelEng/IT

Questions

  • any issues with automated pushsnip to beta channel?
    • Needs to be gated on QA. We already automatically push to betatest. Catlee 12:48, 13 April 2012 (PDT)
  • can we add inputs from QA to incorporate the sign-off as part of automation?
  • any possible issues with load (or mirrors) if we move to more frequent releases of beta on bouncer?
  • Using the signing key appears to be a manual process currently, can this be automated?
    • Already is! (except for android) - Catlee 12:39, 13 April 2012 (PDT)
    • note: upcoming osx10.8 signing is explicitly excluded
  • Can we do the work associated with daily betas be done in such a way that we could move back to weekly betas in the case of unforseen problems?
  • Do we need to tag/package source for daily betas?

Blockers

  • Tracking bug for RelEng work is bug 747168. Most notable items on it are:
  • unrelated but blocker: supporting new OS/platforms.

Crash-Kill Team & Soccorro

Questions

  • What are the implications of switching to daily betas? How would crash analysis be different than m-c or m-a?
  • Do we need to implement windowing over multiple build IDs to emulate having a large beta population on a single beta?
  • What is a unit of 'enough' user testing data (days?) for getting feedback on a regression fix?

Blockers

L10N

Answers/Resolutions

  • Keep sign-offs on release and beta channel, continue without signoffs on aurora and central
  • Drop the use of milestones at least for daily betas
  • Get l10n-changesets from urls that give the current signed-off state for a product/version
    • These urls need to updated on migration day every six weeks
    • These urls are already existing, using the av= keyword argument with the right code instead of a ms= keyword

Questions

  • What do we do with l10n milestones?
  • Could we work towards automatically pulling from a beta branch? This doesn't really scale well.
  • Could l10n_changesets be dynamically created from the csets on beta branch for locales in shipped_locales?
  • Can l10n sign offs be automated? Test suites that check for anything other than string changes?

New questions:

  • What happens with non-daily betas, notably TB/SeaMonkey? (No change, continue as usual)
  • Do we need a "sealed" status for release, or is that good to go without milestones, too? (That is up to the needs of the l10n drivers, do you want a tagged repo every 6 weeks when a release goes out?)

Blockers

  • None, but a releng bug on using the new url: bug 751703

Release Management

Update - RelEng (and lsblakk) met and discussed the complete process of releasing betas and have scoped the work involved in meeting this project's goals here

Questions

  • What should we call beta nightlies (Rapid Betas)? A: Daily Betas (and we are not 'announcing' this like it's something new we're doing)
  • How do we communicate beta nightlies?
  • The beta release process is a useful tool for verifying the final release process. If we change the final release process, how will we verify the changes or bring new people up to speed without affecting the timing of the final releases?
    • Do we need to build in a slightly larger window for the final release to allow for these types of issues?

Blockers

  • Need to change the process of approving for Beta to first go through Aurora, for sake of risk mitigation
  • We need to watch for more stretch across multiple versions of beta than we currently have across weekly betas
  • Feature request: getting BuildIDs in Input to get more data on where feedback is coming from
  • What's new page - either always on or always off but we'll need a better plan

Metrics

We did a sample analysis on the crash report data for beta for last 10 days of april. The 9 out of the top 10 crashers in the 1st hour remain in the top 10 of the last hour of the 10 day period. We probably will not be able to pick up enough reports to catch the rare crashes on day 1, so these will persist in subsequent builds and will be picked up later.

In summary: - Using Daily Builds we will be able to push bug fixes faster. - Rare signatures will be picked up later (once enough reports come in) and in this regard Daily Builds is not much different from weekly builds.

Product Marketing & PR

Update - Grace & Laura discussed; EJ added input; Laura discussed in channel meeting, wrote down answers.

Questions & Responses

  • This will have an impact on how we could deploy a What's New Page which we've planning to do for sometime. We will likely still be able to, but we'll have to coordinate much closer on when and how those pages are turned off and on. What would be the details of getting a paged turned on or off?
    • No clear answer on this. This is not a blocker, but something to keep in mind for the future to find a solution.
  • How will this effect release notes? Will we update them every day? Once a week?
    • Will keep doing release notes the way we do Aurora--stand them up with initial release and make minor changes as needed; don't think we will see major changes between first release and end of the cycle.
  • How will this effect how we track comments/feedback on Input?
    • Yes, it will. Will need a way to correlate comments to buildIDs.
  • Will this effect how we track ADUs/ADIs with metrics tools?
    • Yes, Alex is filing a bug.
  • How will users be alerted to the updates? Will it cause update fatigue?
    • No because this will only be implemented after silent updates.
  • Is this the new way of doing things or a short term thing we are testing?
    • Its the plan for the foreseeable future, but may be changed depending on resources, needs, etc.
  • Does this mean new features can land in beta?
    • It makes it less risky to land features, but any feature additions will be discussed in depth before landing; don't think this will cause an increase in the landing of new features.
  • Do we have to update our communications around what to expect for each channel?
    • I don't think it does, apart from a general blog post announcing the change.
  • Will this change the release timeline?
    • No, this shouldn't effect release timeline at all.

Blockers & Answers

  • Blog Post communicating change on FoF; must be reviewed by PMMs and PR
    • This will be done in conjunction with Alex Keybl, Mark LaVine/EJ and LMesa; timing tbd.
  • Figuring out how to make release notes work for all audiences/communications
    • Don't think this will be a huge concern; see above.
  • This will impact if/when/how often we want to communicate about betas
    • This is unlikely as the changes that will be done between betas will probably stay with bug fixes, security fixes or feature back outs.
  • Updating all channel comms
    • Unlikely outside of blog post mentioned above. Ej/LM/ML to discuss.

Project Status

Release Management

Bug # or Issue Status ETA Contact
Need to change the process of approving for Beta to first go through Aurora, for sake of risk mitigation on track
We need to watch for more stretch across multiple versions of beta than we currently have across weekly betas on track
Feature request: getting BuildIDs in Input to get more data on where feedback is coming from bug 761781 filed
What's new page - either always on or always off but we'll need a better plan on track
Compiling a list of addons & plugins to create automated tests for, to bump up the reliability of automated QA signoffs on track Lukas

Tracked Bugs

Meta: bug 755978

Open

Full Query
ID Summary Priority Status
754470 Once we have rapid betas, we should change the beta channel to prompt users about a downloaded update once a week -- NEW
756302 Automate all applicable smoke tests to be able to scale to the daily betas process -- NEW
764042 Accept token from RelEng automation and send token back to trigger continued automation -- NEW

3 Total; 3 Open (100%); 0 Resolved (0%); 0 Verified (0%);

Resolved

Full Query
ID Summary Priority Status
598757 Running out of space for symbols P3 RESOLVED
714806 Pulse message for nightly builds do not contain previous_buildid P3 RESOLVED
727184 Increase granularity of telemetry uptime measurement -- RESOLVED
733591 Send uptime with telemetry data -- RESOLVED
747168 [tracking bug] release automation support for daily betas P3 RESOLVED
751703 release_sanity.py needs to check locales/locale repos from a different url P3 RESOLVED
752956 [tracker] Add Rapid Beta support to Socorro -- RESOLVED
753103 Only bump the update xml's appVersion attribute once a week P3 RESOLVED
754050 need to implement mechanism to filter input for rapid betas -- RESOLVED
758101 Re-enable background updates on mozilla-central -- RESOLVED
761781 Feature Request: Input.mozilla should get BuildIDs from users P3 RESOLVED
771218 Ongoing problems with Socorro staging P1 RESOLVED

12 Total; 0 Open (0%); 12 Resolved (100%); 0 Verified (0%);


In-Product

Bug # or Issue Status ETA Contact
Background updates landed in FF15, disabled in Windows for now FF15 rstrong

Tracked bugs

Full Query
ID Summary Priority Status
307181 Eliminate wait while restarting Firefox after update (apply update in background) -- RESOLVED
727184 Increase granularity of telemetry uptime measurement -- RESOLVED
733591 Send uptime with telemetry data -- RESOLVED
754470 Once we have rapid betas, we should change the beta channel to prompt users about a downloaded update once a week -- NEW

4 Total; 1 Open (25%); 3 Resolved (75%); 0 Verified (0%);


Automation

Bug # or Issue Status ETA Contact
New test suites needing automation to bump up the reliability of auto-signoffs waiting on Lukas for addons & plugins groupings for automated update testing & for QA to determine what else needs automation ctalbert
Messaging/Token passing with RelEng for complete automation e2e needs implementation decision by Automation/RelEng ctalbert

QA

Bug # or Issue Status ETA Contact
Increase automation coverage and integrate into build/push flow waiting for confirmed plan of action, ETA ashughes
Descoping of unnecessary "busy" work, scoping coverage increases, and risk analysis waiting for confirmed plan of action, ETA TBD ashughes

Tracked Bugs

Full Query
ID Summary Priority Status
756302 Automate all applicable smoke tests to be able to scale to the daily betas process -- NEW

1 Total; 1 Open (100%); 0 Resolved (0%); 0 Verified (0%);


RelEng/IT

Bug # or Issue Status ETA Contact
Implementing daily check for beta repo builds (build when there are changes) blocked on dealing with new OS/platforms joduinn
Messaging/Token passing with RelEng for complete automation e2e needs implementation decision by Automation/RelEng

Tracked bugs

Meta: bug 747168

Open

Full Query
ID Summary Priority Status
748798 replace ReleaseUpdatesFactory with a simplified mozharness script P3 NEW
764042 Accept token from RelEng automation and send token back to trigger continued automation -- NEW

2 Total; 2 Open (100%); 0 Resolved (0%); 0 Verified (0%);

Resolved

Full Query
ID Summary Priority Status
554343 Release builders should always clobber P3 RESOLVED
594930 Release automation should pushsnip automatically P4 RESOLVED
692501 run update verify against beta-cdntest for betas P3 RESOLVED
705807 Android signing-on-demand P3 RESOLVED
725839 adjust post_upload.py to use new candidates mount P2 RESOLVED
748763 automatically version bump release config for rapid betas P3 RESOLVED
748773 read revisions required for a release from a common file P3 RESOLVED
748794 read en-US revision for release builds on the client side P3 RESOLVED
748796 replace SingleSourceFactory with a mozharness script P3 RESOLVED
748800 replace TuxedoEntrySubmitterFactory with a mozharness script P3 RESOLVED
748801 run snippet optimization as part of the release process P3 RESOLVED
748811 automatically archive/remove old candidates directories P3 RESOLVED
748815 run update verify against releasetest channel for rapid betas P3 RESOLVED
749188 carry forward reference to release manifest through entire release automation P3 RESOLVED
749190 adjust builders to read revisions and other release-specific information from a manifest P3 RESOLVED
749312 Firefox beta release source requirements ? P2 RESOLVED
749587 use properties for things in release MailNotifiers that change every release P3 RESOLVED
750294 Update about:license to give link to Hg rev and build instructions -- RESOLVED
751703 release_sanity.py needs to check locales/locale repos from a different url P3 RESOLVED
753103 Only bump the update xml's appVersion attribute once a week P3 RESOLVED
754290 decide whether or not we'll be pushing to internal mirrors for rapid betas P3 RESOLVED

21 Total; 0 Open (0%); 21 Resolved (100%); 0 Verified (0%);


Crash-Kill Team & Soccorro

Bug # or Issue Status ETA Contact
Implementation of Socorro handling for daily betas project is blocked on bug 771218 7/17 Smooney/Laura

Tracked bugs

Meta: bug 752956

Open

No results.

0 Total; 0 Open (0%); 0 Resolved (0%); 0 Verified (0%);

Resolved

Full Query
ID Summary Priority Status
755293 [TRACKER] UI changes to support rapid betas P1 RESOLVED
755297 Database changes to support rapid betas P1 RESOLVED
755301 Middleware changes to support rapid betas P1 RESOLVED
755304 Audit FTP scraper cron job for rapid beta support -- RESOLVED
771218 Ongoing problems with Socorro staging P1 RESOLVED
773835 New cron job list for Mobeta -- RESOLVED
778255 remove code/tests/cronjobs which use obsolete pre-mobeta tables -- RESOLVED

7 Total; 0 Open (0%); 7 Resolved (100%); 0 Verified (0%);


Product Marketing & PR

Bug # or Issue Status ETA Contact
Updating all channel comms (blog posts, etc) on track lforrest
Updating release notes to work for all audiences/communications on track lforrest
Naming of this change (DONE: 'daily betas', this is not-news, we aren't doing anything marketing-wise for this) on track lforrest

L10N

Bug # or Issue Status ETA Contact
No current issues (RelEng has the bug on automating the handling of l10n changesets) on track Axel

Metrics

Bug # or Issue Status ETA Contact
No current issues on track

Support

Bug # or Issue Status ETA Contact
No current issues on track Cheng

Legend

  Healthy: work is progressing as expected.
  Blocked: work is currently blocked.
  At Risk: work is at risk of missing the targeted completion date.
ETA Estimated date for completion of the current task. Overall ETA for this work is 7/17.