Release:Release Automation on Mercurial:Troubleshooting

From MozillaWiki
Jump to: navigation, search

<< Documentation

Paperwork

Some releases require creation of build notes when failures occur, others should already have a page created at release start - requirements. Either way, document all problems and their solutions.

Restarting failed builders without patching the config

Most builders can be safely rebuilt with the "rebuild" button. Individual locales can be triggered through the standalone repack builders. Ask someone if you don't know if it's safe to restart something or not.

Tagging failed out part way through

Don't try to recover from this, just do a new build.

Re-spinning a single locale

WARNING: These instructions are extremely out of date. Consider them nothing more than a vague guideline. Please think through this yourself if you need to do it. We sometimes run into the case where individual locales need to be re-spun. Some reasons this might happen include network timeouts or build slave failures. When this happens, you can follow the following steps to recover and re-spin the missing locale(s):

  1. Manually re-tag the locale's repository (if necessary). You can find the appropriate revision to use for tagging from the shipped-locales file, or from the l10n shipping dashboard
  2. Delete the current build of the locale and cleanup l10n build dirs on build slaves.
  3. Manually force the repack on each of the "$platform_standalone_repack" builders. See the Standalone_Repack_Builders section for details.
  4. Manually sign it and update the *SUMS files
    • You need to download the new locale builds to the signing machine, but you also need the SUM files, en-US Windows build (used for caching) and the zh-TW builds (monitoring tools check for this locale to know when the directory has changed).
    • Run sign-files on the new builds.
    • Manually update the SUMS files with new md5/sha1 sums.
    • Remove the .asc file for the en-US Windows build
    • Push the signed builds back to stage.
    • The Build Notes for 3.6b4 show an example of how this is done.
  5. Manually create a partial MAR for the locale
    • use an appropriate patcher_config file for your release.
    • on a linux slave (preferably a fast one), download the builds with patcher2.pl
    • use patcher2.pl to create the update MAR files and snippets.
    • ensure file (755) and directory (644) modes are correct for your created files.
    • transfer the MAR files to stage
    • transfer the update snippets to the aus2 server(s) <- there may be more than one
      • it is good practice to use a new directory name on the aus2 server to mark the new snippets as part of a distinct respin, e.g. 20091125-Firefox-3.6b4-fr-respin-test. Please also add that new directory name to the list of directories to be run through backupsnip/pushnip in the build notes.
    • The Build Notes for 3.6b4 show an example of how this is done.
  6. Re-run the update verify builder from the waterfall.

Overwriting files that have been pushed to releases/

If a rebuild happens after an earlier build has been pushed to mirrors already, a few steps need to be taken to make sure that the files can be pushed and that the CDN serves the content. (This is always the case for beta respins, as the prior build will have pushed to mirrors as part of automation.) The following should happen before "push to mirrors" runs in the new build. (If you're not in a rush, it's best to do these before kicking off the new release to make sure it does in fact happen in time):

  • Delete the directory from releases. For example:
# from any master...
ssh -i ~/.ssh/ffxbld_rsa ffxbld@stage.mozilla.org
# ffxbld@upload1
rm -rf /pub/mozilla.org/firefox/releases/19.0b20
  • File an IT bug to have the CDN caches purged. These should generally be filed as critical or blocker. Definitely file as a blocker if you're under time pressure.

If you don't delete the releases directory prior to "push to mirrors" running you'll end up with that builder and "check permissions" failing. These should be re-run after you delete the existing contents of the directory.

It's a good idea to verify that everything has been purged correctly, too. You can test the individual CDNs with the script (providing a current url). A sample run showing a stale file on one CDN error:

   $ ./check_cdn thunderbird/releases/34.0b1/update/linux-x86_64/zh-TW/thunderbird-33.0b1-34.0b1.partial.mar
   http://ftp.mozilla.org/pub/thunderbird/releases/34.0b1/update/linux-x86_64/zh-TW/thunderbird-33.0b1-34.0b1.partial.mar
   < Last-Modified: Thu, 13 Nov 2014 14:47:30 GMT
   < Content-Length: 16759161
   http://wildcard.cdn.mozilla.net.edgesuite.net/pub/thunderbird/releases/34.0b1/update/linux-x86_64/zh-TW/thunderbird-33.0b1-34.0b1.partial.mar
   < Last-Modified: Tue, 11 Nov 2014 00:00:54 GMT
   < Content-Length: 16759952
   http://cds.d6b5y3z2.hwcdn.net/pub/thunderbird/releases/34.0b1/update/linux-x86_64/zh-TW/thunderbird-33.0b1-34.0b1.partial.mar
   < Last-Modified: Thu, 13 Nov 2014 14:47:30 GMT
   < Content-Length: 16759161
   http://wpc.1237.edgecastcdn.net/pub/thunderbird/releases/34.0b1/update/linux-x86_64/zh-TW/thunderbird-33.0b1-34.0b1.partial.mar
   < Last-Modified: Thu, 13 Nov 2014 14:47:30 GMT
   < Content-Length: 16759161

The "final verify" builder should be rerun after the CDN is cleared. If the final verify fails again, it could be that the CDNs did not finish purging. Using the script above with the failing url's will show when that url is again valid. The current final-verify builder will pull from "random" CDNs, so a pass of final verify doesn't mean all files have been purged successfully. (Note that individual CDNs may not be consistent - see bug 1099048 for an example of that.)