Sandbox:Release:Automation:Troubleshooting

From MozillaWiki
Jump to: navigation, search

Common Problems & Resolutions

Restarting the automation from a specific point

The automation has the ability to easily skip tagging, source generation, or en-US builds in the event that you want to restart from source, en-US, or l10n repacks. To make use of this, you need to add skip_$builder flags to the release config. For example, to restart from l10n repacks you will need to add the following lines to the release config:

releaseConfig['skip_tag'] = 1
releaseConfig['skip_source'] = 1
releaseConfig['skip_build'] = 1

Only these three steps (and repo_setup, if you're in staging) are skippable. If you need to restart in later parts of the automation, "Force Build" is the supported method of doing so.

Restarting failed builders without patching the config

It's often easier to use Force Build or buildbot sendchange to restart specific things than it is to patch the configs, check in, update the master, etc. If you need to restart tag, source, an en-US build, l10nverify, updates, or update_verify you can use "Force Build" from the Buildbot waterfall.

If you need to restart a single l10n build (that is, one locale) on a single platform you may also use "Force Build", but be sure to use en_revision, l10n_revision, and locale as properties. The revision properties should be set to _RELEASE tags, and the locale should be obvious.

If you need to restart many or all locales you'll have to fill out the web interface multiple times, until bug 518589 is resolved.

Restarting a builder from a certain point

In some cases only part of a Builder will successfully complete. For example, if the 'updates' builder was able to build partial MARs successfully but failed to upload them you will want to figure out what the problem is, and then restart it from the upload step. To do that, open up /tools/buildbotcustom/buildbotcustom/process/factory.py and comment out all steps before uploading. You should also find out which slave it was started on, and update the 'slavenames' for 'updates' in release_master.py to ensure the job gets started on the same slave.

Depending on how much of a builder is left to run it can be faster and/or easier to simply run the steps manually and continue on. For example, if backupsnip or pushsnip in 'updates' failed, it's probably easier to run those steps manually rather than go to the trouble of starting another job.

In other cases, it may make sense just to start from the beginning. Eg, if the build tools repository fails to clone, it's generally easier just to start from scratch.

Use your judgment here.

Tagging failed out part way through

If this happens, you'll need to continue it on but get it to skip locales which it has already tagged. To do so, delete any locales which have been tagged from l10n-changesets and reconfig. If the source repository has already been tagged you should pass 'l10n_repositories' to ReleaseTaggingFactory instead of 'repositories. Note that the rest of the release automation uses shipped-locales so removing things from l10n-changesets doesn't cause locales not to build.

Re-spinning a single locale

We sometimes run into the case where individual locales need to be re-spun. Some reasons this might happen include network timeouts or build slave failures. When this happens, you can follow the following steps to recover and re-spin the missing locale(s):

  1. Manually re-tag the locale's repository (if necessary). You can find the appropriate revision to use for tagging from the shipped-locales file, or from the l10n shipping dashboard
  2. Delete the current build of the locale and cleanup l10n build dirs on build slaves.
  3. Manually force the repack on each of the "$platform_standalone_repack" builders. See the Standalone_Repack_Builders section for details.
  4. Manually sign it and update the *SUMS files
    • You need to download the new locale builds to the signing machine, but you also need the SUM files, en-US Windows build (used for caching) and the zh-TW builds (monitoring tools check for this locale to know when the directory has changed).
    • Run sign-files on the new builds.
    • Manually update the SUMS files with new md5/sha1 sums.
    • Remove the .asc file for the en-US Windows build
    • Push the signed builds back to stage.
    • The Build Notes for 3.6b4 show an example of how this is done.
  5. Manually create a partial MAR for the locale
    • use an appropriate patcher_config file for your release.
    • on a linux slave (preferably a fast one), download the builds with patcher2.pl
    • use patcher2.pl to create the update MAR files and snippets.
    • ensure file (755) and directory (644) modes are correct for your created files.
    • transfer the MAR files to stage
    • transfer the update snippets to the aus2 server(s) <- there may be more than one
      • it is good practice to use a new directory name on the aus2 server to mark the new snippets as part of a distinct respin, e.g. 20091125-Firefox-3.6b4-fr-respin-test. Please also add that new directory name to the list of directories to be run through backupsnip/pushnip in the build notes.
    • The Build Notes for 3.6b4 show an example of how this is done.
  6. Re-run the update verify builder from the waterfall.

Staging Specific Notes

Release automation in staging is mostly the same as in production, but does have a few differences you should know about:

  • Use the config files with the prefix "staging_". These have many values already set correctly for staging.
  • All uploading (builds, snippets, symbols) is done to dev-stage01.build.mozilla.org
  • It can take a few tries to get repo_setup to run properly. This is because hgweb sometimes returns a 500 (internal server error) when we query about a locale. The best solution is just to start the automation from scratch until it works, to make sure you get a clean run. If this is too frustrating for you, you can manually clone the repositories you care about and start automation from tag (see Restarting from a specific point).
  • Staging doesn't have a lot of slaves, and you may need to go around and stop running builds to get your automation run to happen in a reasonable period of time.

Staging specific preparation

Download the previous release

Because we point the staging releases at dev-stage01.build.mozilla.org the previous release must be downloaded to it. This is done by the "release_downloader" builder which is fired by a sendchange sent by release_sanity.py. It automatically removes any candidates or relase dir on dev-stage01 which already exists.

The builder could be disabled by setting releaseConfig['skip_release_download'] = True

Doing it by hand

If you need to download the previous release by hand for some reason you can use the following shell script to do so. Note that the variables are for the *previous* release, not the one you will be running:

# ffxbld @ dev-stage01
export VERSION=3.5.3
export BUILD=1
export PRODUCT=firefox
cd /home/ftp/pub/$PRODUCT/nightly
mkdir -p $VERSION-candidates/build$BUILD
cd $VERSION-candidates/build$BUILD
wget -r -np -nH --cut-dirs=6 -R index.html* -R *unsigned* http://stage.mozilla.org/pub/mozilla.org/$PRODUCT/nightly/$VERSION-candidates/build$BUILD/
cd /home/ftp/pub/$PRODUCT/releases
ln -s /home/ftp/pub/$PRODUCT/nightly/$VERSION-candidates/build$BUILD $VERSION

If you're doing a testrun with a limited number of locales you may delete any locales you don't care about after the above script finishes. (or you can add "-R *locale*", etc for each unwanted locale)

Buildbot configs

When working in staging you'll need to swap out the buildbot-configs, build/tools, buildbotcustom, and compare-locales repos for either stage-ffxbld ones, or your own. Here's an example of the changes you'll need:

diff --git a/mozilla/config.py b/mozilla/config.py
--- a/mozilla/config.py
+++ b/mozilla/config.py
@@ -43,1 +43,1 @@ GLOBAL_VARS = {
-    'compare_locales_repo_path': 'build/compare-locales',
+    'compare_locales_repo_path': 'users/bhearsum_mozilla.com/compare-locales',
diff --git a/mozilla/staging_config.py b/mozilla/staging_config.py
--- a/mozilla/staging_config.py
+++ b/mozilla/staging_config.py
@@ -37,2 +37,2 @@ GLOBAL_VARS = {
-    'config_repo_path': 'build/buildbot-configs',
-    'buildbotcustom_repo_path': 'build/buildbotcustom',
+    'config_repo_path': 'users/bhearsum_mozilla.com/buildbot-configs',
+    'buildbotcustom_repo_path': 'users/bhearsum_mozilla.com/buildbotcustom',
diff --git a/mozilla/staging_release-firefox-mozilla-1.9.2.py b/mozilla/staging_release-firefox-mozilla-1.9.2.py
--- a/mozilla/staging_release-firefox-mozilla-1.9.2.py
+++ b/mozilla/staging_release-firefox-mozilla-1.9.2.py
@@ -106,1 +106,1 @@ releaseConfig['doPartnerRepacks']    = F
-releaseConfig['partnersRepoPath']    = 'users/stage-ffxbld/partner-repacks'
+releaseConfig['partnersRepoPath']    = 'users/armenzg_mozilla.com/partner-repacks'
@@ -132,1 +132,1 @@ releaseConfig['enable_repo_setup'] = Tru
-releaseConfig['build_tools_repo_path'] = "users/stage-ffxbld/tools"
+releaseConfig['build_tools_repo_path'] = "users/asasaki_mozilla.com/tools"

As with production releases, you must tag these repositories with the _RELEASE tag prior to starting the release.


It is possible to use your own HG repository (long term release tests, parallel run, etc). Instead of running repo_setup builder, you can use the following script (run it on your laptop, using LDAP credentials):

#!/bin/bash

for repo in compare-locales tools buildbotcustom buildbot buildbot-configs partner-repacks; do
    echo "deleting $repo"
    ssh hg.mozilla.org edit $repo delete YES
    echo "cloning $repo"
    ssh hg.mozilla.org clone $repo build/$repo
done
 
for l in `wget -q -O- http://hg.mozilla.org/mozilla-central/raw-file/tip/browser/locales/shipped-locales |grep -v en-US | awk '{print $1}'`; do
    echo "deleting locale $l"
    ssh hg.mozilla.org edit $l delete YES
    echo "cloning locale $l"
    ssh hg.mozilla.org clone $l l10n-central/$l
done

ssh hg.mozilla.org edit mozilla-central delete YES
ssh hg.mozilla.org clone mozilla-central mozilla-central

Doing a test run with a limited number of locales

To run a test with a limited number of locales, do the following:

  1. Modify l10n-changesets to include whichever locales you want.
  2. Reconfig Buildbot and start the automation as normal
  3. Once the tag builder is finished and before the end of the first en-US build, clone the staging source repository (eg, http://hg.mozilla.org/users/stage-ffxbld/mozilla-1.9.1) do the following:
hg up -C GECKO191_20090115_RELBRANCH # replace the relbranch appropriately, of course
# modify browser/locales/shipped-locales to include the same locales as l10n-changesets
hg commit -m "Reduce number of locales for this test run"
hg tag -f FIREFOX_3_1b3_RELEASE # using the right version number in the tag
hg push ssh://hg.mozilla.org/users/stage-ffxbld/mozilla-1.9.1

If you don't want to clone the full repo you can do this on the slave that did the tagging.

How to sign in staging

Signing with the staging keys

When you're specifically testing something related to signing or doing a full end to end run it's best to sign the builds on the staging signing server. Doing so is very similar to signing production builds and fully documented in the CombinedSigning doc.

Faking it out

If you're not looking to test signing you can speed up your staging run a bit by shuffling files around so post-signing steps can find them. To do this, log onto dev-stage01.build.mozilla.org and do the following:

# ffxbld@dev-stage01
VERSION=3.5rc1
BUILD=1
cd /home/ftp/pub/firefox/nightly/${VERSION}-candidates/build${BUILD}
mkdir win32 update/win32
rsync -av --exclude=*.zip unsigned/win32/ win32/
rsync -av unsigned/update/win32/ update/win32/
rsync -av unsigned/win32_info.txt .
echo "faked" > win32_signing_build${BUILD}.log

We purposely make copies here rather than symlinking for a couple of reasons: L10n verify scripts barf when they get zip files (hence the --exclude above), 'updates' factory will blow away complete MARs upon upload if update/win32 is a symlink. The echo creates the log the automation looks for, in order to continue to l10n verify and updates.

Creating a CVS mirror for patcher and configs

If you need a new or modified patcher config, which shouldn't be checked into production CVS, then modify the following method:

WHO=yournamehere
# cltbld@dev-stage01
cd /builds/cvsmirrors
mkdir -p ${WHO}/cvsroot.clean/mozilla/tools/
rsync -av --exclude=CVSROOT/config --exclude=CVSROOT/loginfo cvs-mirror.mozilla.org::mozilla/CVSROOT /builds/cvsmirrors/${WHO}/cvsroot.clean/
rsync -av cvs-mirror.mozilla.org::mozilla/mozilla/tools/patcher-configs /builds/cvsmirrors/${WHO}/cvsroot.clean/mozilla/tools/
rsync -av cvs-mirror.mozilla.org::mozilla/mozilla/tools/patcher /builds/cvsmirrors/${WHO}/cvsroot.clean/mozilla/tools/
rsync -av cvs-mirror.mozilla.org::mozilla/mozilla/tools/release /builds/cvsmirrors/${WHO}/cvsroot.clean/mozilla/tools/
rsync -a --delete-after /builds/cvsmirrors/${WHO}/cvsroot{.clean,}/

To make changes check out using

cvs -d dev-stage01.build.mozilla.org:/builds/cvsmirrors/${WHO}/cvsroot co mozilla/

and specify a cvsroot of :ext:cltbld@dev-stage01.build.sjc1.mozilla.com:/builds/cvsmirrors/${WHO}/cvsroot in the config for the release automation.