CIDuty/Other Duties

From MozillaWiki
Jump to: navigation, search

Tree Maintenance

Repo Errors

If a dev reports a problem pushing to hg (either m-c or try repo) then you need to do the following:

  • File a bug (or have dev file it) and then poke in #ops noahm
    • If he doesn't respond, then escalate the bug to page on-call
  • Follow the steps below for "How do I close the tree"

How do I see problems in Treeherder?

All "infrastructure" (that's us!) problems should be purple at https://treeherder.mozilla.org. Some aren't, so keep your eyes open in IRC, but get on any purples quickly.

How do I close the tree?

See ReleaseEngineering/How_To/Close_or_Open_the_Tree

How do I claim a rentable project branch?

See ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE

Re-run jobs

How to trigger Talos jobs

see ReleaseEngineering/How_To/Trigger_Talos_Jobs

How to re-trigger all Talos runs for a build (by using sendchange)

see ReleaseEngineering/How_To/Trigger_Talos_Jobs

How to re-run a build

Do not go to the page of the build you'd like to re-run and cook up a sendchange to try to re-create the change that caused it. Changes without revlinks trigger releases, which is not what you want.

Find the revision you want, find a builder page for the builder you want (preferably, but not necessarily, on the same master), and plug the revision, your name, and a comment into the "Force Build" form. Note that the YOU MUST specify the branch, so there's no null keys in the builds-running.js. Otherwise your build will not show up in self-serve or Treeherder.

Nightlies

How do I re-spin mozilla-central nightlies?

To build new nightlies

To build nightlies on all the platforms

Note: as of Jan 17, 2017 we build nightlies for Android and Linux platform builds on Taskcluster. As of June 21, 2017 we build nightlies for Macosx on Taskcluster. As of July 26, 2017, we build nightlies for Windows builds on Taskcluster. In order to retrigger these nightlies by hand, you need to use the taskcluster tools.

Login to taskcluster tools https://tools.taskcluster.net and goto tools

To retrigger nightlies for all desktop platforms (there is no hook for just Linux nightlies) https://tools.taskcluster.net/hooks/#project-releng/nightly-desktop%252fmozilla-central

To retrigger nightlies for Android https://tools.taskcluster.net/hooks/#project-releng/nightly-fennec%252fmozilla-central

To retrigger nightlies for Macosx https://tools.taskcluster.net/hooks/project-releng/nightly-desktop-osx%2Fmozilla-central

To retrigger nightlies for Windows (both win32 and win64) https://tools.taskcluster.net/hooks/project-releng/nightly-desktop-win%2Fmozilla-central

Select the green "trigger hook" button at the bottom of the page.

If you get an error about scopes, you might have to create a client id for this hook See https://tools.taskcluster.net/auth/clients/ for an example and use "mozilla-ldap/kmoir@mozilla.com/" in the "Client ids beginning with page"

Where are nightly and dependant artifacts stored?

With the move to taskcluster, build artifacts are no longer stored on archive.mozilla.org. Instead, they can be downloaded in two ways:

1. Treeherder. With buildbot, the build + signing + repackaging of Firefox into the correct format for that platform was a single job, the nightly build. With the move to taskcluster, the installable artifacts are created in different job names. For Android and Linux, they are in the Ns (nightly signing) job. For Mac and Windows, they are in the Nr (nightly repackage) job. Here is a filter for mozilla-central that will display the job names you need to download the artifacts.

Platform     Nightly Job     Artifact to download
Android*     Ns              target.apk
Linux*       Ns              target.tar.bz2
Mac          Nr              target.dmg
Win*         Nr              installer.exe

For non-nightly builds, a useful filter for treeherder is here (mozilla inbound as an example, update as appropriate). In this case, the job symbols are B for Android, Linux and Mac, and Bs for Windows (signed on push build for Windows as required by some tests).

Platform     Dep Job Symbol    Artifact to download
Android*     B                 target.apk
Linux*       B                 target.tar.bz2
Mac          B                 target.dmg
Win*         Bs                target.zip

2. Indexes. Taskcluster indexes identify artifacts associated with each job. Look here for taskcluster nightlies, the latest desktop and here for mobile. Again, the jobs and artifacts associated with each nightly correspond the the chart above. Click on the appropriate job, then "Taskid" on the right, then "Run artifacts" on the right. For dep builds, look under for desktop or here for mobile.

Disable updates

See ReleaseEngineering/How_To/Shut_off_all_updates for global shutoff. We use Balrog now for nightly & aurora updates.

Freeze Updates

See ReleaseEngineering/How_To/Enable_or_Disable_Updates_on_Central if you simply need to freeze updates and not completely disable.

Talos

How to update the talos zips

We only need to do this for mobile requests.

This deployment is super safe. NPOTB

# running this from cruncher is faster than downloading/uploading from your localhost
ssh -A cruncher 
export URL=http://people.mozilla.org/~jmaher/taloszips/zips/talos.07322bbe0f7d.zip
export TALOS_ZIP=`basename $URL`
wget $URL
#relengwebadmn has limited access to the internet - that is why we scp from another host
scp ${TALOS_ZIP} relengwebadm.private.scl3.mozilla.com:/mnt/netapp/relengweb/talos-bundles/zips
ssh relengwebadm.private.scl3.mozilla.com "chmod 644 /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}"
ssh relengwebadm.private.scl3.mozilla.com "sha1sum /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}"
curl -I http://talos-bundles.pvt.build.mozilla.org/zips/${TALOS_ZIP}

For talos.zip changes: Once deployed, notify the a-team and let them know that they can land at their own convenience.

  • Please verify the shasum matches what is in the [comment], we have had a few instances where the talos.zip was incorrect.

Update mobile talos webhosts

Keep track of what revisions is being run. Copy/paste the output into the bug. Please update our maintenance page

This could affect mobile talos numbers or break the jobs altogether. Please coordinate with sheriffs

NOTE: There's a great deal of data we can not check into revision control for legal reasons, so there's an extensive .hgignore file. If you're adding new data to the tree that can not be checked in, please ask a talos developer/reviewer or [file a bug] to request any [.hgignore] changes

webapp cluster

ssh relengwebadm.private.scl3.mozilla.com
sudo su -
cd /data/releng/src/talos-remote/www/talos-repo
# NOTICE that we have uncommitted files
hg status
# Take note of the current revision to revert to (just in case)
hg id
hg pull -u
# 488bc187a3ef tip
# ..capture the output here; the remainder will be long and not that useful..
/data/releng/src/talos-remote/update

Tp4 Zip

###
### NOTE UNTESTED AFTER BUG 1050769 -- Please remove this warning next use
###
ssh -A cruncher 
export URL=http://people.mozilla.org/~jmaher/taloszips/zips/mobile_tp4.zip
export TALOS_ZIP=`basename $URL`
wget $URL
scp ${TALOS_ZIP} `whoami`@relengwebadm.private.scl3.mozilla.com:.

# Connect to root @ relengwebadm
ssh `whoami`@relengwebadm.private.scl3.mozilla.com
sudo su -
export ME=jwood
export ZIP=mobile_tp4.zip
chmod 644 /home/$ME/$ZIP
sha1sum /home/$ME/$ZIP
cd /data/releng/src/talos-remote/www/
unzip /home/$ME/$ZIP ./
# finally update the web heads
cd ../
./update

Tp5n.zip

export URL=http://people.mozilla.org/~jmaher/taloszips/zips/tp5n.zip
export TALOS_ZIP=`basename $URL`
wget $URL
scp ${TALOS_ZIP} `whoami`@relengwebadm.private.scl3.mozilla.com:.

# Connect to root @ relengwebadm
ssh `whoami`@relengwebadm.private.scl3.mozilla.com
sudo su -
export ME=jwood  ## BE SURE TO CHANGE
export ZIP=tp5n.zip
export BUG=1288135   ## BE SURE TO CHNAGE
chmod 644 /home/$ME/$ZIP
sha1sum /home/$ME/$ZIP
cd /mnt/netapp/relengweb/talos-bundles/zips/
cp ./tp5n.zip ./tp5n.before_bug_$BUG.zip
mv /home/$ME/$ZIP ./  # YES to overwrite here

Ganglia

  • if you see that a host is reporting to ganglia in an incorrect manner it might just take this to fix it (e.g. bug 674233):
switch to root, service gmond restart

Queue Directories

If you see this in #build:

<nagios-sjc1> [54] buildbot-master12.build.scl1:Command Queue is CRITICAL: 4 dead items

It means that there are items in the "dead" queue for the given master. You need to look at the logs and fix any underlying issue and then retry the command by moving *only* the json file over to the "new" queue. See the Queue directories wiki page for details.

Cruncher

If you get an alert about cruncher running out of space it might be a sendmail issue (backed up emails taking up too much space and not getting sent out):

<nagios-sjc1> [07] cruncher.build.sjc1:disk - / is WARNING: DISK WARNING - free space: / 384 MB (5% inode=93%):

As root:

du -s -h /var/spool/*
# confirm that mqueue or clientmqueue is the oversized culprit
# stop sendmail, clean out the queues, restart sendmail
/etc/init.d/sendmail stop
rm -rf /var/spool/clientmqueue/*
rm -rf /var/spool/mqueue/*
/etc/init.d/sendmail start

hg<->git conversion

This is a production system RelEng built, but has not yet transitioned to full IT operation. As a production system, it is supported 24x7x365 - escalate to IT oncall (who can page) as needed.

We'll get problem reports from 2 sources:

  • via email from vcs2vcs user to release+vcs2vcs@m.c - see email handling instructions for those.
  • via a bug report for a customer visible condition - this should only be if there is a new error we aren't detecting ourselves. See the resources below and/or page hwine.

Documentation for this system:

All services run as user vcs2vcs on one of the following hosts (as of 2013-01-07): github-sync1-dev.dmz.scl3.mozilla.com, github-sync1.dmz.scl3.mozilla.com, github-sync2.dmz.scl3.mozilla.com, github-sync3.dmz.scl3.mozilla.com.

Handling alert_major_errors

# SSH as yourself to the hostname in the 'from' address of the alert_major_errors email.
$ ssh yourname@github-sync3.dmz.scl3.mozilla.com
$ sudo su - vcs2vcs
$ cd etc
# find the repo name that vcs2vcs is complaining about. For example:
$ grep releases-mozilla-central-no-cvs *
job02_cmds:#    "hg:$HOME/repos/releases-mozilla-central-no-cvs" "github"
# discover where that job runs
$ grep job02 status
job02_cmds,github-sync3.dmz.scl3.mozilla.com,m-c w/o cvs as used by b2g
# connect to that host the same as we did above (if not already connected)
# then
$ cd logs/job02 # same job as above
$ show_update_errors update.log
# Note: the command exit code precedes the command itself
# eg. ...;255;hg --cwd...

Continue with instructions here.

disable/re-enable aurora updates

Take care of by the person doing the final release since merge day activities are on the Monday before the release.

Upload

Python packages

Warning signWarning: Mozharness no longer uses packages from the PuppetAgain repositories! Instead, it uses http://pypi.pub.build.mozilla.org/pub and http://pypi.pvt.build.mozilla.org/pub, both served from the same directory.

See https://hg.mozilla.org/build/braindump/file/default/utils/publish_package_our_pypi.sh

Download the tool above, and then run this from your local machine:

publish_package_our_pypi.sh <your_python_package.tar.gz>

How to upload to Tooltool

See ReleaseEngineering/Applications/Tooltool#How_to_upload_to_tooltool

How to enable a user to run Tooltool uploads

See ReleaseEngineering/Applications/Tooltool#How_to_enable_a_user_to_run_tooltool_uploads.

How to upload Talos ZIPs

See How to update the talos zips.

How to add NPM packages

See ReleaseEngineering/How To/Mirror NPM Packages

How to upload new xre.zip files for B2G tests

  • You can use the script at https://github.com/jonallengriffin/xregen/blob/master/xre_gen.sh to generate a new xre.zip for any OS, based on a gecko release version. If you need an xre.zip for which there are only nightly builds (but not release builds), you can use xre.zip as a guide for how to construct the package, but you'll need to do it manually.
  • After you create the xre.zip's (currently needed for linux64 and macosx64), upload them to tooltool, and then update the relevant mozharness config files, currently:
    • b2g/gaia_unit_production_config.py
    • b2g/gaia_integration_config.py
    • marionette/gaia_ui_test_prod_config.py