CIDuty/Other Duties: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 114: Line 114:


'''This could effect mobile talos numbers or break the jobs altogether. Please coordinate with sheriffs'''
'''This could effect mobile talos numbers or break the jobs altogether. Please coordinate with sheriffs'''
'''NOTE: For now, we're only using the "old. mtv1" setup. Update both but only talk about the old setup'''
'''NOTE: For now, we're only using the "old. mtv1" setup. Update both but only talk about the old setup'''
=== new, webapp cluster ===
=== new, webapp cluster ===

Revision as of 17:07, 20 November 2013

Tree Maintenance

Repo Errors

If a dev reports a problem pushing to hg (either m-c or try repo) then you need to do the following:

  • File a bug (or have dev file it) and then poke in #ops noahm
    • If he doesn't respond, then escalate the bug to page on-call
  • Follow the steps below for "How do I close the tree"

How do I see problems in TBPL?

All "infrastructure" (that's us!) problems should be purple at http://tbpl.mozilla.org. Some aren't, so keep your eyes open in IRC, but get on any purples quickly.

How do I close the tree?

See ReleaseEngineering/How_To/Close_or_Open_the_Tree

How do I claim a rentable project branch?

See ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE

Clean up the scheduler DB

Sometimes we get some jobs pending for days: https://secure.pub.build.mozilla.org/buildapi/pending

Here's how to clean them: TODO

Re-run jobs

How to trigger Talos jobs

see ReleaseEngineering/How_To/Trigger_Talos_Jobs

How to re-trigger all Talos runs for a build (by using sendchange)

see ReleaseEngineering/How_To/Trigger_Talos_Jobs

How to re-run a build

Do not go to the page of the build you'd like to re-run and cook up a sendchange to try to re-create the change that caused it. Changes without revlinks trigger releases, which is not what you want.

Find the revision you want, find a builder page for the builder you want (preferably, but not necessarily, on the same master), and plug the revision, your name, and a comment into the "Force Build" form. Note that the YOU MUST specify the branch, so there's no null keys in the builds-running.js. Otherwise your build will not show up in self-serve or tbpl.

Nightlies

How do I re-spin mozilla-central nightlies?

To rebuild the same nightly, buildbot's Rebuild button works fine.

To build a different revision, Force build all builders matching /.*mozilla-central.*nightly/, on any of the regular build masters. Set revision to the desired revision. With no revision set, the tip of the default branch will be used, but it's probably best to get an explicit revision from hg.mozilla.org/mozilla-central. (For b2g, the revision set can only be the mercurial gecko revision.)

You can use https://build.mozilla.org/buildapi/self-serve/mozilla-central to do initiate this build and use the changeset at the tip of http://hg.mozilla.org/mozilla-central. Sometimes the developer will request a specific changeset in the bug. (For b2g, the revision set can only be the mercurial gecko revision.)

To respin just the android nightlies, find the revisions in the fennec*txt file here and here. Then kick off a build (specifying the revision in the revision field) for armv6 and armv7 and 4.2.

Mozilla-aurora nightlies:

To start a new b2g Unagi nightly, force a build on a build master such as bm58. You may want to provide a value for the 'buildid' property such as 20130828155234 (which represents a Pacific timezone date/time).

Builder links to start new mozilla-central Windows nightlies:

WINNT 5.2 mozilla-central nightly
WINNT 5.2 mozilla-central xulrunner nightly
WINNT 6.1 x86-64 mozilla-central nightly

Remember to set branch: mozilla-central, revision: <revision>

Trigger B2G device image nightlies

The current builder list as of November 6, 2013 is:

http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_hamachi_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_hamachi_eng_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_helix_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_inari_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_inari_eng_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_leo_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_leo_eng_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_nexus-4_nightly
http://buildbot-master65.srv.releng.usw2.mozilla.com:8001/builders/b2g_mozilla-central_unagi_nightly

Disable updates

If you're requested to disable updates for whatever reasons you can log on to aus3-staging to do it. Depending what you're asked to shut off, you'll have to chmod a different directory (or directories) to 700. You can logon to aus3-staging.mozilla.org through ldap account and use 'sudo su - ffxbld' (or tbirdbld) to gain the correct privileges. Some examples of shutting off different updates are below:

  • 64-bit Windows on the ux branch:
chmod 700 /opt/aus2/incoming/2/Firefox/ux/WINNT_x86_64-msvc
  • All updates on Nightly:
chmod 700 /opt/aus2/incoming/2/Firefox/mozilla-central
  • Linux (32-bit + 64-bit) updates on Aurora:
chmod 700 /opt/aus2/incoming/2/Firefox/mozilla-aurora/Linux_x86-gcc3 /opt/aus2/incoming/2/Firefox/mozilla-aurora/Linux_x86_64-gcc3

Talos

How to update the talos zips

We only need to do this for mobile requests.

This deployment is super safe. NPOTB

# on your localhost
export URL=http://people.mozilla.org/~jmaher/taloszips/zips/talos.07322bbe0f7d.zip
export TALOS_ZIP=`basename $URL`
wget $URL
# wget from people doesn't work anymore
export RELENGWEB_USER=`whoami`
scp ${TALOS_ZIP} ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com:/mnt/netapp/relengweb/talos-bundles/zips
ssh ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com "chmod 644 /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}"
ssh ${RELENGWEB_USER}@relengwebadm.private.scl3.mozilla.com "sha1sum /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}"
ssh cruncher "curl -I http://talos-bundles.pvt.build.mozilla.org/zips/${TALOS_ZIP}"

Note that you can get to root by running |sudo su -|

For talos.zip changes: Once deployed, notify the a-team and let them know that they can land at their own convenience.

Update mobile talos webhosts

Keep track of what revisions is being run. Copy/paste the output into the bug. Please update our maintenance page

This could effect mobile talos numbers or break the jobs altogether. Please coordinate with sheriffs

NOTE: For now, we're only using the "old. mtv1" setup. Update both but only talk about the old setup

new, webapp cluster

ssh relengwebadm.private.scl3.mozilla.com
sudo su -
cd /data/releng/src/talos-remote/www/talos-repo
# NOTICE that we have uncommitted files
hg st
# ? talos/page_load_test/tp4
# Take note of the current revision to revert to (just in case)
hg id
hg pull -u
# 488bc187a3ef tip
# ..capture the output here; the remainder will be long and not that useful..
/data/releng/src/talos-remote/update

old, mtv1

We have a load balancer (bm-remote) that is in front of three web hosts (bm-remote-talos-0{1,2,3}). Here is how you update them: Update Procedure:

ssh root@bm-remote-talos-webhost-01
cd /var/www/html/talos-repo
# NOTICE that we have uncommitted files
hg st
# ? talos/page_load_test/tp4
# Take note of the current revision to revert to (just in case)
hg id
hg pull -u
# 488bc187a3ef tip
rsync -azv --delete /var/www/html/. bm-remote-talos-webhost-02:/var/www/html/.
rsync -azv --delete /var/www/html/. bm-remote-talos-webhost-03:/var/www/html/.

B2G Dogfood promotion

XXX: is this a valid section?

https://intranet.mozilla.org/RelEngWiki/index.php/How_To/Perform_b2g_dogfood_tasks

TBPL

How to deploy changes

RelEng no longer has access to do this. TBPL devs will request a push from Server Ops.

How to hide/unhide builders

  • In the 'Tree Info' menu select 'Open tree admin panel'
  • Filter/select the builders you want to change
  • Save changes
  • Enter the sheriff password and a description (with bug number if available) of your changes
  • CC :edmorley & :philor on the relevant bug so that they know what to expect when sheriffing.

Ganglia

  • if you see that a host is reporting to ganglia in an incorrect manner it might just take this to fix it (e.g. bug 674233):
switch to root, service gmond restart

Queue Directories

If you see this in #build:

<nagios-sjc1> [54] buildbot-master12.build.scl1:Command Queue is CRITICAL: 4 dead items

It means that there are items in the "dead" queue for the given master. You need to look at the logs and fix any underlying issue and then retry the command by moving *only* the json file over to the "new" queue. See the Queue directories wiki page for details.

Cruncher

If you get an alert about cruncher running out of space it might be a sendmail issue (backed up emails taking up too much space and not getting sent out):

<nagios-sjc1> [07] cruncher.build.sjc1:disk - / is WARNING: DISK WARNING - free space: / 384 MB (5% inode=93%):

As root:

du -s -h /var/spool/*
# confirm that mqueue or clientmqueue is the oversized culprit
# stop sendmail, clean out the queues, restart sendmail
/etc/init.d/sendmail stop
rm -rf /var/spool/clientmqueue/*
rm -rf /var/spool/mqueue/*
/etc/init.d/sendmail start

hg<->git conversion

This is a production system RelEng built, but has not yet transitioned to full IT operation. As a production system, it is supported 24x7x365 - escalate to IT oncall (who can page) as needed.

We'll get problem reports from 2 sources:

  • via email from vcs2vcs user to release+vcs2vcs@m.c - see email handling instructions for those.
  • via a bug report for a customer visible condition - this should only be if there is a new error we aren't detecting ourselves. See the resources below and/or page hwine.

Documentation for this system:

All services run as user vcs2vcs on one of the following hosts (as of 2013-01-07): github-sync1-dev.dmz.scl3.mozilla.com, github-sync1.dmz.scl3.mozilla.com, github-sync2.dmz.scl3.mozilla.com, github-sync3.dmz.scl3.mozilla.com.

Handling alert_major_errors

# SSH as yourself to the hostname in the 'from' address of the alert_major_errors email.
$ ssh yourname@github-sync3.dmz.scl3.mozilla.com
$ sudo su - vcs2vcs
$ cd etc
# find the repo name that vcs2vcs is complaining about. For example:
$ grep releases-mozilla-central-no-cvs *
job02_cmds:#    "hg:$HOME/repos/releases-mozilla-central-no-cvs" "github"
# discover where that job runs
$ grep job02 status
job02_cmds,github-sync3.dmz.scl3.mozilla.com,m-c w/o cvs as used by b2g
# connect to that host the same as we did above (if not already connected)
# then
$ cd logs/job02 # same job as above
$ show_update_errors update.log
# Note: the command exit code precedes the command itself
# eg. ...;255;hg --cwd...

Continue with instructions here.

disable/re-enable aurora updates

Take care of by the person doing the final release since merge day activities are on the Monday before the release.

Upload

Python packages

Warning signWarning: Mozharness no longer uses packages from the PuppetAgain repositories! Instead, it uses http://pypi.pub.build.mozilla.org/pub and http://pypi.pvt.build.mozilla.org/pub, both served from the same directory.
  • SSH to relengwebadm.private.scl3.mozilla.com, or any host with the relengweb volume mounted.
  • sudo to root
  • copy the file to /mnt/netapp/relengweb/pypi/pub
  • ensure file has "read" permissions for all users (ie: 'chmod a+r python_package.tar.gz')
  • c'est fini - verify your file appears at http://pypi.pub.build.mozilla.org/pub

How to upload to Tooltool

SSH to relengwebadm.private.scl3.mozilla.com, or any host with the relengweb volume mounted.

FILE=~/emulator.zip # or whatever you're uploading
export SHA512=`openssl sha512 $FILE | cut -d' ' -f2`
sudo mv -i $FILE /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
sudo chmod 644  /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
ls -l  /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}

copy and save the filesize (from ls -l) and sha512 to add to tooltool manifests later.

How to upload Talos ZIPs

See How to update the talos zips.

How to add NPM packages

See ReleaseEngineering/How To/Mirror NPM Packages