ReleaseEngineering/How To/Setup buildbot masters in AWS

From MozillaWiki
Jump to: navigation, search

This page describes how to set-up a new Buildbot Master in AWS.

Production masters

A production buildbot master will serve as a central authority responsible for all the decisions about what, when and how to build. It sends requests for running various jobs (builds, tests, etc.) to the attached workers and they will execute the commands and return the results.

Steps to add a new master

In order to add a new master, there're certain places we'll need to update:

Support files

The new master will need to be added to production-masters.json in tools repo. Also make sure not to enable the master before updating DNS with the corresponding entries and setting it up in puppet.


  • Make sure your masters are listed in PuppetAgain manifests.
    • this can be done by hand, but you can also use a little helper to generate snippets (use the "--help" option to see options to generation other snippets useful below):
python bm118
# or even better:
hg -R tools export $you_commit |grep '"name":' | grep ^+ | awk -F: '{print $2}' | awk -F'"' '{print $2}' | xargs python
  • To create a master instance use the following snippet as a template. Don't forget to adjust the name and region accordingly (or use the snippets produced with the "--bash FILE" option):
source /builds/aws_manager/bin/activate
cd /builds/aws_manager

#get an ip in the correct region
ip=`python cloud-tools/scripts/ -c cloud-tools/configs/buildbot-master -r us-west-2 -n1`
# double-check that the IP address is not in use by some other machine
host $ip
# create a DNS entry
# use full LDAP e.g.
invtool A create --ip $ip --fqdn --private --description "bug #: bm118"
# create a DNS reverse-mapping (required for puppet certs to work properly)
invtool PTR create --ip $ip --target  --private --description "bug #: bm118"
#create a CNAME
invtool CNAME create --fqdn --target --private --description "bug #: bm118"
sleep 20m # wait for DNS to propagate

#create the instance
python cloud-tools/scripts/ -c cloud-tools/configs/buildbot-master -r us-west-2 -s aws-releng \
-k secrets/aws-secrets.json --ssh-key ~/.ssh/aws-ssh-key  \
-i cloud-tools/instance_data/us-west-2.instance_data_master.json buildbot-master118

Puppet will reboot the master. You can follow the log in a different terminal:

tail -F buildbot-master118.log
  • You can get the IP address of the master from the log or using AWS web console.
  • When master rebooted make sure to login as cltbld and stop the master:
cd /builds/buildbot/*1* && make stop
  • For build/try masters, add master's ip to secrets::network::masterIPs on master-puppet1:/etc/puppet/manifests/secrets.pp. The signing instances will be automatically reloaded.

Add masters to inventory

Create System in Inventory

As of 2013-05-13, there are two ways, manual data entry and CSV import

CSV Import

Manual Entry

  • Go to
  • Fill the following fields:
    • Hostname: use FQDN
    • System Status: production
    • System Rack: Releng-AWS-VPC - Releng-AWS-USE1/USW2
    • System Type: Virtual Server
    • Operating System: Centos 6
    • Allocated To: release
  • Click "Create"

Common Setup

  • Edit the new created entry
  • Switch to "Key/Value Store"
  • Click "Add adapter"
  • Fill the following fields:
    • Adapter Number (0-99): 0
    • IP Address: IP address
    • Mac Address: output of ip link show eth0 | tail -n -1 | awk '{print $2}' on the master
    • Adapter Name (nic0-99 or mgmt0-99): nic0
    • Host Name: fqdn
    • Leave the rest as is
  • Press "Create", then close box, to dismiss the adapter dialog
  • Press "Save" to save the adapter (even though adapter may not show)

Add masters to slavealloc

See Adding your master to slavealloc

Add master's SSH key to known_hosts

See bug 889992

  • Add the master's (scheduler and build only) ssh key to known_hosts in puppet (to make release-runner work):
  • dump the key by running the following:
ssh-keyscan $master_name
  • verify the change (ssh to masters as cltbld from bm81)
ssh -i .ssh/release-runner


  1. File a Operations Server Operations bug to:
    • add the master(s) to Nagios (See bug 1207411 or bug 1253601)
      • take note of the checks that are requested in the sample bug
      • you should be able to see the masters in here: use1 and usw2

Lock a slave and let it take jobs

  • Locked through slavealloc a slave to the newly setup master.
  • Let it run for a couple of hours and check that the jobs worked well.
  • Check also in the #ci channel for possible nagios checks going off (e.g. Queue directories checks)

If no issues are found then go ahead and enable the master on slavealloc.

Final Verification

The following steps should be performed to ensure the rest has worked okay (all steps should be run as user cltbld):

  • SSH verification (and associated netflows).
   ssh -i ~/.ssh/tbirdbld_dsa id
   ssh -i ~/.ssh/trybld_dsa id
   ssh -i ~/.ssh/ffxbld_rsa id
   ssh -i ~/.ssh/ffxbld_rsa id
  • mySQL verification (associated netflows):
nc -zv 3306
  • ensure nagios checks are all green and notifications are enabled (aka not disabled), e.g: