ReleaseEngineering/How To/Setup a buildbot master: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (ChrisCooper moved page ReleaseEngineering/Master Setup to ReleaseEngineering/How To/Setup a buildbot master: Striving for some consistency in HowTo docs)
No edit summary
Line 1: Line 1:
{{Release Engineering How To|Set up a new master}}
{{Release Engineering How To|Setup a buildbot master}}
This page describes how to set-up a new Buildbot Master.
This page describes how to set-up a new Buildbot Master.
=AWS Masters =
=AWS Masters =

Revision as of 22:22, 16 June 2015

This page describes how to set-up a new Buildbot Master.

AWS Masters

AWS master setup is covered by ReleaseEngineering/AWS_Master_Setup

Production masters

For buildbot masters that are intended to be doing production builds, tests, etc.

Hardware

  • Current policy is one buildbot master instance per VM
  • 64-bit guest
  • 2 virtual CPUs
  • 6 GB RAM
  • 6 GB swap
  • 30GB partition mounted at /
  • 100MB partition mounted at /boot

OS

  • Install CentOS 5.5
  • Make sure hostname is correct is set to the fully qualified name. This can be modified in /etc/sysconfig/network and the hostname command.
  • Install puppet
rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install puppet
  • (optional - recommended if this isn't "just another box install" for any reason) Run puppetd manually until no further work is required. (If you don't have/need your own puppent environment on the puppet master, just remove that option below)
# get the hostname & key issues out of the way - repeat until no more
# "warning: peer certificate won't be verified in this SSL session" messages.
# (you'll need to do the master-puppet1 signing listed below)
puppetd --test --server master-puppet1.build.scl1.mozilla.com --environment YOUR_ENV --noop
# now work on actual config issues
# start in directory accessible to all users (not '/root')
cd /tmp
puppetd --test --server master-puppet1.build.scl1.mozilla.com --environment YOUR_ENV 
# repeat above until you get no actions taken or error reported
  • Configure daemon to point to master-puppet1.build.mozilla.org (regardless of datacenter master resides in)
vim /etc/sysconfig/puppet
PUPPET_SERVER=master-puppet1.build.scl1.mozilla.com
  • Run these:
chkconfig puppet on
/etc/init.d/puppet start
# On master-puppet1:
puppetca --sign your-new-master.build.scl1.mozilla.com

Support files, Wikis

Update production-masters.json in tools.

Puppet manifests

  • Make sure your masters are listed in buildmaster-production.pp

When you're ready, update the manifests on the master with:

hg -R /etc/puppet/manifests pull
hg -R /etc/puppet/manifests update

Once the manifests are updated the masters' build dirs should be automatically created.

  • For build masters, add master's ip to secrets::network::masterIPs on master-puppet1:/etc/puppet/manifests/secrets.pp. The signing instances will need to be reloaded. See [1]

Add masters to slavealloc

See Adding your master to slavealloc

IT-related things

Follow the steps for AWS masters.

SSH Keys

  • Copy production ssh keys (for ffxbld, trybld, xrbld and tbirdbld) and known_hosts to ~/.ssh.
  • Verify as described below
  • Add the master's ssh key to known_hosts (to make release-runner work):
    • dump the key by running the following:
ssh-keyscan $master_name
    • ad it to /N/production/home/cltbld/.ssh/known_hosts on master-puppet1.build.scl1.mozilla.com
    • add it to bm36:~cltbld/.ssh/known_hosts (puppet doesn't work here)
    • TODO: add steps for puppetagain
    • verify the change the same way as for AWS masters.

Lock a slave and let it take jobs

Locked through slavealloc a slave to the newly setup master. Let it run for a couple of hours and check that the jobs worked well. Check also in the #buildduty channel for possible nagios checks going off (e.g. Queue directories checks)

If no issues are found then go ahead and enable the master on slavealloc.

Final Verification

We don't "burn in" buildbot masters - they will go directly to their assigned roles. The following steps should be performed to ensure the rest has worked okay(all steps should be run as user cltbld):

  • SSH verification (and associated netflows).
    $ for h in ffxbld trybld xrbld tbirdbld; do ssh -i ~/.ssh/${h}_dsa $h@stage.mozilla.org id ; done
    $ for h in ffxbld; do ssh -i ~/.ssh/${h}_dsa $h@pvtbuilds2.dmz.scl3.mozilla.com id ; done
  • mySQL verification (and associated netflows):
    $ mysql -h buildbot-ro-vip.db.scl3.mozilla.com
    ERROR 1045 (28000): Access denied for user 'cltbld'@'10.22.70.209' (using password: NO)
    $ mysql -h buildbot-rw-vip.db.scl3.mozilla.com
    ERROR 1045 (28000): Access denied for user 'cltbld'@'10.22.70.209' (using password: NO)
  • puppet verification (check server, as well as running status)
    [cltbld@buildbot-master35 ~]$ /sbin/chkconfig --list puppet
    puppet         	0:off	1:off	2:on	3:on	4:on	5:on	6:off
    [cltbld@buildbot-master35 ~]$ ps $(pgrep puppet)
      PID TTY      STAT   TIME COMMAND
    20751 ?        Ssl    0:02 /usr/bin/ruby /usr/sbin/puppetd --server=master-puppet1.build.scl1.mozilla
    [cltbld@buildbot-master35 ~]$ 
  • ensure nagios checks are all green and notifications are enabled (aka not disabled), eg
http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?navbarsearch=1&host=buildbot-master43

Personal / development masters

See ReleaseEngineering/How To/Setup Personal Development Master