ReleaseEngineering/How To/Setup a buildbot master: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
Line 92: Line 92:
* Copy production ssh keys (for ffxbld, trybld, xrbld and tbirdbld) and <tt>known_hosts</tt> to ~/.ssh.
* Copy production ssh keys (for ffxbld, trybld, xrbld and tbirdbld) and <tt>known_hosts</tt> to ~/.ssh.
* Verify as described below
* Verify as described below
* Add the master's ssh key to known_hosts (to make release-runner work):
** dump the key by running the following:
ssh-keyscan $master_name
** ad it to /N/production/home/cltbld/.ssh/known_hosts on master-puppet1.build.scl1.mozilla.com
** add it to bm36:~cltbld/.ssh/known_hosts (puppet doesn't work here)
** TODO: add steps for puppetagain
** verify the change (ssh to masters as cltbld from bm36)


== Final Verification ==
== Final Verification ==

Revision as of 00:04, 10 April 2013

This page describes how to set-up a new Buildbot Master.

AWS Masters

AWS master setup is covered by ReleaseEngineering/AWS_Master_Setup

Production masters

For buildbot masters that are intended to be doing production builds, tests, etc.

Hardware

  • Current policy is one buildbot master instance per VM
  • 64-bit guest
  • 2 virtual CPUs
  • 6 GB RAM
  • 6 GB swap
  • 30GB partition mounted at /
  • 100MB partition mounted at /boot

OS

  • Install CentOS 5.5
  • Make sure hostname is correct is set to the fully qualified name. This can be modified in /etc/sysconfig/network and the hostname command.
  • Install puppet
rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install puppet
  • (optional - recommended if this isn't "just another box install" for any reason) Run puppetd manually until no further work is required. (If you don't have/need your own puppent environment on the puppet master, just remove that option below)
# get the hostname & key issues out of the way - repeat until no more
# "warning: peer certificate won't be verified in this SSL session" messages.
# (you'll need to do the master-puppet1 signing listed below)
puppetd --test --server master-puppet1.build.scl1.mozilla.com --environment YOUR_ENV --noop
# now work on actual config issues
# start in directory accessible to all users (not '/root')
cd /tmp
puppetd --test --server master-puppet1.build.scl1.mozilla.com --environment YOUR_ENV 
# repeat above until you get no actions taken or error reported
  • Configure daemon to point to master-puppet1.build.mozilla.org (regardless of datacenter master resides in)
vim /etc/sysconfig/puppet
PUPPET_SERVER=master-puppet1.build.scl1.mozilla.com
  • Run these:
chkconfig puppet on
/etc/init.d/puppet start
# On master-puppet1:
puppetca --sign your-new-master.build.scl1.mozilla.com

Support files, Wikis

Update production-masters.json in tools.

Puppet manifests

  • Make sure your masters are listed in buildmaster-production.pp

When you're ready, update the manifests on the master with:

hg -R /etc/puppet/manifests pull
hg -R /etc/puppet/manifests update

Once the manifests are updated the masters' build dirs should be automatically created.

  • For build masters, add master's ip to secrets::network::masterIPs on master-puppet1:/etc/puppet/manifests/secrets.pp. The signing instances will need to be reloaded. See [1]

Add masters to slavealloc

See Adding your master to slavealloc

Lock a slave and let it take jobs

Locked through slavealloc a slave to the newly setup master. Let it run for a couple of hours and check that the jobs worked well. Check also in the #buildduty channel for possible nagios checks going off (e.g. Queue directories checks)

If no issues are found then go ahead and enable the master on slavealloc.

IT-related things

File separate bugs for Nagios (eg: bug 717804), Mysql access (eg: bug 717806)

  • Nagios
    • PING
    • Swap
    • avg load
    • buildbot
    • disk - /
    • disk - /builds
  • Mysql access to the DB server
  • Verify that master can send mail to tinderbox via dm-mail01. see e.g. bug 717808

SSH Keys

  • Copy production ssh keys (for ffxbld, trybld, xrbld and tbirdbld) and known_hosts to ~/.ssh.
  • Verify as described below
  • Add the master's ssh key to known_hosts (to make release-runner work):
    • dump the key by running the following:
ssh-keyscan $master_name
    • ad it to /N/production/home/cltbld/.ssh/known_hosts on master-puppet1.build.scl1.mozilla.com
    • add it to bm36:~cltbld/.ssh/known_hosts (puppet doesn't work here)
    • TODO: add steps for puppetagain
    • verify the change (ssh to masters as cltbld from bm36)

Final Verification

We don't "burn in" buildbot masters - they will go directly to their assigned roles. The following steps should be performed to ensure the rest has worked okay(all steps should be run as user cltbld):

  • SSH verification (and associated netflows).
    $ for h in ffxbld trybld xrbld tbirdbld; do ssh -i ~/.ssh/${h}_dsa $h@stage.mozilla.org id ; done
    $ for h in ffxbld; do ssh -i ~/.ssh/${h}_dsa $h@pvtbuilds2.dmz.scl3.mozilla.com id ; done
  • mySQL verification (and associated netflows):
    $ mysql -h buildbot-ro-vip.db.scl3.mozilla.com
    ERROR 1045 (28000): Access denied for user 'cltbld'@'10.22.70.209' (using password: NO)
    $ mysql -h buildbot-rw-vip.db.scl3.mozilla.com
    ERROR 1045 (28000): Access denied for user 'cltbld'@'10.22.70.209' (using password: NO)
  • puppet verification (check server, as well as running status)
    [cltbld@buildbot-master35 ~]$ /sbin/chkconfig --list puppet
    puppet         	0:off	1:off	2:on	3:on	4:on	5:on	6:off
    [cltbld@buildbot-master35 ~]$ ps $(pgrep puppet)
      PID TTY      STAT   TIME COMMAND
    20751 ?        Ssl    0:02 /usr/bin/ruby /usr/sbin/puppetd --server=master-puppet1.build.scl1.mozilla
    [cltbld@buildbot-master35 ~]$ 
  • ensure nagios checks are all green and notifications are enabled (aka not disabled), eg
http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?navbarsearch=1&host=buildbot-master43

Personal / development masters

See ReleaseEngineering/How To/Setup Personal Development Master