ReleaseEngineering/How To/Setup a buildbot master: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
(deleting obsolete page)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{Release Engineering How To|Setup a buildbot master}}
This page describes how to set-up a new Buildbot Master.
=AWS Masters =


AWS master setup is covered by [[ReleaseEngineering/AWS_Master_Setup]]
= Production masters =
For buildbot masters that are intended to be doing production builds, tests, etc.
== Hardware ==
* Current policy is one buildbot master instance per VM
* 64-bit guest
* 2 virtual CPUs
* 6 GB RAM
* 6 GB swap
* 30GB partition mounted at /
* 100MB partition mounted at /boot
== OS ==
* Install CentOS 5.5
* Make sure hostname is correct is set to the fully qualified name. This can be modified in /etc/sysconfig/network and the hostname command.
* Install puppet
<pre>
rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install puppet
</pre>
* ''(optional - recommended if this isn't "just another box install" for any reason)'' Run <tt>puppetd</tt> manually until no further work is required. (If you don't have/need your own puppent environment on the puppet master, just remove that option below)
<pre>
# get the hostname & key issues out of the way - repeat until no more
# "warning: peer certificate won't be verified in this SSL session" messages.
# (you'll need to do the master-puppet1 signing listed below)
puppetd --test --server master-puppet1.build.scl1.mozilla.com --environment YOUR_ENV --noop
# now work on actual config issues
# start in directory accessible to all users (not '/root')
cd /tmp
puppetd --test --server master-puppet1.build.scl1.mozilla.com --environment YOUR_ENV
# repeat above until you get no actions taken or error reported
</pre>
* Configure daemon to point to '''<tt>master-puppet1.build.mozilla.org</tt>''' ''(regardless of datacenter master resides in)''
<pre>
vim /etc/sysconfig/puppet
PUPPET_SERVER=master-puppet1.build.scl1.mozilla.com
</pre>
* Run these:
<pre>
chkconfig puppet on
/etc/init.d/puppet start
</pre>
<pre>
# On master-puppet1:
puppetca --sign your-new-master.build.scl1.mozilla.com
</pre>
== Support files, Wikis ==
Update [http://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/production-masters.json production-masters.json] in tools.
== Puppet manifests ==
* Make sure your masters are listed in buildmaster-production.pp
When you're ready, update the manifests on the master with:
hg -R /etc/puppet/manifests pull
hg -R /etc/puppet/manifests update
Once the manifests are updated the masters' build dirs should be automatically created.
* For build masters, add master's ip to secrets::network::masterIPs on master-puppet1:/etc/puppet/manifests/secrets.pp. The signing instances will need to be reloaded. See [https://intranet.mozilla.org/RelEngWiki/index.php/SigningServers#Reloading]
== Add masters to slavealloc ==
See [https://wiki.mozilla.org/ReleaseEngineering/How_To/Setup_Personal_Development_Master#Adding_your_master_to_slavealloc Adding your master to slavealloc]
== IT-related things ==
Follow the steps for [[ReleaseEngineering/AWS_Master_Setup#IT|AWS masters]].
== SSH Keys ==
* Copy production ssh keys (for ffxbld, trybld, xrbld and tbirdbld) and <tt>known_hosts</tt> to ~/.ssh.
* Verify as described below
* Add the master's ssh key to known_hosts (to make release-runner work):
** dump the key by running the following:
ssh-keyscan $master_name
** ad it to /N/production/home/cltbld/.ssh/known_hosts on master-puppet1.build.scl1.mozilla.com
** add it to bm36:~cltbld/.ssh/known_hosts (puppet doesn't work here)
** TODO: add steps for puppetagain
** verify the change the same way as for [[ReleaseEngineering/AWS_Master_Setup#Add_master.27s_SSH_key_to_known_hosts|AWS masters]].
== Lock a slave and let it take jobs ==
Locked through slavealloc a slave to the newly setup master.
Let it run for a couple of hours and check that the jobs worked well.
Check also in the #buildduty channel for possible nagios checks going off (e.g. [[ReleaseEngineering/Queue_directories|Queue directories]] checks)
If no issues are found then go ahead and enable the master on slavealloc.
== Final Verification ==
We don't "burn in" buildbot masters - they will go directly to their assigned roles. The following steps should be performed to ensure the rest has worked okay(all steps should be run as user <tt>cltbld</tt>):
* SSH verification (and associated netflows).
<pre>
    $ for h in ffxbld trybld xrbld tbirdbld; do ssh -i ~/.ssh/${h}_dsa $h@stage.mozilla.org id ; done
    $ for h in ffxbld; do ssh -i ~/.ssh/${h}_dsa $h@pvtbuilds2.dmz.scl3.mozilla.com id ; done
</pre>
* mySQL verification (and associated netflows):
<pre>
    $ mysql -h buildbot-ro-vip.db.scl3.mozilla.com
    ERROR 1045 (28000): Access denied for user 'cltbld'@'10.22.70.209' (using password: NO)
    $ mysql -h buildbot-rw-vip.db.scl3.mozilla.com
    ERROR 1045 (28000): Access denied for user 'cltbld'@'10.22.70.209' (using password: NO)
</pre>
* puppet verification (check server, as well as running status)
<pre>
    [cltbld@buildbot-master35 ~]$ /sbin/chkconfig --list puppet
    puppet        0:off 1:off 2:on 3:on 4:on 5:on 6:off
    [cltbld@buildbot-master35 ~]$ ps $(pgrep puppet)
      PID TTY      STAT  TIME COMMAND
    20751 ?        Ssl    0:02 /usr/bin/ruby /usr/sbin/puppetd --server=master-puppet1.build.scl1.mozilla
    [cltbld@buildbot-master35 ~]$
</pre>
* ensure nagios checks are all green and notifications are enabled (aka not disabled), eg
http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?navbarsearch=1&host=buildbot-master43
= Personal / development masters =
See [[ReleaseEngineering/How To/Setup Personal Development Master]]

Latest revision as of 21:46, 19 November 2018