ReleaseEngineering/How To/Restart Buildbot Masters: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Created page with "__TOC__ We occasionally need to restart buildbot masters for various reasons: * upgrades to the underlying OS * gradual increase in memory usage over time, leading to reduced...")
 
No edit summary
Line 19: Line 19:


= By script =
= By script =
The above actions have been encapsulated into a script: https://hg.mozilla.org/build/tools/file/a5ffaa578bb9/buildfarm/maintenance/restart_masters.py
The above actions have been encapsulated into a script: https://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/restart_masters.py


The script requires a [[ReleaseEngineering/Buildduty/Reconfigs#How_to_reconfig|bash-format config file like the one used by the end_to_end_reconfig.sh script]]. At the very least the config file must define values for LDAP_USERNAME, LDAP_PASSWORD, and CLTBLD_PASSWORD.
The script requires a [[ReleaseEngineering/Buildduty/Reconfigs#How_to_reconfig|bash-format config file like the one used by the end_to_end_reconfig.sh script]]. At the very least the config file must define values for LDAP_USERNAME, LDAP_PASSWORD, and CLTBLD_PASSWORD.
The script is currently setup to run on dev-master2 in a venv under coop's account. We are in the process of moving this env to a shared user on the buildduty-tools machine ({{bug|1299421}}).
Here is an example invocation:
# dev-master2
$ screen -R restart_masters
$ cd ~coop/restart_masters
$ source bin/activate
$ cd tools/buildfarm/maintenance/
$ ./restart_masters.py -v -m production-masters.json


= Automated =
= Automated =
The above script is currently setup run automatically once a week on dev-master2 under coop's user account. Check his user crontab for the timing.
The above script requires sensitive credentials that shouldn't be stored on disk. For now, we're still running this script by hand.

Revision as of 21:37, 15 September 2016

We occasionally need to restart buildbot masters for various reasons:

  • upgrades to the underlying OS
  • gradual increase in memory usage over time, leading to reduced master performance

Manually

If you need to restart a single master by hand, here's the sequence you should follow:

  • disable the master in slavealloc. This prevents the master from taking more slave connections while you're waiting for it to shutdown.
  • click the "Clean Shutdown" button on the web interface for the given master, e.g. http://buildbot-master82.bb.releng.scl3.mozilla.com:8001/
  • wait for the jobs currently running on that master to complete. You can track progress by searching in-page for "Running" on the master's buildslaves page, e.g. http://buildbot-master82.bb.releng.scl3.mozilla.com:8001/buildslaves?no_builders=1
  • once the master is shutdown, perform whatever upgrades are required, etc.
  • restart the master. """NOTE:""" buildbot masters are configured to restart buildbot automatically on boot, so if you reboot the master, buildbot will restart itself. To restart manually:
xebec:buildduty ccooper$ ssh cltbld@buildbot-master82
Unauthorized access prohibited
[cltbld@buildbot-master82.bb.releng.scl3.mozilla.com ~]$ cd /builds/buildbot/build1/
[cltbld@buildbot-master82.bb.releng.scl3.mozilla.com build1]$ source bin/activate
(build1)[cltbld@buildbot-master82.bb.releng.scl3.mozilla.com build1]$ make start

By script

The above actions have been encapsulated into a script: https://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/restart_masters.py

The script requires a bash-format config file like the one used by the end_to_end_reconfig.sh script. At the very least the config file must define values for LDAP_USERNAME, LDAP_PASSWORD, and CLTBLD_PASSWORD.

The script is currently setup to run on dev-master2 in a venv under coop's account. We are in the process of moving this env to a shared user on the buildduty-tools machine (bug 1299421).

Here is an example invocation:

# dev-master2
$ screen -R restart_masters
$ cd ~coop/restart_masters
$ source bin/activate
$ cd tools/buildfarm/maintenance/
$ ./restart_masters.py -v -m production-masters.json

Automated

The above script requires sensitive credentials that shouldn't be stored on disk. For now, we're still running this script by hand.