ReleaseEngineering/Buildslave Startup Process
Note that the version of Buildbot installed on the slaves is not the same as that on the master. See the ReleaseEngineering/Buildslave Versions for details. The remainder of this page describes how that version is started.
In general, the idea is to get around to running runslave.py, which takes care of contacting the slave allocator, setting up buidlbot.tac, and starting the buildslave process.
/etc/init.d/runner depends on /etc/init.d/puppet, which blocks until puppet run to completion. Builds run in /builds/slave.
Build (CentOS 5)
/etc/init.d/runner depends on /etc/init.d/puppet, which blocks until puppet run to completion.
Builds run in /builds/slave.
Test (Fedora 12)
- note: may be deprecated
On startup, test boxes automatically login, and run home/cltbld/.config/autostart/gnome-terminal.desktop. This file runs /home/cltbld/run-puppet-and-buildbot.sh, which runs puppet in a loop, and when that's complete, runs runslave.py.
Puppet runs as root during startup, from /Library/LaunchDaemons/com.mozilla.puppet.plist. This runs run-puppet.sh, which runs puppet repeatedly until it succeeds, rebooting after too many failures. When this script is finished, it touches /var/tmp/puppet.finished, which signals the runner startup launchd script to start.
That script is a LaunchAgent, which means it will run as any logged-in user. Using users::builder::autologin, the builder user is logged in. The script is configured to run when /var/tmp/puppet.finished is changed. It merely invokes runslave.py.
The /Library/LaunchDaemons/com.reductivelabs.puppet.plist runs /usr/local/bin/sleep-and-run-puppet.sh as root, which is presumably installed as part of the base image. This script sleeps for 60 seconds, then runs puppet in the foreground every 60 seconds until it succeeds.
The buildbot launchd script, /Library/LaunchAgents/org.mozilla.build.runner.plist, waits until puppet has run, and then invokes runner which starts buildslave via its buildbot task. It runs as whatever user logs in on the GUI console, which had best be cltbld.
buildbot.bat is in cltbld's "Startup Items". It is installed via OPSI, but it's not in the OPSI hg repo, because it contains passwords. buildbot.bat runs buildbot-tac.py, via a checkout of http://hg.mozilla.org/build/tools at d:\tools. That checkout is only done once (in the refimage, I think)
Once buildbot-tac.py has created the tac file, buildbot.bat runs start-buildbot.bat, which sets up some VC++ variables (using guess-msvc.bat, which is part of MozillaBuild) and runs start-buildbot.sh, which finally runs 'buildbot start' in the appropriate directory.
Both start-buildbot.bat and start-buildbot.sh have loops in them to try to start buildbot multiple times. These were introduced in bug 550815 because (by my read) a race condition with some OPSI startup stuff was occasionally killing the buildslave during the startup process.
startTalos.bat is in the Startup Items subfolder of the Programs folder, so at login by cltbld (which happens automatically), this batch file runs. Note that it contains passwords, so it is not in opsi-packages, and apparently not on the OPSI server at all, meaning it ships with the refimage. It skips tac file generation for refimages, or if the tac file or a control file (c:\buildbot-tac.control) already exists. Otherwise it invokes c:\tools\buildbot-helpers\buildbot-tac.py to generate the tac file. Once that is done, it sets the screen size (using dc.exe), stops and starts Apache, empties the twistd.log, waits 60 seconds, and starts the slave. There's no loop here, as there is for build slaves.
Talos Windows 7 systems are not administered by OPSI, so the various scripts are installed by hand. I don't know how they work yet