|
|
Line 1: |
Line 1: |
| Our buildbot masters make use of several queuedirs to perform out of process tasks such as pushing events to pulse or uploading logs.
| |
|
| |
|
| == Queuedirs ==
| |
| A queuedir is a simple directory structure on disk where individual jobs are stored in files. The files are moved between directories depending on what state they're in.
| |
|
| |
| * <tt>'''tmp'''</tt>: write out new job files here before moving into <tt>'''new'''</tt>.
| |
| * <tt>'''new'''</tt>: when job files are moved into here, the queue processors will pick them up and move them into <tt>'''cur'''</tt> to indicate they're currently being processed.
| |
| * <tt>'''cur'''</tt>: jobs currently in progress are here.
| |
| * <tt>'''logs'''</tt>: output and debugging information for current and finished jobs are here. Logs older than 5 minutes get deleted.
| |
| * <tt>'''dead'''</tt>: failed jobs go here. this is bad.
| |
|
| |
| We currently have two queuedirs: '''/dev/shm/queue/commands''' and '''/dev/shm/queue/pulse'''
| |
|
| |
| == Processors ==
| |
| We currently have two processors: '''command_runner.py''' and '''pulse_publisher.py'''. These are run as services, start on boot, and are managed by puppet. They both run out of a virtualenv in /builds/buildbot/queue.
| |
|
| |
| We have nagios checks in place to ensure that the queue processors are running, and that there are no dead jobs.
| |
|
| |
| == Troubleshooting ==
| |
| If a processor isn't running, they can be restarted via /etc/init.d/command_runner or /etc/init.d/pulse_publisher
| |
|
| |
| If there are dead jobs, you can read the per job log files in the dead directory. After resolving the issue, job files should be moved back into <tt>'''new'''</tt> to be retried, or deleted.
| |
| * ''Note: there is a <tt>retry_dead_queue</tt> sub-command for [http://hg.mozilla.org/build/tools/file/5053b0ea4564/buildfarm/maintenance/manage_masters.py manage_masters.py]'' (Or, for ansible lovers <tt>deadqueue.yml</tt> in [https://github.com/mozilla/build-ansible build-ansible]. Example invocation: <tt>ansible-playbook -i master-inventory.py deadqueue.yml</tt>)
| |
|
| |
| == Implementation ==
| |
| http://hg.mozilla.org/build/tools/file/739018ba9ff1/lib/python/buildtools/queuedir.py
| |
|
| |
| http://hg.mozilla.org/build/tools/file/739018ba9ff1/buildbot-helpers/command_runner.py
| |
|
| |
| http://hg.mozilla.org/build/tools/file/739018ba9ff1/buildbot-helpers/pulse_publisher.py
| |
|
| |
| http://hg.mozilla.org/build/puppet-manifests/file/0deb57fc17ae/modules/buildmaster/manifests/queue.pp
| |
|
| |
| == See also ==
| |
| https://mana.mozilla.org/wiki/display/NAGIOS/Command+Queue
| |
|
| |
| {{Release Engineering How To|Queue Directories}}
| |