ReleaseEngineering/Queue directories: Difference between revisions

Revision as of 21:32, 15 November 2014

Our buildbot masters make use of several queuedirs to perform out of process tasks such as pushing events to pulse or uploading logs.

queuedirs

A queuedir is a simple directory structure on disk where individual jobs are stored in files. The files are moved between directories depending on what state they're in.

tmp: write out new job files here before moving into new.
new: when job files are moved into here, the queue processors will pick them up and move them into cur to indicate they're currently being processed.
cur: jobs currently in progress are here.
logs: output and debugging information for current and finished jobs are here. Logs older than 5 minutes get deleted.
dead: failed jobs go here. this is bad.

We currently have two queuedirs: /dev/shm/queue/commands, and /dev/shm/queue/pulse

processors

We currently have two processors: command_runner.py and pulse_publisher.py. These are run as services, start on boot, and are managed by puppet. They both run out of a virtualenv in /builds/buildbot/queue.

We have nagios checks in place to ensure that the queue processors are running, and that there are no dead jobs.

troubleshooting

If a processor isn't running, they can be restarted via /etc/init.d/command_runner or /etc/init.d/pulse_publisher

If there are dead jobs, you can read the log files in the dead directory. After resolving the issue, job files should be moved back into new to be retried, or deleted. Note: there is a retry_dead_queue sub-command for manage_masters.py

implementation

http://hg.mozilla.org/build/tools/file/739018ba9ff1/lib/python/buildtools/queuedir.py

http://hg.mozilla.org/build/tools/file/739018ba9ff1/buildbot-helpers/command_runner.py

http://hg.mozilla.org/build/tools/file/739018ba9ff1/buildbot-helpers/pulse_publisher.py

http://hg.mozilla.org/build/puppet-manifests/file/0deb57fc17ae/modules/buildmaster/manifests/queue.pp

@@ Line 20: / Line 20: @@
 If a processor isn't running, they can be restarted via /etc/init.d/command_runner or /etc/init.d/pulse_publisher
-If there are dead jobs, you can read the log files in the dead directory. After resolving the issue, job files should be moved back into <tt>'''new'''</tt> to be retried, or deleted.
+If there are dead jobs, you can read the log files in the dead directory. After resolving the issue, job files should be moved back into <tt>'''new'''</tt> to be retried, or deleted. ''Note: there is a <tt>retry_dead_queue</tt> sub-command for [http://hg.mozilla.org/build/tools/file/5053b0ea4564/buildfarm/maintenance/manage_masters.py manage_masters.py]''
 == implementation ==

ReleaseEngineering/Queue directories: Difference between revisions

Revision as of 21:32, 15 November 2014

Contents

queuedirs

processors

troubleshooting

implementation

Navigation menu

ReleaseEngineering/Queue directories: Difference between revisions

Revision as of 21:32, 15 November 2014

queuedirs

processors

troubleshooting

implementation

Navigation menu

Search