ReleaseEngineering/Queue directories: Difference between revisions

Jump to navigation Jump to search
Added link to the mana page for the command queue Nagios alert
(Added link to the mana page for the command queue Nagios alert)
Line 1: Line 1:
Our buildbot masters make use of several queuedirs to perform out of process tasks such as pushing events to pulse or uploading logs.
Our buildbot masters make use of several queuedirs to perform out of process tasks such as pushing events to pulse or uploading logs.


== queuedirs ==
== Queuedirs ==
A queuedir is a simple directory structure on disk where individual jobs are stored in files. The files are moved between directories depending on what state they're in.
A queuedir is a simple directory structure on disk where individual jobs are stored in files. The files are moved between directories depending on what state they're in.


Line 10: Line 10:
* <tt>'''dead'''</tt>: failed jobs go here. this is bad.
* <tt>'''dead'''</tt>: failed jobs go here. this is bad.


We currently have two queuedirs: /dev/shm/queue/commands, and /dev/shm/queue/pulse
We currently have two queuedirs: '''/dev/shm/queue/commands''' and '''/dev/shm/queue/pulse'''


== processors ==
== Processors ==
We currently have two processors: command_runner.py and pulse_publisher.py. These are run as services, start on boot, and are managed by puppet. They both run out of a virtualenv in /builds/buildbot/queue.
We currently have two processors: '''command_runner.py''' and '''pulse_publisher.py'''. These are run as services, start on boot, and are managed by puppet. They both run out of a virtualenv in /builds/buildbot/queue.


We have nagios checks in place to ensure that the queue processors are running, and that there are no dead jobs.
We have nagios checks in place to ensure that the queue processors are running, and that there are no dead jobs.


== troubleshooting ==
== Troubleshooting ==
If a processor isn't running, they can be restarted via /etc/init.d/command_runner or /etc/init.d/pulse_publisher
If a processor isn't running, they can be restarted via /etc/init.d/command_runner or /etc/init.d/pulse_publisher


If there are dead jobs, you can read the per job log files in the dead directory. After resolving the issue, job files should be moved back into <tt>'''new'''</tt> to be retried, or deleted. ''Note: there is a <tt>retry_dead_queue</tt> sub-command for [http://hg.mozilla.org/build/tools/file/5053b0ea4564/buildfarm/maintenance/manage_masters.py manage_masters.py]'' (Or, for ansible lovers <tt>deadqueue.yml</tt> in [https://github.com/mozilla/build-ansible build-ansible]. Example invocation: <tt>ansible-playbook -i master-inventory.py deadqueue.yml</tt>)
If there are dead jobs, you can read the per job log files in the dead directory. After resolving the issue, job files should be moved back into <tt>'''new'''</tt> to be retried, or deleted.  
* ''Note: there is a <tt>retry_dead_queue</tt> sub-command for [http://hg.mozilla.org/build/tools/file/5053b0ea4564/buildfarm/maintenance/manage_masters.py manage_masters.py]'' (Or, for ansible lovers <tt>deadqueue.yml</tt> in [https://github.com/mozilla/build-ansible build-ansible]. Example invocation: <tt>ansible-playbook -i master-inventory.py deadqueue.yml</tt>)


== implementation ==
== Implementation ==
http://hg.mozilla.org/build/tools/file/739018ba9ff1/lib/python/buildtools/queuedir.py
http://hg.mozilla.org/build/tools/file/739018ba9ff1/lib/python/buildtools/queuedir.py


Line 30: Line 31:


http://hg.mozilla.org/build/puppet-manifests/file/0deb57fc17ae/modules/buildmaster/manifests/queue.pp
http://hg.mozilla.org/build/puppet-manifests/file/0deb57fc17ae/modules/buildmaster/manifests/queue.pp
== See also ==
https://mana.mozilla.org/wiki/display/NAGIOS/Command+Queue


{{Release Engineering How To|Queue Directories}}
{{Release Engineering How To|Queue Directories}}
148

edits

Navigation menu