Auto-tools/Projects/Pulse/PulseGuardian

< Auto-tools‎ | Projects‎ | Pulse
Revision as of 17:06, 12 April 2014 by Mcote (talk | contribs) (Created page with "= Team = * mcote, dkl, ahmed = Problem = We use RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any queue via a common user account. Some clien...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Team

  • mcote, dkl, ahmed

Problem

We use RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any queue via a common user account. Some client applications use durable queues in case they crash; however, sometimes these queues are created by accident, and sometimes apps crash without admins noticing. In these cases, the queues continue to grow without bound, which can eventually result in the RabbitMQ host running out of memory. Our current solution is to have Nagios monitor the queues and send alerts when any queues exceed a certain number of unread or unacknowledged messages, at which point a RabbitMQ admin attempts to find the person responsible and/or delete the offending queue.

Goals & Considerations

We need an intelligent system to handle overgrowing queues. The system should have some way to automatically alert the queue's owner, eventually deleting the queue if no action has been taken.

A further improvement would be to automatically consume messages and write them to disk for later consumption, since this would at least free up memory. This system would also need a limit to avoid consuming too much disk space, after which (with a further alert) the queue would be killed. There would need to be a convenient way to consume archived messages.

Non-Goals

Design and Approach

Implementation