Auto-tools/Projects/Pulse/PulseGuardian: Difference between revisions

Jump to navigation Jump to search
→‎Problem: past tense
(Add address, remove deployment bug, update team)
(→‎Problem: past tense)
Line 11: Line 11:
= Problem =
= Problem =


[[Auto-tools/Projects/Pulse|Pulse]] uses RabbitMQ as a pub/sub service which currently allows anyone to subscribe to any exchange via a common user account.  Some client applications use durable queues in case they crash; however, sometimes these queues are created by accident, and sometimes apps crash without admins noticing.  In these cases, the queues continue to grow without bound, which can eventually result in the RabbitMQ host running out of memory.  Our current solution is to have Nagios monitor the queues and send alerts when any queues exceed a certain number of unread or unacknowledged messages, at which point a RabbitMQ admin attempts to find the person responsible and/or delete the offending queue.
[[Auto-tools/Projects/Pulse|Pulse]] uses RabbitMQ as a pub/sub service which formerly allowed anyone to subscribe to any exchange via a common user account.  Some client applications use durable queues in case they crash; however, sometimes these queues are created by accident, and sometimes apps crash without admins noticing.  In these cases, the queues continue to grow without bound, which can eventually result in the RabbitMQ host running out of memory.  Our previous solution was to have Nagios monitor the queues and send alerts when any queues exceed a certain number of unread or unacknowledged messages, at which point a RabbitMQ admin attempted to find the person responsible and/or delete the offending queue.


= Goals & Considerations=
= Goals & Considerations=
Confirmed users
1,927

edits

Navigation menu