CloudServices/Sagrada/Queuey: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 22: Line 22:
a message guarantee that is generally "Deliver once, and exactly once." These
a message guarantee that is generally "Deliver once, and exactly once." These
guarantee's can be altered based on deployment configuration.
guarantee's can be altered based on deployment configuration.
= Terminology =
Cassandra
    Apache Cassandra, the default storage back-end for the queue
CL
    Consistency Level
DC
    Data-center
MQ
    message-queue
qdo
    Worker library for queuey
queuey
    The message queue web application Python package that provides the
    RESTful API for the message queue.


= Engineers =
= Engineers =
Line 107: Line 91:


When using queue's that are to be consumed, they must be declared up-front as  
When using queue's that are to be consumed, they must be declared up-front as  
a ''partioned'' queue. The amount of partitions should also be specified, and
a ''partitioned'' queue. The amount of partitions should also be specified, and
new messages will be randomly partioned. If messages should be processed in
new messages will be randomly partitioned. If messages should be processed in
order, they can be inserted into a single partition to enforce ordering. All
order, they can be inserted into a single partition to enforce ordering. All
messages that are randomly partitioned should be considered loosely ordered.
messages that are randomly partitioned should be considered loosely ordered.
Line 125: Line 109:
This model is based exactly on how Apache Kafka workers divide up queues to
This model is based exactly on how Apache Kafka workers divide up queues to
work on.
work on.
== Example Deployment Configurations ==
=== Multiple DC Reliability ===
This setup is for messages that are critical to deliver, performance is
substantially slower as writes will not return until a quorum of storage
nodes in each data-center acknowledge the message was written.
Depending on how urgent it is for the clients to see the message, the read
CL can be between ONE and LOCAL_QUORUM. Higher read availability
will be achieved by using ONE though, with only a slight delay before a
written message will be seen regardless of the data-center.
Using a read CL of ONE allows for an entire data-center to become unavailable
and messages will still be delivered, however no new messages can be written
as all data-centers must have quorum's available to write messages. Changing
the write CL to a lower value adversely affects how clients may read the queue
as they may need to track what has been read to ensure they don't miss messages
that are still propagating.
''Queuey''
* Deploy to webheads in each data-center
''Storage back-end''
* Select Cassandra
* Deploy Cassandra machines (3+) in each data-center
* Set write CL to EACH_QUORUM
* Set read CL between ONE - LOCAL_QUORUM
* Set delay as appropriate for the read/write CL's
=== High Throughput, Occasional Unavailability of Messages, Single DC ===
In this setup, Queuey behaves like Apache Kafka. The queuey service runs on the
same nodes as Cassandra, and each node runs Cassandra as a single node. This
provides no data durability beyond the disks, so if message persistence is
important than the drives should be RAID. If a node goes down, the messages on
that node are unavailable until the node returns.
Applications writing messages should hit a load balancer that has the queuey
nodes behind it, so writes will be unaffected by individual machine outages and
message producers will not need configuration updates to add/remove queuey
nodes.
Clients reading queues will have to know the individual hostnames of the queuey
nodes and track how far they've read into the queue per-host, as well as be
able to communicate directly with the queuey nodes (bypassing the load
balancer).
''Queuey''
* Deploy to desired nodes
''Storage back-end''
* Select Cassandra
* Deploy as single-node instance to every queuey node
* Set read/write CL to ONE, as there is only ONE node
* No delay needed
=== Good Throughput, Available Messages, Single DC ===
This is a good generic setup that provides message durability in the event a
few nodes are lost. Webheads run queuey, and all connect to the same Cassandra
cluster.
Message guarantee's can be tweaked by altering the read/write CL's, and the
enforced message availability delay. For example, to raise availability the
read/write CL can be set to ONE and a delay can be set that will provide a
fairly high 99% change of seeing a message within 50ms of writing it (assuming
no nodes go down).
See http://www.eecs.berkeley.edu/~pbailis/projects/pbs/#demo for more details
on tweaking the delay and CL's to achieve the desired message delivery
guarantee's.
''Queuey''
* Deploy to webheads
''Storage back-end''
* Select Cassandra
* Deploy Cassandra cluster (3+) machines
* Set write CL to ONE - QUORUM
* Set read CL to ONE - QUORUM
* Set delay appropriate for read/write CL's


= Initial User Requirements =
= Initial User Requirements =
Line 255: Line 150:
= API =
= API =


Applications allowed to use the '''Message Queue'''
API can be found on the [http://readthedocs.org/docs/queuey/en/latest/api.html queuey API docs] page.  
will be given an application key that must be sent with every request, and
their IP must be on an accepted IP list for the given application key.
 
All methods must include the header 'X-Application-Name' which indicates the name of the
application using the queue. Queue's are unique within an application.
 
For methods requiring authentication (internal apps), the application key must
be sent as a HTTP header named 'ApplicationKey'.
 
== Internal Apps ==
 
These methods are authenticated by IP, and are intended for use by Services Applications.
 
 
'''POST /queue'''
 
    Creates a new queue
 
    Params:
        ''queue_name'' (Optional) - Name of the queue to create
        ''partitions'' (Optional) - How many partitions the queue should have (defaults to 1)
 
'''DELETE /queue/{queue_name}'''
   
    Deletes a given queue created by the App.  
 
    Params:
        ''delete'' (Optional) - If set to ''false'', then the queue contents will be deleted,
                                but the queue will remain registered.
 
'''POST /queue/{queue_name}'''
 
    Create a message on the given queue. Contents is expected to be
    a JSON object.
   
    Raises an error if the queue does not exist.
 
== Clients ==
 
'''GET /queue/{queue_name}'''
   
    Returns messages for the queue.
 
    Params:
        ''since_timestamp'' (Optional) - All messages newer than this timestamp, should be
                                        formatted as seconds since epoch in GMT
        ''limit''          (Optional) - Only return N amount of messages, defaults to 100
        ''order''          (Optional) - 'descending' or 'ascending', defaults to descending
   
    Messages are returned in order of newest to oldest.
 
'''GET /queue/{queue_name}/info'''
 
    Returns information about the queue.
 
    Example response:
        {
            'partitions': 4,
            'created': 1322521547
        }


= Design Background =
= Design Background =
Confirmed users
192

edits