Auto-tools/Projects/Pulse: Difference between revisions

No edit summary
 
(47 intermediate revisions by 9 users not shown)
Line 1: Line 1:
== Mozilla Pulse ==
= Introducing Pulse =


https://pulse.mozilla.org/
Pulse is a managed [http://www.rabbitmq.com RabbitMQ] cluster designed to provide loose coupling between automation and infrastructure tools.  The goal of Pulse is to add visibility to Mozilla's tools and systems and to eliminate polling and other brittle methods of scraping data. This allows more robust, dynamic, and informative tools.


Mozilla currently has a ton of different systems that are inter-connected via polling, screen scraping, email, and other brittle methods. To make their lives easier community members often build tools on top of this house of cards, adding yet another level of scraping and polling. Many systems don't even export important data for others to scrape and use, preventing better tools from being written.
Pulse is available at pulse.mozilla.org:5671 (AMQP over SSL).  It is hosted by [http://cloudamqp.com CloudAMQP].


The goal of Pulse is to eliminate polling and add visibility into all aspects of Mozilla and its systems. This allows more robust, dynamic, and informative tools.
[[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]] is a tool that manages Pulse's users and queues (and eventually exchanges).  It is available at https://pulseguardian.mozilla.org and hosted by [http://heroku.com Heroku].


We have a discussion forum available via the standard trio of [news:mozilla.tools.pulse USENET newsgroup], [https://lists.mozilla.org/listinfo/tools-pulse mailing list], and [https://groups.google.com/forum/#!forum/mozilla.tools.pulse Google Group].
We have a discussion forum available via the standard trio of [news:mozilla.tools.pulse USENET newsgroup], [https://lists.mozilla.org/listinfo/tools-pulse mailing list], and [https://groups.google.com/forum/#!forum/mozilla.tools.pulse Google Group].


File bugs under [https://bugzilla.mozilla.org/enter_bug.cgi?product=Webtools&component=Pulse Webtools :: Pulse].
File bugs under [https://bugzilla.mozilla.org/enter_bug.cgi?product=Webtools&component=Pulse Webtools :: Pulse].  We don't have a separate component for PulseGuardian; rather, we just start the summaries with "[PulseGuardian]".


=== System Description ===
Also see the [https://tools.taskcluster.net/pulse-inspector/ Pulse Inspector] web app, which displays Pulse messages in real time, and the (manually updated) [[/Exchanges|list of Pulse exchanges]].


Pulse isn't any one thing.  At its heart, it is a RabbitMQ system with a particular configuration and a set of conventions for using it along with a management tool, [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]], to make Pulse as automated and self-serve as possible.  Pulse follows the pub-sub pattern, in which publishers send messages to topic exchanges, and consumers create queues bound to these exchanges in order to subscribe to the publishers' messages.  The [https://pypi.python.org/pypi/MozillaPulse mozillapulse] Python package provides classes for existing publishers, consumers, and messages so you can quickly build Pulse applications.
= System Description =


=== Contributing ===
Pulse isn't any one thing.  At its heart, it is a RabbitMQ system with a particular configuration and a set of conventions for using it along with a management tool, [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]], to make Pulse as automated and self-serve as possible.  Pulse follows the pub-sub pattern, in which publishers send messages to topic exchanges, and consumers create queues bound to these exchanges in order to subscribe to the publishers' messages.  In general, publishers create and own exchanges, and consumers create and own queues.


[http://mzl.la/1pc2iGd Browse] the list of open, unassigned  mentored Pulse bugs to see how you can contribute!
= Specification =


To set up a local system for development, see the [https://hg.mozilla.org/automation/mozillapulse/file/tip/HACKING.md HACKING.md] file included in the mozillapulse source.
Pulse is a managed [https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf AMQP 0-9-1]
service with [https://www.rabbitmq.com/extensions.html RabbitMQ extensions] for publishing messages from Mozilla
infrastructure. The aim is provide hooks that subscribers can
use to integrate and extend Mozilla infrastructure.


=== Status ===
== Authentication ==


At the moment, only BuildBot messages (BuildMessage, TestMessage) and [[BMO/ChangeNotificationSystem|SimpleBugMessages]] are being published to Pulse.
Pulse credentials are managed and issued by [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]],
available at https://pulseguardian.mozilla.org. This service SHALL issue
an ''accessToken'' for any ''clientId'' that is registered
with ''authorized'' email address.
The accessToken is strictly secret and MUST NOT be shared
publicly. The clientId is not secret. When establishing an AMQP
connection, the clientId and accessToken MUST be used as
''username'' and ''password'', respectively.


There used to be two other publishers, which have been disabled:
== Authorized Users ==


* HgPublisher: the original shim "crashed on various occasions, in particular file additions/removals/renames and merges made it go funky."
Pulse is intended to be open to all Mozillians who want to
** {{bug|1022701}} on file to fix and re-enable.
extend or integrate with Mozilla infrastructure. To guard against
* BugzillaPublisher: this produced too much traffic for the original prototype system, and for security reasons it could publish only changes to public bugs, making it of questionable value. The [[BMO/ChangeNotificationSystem|SimpleBugzillaPublisher]] is a lightweight replacement that publishes only bug ID and change time, but for all bugs, public or otherwise.
abuse PulseGuardian users MUST authenticate via Persona. PulseGuardian SHOULD verify that users have a vouched Mozillians profile.


=== Technology used ===
== Publishers ==


* The message broker used is [http://www.rabbitmq.com RabbitMQ].
Publishers MUST name ''exchanges'' in the form <code>exchange/<clientId>/<name></code> where clientId is the userid used to bind/connect to the server. Attempts to name an exchange otherwise SHALL result in an authorization error. Exchanges MUST be ''topic exchanges'' and they MUST be declared ''durable''.
* Protocol used to talk to the broker is [http://en.wikipedia.org/wiki/AMQP AMQP].
* Messages are in JSON.
* For the Python mozillapulse package, the underlying library currently used to talk AMQP is [http://kombu.readthedocs.org/ Kombu].


=== Road Map ===
Messages MUST contain a UTF-8-encoded [http://tools.ietf.org/html/rfc7159 JSON] payload, and
their <code>Content-Type</code> MUST be <code>application/json</code>.
Messages SHOULD NOT be larger than 8&nbsp;kB; deviations may be
feasible for low-traffic exchanges. Messages MUST NOT contain
secret or sensitive information; all exchanges and messages
SHALL be considered public.


See the [http://mzl.la/1pc2F3M prioritized bug list] for all open issues.
A message SHOULD carry a ''routing key'', in which fields have a
fixed index from the left. Additionally, a message MAY be
''cced'' to multiple routing keys, using the RabbitMQ
[https://www.rabbitmq.com/sender-selected.html ''Sender-selected Distribution''] extension.


==== Website ====
Messages SHOULD be ''durable'' and SHOULD be published over
* {{bug|1017957}} Merge above in with PulseGuardian; no point in having two websites.
RabbitMQ [https://www.rabbitmq.com/confirms.html ''confirm-publish'' channels]. Otherwise, the
* Indicate current Pulse status (at least just up/down).
documentation MUST clearly reflect that messages from the
* (Maybe) Display published messages on the Pulse website (mostly decorative but also an example of use in the browser).
given exchange do not exhibit deliver ''at-least-once'' semantics.


==== Management ====
== Subscribers ==
* (Almost done!) Intelligently handle queues that start filling up.
** See [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]].


==== Security ====
Subscribers MUST name ''queues'' in the form
* {{done|}} Enable SSL.
<code>queue/<clientId>/<name></code>; attempts to name a queue otherwise
** {{bug|1013980}} Enable SSL by default in clients.
SHALL result in an authorization error. Queues MAY ''consume''
** Close non-SSL port eventually?
from any exchange prefixed <code>exchange/</code>; attempts to consume
* Move to a tighter permission model. See the Security Model section below.
from any other exchange SHALL result in an authentication error.


==== Shims ====
Subscribers MAY limit the size of their queues using the RabbitMQ
* Re-enable hg shim?
[https://www.rabbitmq.com/maxlength.html ''Queue Length Limit''] extension. Subscribers MUST NOT let
* Add git shim?
their queues grow unbounded; if left unattended, ''Pulse''
* Other shims?
SHALL notify the owner by email. Additionally, ''Pulse'' MAY delete
a queue which exceeds defined limits. Subscribers SHOULD specify a
prefetch limit using the RabbitMQ [https://www.rabbitmq.com/consumer-prefetch.html ''Consumer Prefetch'' limit] extension.


==== Other ====
Subscribers SHOULD use either ''durable'' queues or
* Upgrade RabbitMQ to latest 3.x version (ideally with zero downtime).
''auto-delete'' queues. Implementors are recommended to aim
* Enable STOMP or some other method of accessing Pulse via the browser.
for deliver-''at-least-once'' semantics.
* Create a JavaScript library along the lines of the mozillapulse Python package.


=== Security Model ===
== Appendix A: Everything in Bullet Points ==
 
This is a summary of the above.
 
Pulse:
* MUST offer registration at <code>pulse.mozilla.org</code>
* MUST support [https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf AMQP 0-9-1] and these RabbitMQ extensions:
** [https://www.rabbitmq.com/confirms.html Confirms]
** [https://www.rabbitmq.com/consumer-prefetch.html Consumer Prefetch]
** [https://www.rabbitmq.com/maxlength.html Queue Length Limit]
** [https://www.rabbitmq.com/sender-selected.html Sender-selected Distribution]
* SHOULD exhibit deliver-''at-least-once'' semantics
* MAY delete queues that grows beyond ''Pulse'' defined limits
* SHALL notify owner by email when a queue grows close to ''Pulse''-defined limits.
 
Publishers:
* SHOULD use [https://www.rabbitmq.com/confirms.html confirm-publish channels]
 
Exchanges:
* MUST be named <code>exchange/<clientId>/<name></code>
* MUST be topic exchanges
* MUST be durable
 
Messages:
* MUST be UTF-8-encoded [http://tools.ietf.org/html/rfc7159 JSON]
* MUST carry <code>application/json</code> as <code>Content-Type</code>
* SHOULD be durable
* SHOULD be less than 8 KiB (for good performance)
* MAY be CC'ed to [https://www.rabbitmq.com/sender-selected.html multiple routing keys]
* MUST NOT contain private or sensitive information
* SHOULD have a routing key where fields have a fixed index from the left
 
Subscribers:
* SHOULD specify a [https://www.rabbitmq.com/consumer-prefetch.html ''consumer prefetch'' limit]
 
Queues:
* MUST be named <code>queue/<clientId>/<name></code>
* MAY have a [https://www.rabbitmq.com/maxlength.html limited length]
* MUST not grow unbounded
 
= Let's Use It =
 
There are currently two Pulse clients available. Please note that you can also connect to Pulse in other languages, provided you have an AMQP 0.9.1 library that will let you interact with AMQP exchanges. See https://github.com/rabbitmq/rabbitmq-tutorials#languages for example.
 
== Python Pulse client library ==
 
The [https://github.com/mozilla-services/mozillapulse mozillapulse] Python package provides classes for existing publishers, consumers, and messages so you can quickly build Pulse applications.  See the [https://github.com/mozilla-services/mozillapulse/blob/master/README.md README] to get started (although note that the test publisher is currently offline; see {{bug|1218976}}.  You can use another consumer, e.g. BuildConsumer, to verify your setup.).
 
This library is somewhat inflexible, however, and should be rewritten. One idea is to turn TaskCluster's Python client into a standalone package.
 
== Go (golang) Pulse client library ==
 
This can be found here:
* http://taskcluster.github.io/pulse-go/
 
Extensions for TaskCluster exchanges here (see section "AMQP APIs"):
* http://taskcluster.github.io/taskcluster-client-go/
 
= Contributing =
 
To set up a local system for development, see the [https://github.com/mozilla-services/mozillapulse/blob/master/HACKING.md HACKING.md] file included in the mozillapulse source.
 
The main Pulse library (mozillapulse) and publisher shims (pulseshims) are written in Python, although there is also a Go library as mentioned in the section above.  We also want to provide a canonical JavaScript library at some point.  To hack on the main Pulse library, you should be comfortable in Python, and it's helpful to understand the basics of AMQP.  Knowledge of kombu is also useful.
 
To hack on PulseGuardian, you should know some Python and JavaScript.  Experience with Flask, SQLAlchemy, and RabbitMQ are useful, but you can probably learn what you need as you fix bugs.
 
Feel free to stop by #pulse or #ateam with questions!
 
Here is the list of open, unassigned, mentored Pulse and PulseGuardian bugs to get you started.
 
<bugzilla>
    {
        "quicksearch": "status:new",
        "product": "Webtools",
        "component": "Pulse",
        "f1": "bug_mentor",
        "o1": "isnotempty",
        "include_fields": "id,summary,priority,status"
    }
</bugzilla>
 
Once you have your feet wet and are ready to take on a more involved project, here is a list of all current Pulse bugs:
 
<bugzilla>
    {
        "quicksearch": "status:new,assigned,reopened,unconfirmed",
        "product": "Webtools",
        "component": "Pulse"
    }
</bugzilla>
 
For mentored bugs, we use the User Story to provide a link back to this page, as well as any extra information for contributors, such as required knowledge or tools.  The basic text for mentored bugs should be "This is a mentored Pulse bug.  For general information on Pulse, see https://wiki.mozilla.org/Auto-tools/Projects/Pulse, which includes a section on Contributing."  An example of extra text is "This bug also requires you to have a working mail server."
 
= Road Map =
 
See the [https://bugzilla.mozilla.org/buglist.cgi?resolution=---&query_format=advanced&component=Pulse&product=Webtools prioritized bug list] for all open issues and feature requests.
 
= Security Model =
 
This is summarized in the formal Pulse specification above.  What follows is the rationale and some technical implementation notes.


In order to have a reliable, well behaved system, the following assertions will need to be true.
In order to have a reliable, well behaved system, the following assertions will need to be true.
Line 82: Line 195:
With this security model, we technically don't really need vhosts, since the names of the queues and exchanges the users can use are so specific.  There may still be a benefit in allowing apps to use the same queue name for different exchanges, though, which would be possible if each exchange had its own vhost.  The downside is that you cannot specify "all vhosts" when setting a user's permissions, so they would either have to list all vhosts they want to use when creating the user in PulseGuardian, and be able to update that list later, or PulseGuardian or some other app would have to automatically add new permissions to all users when a vhost is created.
With this security model, we technically don't really need vhosts, since the names of the queues and exchanges the users can use are so specific.  There may still be a benefit in allowing apps to use the same queue name for different exchanges, though, which would be possible if each exchange had its own vhost.  The downside is that you cannot specify "all vhosts" when setting a user's permissions, so they would either have to list all vhosts they want to use when creating the user in PulseGuardian, and be able to update that list later, or PulseGuardian or some other app would have to automatically add new permissions to all users when a vhost is created.


=== Admin Procedures ===
= Admin Procedures =
 
dustin and the taskcluster team have access to the Pulse cluster on CloudAMQP and the following related services:
 
* PulseGuardian should be deleting queues that are too long. If you need to manually delete a queue, use the Management UI. Try to ping the queue owner first before killing if possible.
 
== To upgrade a ssl certificate on pulse.mozilla.org ==
 
Open a bug with IT to generate a new certificate https://bugzilla.mozilla.org/enter_bug.cgi?product=Infrastructure%20%26%20Operations&component=SSL%20Certificates
See {{bug|1532325}} for an example.
 
IT needs to email support@cloudamqp.com with the new cert.  The cloudampq support team will install it on all our of cloudampq nodes.  After it has been installed, you can login to the administrative start the nodes one by one which will not result any downtime.  (Ensure you wait for the node to restart before restarting another one.)  Verify that the certs are installed on the nodes
 
* https://sslanalyzer.comodoca.com/?url=pulse.mozilla.org
* https://sslanalyzer.comodoca.com/?url=orange-antelope-01.rmq.cloudamqp.com
* https://sslanalyzer.comodoca.com/?url=orange-antelope-02.rmq.cloudamqp.com
* https://sslanalyzer.comodoca.com/?url=orange-antelope-01.rmq.cloudamqp.com
 
This should show the dates for the new certificate and that the cert is trusted by Mozilla and Microsoft.


These should largely become obsolete when PulseGuardian is deployed.
Cloudampq updated their web page since we last did this so that you should be able to upload the cert yourself and have it propagate.  See the admin console under "Certificate".


* When a queue becomes stuck, you can use the Admin UI to kill it. Try to ping the queue owner first before killing if possible.
= More reading =
** More than half of the queues are QA related (whimboo)
* pulsetranslator service, which normalizes buildbot messages, is currently running on pulsetranslator.ateam.phx1.mozilla.com and may need to be reset from time to time.
* logparser service, used by [http://brasstacks.mozilla.com/orangefactor/ Orange Factor], runs on orangefactor1.dmz.phx1.mozilla.com


=== More reading ===
* [http://slides.com/mcote/pulse Slides] from a presentation on Pulse.
* [https://mrcote.info/blog/2015/02/16/pulse-update/ Update] on Pulse from 2015/02/16.


LegNeato wrote several blog posts on Pulse as he was building it.  They contain some more background if you're really interested.  They are linked below, in chronological order.
LegNeato also wrote several blog posts on Pulse as he was building it.  They contain some more background if you're really interested.  They are linked below, in chronological order.


* [http://christian.legnitto.com/blog/2010/07/17/mozilla-pulse-and-rabbitmq/ Mozilla Pulse and RabbitMQ]
* [http://christian.legnitto.com/blog/2010/07/17/mozilla-pulse-and-rabbitmq/ Mozilla Pulse and RabbitMQ]
Confirmed users
1,201

edits