canmove, Confirmed users, Bureaucrats and Sysops emeriti
2,776
edits
(Created page with "{{SecReviewInfo |SecReview name=Notificaitons Backend }} {{SecReview}} {{SecReviewActionStatus |SecReview action item status=None }}") |
No edit summary |
||
| Line 1: | Line 1: | ||
{{SecReviewInfo | {{SecReviewInfo | ||
|SecReview name=Notificaitons Backend | |SecReview name=Notificaitons Backend | ||
|SecReview target=* https://wiki.mozilla.org/Services/Notifications | |||
* Source: https://github.com/jbalogh/push | |||
* API docs: http://push.rtfd.org/ | |||
Review Bug: | |||
<bugzilla> | |||
{ | |||
"id":"749806" | |||
} | |||
</bugzilla> | |||
}} | |||
{{SecReview | |||
|SecReview feature goal=* Provide a semi-anonymous method for a site to send a brief message to an interested user via any registered Agent acting on behalf of the user. | |||
|SecReview alt solutions=* There are several methods that this could be achieved including a permanent websocket, IM protocol (e.g. XMPP), hidden iframe, etc. | |||
|SecReview solution chosen=* This method was the easiest for 3rd party sites to implement as well as provided the most control and privacy to the user. | |||
|SecReview threats considered=* Spam: remote site could attempt to send spam messages to randomly chosen URLs | |||
** URL namespace is 256bit random, making it very large with a low chance of success | |||
* Site could send malicious or annoying content to user | |||
** messages are format limited to plain text with separate elements for action url and img | |||
** User can disable overly chatty or annoying sites easily. | |||
* Transmit channel could alter or inspect messages: | |||
** Notifications can be optionally encoded via AES where the UserAgent generates and shares the per site keypair. Channel unable to decrypt or decipher message. | |||
* UserAgent / Site keypair negotiation happens outside of Notification channel | |||
|SecReview threat brainstorming=* Denial of service | |||
** Site sends a malformed encrypted message to use resource on client. | |||
*** This matters more for slower devices | |||
** Attempt to exhaust keyspace | |||
*** unlikely due to size of keyspace | |||
*** what happens in the event of a collision? | |||
** rate limiting of notifications? | |||
*** a leaked URL could result in mass spam to a partitcular user | |||
* Abuse of service | |||
** Are we concerned with malicious parties using notifications as a medium for illegal activity? | |||
* Are there plans for bidirectional notifications? | |||
**related to bipostal for browserid | |||
}} | }} | ||
{{SecReviewActionStatus | {{SecReviewActionStatus | ||
|SecReview action item status= | |SecReview action item status=In Progress | ||
|SecReview action items=[dchan] - are websockets torn down when going to privacy mode? - 6/21 | |||
[dchan] - are iframes allowed to generate notifications doorhangars? Should follow same model as geolocation. - 6/21 | |||
[dchan] - testing for notifications | |||
[dchan] - follow up with jonas on b2g apps wants to listen for notifications from their domain - 6/21 | |||
}} | }} | ||
Notifications let websites send small messages (<1024 bytes) to users without | |||
the user having that website open. Websites ask for push permission when a user | |||
has the website open; the javascript API returns a URL like | |||
https://notifications.mozilla.org/long-string-of-random-characters that is | |||
specific to that user and website. The website backend can the POST messages to | |||
that URL, and our notification server will forward the messages to the user's | |||
Firefox/B2G/etc. User devices try to maintain a persistent connection to the | |||
notification server so that messages are delivered immediately. | |||
* A user may have multiple devices/clients receiving notifications. | |||
* A user may receive notifications from multiple websites/apps. | |||
* Websites send messages to users by POSTing to a URL created by the | |||
notification server. | |||
* Each user+website pair has a unique notification URL. | |||
* Clients use long-term socket connections (e.g. WebSockets) to receive | |||
messages. | |||
* All other API interactions use HTTP. | |||
* Clients get a list of socket server addresses through an HTTP call. They | |||
attempt to connect to each of the addresses and back off if they all fail. | |||
* Socket servers can't be behind Zeus due to licensing restrictions. | |||
http://research.google.com/pubs/archive/37474.pdf describes the architecture of | |||
a similar system built by Google that is used to notify their applications of | |||
data changes. | |||
== Data Stored == | |||
* Recent Notifications | |||
Notifications are stored in Cassandra, through Queuey, with a three-day TTL. | |||
Each user has a single queue which stores messages from multiple domains. | |||
* Message state | |||
When a client reads a message it sends an update marker to the notification | |||
server, which is stored in the same queue. Other clients (from the same user) | |||
can read this message and treat the message as read. These messages will expire | |||
with the same three-day TTL. | |||
* Registration Data: | |||
To deliver messages, the server maps notification URLs to users: | |||
{QUEUE_TOKEN: {"user": USER_TOKEN: "domain": DOMAIN}} | |||
The website POSTs a message to a URL like | |||
https://notifications.mozilla.org/QUEUE_TOKEN. The notification server looks up | |||
the user mapping in its data store, writes the message to Cassandra, and | |||
publishes the message to the websocket servers for immediate delivery. | |||
The mapping is currently stored in MySQL. | |||
== Write Traffic == | |||
* Creating a new push URL. Only happens once per site per user. | |||
* New push notifications from websites. This is the largest traffic source. | |||
Urban Airship is ramping up their architecture to handle 100,000 | |||
notifications per second. | |||
* Marking messages as read. Occurs (at most) once per device per user. | |||
== Read Traffic == | |||
* Clients starting up, syncing registered notification URLs. | |||
* Clients starting up, doing initial message state sync. | |||
* Clients starting up, getting list of socket server addresses. | |||
== Push Traffic == | |||
* Messages coming from websites. | |||
* State updates: mark messages as read, add new notification URL. | |||
== Internal Traffic == | |||
* Database lookups for queue => user mapping. | |||
* Cassandra writes for new messages. | |||
* Publishing messages from API servers to socket servers. | |||
The current plan is to use zeromq to broadcast messages to the socket servers, | |||
tagged with the user token. Each server knows which clients it has connected, | |||
so a socket server can pick out and deliver the messages for its connected | |||
users. | |||
Urban Airship was using Kafka for pubsub here, but they were moving to an | |||
architecture with direct RPC calls from API servers to socket servers. Socket | |||
servers tell a central registration server which clients are connected, and the | |||
API servers look up that state to send direct RPC calls. | |||
Here are some potential sharding plans, with a look at scalability, | |||
performance, and fault tolerance. | |||
== No Sharding == | |||
* Everything in a single data center. | |||
* All data in a single mysql cluster and cassandra cluster. | |||
* Internal pubsub stays in the data center. | |||
PRO: | |||
* easy to develop | |||
* performant as long as it scales | |||
* lets us rely on the power of positive thinking | |||
CON: | |||
* running into the scalability limits of a single data center | |||
* having a meteor hit the data center | |||
== Full Sharding == | |||
* Multiple data centers. | |||
* Clients are assigned to a cluster like notifications17.mozilla.org when they | |||
first start up, and stick there forever. Clusters are completely independent. | |||
* The notification server provides push URLs with shard-aware domains like | |||
notifications17.mozilla.org. | |||
* Internal pubsub stays in the data center. | |||
PRO: | |||
* fault tolerant | |||
* almost infinitely scalable | |||
* performant | |||
* easy to develop | |||
CON: | |||
* sharding is locked in forever since external websites have the shards | |||
in their push URLs. | |||
* Websites are sending messages to many domains, so they can't optimize HTTP | |||
connections as well with something like SPDY. | |||
== Limited Sharding == | |||
* Multiple data centers. | |||
* The notification server provides push URLs pointing to the canonical | |||
notifications.mozilla.org domain. | |||
* Clients are sticky to a cluster, but their data can be migrated. | |||
* Notifications come into notifications.mozilla.org and may need to be | |||
propagated across data centers to reach the right user. | |||
PRO: | |||
* fault tolerant, scalable | |||
* websites can optimize connections to notifications.mozilla.org | |||
CON: | |||
* more moving parts, harder to develop. | |||
* cross-data center communication could be slow | |||
== Optimizations == | |||
* If all clients have read a message, delete it early. | |||
* If there's only one client, don't mark messages as read. | |||
* Randomize reconnect backoff to avoid thundering herd. | |||
== Security/Stability Concerns == | |||
* Websites DOSing us with valid push notification traffic. | |||
* Spammers DOSing us with invalid push traffic. | |||
* Attackers trying to guess user's queue URLs. | |||
* Attackers connecting as valid users and exhausting socket server resources. | |||
* Attackers filling valid user queues and exhausting resources. | |||
* Mozilla storing data telling what sites you have push notifications for. | |||
* Mozilla storing your push notifications. | |||
* Mozilla storing your IP as your device (re)connects to socket servers. | |||