Security/Reviews/NotificationsBackend: Difference between revisions

Security/Reviews/NotificationsBackend (view source)

Revision as of 17:24, 15 June 2012

8,359 bytes added , 15 June 2012

no edit summary

Curtisk

canmove, Confirmed users, Bureaucrats and Sysops emeriti

2,776

edits

@@ Line 1: / Line 1: @@
 {{SecReviewInfo
 |SecReview name=Notificaitons Backend
+|SecReview target=* https://wiki.mozilla.org/Services/Notifications
+* Source: https://github.com/jbalogh/push
+* API docs: http://push.rtfd.org/
+Review Bug:
+<bugzilla>
+{
+"id":"749806"
+}
+</bugzilla>
+}}
+{{SecReview
+|SecReview feature goal=* Provide  a semi-anonymous method for a site to send a  brief message to an  interested user via any registered Agent acting on  behalf of the user.
+|SecReview alt solutions=* There   are several methods that this could be achieved including a permanent   websocket, IM protocol (e.g. XMPP), hidden iframe, etc.
+|SecReview solution chosen=* This method was the easiest for 3rd party sites to implement as well as provided the most control and privacy to the user.
+|SecReview threats considered=* Spam: remote site could attempt to send spam messages to randomly chosen URLs
+** URL namespace is 256bit random, making it very large with a low chance of success
+* Site could send malicious or annoying content to user
+** messages are format limited to plain text with separate elements for action url and img
+** User can disable overly chatty or annoying sites easily.
+* Transmit channel could alter or inspect messages:
+** Notifications can be optionally encoded via AES where the UserAgent   generates and shares the per site keypair. Channel unable to decrypt or   decipher message.
+* UserAgent / Site keypair negotiation happens outside of Notification channel
+|SecReview threat brainstorming=* Denial of service
+** Site sends a malformed encrypted message to use resource on client.
+*** This matters more for slower devices
+** Attempt to exhaust keyspace
+*** unlikely due to size of keyspace
+*** what happens in the event of a collision?
+** rate limiting of notifications?
+*** a leaked URL could result in mass spam to a partitcular user
+* Abuse of service
+** Are we concerned with malicious parties using notifications as a medium for illegal activity?
+* Are there plans for bidirectional notifications?
+**related to bipostal for browserid
 }}
-{{SecReview}}
 {{SecReviewActionStatus
-|SecReview action item status=None
+|SecReview action item status=In Progress
+|SecReview action items=[dchan] - are websockets torn down when going to privacy mode? - 6/21
+[dchan] - are iframes allowed to generate notifications doorhangars? Should follow same model as geolocation. - 6/21
+[dchan] - testing for notifications
+[dchan] - follow up with jonas on b2g apps wants to listen for notifications from their domain - 6/21
 }}
+Notifications let websites send small messages (<1024 bytes) to users without
+the user having that website open. Websites ask for push permission when a user
+has the website open; the javascript API returns a URL like
+https://notifications.mozilla.org/long-string-of-random-characters that is
+specific to that user and website. The website backend can the POST messages to
+that URL, and our notification server will forward the messages to the user's
+Firefox/B2G/etc. User devices try to maintain a persistent connection to the
+notification server so that messages are delivered immediately.
+* A user may have multiple devices/clients receiving notifications.
+* A user may receive notifications from multiple websites/apps.
+* Websites send messages to users by POSTing to a URL created by the
+  notification server.
+* Each user+website pair has a unique notification URL.
+* Clients use long-term socket connections (e.g. WebSockets) to receive
+  messages.
+* All other API interactions use HTTP.
+* Clients get a list of socket server addresses through an HTTP call. They
+  attempt to connect to each of the addresses and back off if they all fail.
+* Socket servers can't be behind Zeus due to licensing restrictions.
+http://research.google.com/pubs/archive/37474.pdf describes the architecture of
+a similar system built by Google that is used to notify their applications of
+data changes.
+== Data Stored ==
+* Recent Notifications
+Notifications are stored in Cassandra, through Queuey, with a three-day TTL.
+Each user has a single queue which stores messages from multiple domains.
+* Message state
+When a client reads a message it sends an update marker to the notification
+server, which is stored in the same queue. Other clients (from the same user)
+can read this message and treat the message as read. These messages will expire
+with the same three-day TTL.
+* Registration Data:
+To deliver messages, the server maps notification URLs to users:
+  {QUEUE_TOKEN: {"user": USER_TOKEN: "domain": DOMAIN}}
+The website POSTs a message to a URL like
+https://notifications.mozilla.org/QUEUE_TOKEN. The notification server looks up
+the user mapping in its data store, writes the message to Cassandra, and
+publishes the message to the websocket servers for immediate delivery.
+The mapping is currently stored in MySQL.
+== Write Traffic ==
+* Creating a new push URL. Only happens once per site per user.
+* New push notifications from websites. This is the largest traffic source.
+  Urban Airship is ramping up their architecture to handle 100,000
+  notifications per second.
+* Marking messages as read. Occurs (at most) once per device per user.
+== Read Traffic ==
+* Clients starting up, syncing registered notification URLs.
+* Clients starting up, doing initial message state sync.
+* Clients starting up, getting list of socket server addresses.
+== Push Traffic ==
+* Messages coming from websites.
+* State updates: mark messages as read, add new notification URL.
+== Internal Traffic ==
+* Database lookups for queue => user mapping.
+* Cassandra writes for new messages.
+* Publishing messages from API servers to socket servers.
+The current plan is to use zeromq to broadcast messages to the socket servers,
+tagged with the user token.  Each server knows which clients it has connected,
+so a socket server can pick out and deliver the messages for its connected
+users.
+Urban Airship was using Kafka for pubsub here, but they were moving to an
+architecture with direct RPC calls from API servers to socket servers. Socket
+servers tell a central registration server which clients are connected, and the
+API servers look up that state to send direct RPC calls.
+Here are some potential sharding plans, with a look at scalability,
+performance, and fault tolerance.
+== No Sharding ==
+* Everything in a single data center.
+* All data in a single mysql cluster and cassandra cluster.
+* Internal pubsub stays in the data center.
+PRO:
+* easy to develop
+* performant as long as it scales
+* lets us rely on the power of positive thinking
+CON:
+* running into the scalability limits of a single data center
+* having a meteor hit the data center
+== Full Sharding ==
+* Multiple data centers.
+* Clients are assigned to a cluster like notifications17.mozilla.org when they
+  first start up, and stick there forever. Clusters are completely independent.
+* The notification server provides push URLs with shard-aware domains like
+  notifications17.mozilla.org.
+* Internal pubsub stays in the data center.
+PRO:
+* fault tolerant
+* almost infinitely scalable
+* performant
+* easy to develop
+CON:
+* sharding is locked in forever since external websites have the shards
+  in their push URLs.
+* Websites are sending messages to many domains, so they can't optimize HTTP
+  connections as well with something like SPDY.
+== Limited Sharding ==
+* Multiple data centers.
+* The notification server provides push URLs pointing to the canonical
+  notifications.mozilla.org domain.
+* Clients are sticky to a cluster, but their data can be migrated.
+* Notifications come into notifications.mozilla.org and may need to be
+  propagated across data centers to reach the right user.
+PRO:
+* fault tolerant, scalable
+* websites can optimize connections to notifications.mozilla.org
+CON:
+* more moving parts, harder to develop.
+* cross-data center communication could be slow
+== Optimizations ==
+* If all clients have read a message, delete it early.
+* If there's only one client, don't mark messages as read.
+* Randomize reconnect backoff to avoid thundering herd.
+== Security/Stability Concerns ==
+* Websites DOSing us with valid push notification traffic.
+* Spammers DOSing us with invalid push traffic.
+* Attackers trying to guess user's queue URLs.
+* Attackers connecting as valid users and exhausting socket server resources.
+* Attackers filling valid user queues and exhausting resources.
+* Mozilla storing data telling what sites you have push notifications for.
+* Mozilla storing your push notifications.
+* Mozilla storing your IP as your device (re)connects to socket servers.

Security/Reviews/NotificationsBackend: Difference between revisions

Security/Reviews/NotificationsBackend (view source)

Revision as of 17:24, 15 June 2012

Navigation menu

Search