Security/Reviews/NotificationsBackend: Difference between revisions

Jump to navigation Jump to search
no edit summary
(Created page with "{{SecReviewInfo |SecReview name=Notificaitons Backend }} {{SecReview}} {{SecReviewActionStatus |SecReview action item status=None }}")
 
No edit summary
Line 1: Line 1:
{{SecReviewInfo
{{SecReviewInfo
|SecReview name=Notificaitons Backend
|SecReview name=Notificaitons Backend
|SecReview target=* https://wiki.mozilla.org/Services/Notifications
* Source: https://github.com/jbalogh/push
* API docs: http://push.rtfd.org/
Review Bug:
<bugzilla>
{
"id":"749806"
}
</bugzilla>
}}
{{SecReview
|SecReview feature goal=* Provide  a semi-anonymous method for a site to send a  brief message to an  interested user via any registered Agent acting on  behalf of the user.
|SecReview alt solutions=* There  are several methods that this could be achieved including a permanent  websocket, IM protocol (e.g. XMPP), hidden iframe, etc.
|SecReview solution chosen=* This method was the easiest for 3rd party sites to implement as well as provided the most control and privacy to the user.
|SecReview threats considered=* Spam: remote site could attempt to send spam messages to randomly chosen URLs
** URL namespace is 256bit random, making it very large with a low chance of success
* Site could send malicious or annoying content to user
** messages are format limited to plain text with separate elements for action url and img
** User can disable overly chatty or annoying sites easily.
* Transmit channel could alter or inspect messages:
** Notifications can be optionally encoded via AES where the UserAgent  generates and shares the per site keypair. Channel unable to decrypt or  decipher message.
* UserAgent / Site keypair negotiation happens outside of Notification channel
|SecReview threat brainstorming=* Denial of service
** Site sends a malformed encrypted message to use resource on client.
*** This matters more for slower devices
** Attempt to exhaust keyspace
*** unlikely due to size of keyspace
*** what happens in the event of a collision?
** rate limiting of notifications?
*** a leaked URL could result in mass spam to a partitcular user
* Abuse of service
** Are we concerned with malicious parties using notifications as a medium for illegal activity?
* Are there plans for bidirectional notifications?
**related to bipostal for browserid
}}
}}
{{SecReview}}
{{SecReviewActionStatus
{{SecReviewActionStatus
|SecReview action item status=None
|SecReview action item status=In Progress
|SecReview action items=[dchan] - are websockets torn down when going to privacy mode? - 6/21
[dchan] - are iframes allowed to generate notifications doorhangars? Should follow same model as geolocation. - 6/21
[dchan] - testing for notifications
[dchan] - follow up with jonas on b2g apps wants to listen for notifications from their domain - 6/21
}}
}}
Notifications let websites send small messages (<1024 bytes) to users without
the user having that website open. Websites ask for push permission when a user
has the website open; the javascript API returns a URL like
https://notifications.mozilla.org/long-string-of-random-characters that is
specific to that user and website. The website backend can the POST messages to
that URL, and our notification server will forward the messages to the user's
Firefox/B2G/etc. User devices try to maintain a persistent connection to the
notification server so that messages are delivered immediately.
* A user may have multiple devices/clients receiving notifications.
* A user may receive notifications from multiple websites/apps.
* Websites send messages to users by POSTing to a URL created by the
  notification server.
* Each user+website pair has a unique notification URL.
* Clients use long-term socket connections (e.g. WebSockets) to receive
  messages.
* All other API interactions use HTTP.
* Clients get a list of socket server addresses through an HTTP call. They
  attempt to connect to each of the addresses and back off if they all fail.
* Socket servers can't be behind Zeus due to licensing restrictions.
http://research.google.com/pubs/archive/37474.pdf describes the architecture of
a similar system built by Google that is used to notify their applications of
data changes.
== Data Stored ==
* Recent Notifications
Notifications are stored in Cassandra, through Queuey, with a three-day TTL.
Each user has a single queue which stores messages from multiple domains.
* Message state
When a client reads a message it sends an update marker to the notification
server, which is stored in the same queue. Other clients (from the same user)
can read this message and treat the message as read. These messages will expire
with the same three-day TTL.
* Registration Data:
To deliver messages, the server maps notification URLs to users:
  {QUEUE_TOKEN: {"user": USER_TOKEN: "domain": DOMAIN}}
The website POSTs a message to a URL like
https://notifications.mozilla.org/QUEUE_TOKEN. The notification server looks up
the user mapping in its data store, writes the message to Cassandra, and
publishes the message to the websocket servers for immediate delivery.
The mapping is currently stored in MySQL.
== Write Traffic ==
* Creating a new push URL. Only happens once per site per user.
* New push notifications from websites. This is the largest traffic source.
  Urban Airship is ramping up their architecture to handle 100,000
  notifications per second.
* Marking messages as read. Occurs (at most) once per device per user.
== Read Traffic ==
* Clients starting up, syncing registered notification URLs.
* Clients starting up, doing initial message state sync.
* Clients starting up, getting list of socket server addresses.
== Push Traffic ==
* Messages coming from websites.
* State updates: mark messages as read, add new notification URL.
== Internal Traffic ==
* Database lookups for queue => user mapping.
* Cassandra writes for new messages.
* Publishing messages from API servers to socket servers.
The current plan is to use zeromq to broadcast messages to the socket servers,
tagged with the user token.  Each server knows which clients it has connected,
so a socket server can pick out and deliver the messages for its connected
users.
Urban Airship was using Kafka for pubsub here, but they were moving to an
architecture with direct RPC calls from API servers to socket servers. Socket
servers tell a central registration server which clients are connected, and the
API servers look up that state to send direct RPC calls.
Here are some potential sharding plans, with a look at scalability,
performance, and fault tolerance.
== No Sharding ==
* Everything in a single data center.
* All data in a single mysql cluster and cassandra cluster.
* Internal pubsub stays in the data center.
PRO:
* easy to develop
* performant as long as it scales
* lets us rely on the power of positive thinking
CON:
* running into the scalability limits of a single data center
* having a meteor hit the data center
== Full Sharding ==
* Multiple data centers.
* Clients are assigned to a cluster like notifications17.mozilla.org when they
  first start up, and stick there forever. Clusters are completely independent.
* The notification server provides push URLs with shard-aware domains like
  notifications17.mozilla.org.
* Internal pubsub stays in the data center.
PRO:
* fault tolerant
* almost infinitely scalable
* performant
* easy to develop
CON:
* sharding is locked in forever since external websites have the shards
  in their push URLs.
* Websites are sending messages to many domains, so they can't optimize HTTP
  connections as well with something like SPDY.
== Limited Sharding ==
* Multiple data centers.
* The notification server provides push URLs pointing to the canonical
  notifications.mozilla.org domain.
* Clients are sticky to a cluster, but their data can be migrated.
* Notifications come into notifications.mozilla.org and may need to be
  propagated across data centers to reach the right user.
PRO:
* fault tolerant, scalable
* websites can optimize connections to notifications.mozilla.org
CON:
* more moving parts, harder to develop.
* cross-data center communication could be slow
== Optimizations ==
* If all clients have read a message, delete it early.
* If there's only one client, don't mark messages as read.
* Randomize reconnect backoff to avoid thundering herd.
== Security/Stability Concerns ==
* Websites DOSing us with valid push notification traffic.
* Spammers DOSing us with invalid push traffic.
* Attackers trying to guess user's queue URLs.
* Attackers connecting as valid users and exhausting socket server resources.
* Attackers filling valid user queues and exhausting resources.
* Mozilla storing data telling what sites you have push notifications for.
* Mozilla storing your push notifications.
* Mozilla storing your IP as your device (re)connects to socket servers.
canmove, Confirmed users, Bureaucrats and Sysops emeriti
2,776

edits

Navigation menu