Identity/BrowserID/ServerImplementation

From MozillaWiki
Jump to: navigation, search

BrowserID is built on Node.js.

In terms of scaling the service to 1 million users and later 10 million users, we should examine the best technology choices.

Requirements

  • CPU bound, mostly crypto workload
  • C10K style to reduce need for hardware
  • Prefer horizontal scaling over vertical scaling

Application Server Platforms

We'll assume Zeus (frontend caching, load balancing), nginx (static web server), MySQL (RDBMS) and Socketlabs (outbound email).

Node.js

BrowserID is currently a Node.js application with multi-master MySQL.

When we went to pick a platform it was chosen because:

  • We're doing a lot of requests to other systems, so the event-driven approach of node is great
  • JavaScript for web devs is pretty awesome
  • We have the ultimate (client side) JavaScript expertise in house
  • We saw a number of upcoming Node.js projects internally

We don't have much production experience. Servers like reredis are waiting to ship, as we don't have enough webops comfort with deploying them to production.

Django vs Node was evaluated, but Sync hadn't shipped on Gunicorn yet. Gunicorn was dropped from selection for being less supportable than Django.

Scaling tips

Anything that takes a while (crypto API calls) should be evented so they don't block the Node.js event loop.

It's early days. Although projects like Cluster exist, Mozilla may end up trail blazing how to scale the system.

Gunicorn / Gevent

Firefox Sync's service is gunicorn plus PyMySQl. Sync has around 700 RPS on ?? servers, 70+ master/slave dbs, and ???

In Sync, 80% of the time is spent doing SQL queries and most of the time spent on the web heads is I/O bound, waiting for the queries to return.

The stack used could be used to write non blocking async code similar to Node.js

Services applications have continuous integration and sophisticated deployment practices. Right now, writing an app using the "base app" from Services makes it compliant to services-ops deployement practices.

Playdoh

This is a secured Django plus Jinja templates and other goodness. Baseline framework for webdev projects. Powers AMO which does roughly 50 million API calls per day (cache misses 91M including cache hits somewhere in the stack). Roughly 25 app servers, 6 slave DBs, 3 memached.

AMO has continuous integration and sophisticated deployment practices.

Scaling tips

Hotspots bypass ORM and do straight SQL plus string interpolation.

Evaluation Matrix

TODO: topics

  • Mozilla dev pool (small, medium, big)
  • Ops familiarity (low high)
  • External community
  • In production today
  • C10K
  • Sweet spot (webapps, services, etc)
  • Libraries
  • Development (PITA, pleasure)
  • Profiling tools (voodoo, yes, many)
  • CI support (no, yes)
  • Deployment (easy, hard)
  • Infrasec comfort level (low, high)