Goals

So here's the challenge we face. Current login for sync looks like this:

provide username and password
we log into ldap with that username and password and grab your sync node
we check the sync node against the url you've accessed, and use that to configure where your data is stored.

This solution works great for centralized login. It's fast, has a minimum number of steps, and caches the data centrally. The system that does node-assignment is lightweight, since the client and server both cache the result, and has support for multiple applications with the /node/<app> API protocol.

However, this breaks horribly when we don't have centralized login. And adding support for browserid to the SyncStorage protocol means that we're now there. We're going to get valid requests from users who don't have an account in LDAP. We won't even know, when they make a first request, if the node-assignment server has ever heard of them.

So, we have a bunch of requirements for the system. Not all of them are must-haves, but they're all things we need to think about trading off in whatever system gets designed:

need to support multiple services (not necessarily centrally)
need to be able to assign users to different machines as a service scales out, or somehow distribute them
need to consistently send a user back to the same server once they've been assigned
need to give operations some level of control over how users are allocated
need to provide some recourse if a particular node dies
need to handle exhaustion attacks. For example, I could set up an RP that just auto-approved any username, then loop through users until all nodes were full.
need support for future developments like bucketed assignment
Needs to be a system that scales infinitely.

Proposed Design

This solution proposes to use a token-based authentication system. A user that wants to connect to one of our service asks to a central server an access token.

The central server, a.k.a. the Node Assignment Server checks the authenticity of the user with a supported authentication method, and attributes to the user a server he needs to use with that token.

The server, a.k.a. the Node Server, that gets called controls the validity of the token included in the request. Token have a limited lifespan.

Definitions and assumptions

First, a few definitions:

Service: a service Mozilla provides, like Sync or Easy Setup.
Node Assignment Server: used to authenticate user, returns tokens that can be used to authenticate to our services.
Node: an URL that identifies a service, like http://phx345
Node Server: a server that contains the service, and can be mapped to several Nodes (URLs)
Service Cluster: a group of servers for a Service. It contains Node Assignment Servers and Node Servers

Some assumptions:

A Node Assignment Server is dedicated to a single Service and have a list of all the Nodes for that Service.
Each Node Server can receive calls for virtually any Node.
The Node Assignment Server will support only BrowserID at first, could support any authentication protocol in the future, as long as it is done with a single call.

Flow

Here's the proposed flow for Browser ID

Client                Node Assignment Server        BrowserID (may be local)        Node Server
======================================================================================================
                                 |                          |                          |
request token ---- [1] --------->|------> verify --- [2] -->|                          |
                                 |   attribute node  [3][4] |                          |
                                 |<------ build token       |                          |
keep token <-------- [5] --------|                          |                          | 
create signed auth header        |                          |                          |
call node --------------- [6] ---|--------------------------|------------------------->|--> verify token
                                 |                          |                          |<-- process request
get response <-------------------|--------------------------|--------------------------|

the client request a token, giving its browser id assertion [1]

    POST /request_token HTTP/1.1
    Host: tokenserver.services.mozilla.com
    X-Authentication-Method: Browser-ID
    audience=XXX,assertion=XXX

the node assignment server checks the browser id assertion [2] this step will probably be done locally without calling browserid
the node assignment server checks in a DB if the user is already allocated to a node. [3]
if the user is not allocated to a node, the token picks one by selecting the node that has the less users [4]
the node assignment server creates a token using the user id, a time stamp and a secret string known only by the selected node and itself, and sends it back to the user along with a secret derived from the shared secret, using HKDF (https://tools.ietf.org/html/rfc5869) [5]

  HTTP/1.1 200 OK
  Content-Type: application/x-www-form-urlencoded
  
  oauth_consumer_key=<token>&oauth_consumer_secret=<derived-secret>

the client calculates with the information received an OAuth authorization header
the client calls the right node, using the special Authorization header [6]

    POST /request HTTP/1.1
    Host: some.node.services.mozilla.com
    Authorization: OAuth realm="Example",
                   oauth_consumer_key="9djdj82h48djs9d2",
                   oauth_token="kkk9d7dh3k39sjv7",
                   oauth_signature_method="HMAC-SHA1",
                   oauth_timestamp="137131201",
                   oauth_nonce="7d8f3e4a",
                   oauth_signature="bYT5CMsGcbgUdFHObYMEfcx6bsw%3D"

the node is able with its secret to validate that the token is valid. If it's an invalid or outdated token, the node returns a 401

Tokens

A token is a json encoded mapping composed of app-specific information and an expire date.

The keys are:

expires: an expire timestamp (UTC)
uid: the app-specific user id (the user id integer in the case of sync)

Example:

 app_token = {'uid': '123', 'expires': 1324654308.907832}

The token is encrypted and signed using the shared secret and base64-ed. The encryption is AES-CBC and signature is HMAC-SHA1:

 app_token, signature = AES-CBC+HMAC-SHA1(app_token, secret_key)
 app_token = b64encode(app_token, signature)

Note that the token doesn't contains any information about the chosen node.

XXX to be changed Implementation example: https://github.com/mozilla-services/tokenserver/blob/master/crypto.py

Example XXX :

 $ python crypto.py 
 Creating a secret
 ae6c3407ccf354f4d029061a5de97b188791e078398256a1f78b1b47...b40f834e570f74d9987ac9aa9cc7fa9fa
 
 ========= SERVER ==========
 Creating the signed token
 {'node': 'phx345', 'uid': '123', 'timestamp': 1324654308.907832, 'ttl': 30, 'signature':   
  '452671cf538528cc427e98d42c0fd43ebf285ae5', 'email': 'tarek@mozilla.com'}
 creating a header with it
 Authorization: MozToken {"node": "phx345", "uid": "123", "timestamp": 1324654308.907832, "ttl": 30, "signature":  
 "452671cf538528cc427e98d42c0fd43ebf285ae5", "email": "tarek@mozilla.com"}
 
 ========= NODE ==========
 extracting the token from the header
 Authorization: MozToken {"node": "phx345", "uid": "123", "timestamp": 1324654308.907832, "ttl": 30, "signature": 
 "452671cf538528cc427e98d42c0fd43ebf285ae5", "email": "tarek@mozilla.com"}
 validating the signature

[Trying to think of ways in which we might care about exposing uid.]

[Also email. Security may have an issue with that, as it's theoretically loggable. Need to talk to them.]

Derived Secret

The node assignment server sends back with the token a session secret derived from the shared secret

 secret = HKDF(sharedSecret, token)

User <> Node Database

The node assignment server has a DB listing for each user its applications. An application is composed of:

the user app specific id
the node id (url)
XXX other things ?

Nodes Database

XXX explain the two dbs

The node assignment server has a DB listing all the nodes for the application. For each node it has:

PubKey - the public RSA key of the node
URL - The specific node in the cluster that a user is assigned to
SharedSecret - The current secret
OldSharedSecret - The old secret
Available - A count of remaining assignments available in this period.
CurrentLoad - Although it is not a direct reflection of users, it will be incremented by 1 for each user added.
Capacity - The theoretical maximum weight to be associated with the node
Down - a binary value indicating whether a node is down. Users will not be assigned to a node that has been marked as down.
XXX Backoff - a value in seconds that should be added to responses from services that support backoff.

Operations

By order of frequency:

[read] lookup for the shared secret [node assignment server, automatically, every user token request]
[write] update the current load, available, capacity [node assignment server, automatically, every user allocation]
[read] lookup for the best node [node assignment server, automatically, every user allocation]
[write] update the public key, url, down secrets and backoff [ops, manually, very rare]

SharedSecret Field

Each Node Server has a unique secret per Node it serves, it shares with the Node Assignment Server. A secret is an ascii string of 128 chars. Example of generating such string: https://github.com/mozilla-services/tokenserver/blob/master/crypto.py

Ops create secrets for each Node, and set them into the Node Assignment Server database. Then they deploy them on each Node Server under a csv file called /var/moz/shared_secrets that contains for each Node the secret and maybe the old one

   phx1; secret; oldsecret
   phx2; secret; oldsecret
   ...

Ideally, a Node that's not managed by a Node Server does not appear on its shared_secrets file.

When an existing secret needs to be changed for whatever reason, the current secret becomes the old secret. The reason is to avoid existing tokens to be rejected once the secret is changed.

XXX see how to update with no downtime

Node Deactivation

When a node needs to be shut down,

the backoff flag is set in the token db
if the user asks for a new token on the node, the server returns a 403 + Retry-After: ttl + 1

[we should provide a script for the whole process of downing a server]

Backward Compatibility

XXX TBU

Older versions of the system will use completely different API entrypoints - the old /user api, and the 1.1 /sync api. Those will need to be maintained during the transition, though new clusters should spin up with only 2.0 support.

We should watch logs to study 1.1 falloff and consolidate those users through migration as they diminish.

However, There are a couple of points that need to be synced up:

The database that assigns nodes needs to be shared between the two. We should add a column for "1.0 acceptable" and update the old system to only look at those columns. Alternately, could work with ops to just have an "all old assignments go to this one cluster", in which case, the db doesn't need to be shared.

There will be a migration that moves all the user node data from LDAP to the tokenserver. However, we need to make sure that any subsequent migrations update this data. This ensures that a user with a pre-2 client and post-2 client point at the same place, and that people moving to the new systems will have the right node. We can't punt this, because if a node goes down post-migration, a user who switches over afterwards is stuck on it. (at the very least, we need to purge these nodes from the 2.0 db).

will need to migrate all user login data over to the browserid servers, but that's not relevant to tokenserver.

Infra/Scaling

On the node assignment server

The flow is:

the user ask for a token, with a browser id assertion
the server calls the authority [I/O bound]
the server builds it and send it back [CPU bound]
the user uses the node for the time of the ttl (30mn)

So, for 100k users it means we'll do 200k requests on the node assignment server per hour, so 50 RPS. For 1M users, 500 RPS. For 10M users, 5000 RPS. For 100M users, 50000 RPS.

XXX Scaling check : each new allocation does a couple of writes - check the volume
High availability: we should avoid any single point of failure on the node assigment server
Deployment
- A node assignment server is stateless, so we can deploy as many as we want and have Zeus load balance over them
- The database containing the nodes information is in a shared MySQL instance on its own server (the Mysql server should be redudant/replicated - reached via Zeus)
- The shared secrets are kept in memory so we don't lookup the MySQL DB on every request
- The database of user/node mapping is the current LDAP, and may involved into a more specialised metadata DB

On each node

Memory: There is no need to store anything in memory apart from the shared secret
CPU: each time a request is made, there is a need to validate the token.
Network: Nothing.

APIS

XXX application/x-www-form-urlencoded vs json XXX

POST /request_token

Asks for new token given some credentials. By default, the authentication mechanism is Browser ID but the X-Authentication-Method can be used to explicitly pick a protocol. If the server does not support the authentication protocol provided, a 400 is returned.

When the authentication method requires something else than an Authorization header, the data is provided in the request body. In that case, the request Content-Type must be application/x-www-form-urlencoded and the body contains the values to pass to the server.

Example for BrowserId:

POST /request_token
Host: token.services.mozilla.com
Content-Type: application/x-www-form-urlencoded
     
audience=XXX,assertion=XXX

Returns an encrypted token (oauth_consumer_key) and a secret (oauth_consumer_secret) in an application/x-www-form-urlencoded response.