Identity/AttachedServices/Architecture

From MozillaWiki
Jump to: navigation, search
Last updated: 2014/02/10

Overview

[NOTE: This document is outdated if you are looking for information on the latest Sync efforts. You'll probably want to look at https://wiki.mozilla.org/User_Services/Sync/Relaunch]

Profile In The Cloud (PICL) is a mechanism for synchronizing browser state between a user's various devices. The user attaches a given local profile to a remote account by "logging into their browser", which then uploads and downloads data as necessary to bring the local profile into harmony with the server-held data. Possible PICL services include: bookmarks/history/tabs/passwords backup/syncing, social API preferences, sharing providers, WebRTC bridge provider, file-storage service, etc.

Architectural Overview

There are roughly five areas of concern in the PICL system:

  • 1: Signup/Signin: How does the user attach a new device to their account? This area involves passwords, usernames, email addresses, recovery options, revocation, and device management.
  • 2: Conversion: How do we extract (and inject) data from the various native data sources (PlacesDB for bookmarks and history, the Password Manager, etc)? This data should be converted into a neutral format so the synchronization code doesn't need to know the details. This code must also merge conflicting data when necessary.
  • 3: Synchronization: the neutral data must be encrypted, signed, batched, and delivered to/from a storage server. This process must tolerate dropped messages, interrupted connections, overload conditions, and arbitrarily-long periods of server unreachability.
  • 4: Storage Server Authorization: The browser code must prove to the Storage Server that it has a right to read/write the encrypted records.
  • 5: Storage Server Format: The storage server must store large quantities of data reliably, and provide fast access.

Architecture Map

This document describes our current plans for these five areas. The https://id.etherpad.mozilla.org/picl-backend etherpad page contains links to other design documents.

Data Security

[NOTE: As noted above, this document is not information on the latest Sync efforts. Check out https://wiki.mozilla.org/User_Services/Sync/Relaunch for that.]

We are exploring various models for data security: https://wiki.mozilla.org/Identity/CryptoIdeas/03-ID-Attached-Data

The user will have a single "PICL Password", which they must type into the browser during the sign-in process. The user's browser proves (to the Key Server) that it knows this password. From this, it obtains data-encryption keys and a signed certificate that authorizes Storage Server reads and writes. No server learns this password directly: the closest they come is the Key Server, who receives a stretched "verifier" (which only enables a brute-force dictionary attack).

User data is stored in one of two categories. Anything put in the "Class-A" category can be recovered as long as the user can still access their email (i.e. get a Persona assertion for it from their IdP), and consequently is also technically retrievable by the operators of that IdP and the Key Server (or someone who compromises either). Data put in the "Class-B" category requires the PICL password to retrieve: it cannot be recovered when the password is forgotten, but (if the password is well-chosen) cannot be retrieved by the IdP or any other server-side attackers.

We do not yet know which data will be assigned to which category by default, but it is likely that saved-passwords will go into class-B, and many other datatypes will default to class-A. There will be an option to put all data into Class-B.

Sign-Up / Sign-In

Attaching a profile to an account is called "Sign In To The Browser". The UI for this is still under discussion, but will involve the user typing an email address and a password into chrome browser UI (for both new-account creation and signing into an existing account, as well as password reset). This password will be stretched on the client side (using techniques from Identity/CryptoIdeas/01-PBKDF-scrypt) and used to generate an "SRP password" and a wrapping key (using techniques from Identity/CryptoIdeas/02-Recoverable-Keywrapping).

The SRP Password is then used in a protocol (see Identity/AttachedServices/KeyServerProtocol and the picl-idp-protocol etherpad) to speak with the Key Server . SRP is an interactive "zero-knowledge" protocol which gives the participants exactly one chance to show that they agree on a password. The outcome of SRP is a random session key: if the password was correct, both sides will wind up with the same key (otherwise their keys will be different). This session key is used to protect and authenticate some additional messages, which are used to retrieve the class-A and class-B master data-encryption keys, and a "certificate renewal token". This token allows the browser to obtain a signed certificate for a special "PICL Account" identifier (e.g. GUID@picl.persona.org). These certificates will be used for Persona/BrowserID authentication to the storage servers (described below).

The class-B master key is encrypted by a derivative of the stretched user password. The master keys are then used to derive per-datatype encryption keys. We use different keys for each datatype so that in the future, we can share e.g. bookmarks with a third party (by telling them the decryption key) without also sharing e.g. stored-passwords.

The KeyServer/PiCL-IdP is a small server which holds a few values for each user: email, SRP verifier, and kA/wrapped(kB). This server also keeps track of which devices have been attached to the account (to help the user with device management and revocation).

If the user forgets their password, they can reset the account (and establish a new password) by providing a Persona assertion for their account's email address. The class-B data is deleted, but the class-A data is retained.

Conversion / Data Adapters

Synchronization

We've developed the Delta-Sync protocol for getting full sets of encrypted key-value records from browser to server and back again.

However we are currently (04-Jun-2013) investigating a scheme named "Queue-Sync" for uploading batched change records to the server and merging downstream records back into the local datastore. When compared to Delta-Sync, we expect Queue-Sync to:

  • avoid expensive full-dataset hashes to compute revision identifiers (but also gives up on some full-dataset integrity guarantees)
  • handle "re-sync" more naturally (which occurs at initial connection, and later when either server or browser falls behind)
  • avoid keeping a full shadow copy on the browser


As of late June 2013, we're experimenting with CouchDB, the embeddedable form known as PouchDB, and their built-in data synchronization protocol.

Storage Server Authorization

The browser will speak Queue-Sync to the Storage Server. A Persona (BrowserID) assertion for the "PICL Account Identifier" (e.g. GUID@picl.persona.org) is what allows the browser to read and write their encrypted Queue-Sync records.

This assertion must be verified with the usual public-key signature checks and .well-known lookup process. For performance, the Storage Server will only verify it once, then exchange it for a token that is easier to validate (either a nonce that maps to the validated account identifier and expiration time, or an encrypted/HMACed copy of the session data). Subsequent requests will be authorized by the token.

An initial draft of the storage-server protocol is here and here.

Storage Server Format

The storage server is free to use any database technology or schema, mostly independent of the client browser. The main goals are to achieve high durability and uptime, fast service, and minimum cost.

The workload will be pretty write-heavy. Accounts with just a single device (i.e. just backup, not sync) are pretty common, and these will be almost entirely writes. Some datatypes are more "hot" than others (e.g. open-tabs are updated with every click, while bookmarks only change when the user specifically creates or modifies one), so we might want to mark the collections with hints that tell the server how to best manage the records.