Identity/CryptoIdeas/05-Queue-Sync
Queue-Based Data Synchronization
- Chris Karlof, Brian Warner, May-2013
Summary: like Identity/CryptoIdeas/04-Delta-Sync but more stream-oriented than whole-version -oriented.
(this borrows ideas liberally from Chromium, so there is some terminology overlap with that project)
Syncable Service, Sync Mediator, Registration
A "Syncable Service" is any service that wants to synchronize data with a PICL account (and thus with corresponding services on other devices). Bookmarks, passwords, open-tabs, etc, are all examples of Syncable Services.
Each Syncable Service is required to register with the "Sync Mediator" at browser startup. In the registration call, the service includes parameters to identify:
- the name of the service (this allows a service on one device to connect with the same service on other devices: both must use the same name)
- whether this service uses one-collection-per-device or one-shared-collection
- whether this service's data goes into class-A or class-B
the service also provides callback functions as follows:
- mergeDataAndStartSyncing
- something for downstream changes
Registration returns a function for the service to call when it has upstream changes that need delivery to the server.
Changes vs Records
PICL models the local datastore as a collection of "records", each of which has a globally-unique key (GUID) and some arbitrary value. The server must be able to supply a full set of (encrypted) records at any time (both for new clients which are not yet in sync, and for existing clients that fall out-of-sync for whatever reason).
Once clients are "in sync", they exchange "changes" instead of records. In te current design, these are simply one of two possible forms:
- "ADD/SET", guid, value
- "DELETE", guid
The differences are small, but for clarity we'll try to be precise about whether a record or a change is involved in any given function call or protocol message.
Queues
For each service, the Mediator maintains two queues. The "upstream" or "outbound" queue contains local changes that were made to the native datastore (e.g. the Places database), in response to user actions. The upstream queue holds these changes until:
- network connectivity is available
- some batching timeout has expired (e.g. Nagle algorithm to improve efficiency by sending infrequent large upsteads instead of frequent tiny udpates)
- any downstream changes have been applied and merged in
After upstream entries have been sent to the server, they may remain in the queue until the server acknowledges receipt, at which point they are finally deleted. If the server receives an update from some other device (which has not yet been seen by the local device), the server sends a NACK instead, at which point the client will try to merge the other change into the local datastore. Entries in the upstream queue may be removed or modified before transmission as a result of merge actions.
The "downstream" or "inbound" queue contains changes that arrive from the server which have not yet been fully applied to the local datastore.
Each queue contains plaintext changes. The client exchanges only encrypted records/changes with the server. Upstream changes are encrypted just before transmission, and downstream changes are decrypted before being added to the queue.
Server Data Model
"build numbers", combined change/record rows, tombstones, "fetch changes since X", hash chain, conflict detection
Downstream Change Application
race detection and merge, scanning/modifying the upstream queue. Filtering downstream changes from the upstream observer.
Upstream Change Delivery
new build-number calculation, hash-chain calculation, ACK/NACK.
Initial Merge
mergeDataAndStartSyncing, per-datatype merge functions, race detection and merge. Large upstream change stream.
Downstream Cache
For simplicity (in particular to decouple transactionality between the native datastore and the downstream queue), we may have the browser perform a "resync" at every boot. To avoid re-fetching the entire server dataset each time, we can maintain a full copy of the server's dataset in a "Downstream Cache". This is updated when we receive downstream changes, with a transaction that simultaneously updates the cached data and the new build number. With this, we can safely request only new changes each time. In the ideal case (where nothing has changed on the server), a single roundtrip (returning or confirming the current build number) is enough to make sure we're up-to-date.