Queue-Based Data Synchronization

Chris Karlof, Brian Warner, May-2013

Summary: like Identity/CryptoIdeas/04-Delta-Sync but more stream-oriented than whole-version -oriented.

(this borrows ideas liberally from Chromium, so there is some terminology overlap with that project)

Syncable Service, Sync Mediator, Registration

A "Syncable Service" is any service that wants to synchronize data with a PICL account (and thus with corresponding services on other devices). Bookmarks, passwords, open-tabs, etc, are all examples of Syncable Services.

Each Syncable Service is required to register with the "Sync Mediator" at browser startup. In the registration call, the service includes parameters to identify:

the name of the service (this allows a service on one device to connect with the same service on other devices: both must use the same name)
whether this service uses one-collection-per-device or one-shared-collection
whether this service's data goes into class-A or class-B

the service also provides callback functions as follows:

mergeDataAndStartSyncing
something for downstream changes

Registration returns a function for the service to call when it has upstream changes that need delivery to the server.

Changes vs Records

PICL models the local datastore as a collection of "records", each of which has a globally-unique key (GUID) and some arbitrary value. The server must be able to supply a full set of (encrypted) records at any time (both for new clients which are not yet in sync, and for existing clients that fall out-of-sync for whatever reason).

Once clients are "in sync", they exchange "changes" instead of records. In te current design, these are simply one of two possible forms:

"ADD/SET", guid, value
"DELETE", guid

The differences are small, but for clarity we'll try to be precise about whether a record or a change is involved in any given function call or protocol message.

Queues

For each service, the Mediator maintains two queues. The "upstream" or "outbound" queue contains local changes that were made to the native datastore (e.g. the Places database), in response to user actions. The upstream queue holds these changes until:

network connectivity is available
some batching timeout has expired (e.g. Nagle algorithm to improve efficiency by sending infrequent large upsteads instead of frequent tiny udpates)
any downstream changes have been applied and merged in

After upstream entries have been sent to the server, they may remain in the queue until the server acknowledges receipt, at which point they are finally deleted. If the server receives an update from some other device (which has not yet been seen by the local device), the server sends a NACK instead, at which point the client will try to merge the other change into the local datastore. Entries in the upstream queue may be removed or modified before transmission as a result of merge actions.

The "downstream" or "inbound" queue contains changes that arrive from the server which have not yet been fully applied to the local datastore.

Each queue contains plaintext changes. The client exchanges only encrypted records/changes with the server. Upstream changes are encrypted just before transmission, and downstream changes are decrypted before being added to the queue.

Server Data Model

"build numbers", combined change/record rows, tombstones, "fetch changes since X", hash chain, conflict detection

Downstream Change Application

race detection and merge, scanning/modifying the upstream queue. Filtering downstream changes from the upstream observer.

Upstream Change Delivery

new build-number calculation, hash-chain calculation, ACK/NACK.

Initial Merge

mergeDataAndStartSyncing, per-datatype merge functions, race detection and merge. Large upstream change stream.

Downstream Cache

For simplicity (in particular to decouple transactionality between the native datastore and the downstream queue), we may have the browser perform a "resync" at every boot. To avoid re-fetching the entire server dataset each time, we can maintain a full copy of the server's dataset in a "Downstream Cache". This is updated when we receive downstream changes, with a transaction that simultaneously updates the cached data and the new build number. With this, we can safely request only new changes each time. In the ideal case (where nothing has changed on the server), a single roundtrip (returning or confirming the current build number) is enough to make sure we're up-to-date.

Identity/CryptoIdeas/05-Queue-Sync

Contents

Queue-Based Data Synchronization

Syncable Service, Sync Mediator, Registration

Changes vs Records

Queues

Server Data Model

Downstream Change Application

Upstream Change Delivery

Initial Merge

Downstream Cache

Navigation menu

Identity/CryptoIdeas/05-Queue-Sync

Queue-Based Data Synchronization

Syncable Service, Sync Mediator, Registration

Changes vs Records

Queues

Server Data Model

Downstream Change Application

Upstream Change Delivery

Initial Merge

Downstream Cache

Navigation menu

Search