CloudServices/Sync/FxSync/StoreRedesign

From MozillaWiki
Jump to: navigation, search

There's currently an artificial and perverted separation between Store, Engine, and Tracker. It's excessive and limiting complexity. We can fix this.

Proposal

Reduce to two main classes: Repository and Synchronizer:

  • Repositories are both sinks and sources of records
  • Synchronizers exist (from one perspective, at least) to reify the relationship between two Repositories (including tracking lastSync), which in practice means connecting a pair of repositories as source then sink in turn.

These actions actually take place via RepositorySessions and SynchronizerSessions, which represent single sync events.

Repository(Session)s are entirely responsible for providing a timestamp- and record-centric API over a source of data. This interface abstracts the tracking of changed items and application and retrieval of records, and is uniform across both remote and local storage. For example, we would build a FxBookmarkRepository layer above Places, a ServerRepository layer in front of the v5/v1.1 Sync API, and connect the two with a simple Synchronizer. Both Repository implementations would implement exactly the same interface, which would allow us to trivially implement:

  • Direct sync between two devices, without a server intermediary
  • Sync to a backup file
  • Sync connectors to external data stores
  • Sync to multiple destinations

Furthermore, middleware (classes that implement the Repository interface and wrap another Repository) can be used to implement:

  • Encryption: consume and emit encrypted WBOs; pass decrypted WBOs to the inner store; implement crypto recovery (key refetches)
  • Archiving, logging, etc.
  • Version translation, giving Sync multiple version support
  • Storage item translation: e.g., define Repository in terms of deltas, but maintain storage version compatibility by translation into full objects.

API

The API for Repository/RepositorySession is defined in terms of callbacks, each of which can be called multiple times (e.g., as batches of records arrive). A callback invocation takes an error argument, and optionally one or more records, as input. Each invocation can be provided with a DONE constant to indicate completion. Callbacks can invoke an abort method on the session to (optionally) prevent further cycles.

Classes are (links will rot):

createSession returns (via a callback) a RepositorySession:

Synchronizer holds two Repositories, creating sessions appropriately.

Synchronizer classes are:


Questions and considerations

  • Q: Where does batching happen? A: Within each RepositorySession implementation.
  • As much storage as possible should be pushed down into the data store. That should allow these classes to be effectively stateless; the two query inputs are modified time and GUID. Tracking should be reduced to deleted items, and even that can be elided.
  • I'm very intrigued by deltas. It should be possible to stage rollout through an appropriate wrapper store.