User:Rnewman/TreeSync: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 89: Line 89:


Note that trees don't strictly need to be encrypted; they reveal some opaque structure. That would allow the server to do a large amount of this work, rather than the client; for example, the server could traverse the current tree to find the latest version of an object.
Note that trees don't strictly need to be encrypted; they reveal some opaque structure. That would allow the server to do a large amount of this work, rather than the client; for example, the server could traverse the current tree to find the latest version of an object.
== Terminology ==
; Ref : an identifier which refers to a particular object via content hash.
; Object : a blob of JSON with an encrypted part ("body") and a public part ("envelope").
; Record : a kind of object whose body refers to a domain entity.
; Tree : a kind of object whose body is a structured set of objects, identified either by ref or by hash.
; Collection : a named ref in the global namespace. Collections are an enclosing scope.
== Breaking the chain ==
How do we achieve the following goals:
* "Rooting" of all objects for garbage collection purposes
* Discoverability of roots (e.g., Places roots/top-level folders)
* Transactional behavior where it counts
without having to modify the entire chain of trees, right up to a single global root, when we modify a leaf?
The answer, I think, is to generalize the concept of a ref to be an entity that is referred to by name, not by value (content hash).
Each collection is a root. Its children could be ordinary hash-addressed values, if we wish to mutate the entire collection consistently. Or its children could be a set of refs. Each ref is a standalone tree; as its contents change, other trees that refer to it can remain unchanged.
Each Places root could be considered a ref, and we can even go so far as to introduce folders by reference.
This opens the door to partial tree synchronization: clients are at liberty to synchronize any individual ref, without even considering the structure of the rest of the collection, let alone values.
So how do we model events like bookmark moves between roots? With care. This is analogous to a move across a filesystem boundary: it's a copy followed by a delete. We have several options:
* Versioning or otherwise evolving the ref records themselves to denote a dependency on a particular version of another ref. ("When you update the toolbar, be sure to also update mobile bookmarks".)
* Performing the two ref updates within the same request and server-side transaction. (This takes care of writes but not necessarily reads.)
* Trusting to luck.
And what about unsorted flat collections? With the assumptions of all items in a collection being named by reference, and the collection being flat, this approach is equivalent to the Sync 1.1 system, but with the ability for clients to manage the set of included items by explicit listing.
So what's the relationship between GUIDs and refs? Surely it makes sense for these to be the same, and for a record with a GUID to implicitly be made available by reference?
Yeah, probably. But there's a difficulty: what about two different records with the same GUID?
Perhaps we allow three kinds of identifiers:
* Refs: names managed by the server.
* GUIDs: names managed implicitly by clients.
* Hashes: content-addressing.


== Open questions/flaws ==
== Open questions/flaws ==
canmove, Confirmed users
640

edits

Navigation menu