Identity/CryptoIdeas/05-Queue-Sync: Difference between revisions

m
→‎Server Data Model: explain version numbers better
m (→‎Changes vs Records: more updates)
m (→‎Server Data Model: explain version numbers better)
Line 76: Line 76:
= Server Data Model =
= Server Data Model =


Concepts: "build numbers", combined change/record rows, tombstones, "fetch changes since X", hash chain, conflict detection
Concepts: collection version numbers combined change/record rows, tombstones, "fetch changes since X", hash chain, conflict detection


Browsers send encrypted records up to the server. Each record contains the following fields:
Browsers send encrypted change records up to the server. Each change record contains the following fields:


* record id: hash of header
* record id: hash of header
* "header": (build number, PreviousRecordId, unencrypted key (GUID), hash of encrypted value)
* "header": (version number, PreviousRecordId, unencrypted key (GUID), hash of encrypted value)
* encrypted value (or "DELETE")
* encrypted value (or "DELETE")
* signature: HMAC (using a client-managed key) of record id
* signature: HMAC (using a client-managed key) of record id


Each change record represents an ADD/SET or a DELETE of a specific key. A complete collection is represented by a bunch of ADD/SET changes for non-overlapping keys (exactly one change per key).


These records form a hash chain: each record id validates all previous records back to the very first one (which has a PreviousRecordId of all zeros). Clients which are "in-sync" and receiving only new records will require valid signatures, sequential build numbers, and matching PreviousRecordId values. These clients cannot be made to accept false records, or be tricked into omitting a valid record (some attacks are still possible during a "resync", see below).
The "collection version number" is a collection-wide sequential integer, incremented with each change. It will eventually get get very large (think 8-byte storage). For any given version number, there is a specific set of key/value pairs which make up that version (although the server may not be able to produce that set for arbitrary version numbers). Each version differs by exactly one key ('''note''': this is a significant constraint, and needs more discussion). Each change record has a copy of the collection version number that first includes the new change. Version numbers are generated by the client, when it produces (and hashes/signs) a new change record for delivery to the server.


The server receiving upstream records cannot check the (symmetric) signature, but it validates all the other fields. It then stores the various fields in a database. The server schema needs to support two kinds of reads:
These change records form a hash chain: each record id validates all previous records back to the very first one (which has a PreviousRecordId of all zeros). Clients which are "in-sync" and receiving only new records will require valid signatures, sequential version numbers, and matching PreviousRecordId values. These clients cannot be made to accept false records, or be tricked into omitting a valid record (some attacks are still possible during a "resync", see below).


* 1: "please give me all records from build number N to the present"
The server receiving upstream records cannot check the (symmetric) signature, but it validates all the other fields. It then stores the various fields in a database. The server schema needs to support two kinds of read operations:
* 2: "please give me all current records"


* read op 1: "please give me all changes from version number N to the present"
* read op 2: "please give me all current records"


If the server does not have enough history to answer the first kind of read with a contiguous set of changes for the requested range, it should return an error, causing the client to perform a resync. But the server must always be able to answer the second kind of read.
 
If the server does not have enough history to answer the first kind of read with a contiguous set of changes for the requested range, it should return a distictive error, causing the client to perform a resync. But the server must always be able to answer the second kind of read.


The server should use whatever schema proves to be most efficient, but one possible approach would be:
The server should use whatever schema proves to be most efficient, but one possible approach would be:
Confirmed users
471

edits