Confirmed users
471
edits
m (→Queues) |
|||
Line 56: | Line 56: | ||
= Server Data Model = | = Server Data Model = | ||
"build numbers", combined change/record rows, tombstones, "fetch changes since X", hash chain, conflict detection | Concepts: "build numbers", combined change/record rows, tombstones, "fetch changes since X", hash chain, conflict detection | ||
Browsers send encrypted records up to the server. Each record contains the following fields: | |||
* record id: hash of header | |||
* "header": (build number, PreviousRecordId, unencrypted key (GUID), hash of encrypted value) | |||
* encrypted value (or "DELETE") | |||
* signature: HMAC (using a client-managed key) of record id | |||
These records form a hash chain: each record id validates all previous records back to the very first one (which has a PreviousRecordId of all zeros). Clients which are "in-sync" and receiving only new records will require valid signatures, sequential build numbers, and matching PreviousRecordId values. These clients cannot be made to accept false records, or be tricked into omitting a valid record (some attacks are still possible during a "resync", see below). | |||
The server receiving upstream records cannot check the (symmetric) signature, but it validates all the other fields. It then stores the various fields in a database. The server schema needs to support two kinds of reads: | |||
* 1: "please give me all records from build number N to the present" | |||
* 2: "please give me all current records" | |||
If the server does not have enough history to answer the first kind of read with a contiguous set of changes for the requested range, it should return an error, causing the client to perform a resync. But the server must always be able to answer the second kind of read. | |||
The server should use whatever schema proves to be most efficient, but one possible approach would be: | |||
* columns: userid, collectionId, buildNumber, guid, isCurrent (bool), recordId, header, encryptedValueOrDELETE, signature | |||
* UNIQUE INDEX (userid, collectionId, buildNumber) | |||
* for any (userid, collectionId, guid), there is exactly one row with isCurrent=True | |||
* read #1: SELECT buildNumber>N ORDER BY buildNumber | |||
* read #2: SELECT isCurrent=True | |||
* deleted keys are represented by "tombstones" with encryptedValue set to a special "DELETE" token | |||
* writes that modify or delete an existing key will clear isCurrent from the old row and set isCurrent on the new row. | |||
* servers can periodically compress their datasets. They can remove rows where isCurrent=false, and also rows where isCurrent=true and encryptedValue=DELETE. Ideally, servers will retain enough history (including encryptedValue=DELETE) to return changes to bring all known clients up to date, but if a client has not caught up for a long time, servers can compress away their history anyway, and that client will perform a resync if/when it comes back. | |||
Servers must be able to answer these queries with paginated responses, rather than attempting to hold the entire response in memory or transmitting it in a single network message. | |||
== Server Conflict Detection == | |||
When the server receives upstream records, it compares the submitted buildNumber and PreviousRecordId against its current record: the buildNumber should be one greater, and the PreviousRecordId should match the current RecordId. The server will NACK the upstream message and discard the record unless these match. If both match, the server will store the upstream record (thus incrementing its current buildNumber) and ACK the message. | |||
When two clients race to submit different records, the first will be accepted, and the second will be NACKed. The second client will receive both the NACK and the announcement of the other record, causing it to merge the other record and then re-submit its own (possibly modified as a result of the merge). | |||
= Downstream Change Application = | = Downstream Change Application = |