Services/KeyValueStorage
Planning Questions
How simple can we get away with while still providing useful functionality?
- maximum key size, value size?
- key => single value?
- key => set of values? (like e.g. riak with siblings enabled)
- key + column => value? (like e.g. bigtable or cassandra)
- keys in sorted order? (i.e. hash or btree?)
- handling of concurrent edits, conflicts?
- bulk insert, update or delete operations?
Crypto
- can it sensibly be done at this layer, or do we need to defer to the application?
- encrypted keys would make key-ordering useless
Authentication to App
- can AppKeys be used for authentication, e.g. some sort of request signing with the app key?
We will probably want some sort of partitioning or "buckets".
- AppKey => list of buckets?
- AppKey + UserID => list of buckets?
- can a bucket be shared between multiple apps? multiple users?
It would be good to find and isolate some use cases in the existing Services apps.
- Build a SyncStorage plugin that stores data in the KVStore?
Management features
- built-in quota system? Will be more efficient than simulating it at a higher level.
Strawman Proposal
Level 0: Basic Key-Value Storage
This is the most primitive of all the functionality - a simple map from keys to values. Nothing fancy, but can be hard to work with in a distributed environment.
Python API
bucket.put("my key","my exciting value")
bucket.get("my key")
=> "my exciting value"
bucket.delete("my key")
bucket.get("my key")
=> KeyError
HTTP API
PUT /appkey/bucketname/items/my%20key
Content-Length: 17
my exciting value
GET /appkey/bucketname/items/my%20key
=> 200 OK
Content-Length: 17
my exciting value
GET /appkey/bucketname/items/otherkey
=> 404 Not Found
Level 1: Atomic Compare-and-Swap
Well, as atomic as is reasonable taking into account e.g. vector clocks etc. This is provided in lieu of transactions so that the application can check that it's not e.g. deleting updates made by another process.
Python API
item = bucket.getitem("my key")
item.value # the value previously stored
item.version # opqaue version identifier
bucket.put("my key", "new value", ifmatch="WRONGVERSION")
=> VersionMismatchError
bucket.put("my key", "new value", item.version)
=> OK
HTTP API
GET /appkey/bucketname/items/my%20key
=> 200 OK
Content-Length: 17
X-Weave-Version: XXXYYY
my exciting value
PUT /appkey/bucketname/items/my%20key
X-Weave-If-Match: WRONGVALUE
Content-Length: 2
Hi
=> 412 Precondition Failed
PUT /appkey/bucketname/items/my%20key
X-Weave-If-Match: XXXYYY
Content-Length: 2
Hi
=> 204 No Content
We could treat the version number like etag and use standard HTTP headers for it, but I haven't check if this would violate anything in the RFC. Riak uses a custom header for its vclock thingo.