Confirmed users
358
edits
| Line 36: | Line 36: | ||
== Strawman Proposal == | == Strawman Proposal == | ||
In order to guide the design process, I have developed a strawman refactoring of the server-storage code to use a KeyValueStore instead of an RDBMS. The dummy KVS implementation is available TODO:HERE while the refactored storage interface is availabe TODO:HERE. | |||
The design is based mostly on my high-level understanding of the functions provided by Riak, but I've also been thinking about how it might be implemented on top of other backend technology e.g. HBase or Cassandra. So it's definitely intended to be backend-agnostic as much as possible. | |||
===== Python API< | Key points of the API follow, in increasing level of sophistication (i.e. difficulty to implement). | ||
=== Namespacing === | |||
Most KVStore implementations provide key namespacing via "collections" or "buckets", so that each bucket is an isolated set of key/value pairs. | |||
Based on the needs of server-storage and of other potential applications, I think it makes the most sense to associate a bucket with an (appkey, user_id) pair - i.e. each application can map each user to a set of buckets. | |||
===== Python API ===== | |||
<pre> store = magically_get_kvstore("THIS IS MY APPKEY") | |||
store.create_bucket("ryan@rfk.id.au", "notes") | |||
store.list_buckets("ryan@rfk.d.au") | |||
=> ["notes"] | |||
bucket = store.get_bucket("ryan@rfk.id.au", "notes") | |||
</pre> | |||
===== HTTP API ===== | |||
<pre> GET /appkey/notes | |||
=> 404 Not Found | |||
PUT /appkey/notes | |||
=> 204 No Content | |||
GET /appkey/ | |||
=> 200 OK | |||
Content-Type: text/json | |||
["notes"] | |||
GET /appkey/notes | |||
=> 200 OK | |||
</pre> | |||
Some apps might find it useful to have an application-wide namespace, but I think it should be avoided if possible - as I understand it, much of our infrastructure is predicated on being able to shard on user-id to scale out storage and processing needs. | |||
=== Basic Key-Value Storage === | |||
This is the most primitive of all the functionality - a simple map from keys to values. Nothing fancy, and probably too primitive to build interesting apps on it alone. | |||
===== Python API ===== | |||
<pre> bucket.put("my key","my exciting value") | <pre> bucket.put("my key","my exciting value") | ||
| Line 51: | Line 94: | ||
=> KeyError | => KeyError | ||
</pre> | </pre> | ||
===== HTTP | ===== HTTP API ===== | ||
<pre> PUT /appkey/bucketname/items/my%20key | <pre> PUT /appkey/bucketname/items/my%20key | ||
Content-Length: 17 | Content-Length: 17 | ||
| Line 65: | Line 108: | ||
=> 404 Not Found | => 404 Not Found | ||
</pre> | </pre> | ||
Well, as atomic as is reasonable taking into account e.g. vector clocks etc. This is provided in lieu of transactions so that the application can check that it's not e.g. deleting updates made by another process. | |||
=== Atomic Compare-and-Swap === | |||
Well, as atomic as is reasonable taking into account e.g. vector clocks etc. This is provided in lieu of transactions so that the application can check that it's not e.g. deleting updates made by another process. | |||
===== Python API ===== | ===== Python API ===== | ||
| Line 101: | Line 146: | ||
=> 204 No Content | => 204 No Content | ||
</pre> | </pre> | ||
We could treat the version number like etag and use standard HTTP headers for it, but I haven't check if this would violate anything in the RFC. Riak uses a custom header for its vclock thingo.< | We could treat the version number like etag and use standard HTTP headers for it, but I haven't check if this would violate anything in the RFC. Riak uses a custom header for its vclock thingo. | ||
=== Secondary Indexes === | |||
While we don't want to evolve back to a full-blown RDBMS, the server-storage code does need to be able to query its data based on some simple single-field indices - in particular it needs to perform queries based on last modification time or on a user-defined "sort order". | |||
To assist, we let the application specify secondary indexes against which the data can be queries. It is responsible for constructing and maintaining these indexes manually. The KVStore only provides the ability to query an existing index, and to atomically update indexes when storing a new value. | |||
===== Python API ===== | |||
<pre> bucket.put("key1", "data1", indexes = { | |||
"modified": "2011-09-12" | |||
}) | |||
bucket.put("key2", "data2", indexes = { | |||
"modified": "2011-09-13" | |||
}) | |||
list(bucket.scan("modified")) | |||
=> [("2011-09-12", "key1"), ("2011-09-13", "key2")] | |||
list(bucket.scan("modified", reverse=True)) | |||
=> [("2011-09-13", "key2"), ("2011-09-12", "key1")] | |||
list(bucket.scan("modified", max="2011-09-12")) | |||
=> [("2011-09-12", "key1")] | |||
</pre> | |||
===== HTTP API ===== | |||
<pre> PUT /appkey/bucketname/items/key1 | |||
Content-Length: 5 | |||
X-Weave-Index-modified 2011-09-12 | |||
data1 | |||
=> 204 No Content | |||
PUT /appkey/bucketname/items/key2 | |||
Content-Length: 5 | |||
X-Weave-Index-modified 2011-09-13 | |||
data2 | |||
=> 204 No Content | |||
GET /appkey/bucketname/indexes/modified | |||
=> 200 OK | |||
Content-Type: text/json | |||
[["2011-09-12", "key1"], ["2011-09-13", "key2"]] | |||
GET /appkey/bucketname/indexes/modified?reverse=true | |||
=> 200 OK | |||
Content-Type: text/json | |||
[["2011-09-13", "key2"], ["2011-09-12", "key1"]] | |||
GET /appkey/bucketname/indexes/modified?max=2011-09-12 | |||
=> 200 OK | |||
Content-Type: text/json | |||
[["2011-09-12", "key1"]] | |||
</pre> | |||
Keys are always returned in sorted order. Accepted arguments to the scan would be "min", "max", "reverse" and "limit". | |||
It may be quite challenging to implement this efficiently, but I think it's worth pursuing as it will make the KVStore much more widely useful. For example, the server-storage app can't be written without indexing of some sort. So we would either have a KVS/DB hybrid or push the indexes into the KVStore. | |||
Furthermore, several popular key-value stores are working on secondary indexes as a built-in feature. The above API is inspired by how Riak does it, but Cassandra also offers something similar. Whether they provide a performant-enough implementation remains to be seen. | |||
=== Future: Lightweight Meta-Data === | |||
If the KVStore being used to store large values, then it may be valuable to offer lightweight metadata alongside the main "body" data. This would make it possible to update just the metadata/indexes without re-uploading the entire body. | |||
===== Python API ===== | |||
<pre> bucket.put("key1", "lots of data here", meta = { | |||
"modified": "2011-09-12", | |||
"owner": "Ryan Kelly" | |||
}, indexes = { | |||
"modified": "2011-09-12" | |||
}) | |||
item = bucket.getitem("key1") | |||
item.meta | |||
=> {"modified": "2011-09-12", "owner": "Ryan Kelly"} | |||
# Update metadata without changing the value | |||
bucket.put("key1", meta = { | |||
"modified": "2011-09-12", | |||
"owner": "Ryan Kelly" | |||
}, indexes = { | |||
"modified": "2011-09-12" | |||
}) | |||
</pre> | |||
Is it worth it? I don't know. It's a fair chunk of complication for the API. | |||
=== Future: Explicit Conflict Resolution === | |||
If we build on a KVStore that has vector blocks (e.g. riak) then it might be worthwhile to expose any conflicts to the calling application, rather than going with last-write-wins. Again, it's a lost of complication in the API so I don't know that it's worth it at this stage. | |||
But, we should be careful not to exclude the possibility. | |||
===== Python API ===== | |||
<pre> item = bucket.getitem("key1") | |||
# for simple use-cases we have a single "version" string | |||
item.version | |||
=> "XXXYYY-AAABBB" | |||
# but it actually identifies all conflicting versions | |||
item.versions.keys() | |||
=> ["XXXYYY", "AAABBB"] | |||
# and you can get at each particular version if you need | |||
item.versions["XXXYYY"].value | |||
=> "some value" | |||
item.versions["AAABBB"].value | |||
=> "some other value" | |||
# putting will resolve the conflict | |||
bucket.put("key1", "resolved", ifmatch=item.version) | |||
=> OK | |||
</pre> | |||