Services/KeyValueStorage

From MozillaWiki
Jump to navigation Jump to search

Planning Questions

How simple can we get away with while still providing useful functionality?

  • maximum key size, value size?
  • key => single value?
  • key => set of values? (like e.g. riak with siblings enabled)
  • key + column => value? (like e.g. bigtable or cassandra)
  • keys in sorted order? (i.e. hash or btree?)
  • handling of concurrent edits, conflicts?
  • bulk insert, update or delete operations?

Crypto

  • can it sensibly be done at this layer, or do we need to defer to the application?
  • encrypted keys would make key-ordering useless

Authentication to App

  • can AppKeys be used for authentication, e.g. some sort of request signing with the app key?

We will probably want some sort of partitioning or "buckets".

  • AppKey => list of buckets?
  • AppKey + UserID => list of buckets?
  • can a bucket be shared between multiple apps? multiple users?

It would be good to find and isolate some use cases in the existing Services apps.

  • Build a SyncStorage plugin that stores data in the KVStore?


Management features

  • built-in quota system? Will be more efficient than simulating it at a higher level.

Strawman Proposal

Level 0:  Basic Key-Value Storage

This is the most primitive of all the functionality - a simple map from keys to values.  Nothing fancy, but can be hard to work with in a distributed environment.

Python API
    bucket.put("my key","my exciting value")

    bucket.get("my key")
    => "my exciting value"

    bucket.delete("my key")

    bucket.get("my key")
    => KeyError
HTTP API
    PUT /appkey/bucketname/items/my%20key
    Content-Length: 17
    my exciting value


    GET /appkey/bucketname/items/my%20key
    =>  200 OK
        Content-Length: 17
        my exciting value

    GET /appkey/bucketname/items/otherkey
    =>  404 Not Found

Level 1: Atomic Compare-and-Swap

Well, as atomic as is reasonable taking into account e.g. vector clocks etc.  This is provided in lieu of transactions so that the application can check that it's not e.g. deleting updates made by another process.

Python API
    item = bucket.getitem("my key")
    item.value    # the value previously stored
    item.version  # opqaue version identifier

    bucket.put("my key", "new value", ifmatch="WRONGVERSION")
    =>  VersionMismatchError

    bucket.put("my key", "new value", item.version)
    =>  OK


HTTP API
    GET /appkey/bucketname/items/my%20key
    =>  200 OK
        Content-Length: 17
        X-Weave-Version: XXXYYY
        my exciting value

    PUT /appkey/bucketname/items/my%20key
    X-Weave-If-Match:  WRONGVALUE
    Content-Length: 2
    Hi
    =>  412 Precondition Failed

    PUT /appkey/bucketname/items/my%20key
    X-Weave-If-Match:  XXXYYY
    Content-Length: 2
    Hi
    =>  204 No Content

We could treat the version number like etag and use standard HTTP headers for it, but I haven't check if this would violate anything in the RFC.  Riak uses a custom header for its vclock thingo.