Services/Sync/Server/Archived/HereComesEverybody/Cassandra

From MozillaWiki
< Services‎ | Sync‎ | Server‎ | Archived‎ | HereComesEverybody
Revision as of 03:47, 6 February 2010 by LesOrchard (talk | contribs) (Created page with '[http://incubator.apache.org/cassandra/ Cassandra] is a "highly scalable, eventually consistent, distributed, structured key-value store", originally open sourced by Facebook and…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Cassandra is a "highly scalable, eventually consistent, distributed, structured key-value store", originally open sourced by Facebook and now community-maintained.

Overview

Apparently has some nice scalability characteristics in that it automatically spreads key/value data across a cluster of machines, with tweakable replication and availability settings.

Operations notes

Storage implementation progress

Architecture notes

  • Mostly schema-free, but some aspects need to be configured in a way that requires cluster restart on change
  • No query language, all indexes need to be designed and maintained by hand.
    • The Cassandra API does provide memcache-like basic get/set/delete, as well as various kinds of range searches and interesting batch gets.
  • All based around keys and values, like a 5 dimensional associative array
    • Data access looks like:
      • Keyspace (cluster-wide, eg. "WeaveStorage") ->
      • Row Key (hashed to a machine in cluster, eg. "lmorchard-abcd3%~d") ->
      • ColumnFamily (DB file on machine, eg. "WBO") ->
      • Column (key in DB file, eg. "payload") ->
      • Value (value in DB file, eg. "{...}")
    • Or, using a SuperColumn:
      • Keyspace (cluster-wide) ->
      • Row Key (machine in cluster) ->
      • ColumnFamily (DB file on disk) ->
      • SuperColumn (key in DB file) ->
      • Column ->
      • Value
    • Simple columns reference binary values
      • They're called columns - but as opposed to MySQL, a Cassandra "row" can have millions of "columns" within a ColumnFamily, thus making them suitable for use as indices.
    • SuperColumns reference key/value sub-structures
    • Column names can be range-queried an ordered set, with several choices of sort comparators

For example, a date-range index can be built like so:

 WeaveStorage > lmorchard-bookmarks > WBO_RangeSortindex
    0000000010-ABCDEF = lmorchard-bookmarks-ABCDEF
    0000000010-XYZHFH = lmorchard-bookmarks-XYZHFH
    0000001000-A%ds^& = lmorchard-bookmarks-A%ds^&
    0000055500-CnmKOs = lmorchard-bookmarks-CnmKOs

And an exact-match index can be built like so:

 WeaveStorage > lmorchard-bookmarks > WBO_MatchParent 
     toolbar (SuperColumn) =
         00126531833761-ABCDEF = lmorchard-bookmarks-ABCDEF
         00126531833761-XYZHFH = lmorchard-bookmarks-XYZHFH
     menu (SuperColumn) = 
         00126531833761-A%ds^& = lmorchard-bookmarks-A%ds^&
         00126531833761-CnmKOs = lmorchard-bookmarks-CnmKOs