Services/Sync/Server/Archived/HereComesEverybody/Cassandra

From MozillaWiki
Jump to: navigation, search

Cassandra is a "highly scalable, eventually consistent, distributed, structured key-value store", originally open sourced by Facebook and now community-maintained.

Overview

Apparently has some nice scalability characteristics in that it automatically spreads key/value data across a cluster of machines, with tweakable replication and availability settings.

Implementation progress

Load testing

  • Cluster setup
    • For a better test, need separate machines for Cassandra, webapp, and load source.
    • Cassandra cluster
      • 4-8 machines to run Cassandra nodes.
        • Need more than 1-2, in order to make sure there's inter-node chat for redundancy and balancing
        • 4 nodes is a magic number guess, even number more than 2 and less than 8
      • Best as real hardware, identical machines
    • API cluster
    • Load source
  • Next steps?

Operations notes

  • http://wiki.apache.org/cassandra/Operations
  • How stable is this thing?
  • Need to investigate all the basics of:
    • upgrading between cassandra versions
    • adding/removing cluster nodes
    • performing backups
    • monitoring health and performance
    • performing maintenance on data changes & etc

Architecture notes

  • Mostly schema-free, but some aspects need to be configured in a way that requires cluster restart on change
  • No query language, all indexes need to be designed and maintained by hand.
    • The Cassandra API does provide memcache-like basic get/set/delete, as well as various kinds of range searches and interesting batch gets.
  • All based around keys and values, like a 4 or 5 dimensional associative array
    • Data access looks like:
      • Keyspace (cluster-wide, eg. "WeaveStorage") ->
      • Row Key (hashed to 1+ machine(s), eg. "lmorchard-bookmarks-abcd3%~d") ->
      • ColumnFamily (DB file on machine, eg. "WBO") ->
      • Column (key in DB file, eg. "payload") ->
      • Value (value in DB file, eg. "{...}")
    • Or, using a SuperColumn:
      • Keyspace (cluster-wide) ->
      • Row Key (machine in cluster) ->
      • ColumnFamily (DB file on disk) ->
      • SuperColumn (key in DB file) ->
      • Column ->
      • Value
  • Simple columns reference binary values
    • They're called columns - but as opposed to MySQL, a Cassandra "row" can have millions of "columns" within a ColumnFamily, thus making them suitable for use as indices.
  • Column names can be range-queried as an ordered set, with several choices of sort comparators
  • SuperColumns reference key/value sub-structures
    • Useful, but not quite millions of sub-columns - the whole sub-structure of a SuperColumn is currently loaded into server memory on access.
  • Cluster-wide keys can be range-queried as an ordered set, but only in lexical order and only by using an order-preserving server partitioner that has negative load balancing implications.
    • Data can pile up on servers in sequence, rather than being evenly distributed.
    • So it's best to use keys for random access and columns for range-based indices.

Data layout

Plain storage can look like this:
 WeaveStorage > lmorchard-bookmarks-ABCDEF > WBO
     id = ABCDEF
     collection = bookmarks
     sortindex = 10
     parentid = toolbar
     modified = 126531833761
     payload = {...}

This causes all WBOs to be scattered evenly throughout the cluster, in theory, with tweakable degrees of replication. Access to the data seems pretty fast, since servers in a cluster talk to each other, and many mass gets are done in parallel.

A date-range index can be built like so:
 WeaveStorage > lmorchard-bookmarks > WBO_RangeSortindex
    0000000010-ABCDEF = lmorchard-bookmarks-ABCDEF
    0000000010-XYZHFH = lmorchard-bookmarks-XYZHFH
    0000001000-A%ds^& = lmorchard-bookmarks-A%ds^&
    0000055500-CnmKOs = lmorchard-bookmarks-CnmKOs

This puts the sortindex index for a single user's bookmarks on one server, which seems fast to search. The indexes themselves should be distributed evenly between servers in a cluster, hashed on the user and collection name.

And an exact-match index can be built like so:
 WeaveStorage > lmorchard-bookmarks > WBO_MatchParent 
     toolbar (SuperColumn) =
         00126531833761-ABCDEF = lmorchard-bookmarks-ABCDEF
         00126531833761-XYZHFH = lmorchard-bookmarks-XYZHFH
     menu (SuperColumn) = 
         00126531833761-A%ds^& = lmorchard-bookmarks-A%ds^&
         00126531833761-CnmKOs = lmorchard-bookmarks-CnmKOs