Services/Sync/Server/Archived/HereComesEverybody/Cassandra: Difference between revisions

Latest revision as of 02:51, 12 July 2010

Cassandra is a "highly scalable, eventually consistent, distributed, structured key-value store", originally open sourced by Facebook and now community-maintained.

Overview

Apparently has some nice scalability characteristics in that it automatically spreads key/value data across a cluster of machines, with tweakable replication and availability settings.

Implementation progress

Clone of weaveserver-sync:
- http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync/

Load testing

Simple setup
- For a simple bakeoff, Cassandra should work fine with a single node.
- Notes from an initial test on 06 Mar 2010.
- 1 webapp frontend (mv-weavetest03)
  - Make sure to install Apache Thrift and the native PHP extension
  - weaveserver-sync with Cassandra and weaveserver-registration with MySQL
  - Cassandra config defines for default_constants.php (ie. mainly the hostname)
- 1 Cassandra node (mv-weavetest04)
  - Use a nightly build or build from trunk
  - Needs Java 1.6+ to get started.
  - Storage config in cassandra.php comments
- 1 load source (mv-weavetest02)
  - Use weave-loadtest (what command options?)

Cluster setup
- For a better test, need separate machines for Cassandra, webapp, and load source.
- Cassandra cluster
  - 4-8 machines to run Cassandra nodes.
    - Need more than 1-2, in order to make sure there's inter-node chat for redundancy and balancing
    - 4 nodes is a magic number guess, even number more than 2 and less than 8
  - Best as real hardware, identical machines
- API cluster
  - 1-2 machines to run the API webapp on Apache/PHP (ie. weaveserver-sync and weaveserver-registration)
    - 1 is probably fine
  - Better as real hardware, could be VMs?
- Load source
  - 1-2 machines to run weave-loadtest
    - 1 is probably fine
  - Could be VMs?
Next steps?

Operations notes

http://wiki.apache.org/cassandra/Operations
How stable is this thing?
Need to investigate all the basics of:
- upgrading between cassandra versions
- adding/removing cluster nodes
- performing backups
- monitoring health and performance
- performing maintenance on data changes & etc

Architecture notes

Mostly schema-free, but some aspects need to be configured in a way that requires cluster restart on change
No query language, all indexes need to be designed and maintained by hand.
- The Cassandra API does provide memcache-like basic get/set/delete, as well as various kinds of range searches and interesting batch gets.
All based around keys and values, like a 4 or 5 dimensional associative array
- Data access looks like:
  - Keyspace (cluster-wide, eg. "WeaveStorage") ->
  - Row Key (hashed to 1+ machine(s), eg. "lmorchard-bookmarks-abcd3%~d") ->
  - ColumnFamily (DB file on machine, eg. "WBO") ->
  - Column (key in DB file, eg. "payload") ->
  - Value (value in DB file, eg. "{...}")
- Or, using a SuperColumn:
  - Keyspace (cluster-wide) ->
  - Row Key (machine in cluster) ->
  - ColumnFamily (DB file on disk) ->
  - SuperColumn (key in DB file) ->
  - Column ->
  - Value
Simple columns reference binary values
- They're called columns - but as opposed to MySQL, a Cassandra "row" can have millions of "columns" within a ColumnFamily, thus making them suitable for use as indices.
Column names can be range-queried as an ordered set, with several choices of sort comparators
SuperColumns reference key/value sub-structures
- Useful, but not quite millions of sub-columns - the whole sub-structure of a SuperColumn is currently loaded into server memory on access.
Cluster-wide keys can be range-queried as an ordered set, but only in lexical order and only by using an order-preserving server partitioner that has negative load balancing implications.
- Data can pile up on servers in sequence, rather than being evenly distributed.
- So it's best to use keys for random access and columns for range-based indices.

Data layout

Plain storage can look like this:

 WeaveStorage > lmorchard-bookmarks-ABCDEF > WBO
     id = ABCDEF
     collection = bookmarks
     sortindex = 10
     parentid = toolbar
     modified = 126531833761
     payload = {...}

This causes all WBOs to be scattered evenly throughout the cluster, in theory, with tweakable degrees of replication. Access to the data seems pretty fast, since servers in a cluster talk to each other, and many mass gets are done in parallel.

A date-range index can be built like so:

 WeaveStorage > lmorchard-bookmarks > WBO_RangeSortindex
    0000000010-ABCDEF = lmorchard-bookmarks-ABCDEF
    0000000010-XYZHFH = lmorchard-bookmarks-XYZHFH
    0000001000-A%ds^& = lmorchard-bookmarks-A%ds^&
    0000055500-CnmKOs = lmorchard-bookmarks-CnmKOs

This puts the sortindex index for a single user's bookmarks on one server, which seems fast to search. The indexes themselves should be distributed evenly between servers in a cluster, hashed on the user and collection name.

And an exact-match index can be built like so:

 WeaveStorage > lmorchard-bookmarks > WBO_MatchParent 
     toolbar (SuperColumn) =
         00126531833761-ABCDEF = lmorchard-bookmarks-ABCDEF
         00126531833761-XYZHFH = lmorchard-bookmarks-XYZHFH
     menu (SuperColumn) = 
         00126531833761-A%ds^& = lmorchard-bookmarks-A%ds^&
         00126531833761-CnmKOs = lmorchard-bookmarks-CnmKOs

@@ Line 11: / Line 11: @@
 == Implementation progress ==
-* Patch queue with in-progress work:
-** http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync-patches/
 * Clone of weaveserver-sync:
 ** http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync/
-* Cassandra SVN branch with required bugfixes:
-** https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5/
+== Load testing ==
+* Simple setup
+** For a simple bakeoff, Cassandra should work fine with a single node.
+** [http://pastebin.mozilla.org/706953 Notes from an initial test on 06 Mar 2010].
+** 1 webapp frontend (mv-weavetest03)
+*** Make sure to install [http://incubator.apache.org/thrift/download/ Apache Thrift] and [https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP#UsingCassandrawithPHP-BuildingandinstallingthenativePHPextension the native PHP extension]
+*** [http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync/ weaveserver-sync] with Cassandra and [http://hg.mozilla.org/labs/weaveserver-registration/ weaveserver-registration] with MySQL
+*** [http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync/file/6aaeb20df0ad/1.0/weave_storage/cassandra.php#l41 Cassandra config defines for default_constants.php] (ie. mainly the hostname)
+** 1 Cassandra node (mv-weavetest04)
+*** Use a [http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/ nightly build] or [https://svn.apache.org/repos/asf/incubator/cassandra/trunk/ build from trunk]
+*** Needs Java 1.6+ to [http://wiki.apache.org/cassandra/GettingStarted#Step_2:_Running_a_single_node get started].
+*** [http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync/file/6aaeb20df0ad/1.0/weave_storage/cassandra.php#l62 Storage config in cassandra.php comments]
+** 1 load source (mv-weavetest02)
+*** Use [http://hg.mozilla.org/labs/weave-loadtest/ weave-loadtest] (what command options?)
+* Cluster setup
+** For a better test, need separate machines for Cassandra, webapp, and load source.
+** Cassandra cluster
+*** 4-8 machines to run Cassandra nodes.
+**** Need more than 1-2, in order to make sure there's inter-node chat for redundancy and balancing
+**** 4 nodes is a magic number guess, even number more than 2 and less than 8
+*** Best as real hardware, identical machines
+** API cluster
+*** 1-2 machines to run the API webapp on Apache/PHP (ie. [http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync/ weaveserver-sync] and [http://hg.mozilla.org/labs/weaveserver-registration/ weaveserver-registration])
+**** 1 is probably fine
+*** Better as real hardware, could be VMs?
+** Load source
+*** 1-2 machines to run [http://hg.mozilla.org/labs/weave-loadtest/ weave-loadtest]
+**** 1 is probably fine
+*** Could be VMs?
+* Next steps?
 == Operations notes ==
@@ Line 23: / Line 52: @@
 * How stable is this thing?
 * Need to investigate all the basics of:
-** adding/removing nodes
+** upgrading between cassandra versions
+** adding/removing cluster nodes
 ** performing backups
 ** monitoring health and performance