Services/Sync/Server/Archived/HereComesEverybody: Difference between revisions

Revision as of 21:40, 9 February 2010

The "Here Comes Everybody" release is intended to scale services.mozilla.com and Weave Sync up to match potential usage as a built-in feature in Firefox.

Storage

What are our selection criteria?
- Cost per user?
- Ops complexity?
- Feature development constraints?
- Notes on our data characteristics

Candidates

MySQL variants
- Classic - single WBO table, InnoDB, no memcached
  - Current implementation
  - Need to add memcache to collection counts / dates?
  - tests/python/run_server_tests.py yields failures=11, errors=2
- Table-per-user, MyISAM
  - Can MySQL handle / directory hash millions of tables?
- Maybe Flickr has some useful MySQL hacks?
  - http://code.flickr.com/blog/2010/02/08/using-abusing-and-scaling-mysql-at-flickr/
MongoDB
- telliott has a PHP storage backend
  - http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync-patches/file/tip/db-mongo
- tests/python/run_server_tests.py yields failures=22, errors=4
Cassandra
- User:LesOrchard is working on a PHP storage backend
  - http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-sync-patches/file/tip/db-cassandra
- Cassandra backend implementation notes
- tests/python/run_server_tests.py yields failures=25, errors=10
HBase?
- HBase notes
Hypertable?
- HyperTable notes

Rejected

MySQL variants
- Collection-per-table, InnoDB
  - Not enough space savings to justify reworking and restrictions introduced
CouchDB
- Rejected already? Concerns over ability to keep a large scale installation running
SQLite DB per user
- MHanson has experimented with a sqlite-based server that creates one db per user. In the default configuration it runs very slowly, because of fsync overhead on ext3. Turning off file synchronization helps a lot (250x speedup) but is obviously more dangerous. A different file system could help. Note that the Minimal Server does run on sqlite, because we do not expect heavy concurrent usage.
Amazon Web Services
- Seems prohibitively expensive for heavy writes.

Service

There's been mention of rebuilding the service in a new language / framework.

Needs investigation - is the current impl lacking, how / why? Are things more storage-bound, or does the service currently introduce significant CPU / latency issues?

What are our selection criteria?
- Cost per user?
- Ops complexity?
- Feature development constraints?

Considerations:

High latency is okay
Tabs are updated very frequently; history somewhat frequently; everything else less so.

Candidates

This isn't (yet?) a concern in terms of performance / storage capacity. It's more a concern about maintainability and feature velocity.

PHP
- Current implementation in plain PHP
- Kohana?
  - Not be worth it (vs switching to Python) unless there's a significant web UI added atop the REST API
Python
- web.py
  - Hosted atop apache / mod_wsgi
  - Minimal python web framework
  - May be too minimal to use beyond a REST API
- Django
  - Hosted atop apache / mod_wsgi
  - May be overkill for just a REST service, but could help with an added web UI
  - See also, AMO / Zamboni
- Twisted
  - Event-driven networking engine
  - Can be high-performance, but is semi-exotic
- Tornado
  - Non-blocking / event-driven web server
  - Used by FriendFeed
  - Comparable to Twisted, possibly less exotic, though still somewhat unusual

Capacity / Load Testing

Methodology

Need to get our science on.
Formulate the questions / testing criteria
- Max users per cluster unit?
- Cost / user
Develop a traffic / load model
- Based on 1 day / 1 week of current Weave service logs?
Employ a load cluster
- Machines to apply load to the test cluster
- eg. using Grinder or similar
- Do we already have this available?
- We have been using pm-weave05.mozilla.com as our load initiator.
Build out a test cluster based on a storage / service arch prototype
- We have pm-weave06.mozilla.com available to act as a prototype; it talks to pm-weavefs03.mozilla.com as a DB. pm-weavefs06 has a clean database setup on it as well.
Perform experimental run of the load model using load cluster on the test cluster
Variables to monitor over time during load test:
- Concurrent users
- Service
  - CPU load
  - Latency per request
  - Network usage
- Storage
  - CPU load
  - Disk space usage
  - Disk I/O usage
Repeat load model runs, explore capacity by increasing intensity until failure modes are encountered
- What failure modes?
  - Unacceptable latency
  - Storage space exhausted
  - Connection refusal due to runtime resource exhaustion (e.g. LDAP socket usage, MySQL socket usage)
Form conclusions to answer questions
Use shared spreadsheet [1] to gather results.

Plan / Schedule

We need one.
- Who does what and by when?
We need to select & build the storage / service prototypes
Build a schedule for setting up each prototype and running the experiment

Load Model

Do we have logs / log analysis?
- Reads per day? Rate & amount
  - Per each hour of a day, to model usage patterns?
- Writes per day? Rate & amount
  - Per each hour of a day, to model usage patterns?

Testing Hardware

What do we have, what can we get?
Need a cluster of machines to retask for experimental evaluation of each storage tech under consideration
Need machines to apply load to the experiment cluster

Tools

Grinder
ab
log_replay
- Ask oremj?
XDebug
- For profiling PHP, though it's doubtful there's enough PHP in play for it to make an order-of-magnitude difference vs storage concerns.

We have a hand-crafted load tool, which is a blessing and a curse: [2]. It can simulate more complex interactions, e.g. create a bunch of users, do a bunch of inserts, delete users, with a rolling window, which might be harder to do with scripted tools.

Developer Relations

https://wiki.mozilla.org/Labs/Weave/Developer
http://mozillalabs.com/weave/2010/02/05/weave-sync-new-apis-and-resources-for-developers/
Need to build a developers.service.mozilla.com?
See also: https://addons.mozilla.org/en-US/developers
Present Weave / services.mozilla.com as an open service
Messaging / copy / design needed
Forums?
Offer updated service docs and example clients
- Weave/Experimental_Clients/Web
- Weave/Experimental_Clients/iPhone
- Weave/Experimental_Clients/WebOS
  - User:LesOrchard has worked on this in free time - time to apply official time to it?
- python command-line client
  - User:Mhanson is working on this.

Marketing

Marketing the Weave Sync addon
- Who to target?
- What expected growth?

@@ Line 67: / Line 67: @@
 === Candidates ===
+This isn't (yet?) a concern in terms of performance / storage capacity.  It's more a concern about maintainability and feature velocity.
 * PHP

Services/Sync/Server/Archived/HereComesEverybody: Difference between revisions

Revision as of 21:40, 9 February 2010

Contents

Storage

Candidates

Rejected

Service

Candidates

Capacity / Load Testing

Methodology

Plan / Schedule

Load Model

Testing Hardware

Tools

Developer Relations

Marketing

Navigation menu

Services/Sync/Server/Archived/HereComesEverybody: Difference between revisions

Revision as of 21:40, 9 February 2010

Storage

Candidates

Rejected

Service

Candidates

Capacity / Load Testing

Methodology

Plan / Schedule

Load Model

Testing Hardware

Tools

Developer Relations

Marketing

Navigation menu

Search