Services/Sync/Server/Archived/HereComesEverybody: Difference between revisions

From MozillaWiki
< Services‎ | Sync‎ | Server‎ | Archived
Jump to navigation Jump to search
Line 67: Line 67:


=== Candidates ===
=== Candidates ===
This isn't (yet?) a concern in terms of performance / storage capacity.  It's more a concern about maintainability and feature velocity.


* PHP
* PHP

Revision as of 21:40, 9 February 2010

The "Here Comes Everybody" release is intended to scale services.mozilla.com and Weave Sync up to match potential usage as a built-in feature in Firefox.

Storage

Candidates

Rejected

  • MySQL variants
    • Collection-per-table, InnoDB
      • Not enough space savings to justify reworking and restrictions introduced
  • CouchDB
    • Rejected already? Concerns over ability to keep a large scale installation running
  • SQLite DB per user
    • MHanson has experimented with a sqlite-based server that creates one db per user. In the default configuration it runs very slowly, because of fsync overhead on ext3. Turning off file synchronization helps a lot (250x speedup) but is obviously more dangerous. A different file system could help. Note that the Minimal Server does run on sqlite, because we do not expect heavy concurrent usage.
  • Amazon Web Services

Service

There's been mention of rebuilding the service in a new language / framework.

Needs investigation - is the current impl lacking, how / why? Are things more storage-bound, or does the service currently introduce significant CPU / latency issues?

  • What are our selection criteria?
    • Cost per user?
    • Ops complexity?
    • Feature development constraints?

Considerations:

  • High latency is okay
  • Tabs are updated very frequently; history somewhat frequently; everything else less so.

Candidates

This isn't (yet?) a concern in terms of performance / storage capacity. It's more a concern about maintainability and feature velocity.

  • PHP
    • Current implementation in plain PHP
    • Kohana?
      • Not be worth it (vs switching to Python) unless there's a significant web UI added atop the REST API
  • Python
    • web.py
      • Hosted atop apache / mod_wsgi
      • Minimal python web framework
      • May be too minimal to use beyond a REST API
    • Django
      • Hosted atop apache / mod_wsgi
      • May be overkill for just a REST service, but could help with an added web UI
      • See also, AMO / Zamboni
    • Twisted
      • Event-driven networking engine
      • Can be high-performance, but is semi-exotic
    • Tornado
      • Non-blocking / event-driven web server
      • Used by FriendFeed
      • Comparable to Twisted, possibly less exotic, though still somewhat unusual

Capacity / Load Testing

Methodology

  • Need to get our science on.
  • Formulate the questions / testing criteria
    • Max users per cluster unit?
    • Cost / user
  • Develop a traffic / load model
    • Based on 1 day / 1 week of current Weave service logs?
  • Employ a load cluster
    • Machines to apply load to the test cluster
    • eg. using Grinder or similar
    • Do we already have this available?
    • We have been using pm-weave05.mozilla.com as our load initiator.
  • Build out a test cluster based on a storage / service arch prototype
    • We have pm-weave06.mozilla.com available to act as a prototype; it talks to pm-weavefs03.mozilla.com as a DB. pm-weavefs06 has a clean database setup on it as well.
  • Perform experimental run of the load model using load cluster on the test cluster
  • Variables to monitor over time during load test:
    • Concurrent users
    • Service
      • CPU load
      • Latency per request
      • Network usage
    • Storage
      • CPU load
      • Disk space usage
      • Disk I/O usage
  • Repeat load model runs, explore capacity by increasing intensity until failure modes are encountered
    • What failure modes?
      • Unacceptable latency
      • Storage space exhausted
      • Connection refusal due to runtime resource exhaustion (e.g. LDAP socket usage, MySQL socket usage)
  • Form conclusions to answer questions
  • Use shared spreadsheet [1] to gather results.

Plan / Schedule

  • We need one.
    • Who does what and by when?
  • We need to select & build the storage / service prototypes
  • Build a schedule for setting up each prototype and running the experiment

Load Model

  • Do we have logs / log analysis?
    • Reads per day? Rate & amount
      • Per each hour of a day, to model usage patterns?
    • Writes per day? Rate & amount
      • Per each hour of a day, to model usage patterns?

Testing Hardware

  • What do we have, what can we get?
  • Need a cluster of machines to retask for experimental evaluation of each storage tech under consideration
  • Need machines to apply load to the experiment cluster

Tools

  • Grinder
  • ab
  • log_replay
    • Ask oremj?
  • XDebug
    • For profiling PHP, though it's doubtful there's enough PHP in play for it to make an order-of-magnitude difference vs storage concerns.
  • We have a hand-crafted load tool, which is a blessing and a curse: [2]. It can simulate more complex interactions, e.g. create a bunch of users, do a bunch of inserts, delete users, with a rolling window, which might be harder to do with scripted tools.

Developer Relations

Marketing

  • Marketing the Weave Sync addon
    • Who to target?
    • What expected growth?