Services/Sync/Server/Archived/HereComesEverybody

From MozillaWiki
< Services‎ | Sync‎ | Server‎ | Archived
Jump to: navigation, search

The "Here Comes Everybody" release is intended to scale services.mozilla.com and Weave Sync up to match potential usage as a built-in feature in Firefox.

Storage

Candidates

Rejected

  • MySQL variants
    • Collection-per-table, InnoDB
      • Not enough space savings to justify reworking and restrictions introduced
  • CouchDB
    • Rejected already? Concerns over ability to keep a large scale installation running
    • We should unreject this candidate until after some discussion. We could distribute the collections across many databases, many not in Mozilla's control, given the proper architecture. CouchDB would be an interesting candidate for that. See TimelessN (tellis at mozillaDotCom) for details on this. (Note that we will probably re-reject this after the discussion)
  • SQLite DB per user
    • MHanson has experimented with a sqlite-based server that creates one db per user. In the default configuration it runs very slowly, because of fsync overhead on ext3. Turning off file synchronization helps a lot (250x speedup) but is obviously more dangerous. A different file system could help. Note that the Minimal Server does run on sqlite, because we do not expect heavy concurrent usage.
  • Amazon Web Services

Service

There's been mention of rebuilding the service in a new language / framework.

Needs investigation - is the current impl lacking, how / why? Are things more storage-bound, or does the service currently introduce significant CPU / latency issues?

  • What are our selection criteria?
    • Cost per user?
    • Ops complexity?
    • Feature development constraints?

Considerations:

  • High latency is okay
  • Tabs are updated very frequently; history somewhat frequently; everything else less so.

Candidates

This isn't (yet?) a concern in terms of performance / storage capacity. It's more a concern about maintainability and feature velocity.

  • PHP
    • Current implementation in plain PHP
    • Kohana?
      • Not be worth it (vs switching to Python) unless there's a significant web UI added atop the REST API
  • Python
    • web.py
      • Hosted atop apache / mod_wsgi
      • Minimal python web framework
      • May be too minimal to use beyond a REST API
    • Django
      • Implementation in progress (jensd)
      • Hosted atop apache / mod_wsgi
      • May be overkill for just a REST service, but could help with an added web UI
      • See also, AMO / Zamboni
      • A lot of Django's goodness comes from the ORM, which is not available if you're not on a SQL db.
    • Twisted
      • Event-driven networking engine
      • Can be high-performance, but is semi-exotic
    • Tornado

Capacity / Load Testing

Methodology

  • Need to get our science on.
  • Formulate the questions / testing criteria
    • Max users per cluster unit?
    • Cost / user
  • Develop a traffic / load model
    • Based on 1 day / 1 week of current Weave service logs?
  • Employ a load cluster
    • Machines to apply load to the test cluster
    • eg. using Grinder or similar
    • Do we already have this available?
    • We have been using pm-weave05.mozilla.com as our load initiator.
  • Build out a test cluster based on a storage / service arch prototype
    • We have pm-weave06.mozilla.com available to act as a prototype; it talks to pm-weavefs03.mozilla.com as a DB. pm-weavefs06 has a clean database setup on it as well.
  • Perform experimental run of the load model using load cluster on the test cluster
  • Variables to monitor over time during load test:
    • Concurrent users
    • Service
      • CPU load
      • Latency per request
      • Network usage
    • Storage
      • CPU load
      • Disk space usage
      • Disk I/O usage
  • Repeat load model runs, explore capacity by increasing intensity until failure modes are encountered
    • What failure modes?
      • Unacceptable latency
      • Storage space exhausted
      • Connection refusal due to runtime resource exhaustion (e.g. LDAP socket usage, MySQL socket usage)
  • Form conclusions to answer questions
  • Use shared spreadsheet [1] to gather results.

Plan / Schedule

Load Model

  • Do we have logs / log analysis?
    • Reads per day? Rate & amount
      • Per each hour of a day, to model usage patterns?
    • Writes per day? Rate & amount
      • Per each hour of a day, to model usage patterns?

Testing Hardware

  • What do we have, what can we get?
  • Need a cluster of machines to retask for experimental evaluation of each storage tech under consideration
  • Need machines to apply load to the experiment cluster

Tools

  • Grinder
  • ab
  • log_replay
    • Ask oremj?
  • XDebug
    • For profiling PHP, though it's doubtful there's enough PHP in play for it to make an order-of-magnitude difference vs storage concerns.
  • We have a hand-crafted load tool, which is a blessing and a curse: [2]. It can simulate more complex interactions, e.g. create a bunch of users, do a bunch of inserts, delete users, with a rolling window, which might be harder to do with scripted tools.

Developer Relations

Marketing

  • Marketing the Weave Sync addon
    • Who to target?
    • What expected growth?