Services/AppsInTheCloud

From MozillaWiki
Jump to: navigation, search

Supporting Apps in Services

The Services team exists to build and support products/services for Mozilla. As part of the consumer launch for Apps, we need to deliver server-side support for storing/sharing some key data for use by Apps clients (web and native). To deliver on this, we need to identify requirements, and options to fulfil those requirements. This is an attempt to define and document those requirements.

Requirements

Background

  • Need services server support for two types of data: App Receipts & Device State
  • These data types have different requirements

App Receipts

Key Considerations

  • App Receipts are used to allow install/use of all apps, from all app stores (not just Mozilla's)
  • This service must be up for users to purchase and install apps
  • Marketplace has a direct dependency for its own SLA
  • Data has minimal personal information

Data protection

  • Normal disk encryption is good enough

Durability

  • Dataloss is not acceptable. Complete loss of data would be catastrophic to the ecosystem.
  • Data must survive the loss of a physical location (datacenter)
  • Backup frequency: N hours (must define N)
  • Clients may be able to recreate server data, but this cannot be relied upon as a part of service continuity

Uptime

  • Target uptime must match/exceed Marketplace, as Marketplace depends on this service
  • Dependent services must be aware of service interruptions to gracefully degrade UX
  • Launch target: 99.9% uptime

API

Must be able to retrieve receipts for:

  • A single application
  • All applications
  • All receipts updated since a given timestamp

Must be able to upload new receipts:

  • With a specific ID

Build Assumptions

  • Manifests will average 2K and not exceed 16K
  • Manifests per user will average 16 and not exceed 200
  • Initial support is for 5M users

- therefore, total initial storage is ~ 160GB

Given that sync machines have 3TB disks, 1 machine will be sufficient to hold the data with plenty of capacity. How that is replicated affects the overall number of machines.

Options

Use Sync Servers:

  • Cost: Free (drop in the bucket versus other data)
  • Reliability: 99%
  • Durability: Minimal. No data guarantees
  • Delivery Date: Available on dev now.
  • Notes: Not intended as an option, just here as a potential development solution

Dedicated Sync Servers, Single Colo:

  • Cost: 3 Webheads at $10K. Each storage machine is $15K. Replication overhead cost is minimal. Yearly maintenance on each machine is $800.
  • Reliability: Hot failover. What is acceptable failover delay?
  • Durability: One slave box: 99.9% Two slave boxes: 99.99%
  • Delivery Date: June 6, though we're starting to push it
  • Notes: No plans to do periodic tape backups

Dedicated Sync Servers, Multicolo:

  • Cost: As above for hardware.
  • Reliability: as above
  • Durability: Marginally higher, due to avoiding danger of a colo exploding
  • Delivery Date: June 6, but with higher risk
  • Notes: Can migrate to this solution from Single Colo

Custom API

  • Cost: As above, plus annual expenditure of $200K for ~1/3rd time of two engineers and an ops headcount
  • Reliability: as above
  • Durability: as above
  • Delivery Date: Dependent on hiring. Possible for June 6
  • Notes: Will enable more flexibility moving forwards. May be able to split the costs with other products if the underlying API matches, but that cuts down on flexibility.

Device State

Key Considerations

  • This is per-device data, App Receipts would be sufficient to recover/reinstall applications on a device
  • Durability is not deeply required, as clients will store local state and can recreate this in most cases, falling back to full set of available apps (similar to iOS recovery)
  • This data primarily exists for future "clone setup to device" and "install app on remote device" features that are currently being specified.
    • Because of the above features, server data is a potential stepping stone to remote code installs
    • This data may have minimal/no value before those features are a part of the offering
  • Must contain, or link to, device metadata such as device name, type, capabilities

Data Protection

  • Needs further security review, but as this will be a remote install vector, some form of user secret (BID key wrapping?) is likely preferred

Durability

  • Missing server data can be recreated by the client in the short term. in the longer term this will impede future features.

Uptime

  • Until future features are available, this feature has minimal uptime requirements

API

  • Must be able to retrieve the device state for:
    • A single device
    • All devices
    • All devices updated since a given timestamp (optional?)
  • Must be able to upload new device states

API Design

WIP at https://etherpad.mozilla.org/apps-in-the-cloud-api-rev00

Implementation Options

  • Work is proceeding, more updates soon