Identity/AttachedServices/DeploymentPlanning/TrafficModel

From MozillaWiki
Jump to: navigation, search

Overview

Herein we will construct a high-level aggregate model of how users will interact with the PICL backend services. It's an attempt to inform capacity and loadtest planning based on some real data from existing services, some educated guesswork, and some hand-waving.

Assumptions:

  • ten times as many users on the new system as there are on current sync.
  • roughly similar per-user behaviour as current sync (e.g. data volume, change rates).
  • roughly similar per-user keysigning/authentication overheads as current persona.
  • err on the side of caution; overestimate in the face of ambiguity.

Raw data and some preliminary analysis can be found in https://id.etherpad.mozilla.org/picl-user-model


Users and Devices

Due to the enormously more effective Setup/Signin UI, we expect to be supporting the following userbase:

  • 20 million total user accounts
  • 15 million active users hitting the service in a 24-hour period

This is only a small fraction of the firefox userbase, but it's a lot more than current sync.

The number of devices per user is a power-law-distribution in current sync, and we anticipate the same for PICL. Erring on the side of caution:

  • 90% of users have 1 device
  • 8% of users have 2 devices
  • 2% of users have 3 devices
  • negligible number of users with > 4 devices
    • Let's round it up to: 1.2 devices-per-user

The split between desktop and mobile devices will mirror that seen in current sync:

  • 85% of devices are desktop
  • 15% of devices are mobile

Note that this is a much higher percentage of mobile devices than seen in the overall firefox userbase. It seems safe to skew towards more mobile devices; it will affect the requirements for the Scrypt Helper but otherwise does not change the numbers.

Firefox Account Operations

These are the major operations that can be performed on the Firefox Accounts server.

TODO: estimated data sizes, which translates into data volume rate in/out TODO: any other potentially-troublesome operations that we should scope out here?


Create Account

During the changeover, we expect an influx of account setup activity as users migrate over. Let's say two weeks for the existing sync userbase to migrate across, with that representing the peak rate of account creation:

  • Rounds up to: 2 create-account operations per second

Note that each create-account operation on the Firefox Accounts server involves sending a confirmation email.

Sanity-check: according to KPI data, this is higher than the rate of account creation on persona.org.


Establish Session

Each of the 1.2 devices-per-user needs to establish a session token with the Firefox Accounts server. They must do this at initial device setup, whenever the user changes their password, and whenever the device is updated. Let's posit that password changes are negligible and so each device establishes a fresh session once every six weeks:

  • (2 * 1.2) establish-session operations per second due to initial setup
  • (20mil * 1.2) establish-session operations every six weeks due to device update etc
    • Rounds up to: 10 register-device operations per second


Sign Certificate

Each device needs a valid BrowserID identity certificate in order to talk to the storage servers. Certificates last for 1 hour. Lets say each device is online and syncing for 12 hours each day, making 12 sign-certificate requests per device per day.

  • (15mil * 1.2 * 12) sign-certificate requests per day
    • 2500 sign-certificate operations per second

Sanity-check: according to KPI data, this is higher than the rate of signins on persona.org.

TODO: That's a lot. Adjust certificate duration to bring this number down?

Scrypt Helper

The scrypt helper is used by mobile devices during authentication, required as part of the establish-session operation. At 10 establish-session operations per second and 15% mobile devices, we have:

  • 1.5 scrypt-helper requests per second


Sync Storage Operations

Here we're basically taking the current traffic to sync and multiplying it by 10.

?XXX TODO? item size breakdown for each datatype; we have this data from sync, just need to run the numbers...


Create Account

Since the storage servers are a separate system, they will also require some initialization for each new user.

  • 2 create-account operations per second


History Data

Based on FHR data, the distribution of history item counts is roughly (after lots of rounding-up):

  • 25% have less than 200 history entries
  • 50% have less than 1,500 history entries
  • 75% have less than 6,000 history entries
  • 95% have less than 30,000 history entries
  • expected peak is around 100,000 history entries

Extrapolating up from sync read/write rates, we expect:

  • 175 history-data reads per second
  • 1200 history-data writes per second


Bookmarks Data

Based on FHR data, the distribution of bookmark item counts is roughly (after lots of rounding-up):

  • 25% have less than 30 bookmarks
  • 50% have less than 40 bookmarks
  • 75% have less than 70 bookmarks
  • 95% have less than 400 bookmarks
  • expected peak is around 10,000 bookmarks

The data from server-side sync databases skews higher, with more users in the 100-1000 bookmark bracket. We'll assume FHR is more representative of the general firefox population.

Extrapolating up from sync read/write rates, we expect:

  • 130 bookmark-data reads per second
  • 250 bookmark-data writes per second


Passwords Data

Based on server-side sync data:

  • 75% have less than 100 passwords
  • expected peak is around 1,000 passwords

Extrapolating up from sync read/write rates, we expect:

  • 100 password-data reads per second
  • 240 password-data writes per second


Tabs Data

Extrapolating up from sync read/write rates, we expect:

  • 420 tabs-data reads per second
  • 1010 tabs-data writes per second


Aggregate

Based on current sync traffic, we would aim for the following:

  • 3500 reads per second
  • 3900 writes per second

However, the reads-per-second figure here is skewed by some very read-heavy collections that will not appear in PICL. Instead, let's just add up the figures for individual data-types above and then round it up a little:

  • 900 reads per second
  • 3000 writes per second