Identity/AttachedServices/DeploymentPlanning: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 29: Line 29:


This does not mean having a fully-deployed production environment!  With the implementation of the storage component still outstanding, there's no point in standing up an authentication service all by itself.  It does mean that we need the ability to do automated deployments that pass loadtests, meet operational and security criteria, and generally inspire confidence that we could roll things out to production without major hiccups.
This does not mean having a fully-deployed production environment!  With the implementation of the storage component still outstanding, there's no point in standing up an authentication service all by itself.  It does mean that we need the ability to do automated deployments that pass loadtests, meet operational and security criteria, and generally inspire confidence that we could roll things out to production without major hiccups.
An analysis of the expected traffic to the services is available at [[Identity/AttachedServices/DeploymentPlanning/TrafficModel|TrafficModel]]


These are the top-level milestones on our way to said goal, broken down into weekly chunks:
These are the top-level milestones on our way to said goal, broken down into weekly chunks:

Revision as of 04:41, 13 August 2013

Overview

Herein we will construct a solid plan for deployment of the various Mozilla-hosted services that make up PiCL aka "New Sync".

Details on the overall product plan are available at User_Services/Sync. From a server-side perspective it consists of three mostly-independent services. We can work on deployment of each independently, but they all need to be standing before New Sync goes into production.

  • Storage Service: This service provides storage of encrypted blobs of data, and is used as the backend storage for New Firefox Sync. Users authenticate to it using BrowserID assertions issued by the Firefox Accounts Service.

Goals and Milestones

The goal for Q32013 is to have the authentication pieces Production Ready.

This does not mean having a fully-deployed production environment! With the implementation of the storage component still outstanding, there's no point in standing up an authentication service all by itself. It does mean that we need the ability to do automated deployments that pass loadtests, meet operational and security criteria, and generally inspire confidence that we could roll things out to production without major hiccups.

An analysis of the expected traffic to the services is available at TrafficModel

These are the top-level milestones on our way to said goal, broken down into weekly chunks:

  • Aug 09:
    • usable manual-deploy dev environment tooling for Firefox Accounts and Scrypt Helper.
  • Aug 16:
    • defined testable "success criteria" for Firefox Accounts deployment (req/sec etc).
    • defined testable "success criteria" for Scrypt Helper deployment (req/sec etc).
    • loadtesting code written and debugged for Firefox Accounts.
  • Aug 23:
    • automated single-region staging deployment of Firefox Accounts.
    • loadtests run against Firefox Accounts staging environment.
  • Aug 30:
    • fixed any load-related issues in Firefox Accounts.
    • loadtesting code written and debugged for Scrypt Helper.
  • Sep 6:
    • automated single-region staging deployment of Scrypt Helper.
    • loadtests run against Scrypt Helper staging environment.
  • Sep 13:
    • automated two-region staging deployment of Firefox Accounts.
    • loadtests run against Firefox Accounts staging environment.
  • Sep 20:
    • fixed any load-related issues in both services.
    • a bit of slack time to account for inevitable slippage.
  • Sep 27:
    • security review signoff for both services.
    • svcops signoff for both services.
  • Sep 30:
    • Production Ready!

And one stretch goal, to work on as opportunity presents itself:

  • Sep 30:
    • auto-updating dev deployments for Firefox Accounts and Scrypt Helper.

See the sub-pages for each service for more detailed plans, dependencies, etc.

Some things to note about this plan:

  • There are no specific milestones for the Storage Service. It's not defined enough. That may have to change depending on how client work progresses.
  • We're not taking any dependencies on the client-side work. Assuming the protocol stays stable, we'll have the servers ready regardless of whether there's client code landed in Firefox.

Deployment Environment

We will follow the standard dev, stage, prod deployment scheme. Developers do all their work in the dev environment, spinning up and tearing down small testing stacks at will. QA and Ops collaborate to manage the stage environment. Prod is for SvcOps to control in its entirety.

All deployments go into AWS. Dev and stage in the "moz-svc-dev" account; prod in the "mozilla" account.

Common Tooling

We will use the following across all projects and all deployment environments:

Domain Names

Note: entirely made-up and provisional, but you gotta start somewhere. We should talk with SvcOps about using mozaws.net and whatever other standard domains they hang off of.

Each of the three services will have a single canonical endpoint URL, and they will all be under a common domain suffix. Individual deployment stacks will be identified by the common domain suffice.

Dev deployments will have:

  • account.<stack-name>.lcip.org
  • scrypt.<stack-name>.lcip.org
  • storage.<stack-name>.lcip.org

Stage will have:

  • account.stage.picl.mozilla.com
  • scrypt.stage.picl.mozilla.com
  • storage.stage.picl.mozilla.com

Production will have:

  • account.picl.mozilla.com
  • scrypt.picl.mozilla.com
  • storage.picl.mozilla.com

Dev Deployment Environment

Managed entirely by dev, for testing or experiments or whatever.

Individual deployments hang off of <stack>.lcip.org where <stack> may be e.g. the name of a particular feature branch.

The stack dev.lcip.org will (eventually) be auto-deployed with the latest version of the code for each project. Probably we can even run automated tests against it. This is not a priority though.

Using awsbox as much as possible, for simple deployments.

Using awsboxen for things that are a little more complicated, e.g. if you want to do a testing deployment of a production-like stack.

Stage Deployment Environment

XXX TODO: fill this in as we nail down the details:

  • externally visible domains
  • VPC and subnet setup
  • Logging setup

Ideally, this will exactly mirror the things we set up in Stage, just managed entirely by SvcOps. Since we don't plan to have a full production environment set up by end of Q3, there are no firmer details to be had here just yet.

Open Questions

Here are the TODOs from various sub-projects, all collected into one master list. This needs to reach Inbox Zero ASAP.

  • Who will be in charge of this from SvcOps?
    • mmayo and lloyd to figure out resource allocation
  • Target capacity requirements?
    • Firefox Accounts:
      • Total number of users, devices per user?
      • Rate of account creation?
      • Number of logins per second?
      • Cross-region replication latency?
      • ckarlof or warner to posit these figures
    • Scrypt Helper
      • Number of operations per second
      • rfkelly to extrapolate this from Firefox Accounts capacity data
    • Storage Service
      • Total number of users, devices per user?
      • Reads/writes per second for different datatypes
      • Storage volume profile per user, per datatype
      • rfkelly to synthesize this data from current sync stats
    • Other opsy metrics like MTTR
  • Budget and costs of running all this stuff.
  • Protocol details to be finalised:
    • storage service
    • scrypt helper
  • Do client teams have everything they need to proceed?
    • In particular, do we need to stand up dev instances right away or can they run servers locally? Particularly for desktop team.
  • Organizational stuff
    • estimate hours of work, headcount needed
    • how do we manage all this, e.g. repos and bugs? This is also a larger question for the product as a whole.
  • detailed instructions for spinning up dev/stage/prod stacks, especially important for QA