Identity/AttachedServices/DeploymentPlanning/ScryptHelperService

Overview

This is a working proposal for the backend architecture and deployment of the Scrypt Helper service.

The immediate and only goal of this service is to let New Sync devices outsource some of the computational costs of the Firefox Accounts authentication process. It is a stateless but computationally-expensive service.

Goals

TBD

Need to define capacity, uptime, various success criteria here.

Dev Deployment

Development deployments are done using awsboxen - plain awsbox is not suitable since this is not a nodejs app. :rfkelly will take responsibility for a basic awsboxen script that just stands up a single box.

Stage/Prod Deployment

To begin we will script this in the scrypt-helper repo, using awsboxen/cloudformation. That should suffice for initial QA and loadtesting purposes. If and when we need to migrate to other SvcOps tools, the cloudformation stuff will be a good starting point.

Architecture

This will be a multi-region high-availability deployment. Since it a prerequisite for use of the Firefox Accounts service by low-powered devices, it should have a least the same availability as the Firefox Accounts service itself.

This is a very skinny and simple service, consisting of a single URL endpoint. We'll run an autoscale cluster of machines behind a simple public ELB:

    client       +------------+    +-----------------------+
    requests --> | Public ELB |--->| Scrypt Helper Cluster |
                 +------------+    +-----------------------+

Since the work of this service is compute-bound, we'll probably run a small number of beefy compute nodes. :warner has some benchmarking results on the appropriate machines to use here.

Security

We need some good abuse-detection and abuse-prevention strategies in here, it's a great big DoS waiting to happen.

It's also a high-value target in that the machines will be holding cheap verifiers of user password, rather than the more expensive scrypt verifiers held by the Firefox Account server.

So, what do we do to keep these secure?

TDB.

Supporting Infrastructure

Each machine will run a local Heka agent to aggregate logs. They will be shipped to a stand-alone Heka router assocated with the region, which will in turn forward them to the shared ElasticSearch /Kibana thingy.