User:Ffledgling/Senbonzakura

This service (I'm calling it Senbonzakura for now) will generate partial .mar (Mozilla ARchive) files for updates from Version A to Version B on demand.

Benefits

Generate updates on the fly
Generate updates on a need-only basis
Separate the update mar generation process form the build process (speed up ze builds!)
Greater flexibility in what update paths we need/want

Structure

Function Signature (?)

Input: URL for Cmar1, URL for Cmar2, Cmar1 Hash, Cmar2 Hash
Output: Pmar1-2

Internally:

Fetch Cmar's
Use a resilient retry library here
verify hashes (sanity check)
cache Cmars
Where, How needs to be decided, so ideally have two functions approximating, storage of Cmar, Lookup of Cmar based on it's hash, retrieval of Cmar based on it's hash
determine which version of the mar, mbsdiff tools to use, use them.
These probably need to be cached as well, maybe based on own version, maybe based on gecko version, simply keep a function that decides and determines which one to use and points you to the right one. Use the one given by that tool, assume abstraction.
We might have to cache these as well based on the version of update paths we're given.
generate the partial mar file based on the input .mar's and the given mar, mbsdiff tools.
cache the generated partial mar file based on the update path or based on a combination of the hashes of the input mar files.
Where and how the partial mars are actually cached again depends on our caching strategy, we simply use our abstraction functions.

API & Frontend

have a web API that allows one to trigger request partial mar generation between two given mar files.
have a GUI/webpage a front end that kind of does the same

Scaling, Resilience and Caching

It is probably best to design for scalability, resilience and caching from the ground up so things to keep in mind are: - Retry retry retry - Log more than enough to debug - Have our application/service start up from a config file - Do not trust your machine to store state, keep it on disk or on file? - abstraction abstraction abstraction?

When trying to combine scaling and caching, we need to think about how and where we'll store all our cached stuff? - locally on each machine? - S3? How do we optimize caching? Will depend on caching strategy.

Level 1 Caching

We simply caching partialMar.versionA.versionB

Deliverables

I do not have a concrete idea of the deliverables so everything below is subject to possibly radical change, but for now, this is what makes sense to me:

Prototype 0.1

The intial prototype will simply be a bunch of python that essentially simply takes the input MAR urls, diffs them and spits them out

Prototype 0.2

The second prototype starts to add the caching functions, resilience logic, mar/mbsdiff tool versioning logic and generally attempts to map out the entire structure/flow of code.
Should probably have some ideas about the certs as well at this point in time

Deliverable 1.0

Have all the basics services up and running with our partial Mar (Level 1) caching up and running, should ideally try deployment on a machine in the cloud and let it run for a bit to see how things go

Deliverable 1.x

Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.

Signing and Certs

Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things.

Issues

Catlee's partial's on demand vs. nthomas's ... something else
Signing explanation
What do we do about the tool versioning?

People to contact

In no particular order:

bhearsum
catlee
nthomas
hwine