User:Ffledgling/Senbonzakura: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Added sections about level2 caching and improved language in certain parts)
(added detailing to the webapi design section)
Line 13: Line 13:


Input: URL for Cmar1, URL for Cmar2, Cmar1 Hash, Cmar2 Hash<br />Output: Pmar1-2
Input: URL for Cmar1, URL for Cmar2, Cmar1 Hash, Cmar2 Hash<br />Output: Pmar1-2
=== Web API implementation ===
There are a couple of questions we might want to answer before we begin designing our API.
* What kind of requests will be sent to the API?
* Who will be sending them?
* What will they look like? format?
** HEAD: return pmar's meta information?
** GET: retrieve pmar if it exists, else error?
** POST: request pmar generation, gives pmar URLs and Hashes
** PUT: ?
** PATCH: ?
** DELETE: Delete mentioned update path? (do we want this?)
* Do we need a separate admin API?<br />
* What all might we need it for?
** Cache control (invalidation, flush, changes?)
** Starting/restarting service (or will this be done via ssh?)
** ?
* What kindof information is to be exposed by the API?
** Available update paths?
** ?
resources: - http://blog.luisrei.com/articles/rest.html - http://blog.luisrei.com/articles/flaskrest.html


=== Internally: ===
=== Internally: ===
Line 41: Line 65:


A lot of the bigger stuff between releases like the XUL libs on every platform remain the same despite different locales, this locale independent stuff should probably be cached and re-used. Since we plan to things at the file level, we might also want to cache the diffs b/w the commonly used files to speed things up further. What kindof speed up will this give us? (is this possible with the way our scripts currently work? I think it is, confirmation needed)
A lot of the bigger stuff between releases like the XUL libs on every platform remain the same despite different locales, this locale independent stuff should probably be cached and re-used. Since we plan to things at the file level, we might also want to cache the diffs b/w the commonly used files to speed things up further. What kindof speed up will this give us? (is this possible with the way our scripts currently work? I think it is, confirmation needed)
== Signing and Certs ==
Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things. Feedback needed by catlee, nthomas, bhearsum
== Pertinent Questions ==
* does the client require the request to be synchronous or asynchronous?
* does the client require any progress information?
* will any client need to ask if the partial mar already exists?
* how will cache maintenance/invalidation be handled? (same api, admin api, cli, scripts, docs?)
* what type of docs are planned.
== Issues ==
# Catlee's partial's on demand vs. nthomas's ... [https://bugzilla.mozilla.org/show_bug.cgi?id=770995#c0 something else]
# Signing explanation
# What do we do about the tool versioning?


== Deliverables ==
== Deliverables ==
Line 62: Line 104:
Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.
Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.


== Signing and Certs ==
=== Unit Tests ===
 
<pre>Unit-Test as much code as possible</pre>
=== Docs ===


Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things. Feedback needed by catlee, nthomas, bhearsum
<pre>Keep documenting stuff being done</pre>
=== Environment ===


== Issues ==
What's required for: - Dev Environment - Deployment/Production


# Catlee's partial's on demand vs. nthomas's ... [https://bugzilla.mozilla.org/show_bug.cgi?id=770995#c0 something else]
Possible stuff at the moment: 1. Python 2. pip 3. virtualenv
# Signing explanation
# What do we do about the tool versioning?


== People to contact ==
== People to contact ==

Revision as of 21:28, 15 May 2014

This service (I'm calling it Senbonzakura for now) will generate partial .mar (Mozilla ARchive) files for updates from Version A to Version B on demand.

Benefits

  • Generate updates on the fly
  • Generate updates on a need-only basis
  • Separate the update mar generation process form the build process (speed up ze builds!)
  • Greater flexibility in what update paths we need/want

Structure

Function Signature (?)

Input: URL for Cmar1, URL for Cmar2, Cmar1 Hash, Cmar2 Hash
Output: Pmar1-2

Web API implementation

There are a couple of questions we might want to answer before we begin designing our API.

  • What kind of requests will be sent to the API?
  • Who will be sending them?
  • What will they look like? format?
    • HEAD: return pmar's meta information?
    • GET: retrieve pmar if it exists, else error?
    • POST: request pmar generation, gives pmar URLs and Hashes
    • PUT: ?
    • PATCH: ?
    • DELETE: Delete mentioned update path? (do we want this?)
  • Do we need a separate admin API?
  • What all might we need it for?
    • Cache control (invalidation, flush, changes?)
    • Starting/restarting service (or will this be done via ssh?)
    • ?
  • What kindof information is to be exposed by the API?
    • Available update paths?
    • ?

resources: - http://blog.luisrei.com/articles/rest.html - http://blog.luisrei.com/articles/flaskrest.html

Internally:

  • Fetch Cmar's
    Use a resilient retry library here
  • verify hashes (sanity check)
  • cache Cmars
    Where, How needs to be decided, so ideally have two functions approximating, storage of Cmar, Lookup of Cmar based on it's hash, retrieval of Cmar based on it's hash
  • determine which version of the mar, mbsdiff tools to use, use them.
    These probably need to be cached as well, maybe based on own version, maybe based on gecko version, simply keep a function that decides and determines which one to use and points you to the right one. Use the one given by that tool, assume abstraction.
    We might have to cache these as well based on the version of update paths we're given.
  • generate the partial mar file based on the input .mar's and the given mar, mbsdiff tools.
  • cache the generated partial mar file based on the update path or based on a combination of the hashes of the input mar files.
    Where and how the partial mars are actually cached again depends on our caching strategy, we simply use our abstraction functions.

API & Frontend

  • have a web API that allows one to trigger request partial mar generation between two given mar files. (Priority)
  • have a GUI/webpage a front end that kind of does the same

Scaling, Resilience and Caching

It is probably best to design for scalability, resilience and caching from the ground up so things to keep in mind are: - Retry retry retry - Log more than enough to debug - Have our application/service start up from a config file - Do not trust your machine to store state, keep it on disk or on file? - abstraction abstraction abstraction?

When trying to combine scaling and caching, we need to think about how and where we'll store all our cached stuff? - locally on each machine? - S3? How do we optimize caching? Will depend on caching strategy.

Level 1 Caching/Storage

We simply store partialMar.versionA.versionB somewhere, perhaps centrally on an ftp server or on S3.

Level 2 Caching

A lot of the bigger stuff between releases like the XUL libs on every platform remain the same despite different locales, this locale independent stuff should probably be cached and re-used. Since we plan to things at the file level, we might also want to cache the diffs b/w the commonly used files to speed things up further. What kindof speed up will this give us? (is this possible with the way our scripts currently work? I think it is, confirmation needed)

Signing and Certs

Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things. Feedback needed by catlee, nthomas, bhearsum

Pertinent Questions

  • does the client require the request to be synchronous or asynchronous?
  • does the client require any progress information?
  • will any client need to ask if the partial mar already exists?
  • how will cache maintenance/invalidation be handled? (same api, admin api, cli, scripts, docs?)
  • what type of docs are planned.

Issues

  1. Catlee's partial's on demand vs. nthomas's ... something else
  2. Signing explanation
  3. What do we do about the tool versioning?

Deliverables

I do not have a concrete idea of the deliverables so everything below is subject to possibly radical change, but for now, this is what makes sense to me:

Prototype 0.1

The intial prototype will simply be a bunch of python that essentially simply takes the input MAR urls, diffs them and spits them out

Prototype 0.2

The second prototype starts to add the caching functions, resilience logic, mar/mbsdiff tool versioning logic and generally attempts to map out the entire structure/flow of code.
Should probably have some ideas about the certs as well at this point in time

Deliverable 1.0

Have all the basics services up and running with our partial Mar (Level 1) caching up and running, should ideally try deployment on a machine in the cloud and let it run for a bit to see how things go

Deliverable 1.x

Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.

Unit Tests

Unit-Test as much code as possible

Docs

Keep documenting stuff being done

Environment

What's required for: - Dev Environment - Deployment/Production

Possible stuff at the moment: 1. Python 2. pip 3. virtualenv

People to contact

In no particular order:

  1. bhearsum
  2. catlee
  3. nthomas
  4. hwine

Related Bug #s