User:Ffledgling/Senbonzakura: Difference between revisions

Updated to reflect current state and understanding of project
m (formatting)
(Updated to reflect current state and understanding of project)
Line 1: Line 1:
This service (I'm calling it [http://en.wikipedia.org/wiki/Byakuya_Kuchiki#Senbonzakura Senbonzakura] for now) will generate partial .mar (Mozilla ARchive) files for updates from Version A to Version B on demand.
This service, I'm calling it [http://en.wikipedia.org/wiki/Byakuya_Kuchiki#Senbonzakura Senbonzakura] (or 'SBZ' for those who prefer TL;DRing everthing) will generate partial MAR (Mozilla ARchive) files for updates from Version A to Version B of firefox on demand.


== Benefits ==
== Benefits ==


* Generate updates on the fly
* Generate updates on the fly.
* Generate updates on a need-only basis
* Generate updates on a need-only basis.
* Separate the update mar generation process form the build process (speed up ze builds!)
* Separate the update mar generation process form the build process (speed up ze builds!)
* Greater flexibility in what update paths we need/want
* Update generation as a service rather than a step that simply happens during the build process. This makes the updates available to a wider audience, although the consequences of doing so are a little unclear at the moment.
* Helps transition older/no-longer-supported firefox versions to newer firefox versions, without adding delays and/or adding to compute time during the build process.
* Greater flexibility in what update paths we need/want.
 
== Open Issues ==
 
These are a list of 'issues' that have no definite solution at the moment, but are important in some way or the other and thus need to be kept note of.
 
* Figure out tool versioning.
* Integration with Releng API (need to talk to dustin after we have a concrete prototype)
* Parallelizing the MAR build process further by using separate celery workers or subprocess calls to fetch the MARs and do diffs on larger files (ref: Level 2 caching)
* Do we need end-to-end testing? Mozmill has a suite of tests called [http://hg.mozilla.org/qa/mozmill-tests/file/tip/firefox/tests/update/testDirectUpdate/ update tests] that apply a MAR and check if the update applied correctly, can we/do we want to use this to test our prototype? How is QA affected when we change the way we generate our updates? Can they still test if Firefox updates correctly? We might want to talk to Henrik(:whimnoo) or Clint(:ctalbert) eventually. (See conversation snippet at the end)
* What do we want to use for our Caching layer? Why is X better/preferred over Y?
* There seems to be some confusion about whether all the required tooling will be available somewhere (even in-tree) for some of the older Firefox versions (talk to bhearsum & catlee)
* Other open issues?
 
=== Pertinent Questions ===
 
Subset of Open Issues, using this as a scratchpad to note down issues and later polish them and move them upto the Open Issues section
 
* does the client require the request to be synchronous or asynchronous?
* does the client require any progress information?
* will any client need to ask if the partial mar already exists?
* how will cache maintenance/invalidation be handled? (same api, admin api, cli, scripts, docs?)
* what type of docs are planned.


== Structure ==
== Structure ==


=== Function Signature (?) ===
=== Service Signature ===


Input: URL for Cmar1, URL for Cmar2, Cmar1 Hash, Cmar2 Hash<br />Output: Pmar1-2
Input : URL for CompleteMAR1, URL for CompleteMAR2, CompleteMAR1 Hash, CompleteMAR2 Hash
 
Output : PartialMAR1-2 (Available to the user/client in some form)


=== Web API implementation ===
=== Web API implementation ===


There are a couple of questions we might want to answer before we begin designing our API.
# '''GET'''


* What kind of requests will be sent to the API?
Sent to <code>/partial/\&lt;identifier\&gt;</code> . Where identifier is a valid identifier returned by the POST request sent to the <code>/partial/</code>
* Who will be sending them?
* What will they look like? format?
** HEAD: return pmar's meta information?
** GET: retrieve pmar if it exists, else error?
** POST: request pmar generation, gives pmar URLs and Hashes
** PUT: ?
** PATCH: ?
** DELETE: Delete mentioned update path? (do we want this?)
* Do we need a separate admin API?<br />
* What all might we need it for?
** Cache control (invalidation, flush, changes?)
** Starting/restarting service (or will this be done via ssh?)
** ?
* What kindof information is to be exposed by the API?
** Available update paths?
** ?


resources: - http://blog.luisrei.com/articles/rest.html - http://blog.luisrei.com/articles/flaskrest.html
<ol start="2" style="list-style-type: decimal;">
<li>'''POST'''</li></ol>


=== Internally: ===
Sent to <code>/partial/</code> The POST request is used to request the generation of the partial MAR file to update from a source mar to a destination mar.


* Fetch Cmar's<br /> Use a resilient retry library here
The parameters that need to passed in as part of the post request are:
* verify hashes (sanity check)
* cache Cmars<br /> Where, How needs to be decided, so ideally have two functions approximating, storage of Cmar, Lookup of Cmar based on it's hash, retrieval of Cmar based on it's hash
* determine which version of the mar, mbsdiff tools to use, use them.<br /> These probably need to be cached as well, maybe based on own version, maybe based on gecko version, simply keep a function that decides and determines which one to use and points you to the right one. Use the one given by that tool, assume abstraction.<br /> We might have to cache these as well based on the version of update paths we're given.
* generate the partial mar file based on the input .mar's and the given mar, mbsdiff tools.
* cache the generated partial mar file based on the update path or based on a combination of the hashes of the input mar files.<br /> Where and how the partial mars are actually cached again depends on our caching strategy, we simply use our abstraction functions.


=== API &amp; Frontend ===
<pre>  mar_from        : HTTP URL to the complete source MAR file
  mar_to          : HTTP URL to the complete destination MAR file
  mar_from_hash  : MD5 hash of the source MAR file
  mar_to_hash    : MD5 hash of the destination MAR file</pre>
Although the API has been concretized to some extent it is still subject to change based on the one or more of the following factors:


* have a web API that allows one to trigger request partial mar generation between two given mar files. ('''Priority''')
* The kind of requests will be sent to the API
* have a GUI/webpage a front end that kind of does the same
* The Client and/or the target audience using the API. Possible clients include:
** Balrog
** Developers
** QA
* Requirements for usage of the other HTTP Verbs
** PUT : What would we want to 'PUT' on our servers?
** PATCH : What would we want to PATCH on our servers? (Monkey Patching code? Probably not a good idea.)
** DELETE : Delete/Stop Serving a partial mar from Senbonzakura
* The need for a separate Admin API. The admin API would theoretically allow an authenticated entity known as the 'Admin' to perform administrative tasks on the service such as but not limited to:
* Restarting Partial MAR builds that have aborted or stalled or failed for some reason
* Control over the cache, such as:
** Flushing/Invalidating the cache
** Removing selected files.
* Reseting the entire database.
* Extra infromation that the API might need to expose.


== Scaling, Resilience and Caching ==
resources regarding REST API design:


It is probably best to design for scalability, resilience and caching from the ground up so things to keep in mind are:
* http://blog.luisrei.com/articles/rest.html
* http://blog.luisrei.com/articles/flaskrest.html
 
== Caching ==
 
This service will be dealing with and generating a lot of files. It therefore makes sense to have an underlying caching layer that stores the generated and downloaded files/tools.
 
The caching layer can be implemented in a number of ways, some of the initial ideas being: - As storage on Amazon S3 - As a shared NFS file-system - Local storage on the nodes (probably not the best way)
 
There are certain requirements that are imposed on the caching layer, and more might be added as the requirements for the caching layer clear up. Some of these requirements are as followed:
 
* Must be agnostic to the file type being stored in the cache.
* Accessing the cache Must be much faster than directly accessing the files via a direct download.
* The caching layer should provide an identifier that can be used to uniquely identify and reference the files in the cache.
* The caching layer should ideally have fast read, write and lookup, but in a toss up between all the 3, lookup and read need to be the faster operations (they will ideally be used much more than anything else)
* OPTIONAL: A method to access files via the identifier over the network, so that clients/users can directly access the files in the cache without Senbonzakura acting as middle man.
 
There are two levels of caching that are planned for this service, detailed as follows:
 
=== Level 0 ===
 
This level simply keeps track of the downloaded files and their hashes on the worker's local file system. This cache is not persistant and is not meant to be, this is simply a cache that exists for convenience.
 
This Cache level has not been stubbed out yet and may or may not make it into the service.


* Retry retry retry
''Requires Discussion''
* Log more than enough to debug
* Have our application/service start up from a config file
* Do not trust your machine to store state, keep it on disk or on file?
* abstraction abstraction abstraction?


When trying to combine scaling and caching, we need to think about how and where we'll store all our cached stuff? - locally on each machine? - S3?
=== Level 1 Caching ===


How do we optimize caching? Will depend on caching strategy.
This level does caching at the MAR level. Downloaded complete MARs are cached to save bandwidth and improve speed during the Partial MAR generation phase.


=== Level 1 Caching/Storage ===
Partial MARs are stored in the Cache after generation and are returned after a lookup in the Cache when requested for by the client.


We simply store partialMar.versionA.versionB somewhere, perhaps centrally on an ftp server or on S3.
Each of the files are identified by a unique identifier which at the moment is the MD5 Hash of the file for lack of a better function.


=== Level 2 Caching ===
=== Level 2 Caching ===


A lot of the bigger stuff between releases like the XUL libs on every platform remain the same despite different locales, this locale independent stuff should probably be cached and re-used. Since we plan to things at the file level, we might also want to cache the diffs b/w the commonly used files to speed things up further. What kindof speed up will this give us? (is this possible with the way our scripts currently work? I think it is, confirmation needed)
A lot of the bigger stuff between releases like the XUL libs on every platform remain the same despite different locales, this locale independent stuff should probably be cached and re-used. The level will cache the files inside the different MAR versions.
 
The idea is to not re-do already done work by diff-ing files or to be aware of the files that don't need to be diff'd.
 
If we take the example of the XUL binary, it is an extremely large binary that takes a very large chunk of the total time it takes to generate a partial MAR. If we can recognize that the XUL binary has not changed, we can skip the binary diff'ing step and this should theoretically save us a lot of compute time and resources. If we also manage to cache the binary diff of two different XUL runners, this diff is useful to cache and keep track of because, we this is likely to be common across all firefox version updates regardless of locales, so it should help us speed up partial MAR generation after we have the diff'd binary as long as we can recognize the duplication effort.
 
The actual recognition logic will be separate from the caching layer and ideally a part of the par generation/diff'ing service.
 
== Implementation details ==
 
=== Dependencies ===
 
Nearly everything we use is pip installable for the application, but the host machine must provide a few things that might not be pip installable. The known ones are:
 
* RabbitMQ (or anykind of message queue to be used by Celery)
* Virtualenv
* Python 2.7
 
=== File Structure ===
 
# <code>api.py</code><br /> This file contains all the Flask related code for routing and handling the API call parameters.
# <code>cache.py</code><br /> This is currently a stub file that contains function prototypes for the caching layer.
# <code>core.py</code><br /> This file contains all the core logic for Building and generating MARs.
# <code>csum.py</code><br /> This file contains checksum calculation and verification functions, mostly just a conveinent wrapper over python's unbuilt hashlib
# <code>db.py</code><br /> This file contains the Database utilities, in essence wrapper functions that make Insert, Update and Search operations on the database more convienent to use.
# <code>db_classes.py</code><br /> Defines the database schema and provides other convienent excpetions and Enum-Style Dicts for status codes. Used directly only by <code>db.py</code>
# <code>fetch.py</code><br /> This file contains methods to download/fetch files given a URI.
# <code>flasktask.py</code><br /> File contains class for flask and celery integration, but isn't actually used at the moment. Might be removed
# <code>tasks.py</code><br /> This file contains wrappers that call the core functions from <code>core.py</code> for celery.
 
=== Known Issues: ===
 
* DB Errors are handled very poorly at the moment.
* Parameter validation in the flask API as well as for the DB wrapper functions is poor, invalid parameters or empty strings slip through
* The <code>unwrap_full_update.pl</code> and <code>make_incremental_update.sh</code> are known to require chmod +x or the equivalent otherwise the subprocess calls fails with a PermissionDenied Error.
* The DB doesn't seem to handle repeat triggers very well, something needs to be improved in that portion of the code.
 
=== Things to take care about: ===
 
Use a resilient retry library while fetching (bhearsum's redo is a good one to look at)
 
Catching Exceptions and raising the correct exceptions at different parts in the code. Currently a lot of places have a commented out <code>raise</code> these need actual custom exceptions and need to be raised. These and other exceptions need to be caught and handled properly so that the build does not fail in between and if it does there's enough traceback or logs to debug.
 
Replace all the print statements with logging statements and LOG ALL THE THINGS ~!
 
Unit-test ALL of teh things!
 
== Tooling ==
 
We need to figure out how which tools to use with any given combination of CompleteMAR files. There are atleast three different versions of these tools and there is no central location for these tools.
 
Tools also fall into two categories:
 
# The partial mar generation scripts.
# The <code>mar</code> and <code>mbsdiff</code> binaries.
 
These live in separate locations and it might be in our best interest to consolidate them.
 
To be able to decide which tools to use with the targeted version of firefox, we need to figure out a Tool Version --&gt; FF version mapping. To the best of my knowledge and based on feedback from Ben and Catlee such a mapping does not exist at the moment and will need to be built as part of the project going forward.
 
How do we handle fetching/Building/using the tools? Issues: - Tools like <code>mar</code> and <code>mbsdiff</code> are built as part of a firefox build. Their source code exists in Mozilla Central, but the complied binaries are built as part of the build and available on FTP.m.o after a build has been completed, do we pull the source in and compile them? Do we keep pre-compiled versions at hand? - To move to central repo or not to move to a central repo, that is the question. - As ranted about above, versioning.
 
== Note on Scaling, Resilience and Caching ==
 
It is probably best to design for scalability, resilience and caching from the ground up so things to keep in mind are:
 
* Retry retry retry
* Log more than enough to debug (See Things to care about above)
* Have our application/service start up from a config file
* <s>Do not trust your machine to store state, keep it on disk or on file?</s><br /> We now use an SQL database to do this.
* abstraction abstraction abstraction?
 
How do we optimize our caching? It will depend on caching strategy and underlying caching layer in use.


== Signing and Certs ==
== Signing and Certs ==


Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things. Feedback needed by catlee, nthomas, bhearsum
Still very hazy on how this plugins into the rest of the system, where it's needed and how if at all it changes things. Feedback needed by catlee, nthomas, bhearsum
== Pertinent Questions ==
* does the client require the request to be synchronous or asynchronous?
* does the client require any progress information?
* will any client need to ask if the partial mar already exists?
* how will cache maintenance/invalidation be handled? (same api, admin api, cli, scripts, docs?)
* what type of docs are planned.


== Issues ==
== Issues ==


# Catlee's partial's on demand vs. nthomas's ... [https://bugzilla.mozilla.org/show_bug.cgi?id=770995#c0 something else]
# Catlee's partial's on demand vs. nthomas's [https://bugzilla.mozilla.org/show_bug.cgi?id=770995#c0 something else]
# Signing explanation
# Signing explanation
# What do we do about the tool versioning?
# What do we do about the tool versioning?
== Implementation questions ==
* Will we have mar installed?
* How do we handle multiple mar versions?
* Where do we put them?
* Does it make sense to modify the script? Probably not, because we have no control over the older scripts
* How do we fetch the tools? Just the ones we need without cloning all of MC.


== Deliverables ==
== Deliverables ==
Line 112: Line 227:
Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.
Change things around based on feedback from various team members, fine tune the system, add features requested and most importantly iron out glitches and swat those bugs.


=== Unit Tests ===
== Unit Tests ==


Unit-Test as much code as possible
Unit-Test as much code as possible


=== Docs ===
== Docs ==


Keep documenting stuff being done
Keep documenting stuff being done.<br /> Using this Wiki as general documentation purposes.<br /> Use Sphinx for API level documentation.


=== Environment ===
== Repository ==


What's required for:
We have a Github Repo -- [https://github.com/ffledgling/Senbonzakura/ Senbonzakura]
 
* Dev Environment
* Deployment/Production
 
Possible stuff at the moment:
 
# Python
# pip
# virtualenv


== People to contact ==
== People to contact ==
Line 147: Line 253:
* https://bugzilla.mozilla.org/show_bug.cgi?id=941949
* https://bugzilla.mozilla.org/show_bug.cgi?id=941949
* https://bugzilla.mozilla.org/show_bug.cgi?id=797033
* https://bugzilla.mozilla.org/show_bug.cgi?id=797033
== Relevant Links ==
* https://developer.mozilla.org/en-US/docs/Mozilla/Projects/XULRunner/Application_Update#Update_Server
== IRC Conversation snippets ==
Conversation with Henrik re: Browser update testing
<pre>22:15 &lt; ffledgling&gt; I was wondering if it's possible to use mozmill to test browser updates
22:15 &lt; ffledgling&gt; but with custom MAR files and make sure they applied correctly?
22:16 &lt;@whimboo&gt; browser updates? thats something we are doing for a long time
22:16 &lt; ffledgling&gt; I think I found some tests that do what I want with the actual updation from offical servers -- http://hg.mozilla.org/qa/mozmill-tests/file/tip/firefox/tests/update/testDirectUpdate/
22:16 &lt;@whimboo&gt; the only thing you would have to do is to set the right update url
22:16 &lt; ffledgling&gt; whimboo: yes, but I want to use a custom MAR
22:16 &lt;@whimboo&gt; right
22:17 &lt; ffledgling&gt; ah, can you point me to how I can configure that?
22:17 &lt;@whimboo&gt; as said you would have to modify the update server url
22:17 &lt;@whimboo&gt; app.update.url
22:18 &lt;@whimboo&gt; just change that pref and ensure to send correct update snippets</pre>
Confirmed users
22

edits