Balrog

From MozillaWiki
Revision as of 00:21, 29 September 2015 by Maybe (talk | contribs) (→‎Filing Bugs: escape space in link)
Jump to navigation Jump to search

Overview

Balrog is the software that runs the server side component of the update system used by Firefox and other Mozilla products. It is the successor to AUS (Application Update Service), which did not scale to our current needs nor allow us to adapt to more recent business requirements. Balrog helps us ship updates faster and with much more flexability than we've had in the past.

Filing Bugs

Bugs and feature requests should be filed in the Backend or Frontend components of Bugzilla.

Database Model

Balrog's model centres around two concepts: Rules and Releases. When a request for update from an application is received it is matched up against the rules. Once the correct rule has been found, it contains a pointer to a Release, which contains all of the metadata needed to construct a proper update response. Rules and Releases are described in greater detail below:

Rules

The most important part of Balrog to understand is its rules. When a request comes in it is matched against each of Balrog's rule to find the one that best suits it (more in this below). Once found, Balrog looks at that rule's "mapping", which points at a release that has the required information to serve an update back to the client. Without any rules, Balrog will never serve an update. With badly configured rules Balrog could do bad things like serve Firefox updates to B2G devices.

In addition to the information found in the request, each rule also has a priority, which allows us to override updates for specific things while letting the rest "fall down" to a more general rule. For example, we block unsupported versions of OS' with a rule of the highest priority but continue to serve updates for users of other OS' with a rule of slightly lower priority. The ability to override and fallback by setting the priority is one of the reasons Balrog is so flexible, and a key way that it distinguishes itself from its predecessor.

What's in a rule?

Each rule has quite a lot of columns, but they all fall into one of the buckets below:

  • Matchable - these correspond to information provided in the update request, and are used to filter out rules that don't apply to the request.
  • Decision - these are also used to filter rules, but do not correspond to information in the request.
  • Response - these contain information that ends up in the response
  • Info - informational columns, not used as part of serving updates

Individual columns are detailed in the table below:

Attribute Category Description Matching logic Examples
Product Matchable The name of the application requesting an update. Exact string match only "Firefox" or "B2G"
Version Matchable The version of the application requesting an update. Exact string match or operator plus version to compare the incoming one against "36.0" or ">=38.0a1"
Channel Matchable The update channel of the application request an update. Exact string match or a string with "*" character to glob "nightly" or "beta*"
buildTarget Matchable The "build target" of the application requesting an update. This is usually related to the target platform the app was built for. Exact string match only "Darwin_x86_64-gcc3-u-i386-x86_64" or "flame-kk-userdebug"
buildID Matchable The build ID of the application requesting an update. Exact string match or operator plus buildid to compare the incoming one against "201410010830" or "<201512010830"
Locale Matchable The locale of the application requesting an update. Exact string match or comma separated list of locales to do an exact match on "de" or "en-US,en-GB,id"
osVersion Matchable The OS Version of the application requesting an update. This field is primarily used to point desupported operating systems to their last supported build. Partial string match or comma separated list of partial strings to match on "Windows_NT 5.0" or "Darwin 6,Darwin 7,Darwin 8"
distribution Matchable The partner distribution name of the application requesting an update or "default" if the application is not a partner build. Exact string match only "default" or "yahoo"
distVersion Matchable The version of the partner distribution of the application requesting an update or "default" if the application is not a partner build. Exact string match only "default" or "1.19"
headerArchitecture Matchable The architecture of the OS of the client as guessed based on build target. This field is mostly deprecated now that this information is included in the build target. Exact string match only "PPC" and "Intel" are the only possible values
Priority Decision The priority of the rule, relative to other rules. If multiple rules match an incoming request based on the Matchable columns, the rule with the highest priority is chosen. N/A Any number, by convention positive integers.
backgroundRate Decision The percentage of background update requests that should receive an update if they match this rule. Generally this is used as a throttle to increase or decrease the rate at which the majority of users receive an update. N/A Any number 0 to 100.
Mapping Response The Release to construct an update out of. This is a foreign key to the "name" column of the Releases table. N/A Any valid release name, or NULL.
update_type Response The update_type to use in the XML response. It's very rare for a rule to use anything other than "minor" these days. N/A "minor" or "major"
id Info The id of the rule. This id is necessary to make changes to the rule through the REST API. N/A Autoincrementing integer
Comment Info A string describing the purpose of the rule. Not always necessary for obvious rules. N/A Any string

How are requests matched up to rules?

The incoming request parts match up directly to incoming URL parts. For example, most update requests will send an URL in the following format:

/update/3/<product>/<version>/<buildID>/<buildTarget>/<locale>/<channel>/<osVersion>/<distribution>/<distVersion>/update.xml?force=1 # force can also be left off

The following logic is used to figure out which rule an update matches and what to respond with:

  1. If a rule specifies one of these fields and a request's field doesn't match it, the rule is considered not to be a match and the rule is ignored for that request. See above for details on how specific columns perform matching.
  2. If "force" wasn't specified, the backgroundRate of the selected rule is looked at.
  3. If we still choose serve an update after accounting for backgroundRate we look at the rule's mapping. This is a foreign key that points at an entry in the releases table. That row has most of the information we need to construct the update.
  4. Using the update_type and release that the mapping points to, construct and return an XML response with the details of the update for the client.

Releases

needs fleshing out
To Balrog, a "release" is data about a related set of builds. This does _not_ match up with the concept of a "release" being on the "beta", "release" or "esr" channel elsewhere. In Balrog, each set of nightlies on any branch is considered a release.

While there's no enforced format on release names, there are a few conventions that we use:

  • Nightly-style builds submit to releases named by product and branch. Each nightly generally submits to two different releases, one "dated" (eg: Firefox-mozilla-central-nightly-20150513010203) and one "latest" (eg: Firefox-mozilla-central-nightly-latest).
  • Release-style builds submit to releases named by product, version number, and build number, eg: Firefox-38.0-build1
  • GMP blobs are created by hand and generally named with the version of each plugin they contain in the name, eg: GMP-20150423-CDM-v4-OpenH264-v1.4

Permissions

The permissions table is a simple list of usernames and the ACLs that they have. A user could be an "admin", giving them write access to everything, or could have one or more specific permissions. For example, our "ffxbld" system account has access to make PUT requests to add data to "Firefox" or "Fennec". These specific ACLs let us do things such as give B2G folks access to Balrog without the risk of them or their tools accidentally messing up Firefox updates.

History Tables

Change attribution and recording is embedded deeply into Balrog. The rules, releases, and permissions tables all have a corresponding history table that records the time a change was made and who made it. This allows us to look back in time when debugging issues, attribute changes to people (aka blame), and quickly roll back bad changes.

Admin UI Use Cases

Locking/Unlocking Nightlies

One of the most common uses for the Balrog UI is to lock a nightly update channel to a specific release for a period of time, and then unlock it later (so that users on that channel start receiving the latest available build again). This is often done if a serious bug is introduced to minimize the number of users affected by it.

Taking the B2G nightly channel as an example, let's see how we would lock it to the nightlies from 20150505160203:

  1. Log in to https://aus4-admin.mozilla.org
  2. Click on the "Rules" link at the top of the page
  3. Use the filter in the top right to narrow down the rules to "product:B2G channel:nightly"
  4. Locate the rule (or rules) on the "nightly" channel
    • Changing the sort to "Product, Channel" will group things together better.
  5. For each rule on the channel:
    • Click the "Update" button to enter edit mode
    • Find the mapping field and replace the "-latest" part with "-20150505160203" (the UI will autocomplete this for you if you start typing).
    • Scroll down and click "Save Changes"

When you're ready to unlock the updates, follow the same steps as above but replace the "-20150505160203" part of the mapping with "-latest" again.

Adding a rule for a new update channel

When nightly builds are set up on a new branch, rules need to be added to Balrog for updates to be served. Note that the nightly build automation is responsible for providing metadata about each new set of builds to Balrog.

As an example, here is how B2G updates could be set up on a hypothetical "mozilla-b2g40" branch:

  1. Log in to https://aus4-admin.mozilla.org
  2. Click on the "Rules" link at the top of the page
  3. Click on "Add a new Rule" near the top left of the page
  4. Fill out the form as follows:
    • Product: B2G
    • Channel: nightly-b2g40
    • Mapping: B2G-mozilla-b2g40-nightly-latest
    • Rate: 100
    • Priority: 90
  5. Click "Save Changes"

Modify an existing release

Most modifications to releases are done by automation, but sometimes we need to tweak them by hand. For example, when adjusting What's New page configuration.

For example, if you wanted to modify the "Firefox-38.0-build3" release, follow these steps:

  1. Log in to https://aus4-admin.mozilla.org/
  2. Click the "Releases" link at the top of the page
  3. Find the "Firefox-38.0-build3" release and click the "Download" link
  4. Save the file locally and modify it to your liking
  5. Click the "Update" link for "Firefox-38.0-build3"
  6. Click "Browse" and choose your new local version
  7. Click "Save Changes"

Code Overview

Balrog's code is organized into roughly the following parts:

  • The blobs - These contain most of the brains (business logic) behind Balrog. They know how to validate new data coming into the system and translate existing data into useful responses to update requests.
  • The database abstraction layer - This layer sits between the actual database and the applications. It defines the database schema, performs permissions checking, and ensures all changes are written to history tables. Application should never touch the database directly - they should always go through this layer.
  • The user-facing application - The entry point to requests from applications looking for updates.
  • The admin API - A simple RESTful API that allows the Admin UI and automation to make changes to Balrog's database.
  • The admin UI A human friendly interface to manage updates.

Hacking

Balrog's code is split between the backend server and the admin ui. Both Github repositories use Travis for continuous integration and accept pull requests. To get both repositories do the following:

git clone https://github.com/mozilla/balrog
cd balrog
git submodule init
git submodule update

Basic Balrog development and testing can be done on your local machine. It is recommended that you use Vagrant to test your changes, because it is configured very similar to production. How to use it, and the standalone applications, is described below.

Vagrant

Vagrant provides a VM that is configured very similar to production and includes the user-facing application, admin API, admin UI, and some basic sample data. To use it, follow these steps:

  • install vagrant
  • Open a command line/terminal and browse to your Balrog clone
  • Type "vagrant up" to bring up the development environment
  • Add this to your hosts file
127.0.0.1 balrog.mozilla.dev
127.0.0.1 balrog-admin.mozilla.dev

Backend

Development Environment

Balrog is bundled with most of its required libraries, but you should still use a virtualenv to install the compiled and dev-only packages. To get started, create a new virtualenv and run the following:

pip install -r requirements/compiled.txt
pip install -r requirements/dev.txt

Once you've installed the necessary packages you can bring up the admin API and user-facing applications with the following commands:

python admin.py
python balrog-server.py

Unit Tests

Balrog comes with an extensive suite of unit tests which are a crucial component of ensuring that we can push changes without any interruption to service. You should always run tests before asking for review, and you should be adding new tests for most types of changes. To run the unit tests for the backend simply run:

make test

or if you want to generate a code coverage report:

make test COVERAGE=1

Frontend

Development Environment

The frontend is NOT bundled with all of its required dependencies. To install them, run:

npm install
npm install -g lineman

With that done, you can bring up an instance of the UI with:

lineman run

In order for it to have data and be functional you'll need the admin API running as well (see above).

Unit Tests

To run the unit tests you'll need the UI running first ("lineman run"), and then run:

lineman spec

Deploying Changes

The dev, stage, and production deployments of Balrog are managed by the Web Operations team. Details of the overall deployment can be found on Mana. This page describes how to go from a reviewed patch to deploying it in production.

Is now a good time?

Before you deploy, consider whether or not it's an appropriate time to. Some factors to consider:

  • Are we in the middle of an important release such as a chemspill? If so, it's probably not a good time to land non-trivial changes.
  • How risky are your changes? If they're high risk, deploying on a Friday is probably a bad idea.
  • Do you need to migrate any data? If you do, make sure you have time to do so right after deploying.

Landing

UI Changes

If you've made a change to the ui repository, make sure you run "lineman build" and commit the result before pushing that change. This will rebuild the UI application and put the result into "dist", which is where the deployed instances run out of. You also need to commit the subrepository change to the Balrog repository and push that back. Generally, the workflow for this is something like:

cd /your/balrog/repo
cd ui
lineman build
git commit -a -m "Rebuild UI."
git push origin
cd ..
git commit -a -m "Update to latest UI."
git push origin

Backend Changes

Just push your change to the master branch of the Balrog repository.

Testing

The dev environment will automatically pick up any changes you push. You should do some testing against the admin interface and public interface before proceeding further.

Pushing to stage and production

Once you're satisfied with your results while testing in dev, simply file an IT bug to have the changes pushed to stage and production. Make sure to block the bug(s) containing the code that's being pushed on it. If you'd like to be around when the change is pushed be sure to give a time frame, too.

After the change is pushed it's a good idea to keep do some quick verification in production. This could mean making changes by hand, checking update URLs, watching jobs that submit to Balrog, or other things -- it depends what time of change you've made.

Scripts

These tools all live in tools/scripts/updates. We use them to programmatically adjust the update server.

balrog-submitter.py

Used to submit nightly and release style builds into release blobs. Called on build slaves, once per each combination of platform-locale.

balrog-release-pusher.py

Used by the release automation to add the metadata to a release blob, and to push the new release onto the test channels. Example builder name - release-mozilla-beta-firefox_updates.

balrog-release-shipper.py

Used by the release automation to push a release to the production chanenl. Example builder name - release-mozilla-beta-update_shipping.

balrog-tweaker.py

We don't have UI support for modifying blobs, so this is a helper to submit a blob fragment to replace content on the server. Doesn't handle removing keys.

Example:

python scripts/updates/balrog-tweaker.py --json json -b 'Firefox-33.0-build1' --api-root 'https://aus4-admin.mozilla.org'  --credentials-file cred -u 'nthomas@mozilla.com' -v

where json is a file containing the fragment of json to submit, cred is a file containing the password for the -u argument, with format

balrog_credentials = {
     'username': 'password'
}

balrog-nightly-locker.py

Used to 'freeze' nightly updates by pointing to a dated release blob instead of the latest, eg for big code landings, merges. Also to unfreeze afterwards. More details on usage at Enable/disable updates on Aurora

Common code

cli.py and api.py in tools/lib/python/balrog/submitter/ provide the shared code.