ReleaseEngineering/Jacuzzis

From MozillaWiki
Jump to: navigation, search

Background

Implementation

Buildbot masters poll the Jacuzzi Allocator (http://jacuzzi-allocator.pub.build.mozilla.org/v1/) for builder/machine assignments. The Jacuzzi allocator is backed by static files currently hosted in the github repository (https://github.com/mozilla/releng-jacuzzis).

These allocations are updated periodically based on load. The allocate.py script is responsible for determining the proper allocations per jacuzzi. It first reads config.json, then looks at build history, and writes out an adjusted version of config.json. There are several parameters in the code that may be tweaks to adjust its behaviour. Currently it tries to make sure that no jacuzzi would have been more than 90% busy for more than 20 minutes per day over the past week.

The manage_jacuzzis.py script is responsible for managing the specific machines in each jacuzzi, and populating the files. It reads config.json to determine how many of each type of machine should be in each jacuzzi. It also consults the usable slaves report and slavealloc to pick appropriate slaves.

Runnable Code

The runnable scripts: allocate.py and manage_jacuzzis.py live in a separate repository from the static files: https://github.com/mozilla/build-jacuzzi-allocator

So, to modify anything aside from configuration, you'll need to submit a pull request to this repository and then run git pull from /data/releng/src/jacuzzi-allocator on the relengwebadm host -- where crontask.sh lives.

Updating jacuzzis

To make changes to the jacuzzis, do the following:

Disabling dynamic allocator

The dynamic allocator will normally override any changes to 'bld-linux64-spot-' and 'w64-ix-' amounts. To disable this behaviour (in the case of a bug), add a top-level key to config.json: "disabled": true

Adding a new builder to a jacuzzi

For linux64 based on Windows based builds, it's very simple. Simply add the builder name into the dictionary of builders in config.json with zero slaves allocated:

   "b2g_mozilla-central_emulator-debug_dep": {
         "bld-linux64-spot-": 0
   },

Make sure to verify you've still got valid json. `python -m json.tool config.json` is a handy way to test this.

The next time the allocator runs, it will calculate the proper number of machines for this jacuzzi.

For other types of builds, allocate.py will need to be modified to support the new slave type.

Removing a builder from a jacuzzi

Simply delete it from config.json.

Troubleshooting

The allocation is run on relengwebadm.private.scl3.mozilla.com from /data/releng/src/jacuzzi-allocator/crontask.sh

Take a look at the current allocations at http://jacuzzi-allocator.pub.build.mozilla.org/v1/. Do they match what's in the repo?

If the jacuzzi looks like it has enough machines, but the machines aren't running, check what aws_watch_pending is doing, and that the slaves are actually usable.

If all else fails, contact catlee

Current limitations / known issues

  • manage_jacuzzis.py won't always remove unusable or disabled slaves from the allocations
  • allocate.py doesn't notice pending load. this means if a builder has low/no activity for a long time and then gets sudden high load, the allocations won't be adjusted in time