Confirmed users
2,456
edits
(Created page with "= Background = * http://atlee.ca/blog/posts/initial-jacuzzi-results.html * https://bugzilla.mozilla.org/show_bug.cgi?id=970738 = Implementation = Buildbot masters poll the Ja...") |
|||
| Line 4: | Line 4: | ||
= Implementation = | = Implementation = | ||
Buildbot masters poll the Jacuzzi Allocator (http://jacuzzi-allocator.pub.build.mozilla.org/v1/) for builder/machine assignments. The Jacuzzi allocator is backed by static files currently hosted in bhearsum's github repository (https://github.com/bhearsum/static-jacuzzis). To make changes to the jacuzzis, do the following: | Buildbot masters poll the Jacuzzi Allocator (http://jacuzzi-allocator.pub.build.mozilla.org/v1/) for builder/machine assignments. The Jacuzzi allocator is backed by static files currently hosted in bhearsum's github repository (https://github.com/bhearsum/static-jacuzzis). | ||
These allocations are updated periodically based on load. The allocate.py script is responsible for determining the proper allocations per jacuzzi. It first reads config.json, then looks at build history, and writes out an adjusted version of config.json. There are several parameters in the code that may be tweaks to adjust its behaviour. Currently it tries to make sure that no jacuzzi would have been more than 90% busy for more than 20 minutes per day over the past week. | |||
The manage_jacuzzis.py script is responsible for managing the specific machines in each jacuzzi, and populating the files. It reads config.json to determine how many of each type of machine should be in each jacuzzi. It also consults the [https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/machine_sanity/usable_slaves.json usable slaves report] and slavealloc to pick appropriate slaves. | |||
= Updating jacuzzis = | |||
To make changes to the jacuzzis, do the following: | |||
* Clone the static-jacuzzis repo | * Clone the static-jacuzzis repo | ||
* Modify the repo as necessary. Usually this requires you to add one or more files to the builders and machines directories, and modify allocated/all. | * Modify the repo as necessary. Usually this requires you to add one or more files to the builders and machines directories, and modify allocated/all. | ||
* Run the allocation report to look for inconsistencies: https://hg.mozilla.org/build/braindump/file/default/jacuzzi-related/allocation-report.py | * Run the allocation report to look for inconsistencies: https://hg.mozilla.org/build/braindump/file/default/jacuzzi-related/allocation-report.py | ||
* Push your changes | * Push your changes | ||
* | * The crontask on relengwebadm will automatically pull in changes on its next run | ||
== Disabling dynamic allocator == | |||
The dynamic allocator will normally override any changes to 'bld-linux64-spot-' and 'w64-ix-' amounts. To disable this behaviour (in the case of a bug), add a top-level key to config.json: "disabled": true | |||
== Adding a new builder to a jacuzzi == | |||
For linux64 based on Windows based builds, it's very simple. Simply add the builder name into the dictionary of builders in config.json with zero slaves allocated: | |||
"b2g_mozilla-central_emulator-debug_dep": { | |||
"bld-linux64-spot-": 0 | |||
}, | |||
Make sure to verify you've still got valid json. `python -m json.tool config.json` is a handy way to test this. | |||
The next time the allocator runs, it will calculate the proper number of machines for this jacuzzi. | |||
For other types of builds, allocate.py will need to be modified to support the new slave type. | |||
== Removing a builder from a jacuzzi == | |||
Simply delete it from config.json. | |||
= Troubleshooting = | |||
The allocation is run on relengwebadm.private.scl3.mozilla.com from /data/releng/src/jacuzzi-allocator/crontask.sh | |||
Take a look at the current allocations at http://jacuzzi-allocator.pub.build.mozilla.org/v1/. Do they match what's in the repo? | |||
If the jacuzzi looks like it has enough machines, but the machines aren't running, check what aws_watch_pending is doing, and that the slaves are actually usable. | |||
If all else fails, contact catlee | |||
= Current limitations / known issues = | |||
* manage_jacuzzis.py won't always remove unusable or disabled slaves from the allocations | |||
* allocate.py doesn't notice pending load. this means if a builder has low/no activity for a long time and then gets sudden high load, the allocations won't be adjusted in time | |||