User:Djmitche/Slave Allocator Proposal
Contents
Slave-side Operation
- currently implemented for Puppet-administered linux and mac slaves, but not for mobile, windows, or misc slaves.
Using runslave.py, each slave requests a new buildbot.tac each time before it starts the buildslave by issuing an HTTP GET to a simple URL containing the slave's hostname.
If the GET fails, and a buildbot.tac exists, it is used. If the file DO_NOT_START exists in the basedir, then no .tac file is requested and the slave is not started.
Runslave currently guesses the buildslave directory based on the hostname. We should have a more reliable way of doing this - bug 622980 - but this will do for now.
Allocator Design
- partially implemented locally; nothing committed
Intention
When a slave starts up, it should be allocated to the master that most needs it at that moment. This means that the appropriate master is calculated for each HTTP request, rather than being statically calculated and stored e.g., in a database.
This should be a minimally disruptive service. That is, we should not often need to poke and prod the slave allocator. It is not a slave management console by any stretch (that's another project).
Data
The allocator uses lists of slaves and masters - slaves are currently statically calculated from inventory; masters are based on Masters and Machine Bookings, but could potentially be based on catlee's masters JSON file.
Silos
Slaves are assigned to silos based on their innate characteristics. Those characteristics are:
- environment
- purpose
- distro
- bitlength
- datacenter
- trustlevel (corresponds to commit levels)
Each 6-tuple of such values constitutes a silo. Current silos, with counts, are (roughly - I know there are some problems with this list):
environment purpose distro bits dc trustlevel count preprod build centos5 32 mpt core 1 preprod build centos5 64 mpt core 1 preprod build darwin10 32 mpt core 1 preprod build darwin9 32 mpt core 1 preprod test darwin10 32 scl core 1 preprod test darwin9 32 scl core 1 preprod test fedora12 32 scl core 1 preprod test fedora12 64 scl core 1 production build centos5 32 mpt core 42 production build centos5 32 mpt tryuser 33 production build centos5 32 mtv core 25 production build centos5 32 mtv tryuser 10 production build centos5 32 scl core 27 production build centos5 64 mpt core 10 production build centos5 64 mpt tryuser 10 production build darwin10 32 mpt core 24 production build darwin10 32 mtv core 11 production build darwin9 32 mpt core 56 production build darwin9 32 mpt tryuser 36 production build darwin9 32 mtv core 9 production build darwin9 32 mtv tryuser 13 production build darwin9 64 mpt tryuser 26 production build win2k3sp2 32 mpt core 56 production build win2k3sp2 32 mpt tryuser 35 production build win2k3sp2 32 mtv core 47 production build win2k3sp2 32 scl core 17 production build win2k3sp2 64 mtv core 1 production geriatric darwin8 32 mpt core 18 production geriatric darwin8 32 mpt tryuser 2 production test android 32 mtv core 14 production test android-n900 32 mtv core 40 production test android-tegra 32 mtv core 93 production test darwin10 32 scl core 52 production test darwin9 32 scl core 50 production test fedora12 32 scl core 50 production test fedora12 64 scl core 51 production test win7 32 scl core 48 production test win7 64 scl core 50 production test winxp 32 scl core 46 staging build centos5 32 mpt core 4 staging build centos5 32 mtv core 3 staging build centos5 64 mpt core 1 staging build darwin10 32 mpt core 4 staging build darwin10 32 mtv core 2 staging build darwin9 32 mpt core 3 staging build win2k3sp2 32 mpt core 4 staging build win2k3sp2 32 mtv core 3 staging test darwin10 32 scl core 2 staging test darwin9 32 scl core 2 staging test fedora12 32 scl core 2 staging test fedora12 64 scl core 1 staging test win7 32 scl core 3 staging test winxp 32 scl core 6
Pools
Each masters and slaves is assigned a pool. My rough sketch of the available pools are:
pool nslaves masters bm-scl 44 bm01 bm02 geriatric 20 geriatric pm-mpt 285 pm pmX pm01 pm02 pm03 pmm-mtv 147 pmm01 pmm02 preprod 8 pp sched 0 scheduler_master tests_scheduler staging 24 sm01 sm02 sm03 sm04 rail bhearsum staging-mobile 0 smm smm2 staging-tests 16 mozilla-tests1 mozilla-tests2 tm-mpt 0 talos-master tm-mtv 0 talos-master02 tm01 tm02 tm-scl 347 tm03 tm04 tm05 tm06 try 159 try_trunk_master
- are the pm (production master) and bm (builder master) masters doing roughly the same thing? Should I name them 'bm-scl and bm-mpt'? --Dmitchell@mozilla.com 02:23, 5 January 2011 (PST)
- Aki sez: yes. also, smm* and pmm* can probably be taken out unless you're dealing with n900s
- Great, so pm == bm; I want to leave the mobile masters in there for now, even if they aren't being allocated, so that they're ready to roll if/when we decide to start hooking phones or foopies up to them. --Dmitchell@mozilla.com 16:07, 5 January 2011 (PST)
- Aki sez: yes. also, smm* and pmm* can probably be taken out unless you're dealing with n900s
Runtime Data
The allocator needs some estimate of which slaves are attached to which masters. A reasonable approximation is simply to remember allocations when they are made: if slave S is assigned to master M, then assume S is attached to M until it is reassigned.
The allocator also needs an up-to-date list of active masters. This doesn't change so often, so a static list (e.g., a human-edited DB table) is a reasonable approximation.
A more accurate determination of both can be made by polling the buildmasters: request a buildmaster's /about page before allocating a slave to it (to check up-ness), and periodically request and scrape the /buildslaves?no_builders=1 page to determine which slaves are actually attached to the master.
Balancing
Within each pool, the allocator attempts to balance the slaves in each silo across the available masters. Let's take an example from the data above. Specifically, let's look at the tm-scl pool and the <production, test, darwin9, 32, scl, core> silo. There are 50 slaves in this silo, all of which are assigned to this pool, and 4 masters in the pool. The ideal allocation, then, will attach 12 or 13 slaves to each of tm03..tm06.
The balancing algorithm will proceed as follows:
- determine the pool for the slave
- determine the silo for the slave
- for each active master in the pool, count the number of attached slaves from the silo
- attach the new slave to the master with the lowest count, sorting by master name where counts are equal
Implementation
The slave allocator needs to both serve HTTP requests for .tac files and run background operations such as polling masters. None of the "cool" Python web frameworks that I know of (Pylons, Django, Plone) support this without lots of gymnastics, but trusty old Twisted supports it quite nicely.
I'm building the slave allocator to run as a Twisted daemon. It has a command-line utility for all other management tasks, so there is no web-based management UI. The command-line utility currently speaks directly to the database, but that can certainly be changed later.
Not Another Standalone System!
Yep! I couldn't find any other system into which this would fit nicely.
Running a Downtime
When it's time for a rolling downtime, we select at most one master at a time from each pool, configure the allocator to allocate slaves away from it, and wait until it has no more slaves attached before shutting it down (no need to be graceful, but it won't hurt!). It will require a little bit of coordination between engineers to ensure that we don't shut down too many masters in a pool, but assuming that each pool has as a sufficient number of masters to run with one master down, we can perform a rolling downtime with no loss in capacity.
Relationship to Pods
Pods are a great way to conceptualize ensuring sufficient redundancy for release engineering infrastructure. However, they inform the decision of where to put particular types of slaves and masters. Once that decision is made, the slave allocator's job is very simple: connect slaves to appropriate masters as described above. So the allocator is not particularly concerned with pods in and of themselves, although pod-related considerations guide the decisions that create the data in the allocator's masters and slaves tables.