ReleaseEngineering/How To/Set Up a New AWS Worker Class

From MozillaWiki
Jump to: navigation, search


How to Set up a New AWS Worker Class

We currently run most of Linux based tests on AWS. AWS has many instance types. We use spot instances because they are cheaper than on-demand. The default instance type we run most of our Linux based tests is m1.medium. It is low cost but is not powerful enough for some of our tests, like crashtests and reftests on Android emulators, or media or gaia-ui tests on B2G. So how do you add a new worker class for so that different tests can run within different instance types within the for the same platform. For example this how the worker platforms are defined for Android 2.3:

PLATFORMS['android']['ubuntu64_vm_mobile'] = {
    'name': "Android 2.3 Emulator",
}
PLATFORMS['android']['ubuntu64_vm_large'] = {
    'name': "Android 2.3 Emulator",
}

This example will focus on the adding tst-emulator64-spot instance worker type because this is the one I recently added in bug 1031083: implement c3.xlarge worker class for Linux64 test spot instances. The repo you'll need to work with are puppet , cloud tools and buildbot-configs. You'll also need to work with the tools repo if you need to enable new buildbot masters as part of this change.

Create new AMI

  • To do: describe how to create an AMI from scratch, I just reused an existing AMI

The configs for each AMI are stored in cloud tools. For the tst-emulator64-spot AMI I created, I reused the existing tst-linux64 AMI id and copied it into the new configs I created. However, I changed the instance type to be c3.xlarge instead of m1.medium and also had to tweak the subnets because this instance type is not available in some AWS regions. You'll note that the configs define the AMI in two regions - us-east1 and us-west2.

Update Cloud Tools with new platform

I also updated cloud-tools so this new AMI would be listed as a platform. You can see the changes required here and here.

Create golden AMI image

This config was then added to puppet so the AMI should have been created automatically the next time it ran. Also, we had to use invtool to add A and PTR records into DNS for the golden image. The cron job that creates the golden images is stored in Puppet. You can login to the aws manager machine to run the cron jobs manually and speed things up.

Loan yourself machine and test on dev-master

Once you have the AMI image up and running, you can loan yourself a slave of that type and try to run tests on it on your dev-master. You'll have to change the config in the loan document to match your new machine type, i.e. instead of cloud-tools/configs/tst-linux$arch it should be cloud-tools/configs/tst-emulator64 to follow my example.

Write patches to enable the new platform in buildbot and Puppet

Example of a change to add a new machine class to puppet. In this example I added two new puppet machine classes despite them being mapped to the same AWS instance type. This is because of duplicate builder issues in buildbot. The name of the platforms (ubuntu64_vm_armv6_large and ubuntu64_vm_large) is included in the builder directory and thus if you have crashtests running on both platforms you'll run into a duplicate builder issue.

Here's an example of the buildbot changes required to enable this new machine platform on ash

Update Slavealloc db

This also requires adding the names of the new machines (and masters if applicable) to slavealloc. An example here. To add machines to slavealloc you can use the dbimport tool as described here.

Add new buildbot masters if required

Add new masters to handle the load from the slaves is required. https://wiki.mozilla.org/ReleaseEngineering/AWS_Master_Setup. A good example of the changes required is bug 1035863. As the doc states, lock the master to some slaves and verify that the jobs run green before enabling the new masters in production. Then enable the master in tools and reconfig enable the new buildbot masters and buildbot-config changes.

Write patches so watch_pending.cfg will allocate machines to this pool

At first, I tested this on ash so the regexp was only matched certain tests on ash.

Ensure tests run green on this branch.

Enable builders on other relevant branches

By landing patches in buildbot configs, running a reconfig and then adjusting the regexp in watch_pending.py to allocate machines to all branches for these tests.

Adjust size of slave pool if pending counts are high

Update buildbot-configs/mozilla-tests/production_config.py and add new slaves to slavealloc] and reconfig.

Close bug!

Video presentation of this topic

https://wiki.mozilla.org/ReleaseEngineering/Blackbox_Sessions/08-15-2014