ReleaseEngineering/How To/Manage spot AMIs

From MozillaWiki
Jump to: navigation, search


Something wrong with spot AMIs.


Spot AMIs are generated daily (almost) from scratch daily by

The scripts uses base AMIs generated by These are generated manually and contain only base system. Particular base AMIs used to generate final spot AMIs are listed in the instance configs (e.g. bld-linux64).

The script takes a base AMI, puppetizes it, and cleans up some files. Spot AMIs use cloud-init to bootstrap their hostnames specified it instance specific user-data.

Sometimes things can go wrong during this process.

  • the most recent puppet configs could result in some bad AMIs and the jobs that are being run on the corresponding instances may fail or not start. If that's the case, we should unregister those bad AMIs, delete the associated snapshots and terminate the faulty instances. That way, we'll get automation using the previous AMIs while we're figuring out what caused the issues.
  • some golden AMIs may get stuck in the creation process and we'll get a notification in #buildduty for that:
<nagios-releng> Fri 14:36:24 UTC [7351] [] age - golden AMI is CRITICAL: ELAPSED CRITICAL: 5 crit, 0 warn out of 5 processes with args 'ec2-golden' (

We should check the logs on both aws-manager2 (/var/log/messages) and on the instance itself to see why it got stuck. Depending on situation, if the issue is an isolated one we generally terminate the golden AMI instance in AWS console and kill the associated processes. We can then force the generation of a new golden AMI. On the other hand, if the golden AMI generation process does get stuck frequently, we should file a bug against Releng::Buildduty and work on a fix.


To find which AMI a particular instance uses you can run the following:

ssh cltbld@instance_ip curl

Verify that it matches the latest AMIs in output:

$ python scripts/
bld-linux64, us-east-1: ami-42cf4d38 (spot-bld-linux64-2017-11-17-09-43, ebs)
try-linux64, us-east-1: ami-38cc4e42 (spot-try-linux64-2017-11-17-09-36, ebs)
tst-linux64, us-east-1: ami-2fdb5955 (spot-tst-linux64-2017-11-17-10-11, ebs)
tst-linux32, us-east-1: ami-eed45694 (spot-tst-linux32-2017-11-17-10-13, ebs)
tst-emulator64, us-east-1: ami-90d95bea (spot-tst-emulator64-2017-11-17-10-33, ebs)
av-linux64, us-east-1: ami-4fd25035 (spot-av-linux64-2017-11-17-09-57, ebs)
bld-linux64, us-west-2: ami-c7d201bf (spot-bld-linux64-2017-11-17-09-43, ebs)
try-linux64, us-west-2: ami-f5a6758d (spot-try-linux64-2017-11-17-09-36, ebs)
tst-linux64, us-west-2: ami-c3a97abb (spot-tst-linux64-2017-11-17-10-11, ebs)
tst-linux32, us-west-2: ami-a5a774dd (spot-tst-linux32-2017-11-16-10-13, ebs)
tst-emulator64, us-west-2: ami-3bd20143 (spot-tst-emulator64-2017-11-17-10-33, ebs)
av-linux64, us-west-2: ami-9dae7de5 (spot-av-linux64-2017-11-17-09-57, ebs)

Delete the AMIs and the corresponding snapshots using AWS Web Console: us-east-1 us-west-2. To delete instances generated by those AMIs use the following script.

e.g. python scripts/ -v <ami_id>