Congratulations. You have been chosen to setup a new reference platform. Armen summarized this journey as "It will be difficult". In addition to testing the new image on the test or build machines, there are several other steps that must be taken to ensure that our build infrastructure is read to work with the new platform and the associated slaves. There are several tasks that you can do ahead of time to make it easier noted in the checklist below. Many of these involve IT, so open bugs accordingly.
Do you need a new master?
- Do you have enough capacity on your master to accommodate the new slaves?
- roughly, testing masters should have no more than 80 slaves (these days) (Todo: how to check what slaves are assigned to each master)
- Does your master reside in the same data center as your new slaves?
- If not, you should set up a new test master so both slaves and master reside in the same data center.
If so, open bugs for the new master
If the answer is no to any of the questions above, you'll need to setup a new master, unless there unused masters that are already provisioned and available to you. Open a bug with IT to bring up some VMs where you can install a new master (example: bug 782870. Read ReleaseEngineering/Master_Setup to understand the steps to setup a new master. This document also describes some bugs that need to be opened with various teams when setting up the new master, so read it now.
Open a bug to establish network flows to the sql server from the new master
- Open a bug (Server Operations::ACL Request) for network flows from your new master(s) to the sql server to example bug 783055
Open bugs so you can puppetize the new master(s) and add them to productionmasters.json
This is a example bug 783455. The buildmaster-production.pp (puppet-manifests) needs to have the new masters added to the nodes so you can puppetize the new masters. The productionmasters.json (tools) needs to have the new masters listed. I initially set them to disabled, they will be enabled when we are ready for production. Before the reconfig to enable the new master occurs, you should add ssh keys and an updated authorized_keys file to the master. Once the reconfig is complete, you'll need to start the new master.
Are you able to send mail to the tinderbox server from the new master?
bug 717808 is an example. Tested this tonight from bm37, think it works
Aug 19 16:47:06 buildbot-master38 sendmail: q7JNl5wW017859: to=<firstname.lastname@example.org>, ctladdr=<email@example.com> (0/0), delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=120434, relay=mx1.corp.phx1.mozilla.com. [220.127.116.11], dsn=5.1.1, stat=User unknown Aug 19 16:47:06 buildbot-master38 sendmail: q7JNl5wW017859: q7JNl6wW017861: DSN: User unknown Aug 19 16:47:06 buildbot-master38 sendmail: q7JNl6wW017861: to=<firstname.lastname@example.org>, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=31748, dsn=2.0.0, stat=Sent
Open a bug for buildbot and puppet changes
There are changes needed to buildbotcustom, buildbot-configs and puppet-manifests to support the new platform - example bug 777759. The buildbot-configs patch will have to wait to be released until all your testing is complete and the platform is ready to land in a reconfig. The buildbotcustom and puppet-manifests changes can be landed at any time. There are also changed required to enable a platform's tests running to mozilla-central + peers see https://bugzilla.mozilla.org/show_bug.cgi?id=777759#c31 for an example.
The changes in puppet-manifests repo are changes to the modules/buildmaster/templates/BuildSlaves-tests.py.erb file. (From rail: the BuildSlaves-tests.py.erb file is in the puppet-manifests repo, but these machines are running from puppet again master). As well, you'll need to update the secrets.pp.template and secrets.pp on master-puppet1, and replicate these changes to the other masters. Changes to the puppet-again servers are deployed automatically, changes to the puppet-manifests servers are not. See ReleaseEngineering/Puppet/Usage#Deploy_changes on how to deploy changes to servers pulling from the old puppet-manifests repo.
Open bugs for graph server changes
- testing machines and each type of build need graph server changes
- You can insert graph changes by logging to slavealloc
- graph server work needs to be run on staging (graphs_mozilla_org_new) and production (graphs_stage2_prod) graph server
- need to land changes to 'sql/data.sql' on the default branch of http://hg.mozilla.org/graphs (to match your inserts).
- If this is a new build platform, make sure that graph server knows about the build platform
- insert a machine name like %OS%_%branch% (e.g. "WINNT_5.2_mozilla-central" and "WINNT_5.2_mozilla-central_leak_test")
start transaction; insert into os_list values (null, "WINNT 6.2 x64"); SET @lastid = LAST_INSERT_ID(); insert into machines values (null, @lastid, 0, "2.67", "t-w864-ix-001", 1, unix_timestamp()); insert into machines values (null, @lastid, 0, "2.67", "t-w864-ix-002", 1, unix_timestamp()); commit;
NOTE: If you inadvertently add an incorrect entry to the graph database, it is best to remove that entry so it doesn't appear as an option on this page, for example: http://graphs.mozilla.org/graph.html
Open a bug for tbpl changes
- TBPL needs to be patched to support the new platform (see bug 782826 for an example of this change). Please file a bug here as soon as possible once the buildernames are known, to avoid delays - since TBPL pushes to production require IT intervention, in addition to the usual patch+review+staging turnaround time.
Open a bug for buildfaster changes
As easy as this (buildfaster_report.py):
('winxp', ['Rev3 WINNT 5.1']), + ('win8', ['Rev3 WINNT 6.2']), ]
https://hg.mozilla.org/build/tools/rev/5dbaa5080bcd https://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/f3eda2cc7d72 https://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/128bcd9c0e58
Disable tests on branches where this platform isn't needed
** bug 786424 shows an example, disabled Mountain Lion tests for mozilla-esr10 ** bug 803248 buildbot config changes to support panda_android*
When testing, ensure that you initiate sendchanges to both the the superset of branches, and the branches you want to limit the tests on to ensure when you run it in production there aren't any unexpected builds.
Slavealloc changes and cnames for slaves
- Open a bug with IT for the cnames for the slaves. Example bug 782870
- Add the new slaves to slavealloc (initially disabled)
- Add the slave password to the slave_passwords table for the appropriate poolid and distro
- Once your slaves are in slavealloc, tie them to your development master. Patch your development master so it includes the changes to accommodate the new platform. Set up your development master to test the new slaves and ensure that the tests run successfully for that platform.
Things to check for when running the tests:
- That the machines reboot after the tests complete
- That they can input results into the graphing server
- The Apache server serves the content correctly for the performance tests
- ...more to add
Moving new platform into production
- After testing is complete, you should be ready to move your new platform into production.
Once the slaves are in production, monitor last builds per slave http://build.mozilla.org/builds/last-job-per-slave.html to ensure there aren't any problems with hung slaves.
Notes on configuring the client
(perhaps this should be moved to another section or document)
- disable screensaver
- disable power savings
- test that resolution meets requirements set forth by devs. This may require a dongle.
Update releng monitoring/reporting
Wait times emails
Your new platform will appear as 'other' in the wait times emails unless you add a pattern match to the buildapi libs:
You will need to update the buildapi code in /home/buildapi/src on buildapi01.build.mozilla.org, and then restart the buildapi daemon for your code changes to take affect.
# buildapi@buildapi01 cd /home/buildapi/src hg pull && hg up -r default su - # root@buildapi01 /etc/init.d/buildapi restart
The buildfaster report will break unless you add your new platform to the list of _os_patterns in buildfaster_report.py:
Add support for the new build slave type to kittenherder so it can be auto-rebooted by kittenherder if/when needed. See bug 874957 for an example, and add a crontab entry on buildduty@cruncher similar to the following:
38 */6 * * * /home/buildduty/production/run_kittenherder.sh tegra