ReleaseEngineering/Buildbot Best Practices
This doc is superceded by ReleaseEngineering/Landing_Buildbot_Master_Changes, except that some of the rules still apply (local changes). NEEDS LOVE
We've gotten to the point of having many different masters, and more complex masters. It's becoming more and more important to have consistent setup in terms of where configuration files are stored, where supporting code is stored, and other such details. This document is a rulebook and guideline for how to maintain these machines and instances.
- buildbot-configs - This repository contains Buildbot master.cfg and mozconfig files for most of our Buildbot instances.
- buildbotcustom - This repository contains custom steps, factories, and other Mozilla specific Buildbot code required by our Buildbot masters.
- buildbot - This repository is an import of upstream Buildbot code, plus some patches that we require.
- Patches should not be landed during code freeze periods
- We track configuration updates to our masters on Maintenance. Any change to a production Buildbot Master should be tracked on this page.
- Patches should not be checked in until you are ready to update the master with them. This helps to avoid situations where an urgent fix needs to go in, and a random unrelated patch ends up getting enabled at the same time.
- All affected masters running production instances must be updated.
- Production masters should never contain local changes. Even if you are just testing something you should check it in and pull it rather than making the change locally. Having temporary changes in the version control history is useful when debugging things later.
- After landing a patch and reconfiguring the master, you should rebuild some old builds of affected components. This will help narrow down if problems come from changes to the masters, or product code changes.
- If you have a patch that affects multiple masters you should update all of the masters when you land your patch.
- This is less strict for staging masters. Eg, you shouldn't update a staging master if someone is currently using it.
- There are no staging repositories for buildbot or buildbotcustom. Because of this you should never check in code to either of these repositories until it is properly reviewed and about to be deployed in production.
- If you want to keep these repositories version controlled while doing development/testing you should create a user repository for them.
How to land a patch
Below is one way to land a patch for a Buildbot master. Your technique may vary, but this will ensure there are no other patches sneaking in with yours, and no local changes. Before doing any of the below, land your patches into the repositories.
For older masters (like production-master*, test-master*, talos-master02) the checkouts are typically at /tools/buildbotcustom/buildbotcustom and /builds/buildbot/configs. For buildbot-master* there are buildbotcustom and buildbot-configs checkouts in /builds/buildbot/$master. The following text describes the first case.
- Look for local changes on the master.
cd /tools/buildbotcustom/buildbotcustom hg diff cd /builds/buildbot/configs hg diff
- Pull in changes and inspect them before you update the local repository's working directory. If there are changes other than yours which affect the master you're updating you should check with the person that landed them and make sure they are safe to take at the same time. If you are unsure or the person is unavailable they may be backed out to avoid complicating the update. Use your judgment here.
cd /tools/buildbotcustom/buildbotcustom hg pull hg diff -rdefault # same thing for configs cd /builds/buildbot/configs hg pull hg diff -rdefault
- Now, assuming there are no interfering patches, update the local repository's working directory.
cd /tools/buildbotcustom/buildbotcustom hg up cd /builds/buildbot/configs hg up
- Run checkconfig to make sure the reconfig/restart will succeed
cd /builds/buildbot/$master buildbot checkconfig
- If everything looks OK, reconfig or restart the master
buildbot reconfig `pwd` # OR # You may need to run stop multiple times before Buildbot recognizes that the process is dead # You can also check 'ps auxwww | grep buildbot' for it. buildbot stop `pwd` buildbot start `pwd`
- Watch the Buildbot Waterfall for problems.