CIDuty/Reconfigs: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (Jlund moved page Buildduty/Reconfigs to CIDuty/Reconfigs: changing team name)
 
(24 intermediate revisions by 7 users not shown)
Line 1: Line 1:
Buildduty is responsible for reconfig-ing the Buildbot masters to get release engineering code changes into production.
__TOC__
{{Release Engineering How To|Reconfigs}}


The person doing reconfigs should also update the [https://wiki.mozilla.org/ReleaseEngineering:Maintenance#Reconfigs_.2F_Deployments reconfig deployments page].
A reconfig (short for "reconfigure the buildbot masters") is how changes to buildbot configurations make it into production. CiDuty is responsible for running reconfigs.


= Scheduled reconfigs =
In a nutshell, a reconfig consists of:
Scheduled reconfigs are *supposed* to happen <b>every Monday and Thursday</b>. During this, buildduty needs to merge default -> production branches and reconfig the affected masters. [[ReleaseEngineering/Landing_Buildbot_Master_Changes|The Landing Buildbot Master Changes wiki page]] has step by step instructions.
* moving the ''production'' tag for the [https://hg.mozilla.org/build/buildbot-configs buildbot-configs] repository, and the ''production-0.8'' tag for the [https://hg.mozilla.org/build/buildbotcustom buildbotcustom] to the current tip (or chosen revision)
* updating the source checkout of both repositories on all the buildbot masters
* updating the [https://hg.mozilla.org/build/tools tools] checkout on each master to the current tip
* executing a buildbot reconfig command on each buildbot master


It is also valid to do other additional reconfigs anytime you want. Other release engineers may have important changes to land that don't coincide with the scheduled reconfigs.
Please see the [[ReleaseEngineering/How_To/Land_Buildbot_Master_Changes|instructions for how to land buildbot changes]] for more information.


It is polite to ask in #mozbuild if anyone has further changes to land before starting the reconfig process.
It is polite to ask in #releng if anyone has further changes to land before starting the reconfig process.


While merging buildbot-configs default -> production and buildbotcustom default -> production-0.8, merge mozharness default -> production as well.
= How to reconfig =
The current state of the art is to use the [https://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/end_to_end_reconfig.sh end_to_end_reconfig.sh] script. To see all the available options the script provides:
<pre>
bash ./end_to_end_reconfig.sh -h
</pre>  


If there are changes to tools that impact the foopies, you should update to the [[ReleaseEngineering/How_To/Android_Tegras#Update_software_on_the_foopies|latest version of code on the foopies]].
The end_to_end_reconfig.sh script uses the original [https://wiki.mozilla.org/ReleaseEngineering/Managing_Buildbot_with_Fabric fabric scripts] as it's core, but also takes care of updating the wiki, updating bugs affected by the merge, and updating the tools checkouts on foopies as well. As such, the script has a few, indirect python module dependencies:
* fabric
* requests
However, the script will try to install the dependencies automatically if they are missing.


= How to reconfig =
In order to update the wiki/bugzilla, the script also requires your credentials for those services in a config file:
You should [https://wiki.mozilla.org/ReleaseEngineering/Managing_Buildbot_with_Fabric use Fabric to do the reconfig]. That's kind of old school and error prone.  Even better - use the script pmoore wrote. It's full of awesome. Unicorns oo.  {{bug|1018248}} See http://hg.mozilla.org/build/tools/file/7401eb2160cf/buildfarm/maintenance/end_to_end_reconfig.sh/end_to_end_reconfig.sh
 
# Needed if updating wiki - note the wiki does *not* use your LDAP credentials...
export WIKI_USERNAME=XXX
export WIKI_PASSWORD=XXXXXXXX
# Needed if updating Bugzilla bugs to mark them as in production - *no* Persona integration - must be a  native Bugzilla account.. .
# Details for the 'Release Engineering SlaveAPI Service' <slaveapi@mozilla.releng.tld> Bugzilla user can be found in the RelEng
# private repo, in file passwords/slaveapi-bugzilla.txt.gpg (needs decrypting with an approved gpg key).
export BUGZILLA_USERNAME='XXX@mozilla.com'
export BUGZILLA_PASSWORD='XXXXXXXXX'
# Used for slaveapi actions
export LDAP_USERNAME='XXX@mozilla.com'
export LDAP_PASSWORD='XXXXXXXX'
 
The script will also:
* create a template config file (default is ''~/.reconfig/config'') if it does not exist, but you'll need to fill in your own credentials before it will work.
* attempt to update IRC with reconfig status. To do so, it uses a minimal bash IRC client called [http://tools.suckless.org/ii/ ii]. You can [http://tools.suckless.org/ii/ download ii from their website], or install it via a package manager (e.g. port install ii). Updating irc is non-fatal, but make sure ''ii'' is in your PATH if you want it to work.
* create a temporary folder (''/tmp/reconfig'') to store the reconfig files. When starting a new reconfig, you should delete the reconfig folder corresponding to an older run of the script before attempting to run it again.
 
= Updating the pinned version of mozharness on mozilla-central =
The revision of [https://hg.mozilla.org/build/mozharness mozharness] used by a particular branch of mozilla code is now tracked in-tree. As a courtesy to developers and sheriffs, CiDuty is expected to update the pinned revision in the mozilla-central integration branch when they move the <code>production</code> tag. The change will be merged from mozilla-central to other branches by sheriffs as part of their normal duties.
 
The pinned revision is tracked in this file: http://hg.mozilla.org/mozilla-central/file/920ded6a1f77/testing/mozharness/mozharness.json
 
Update the revision in the file to point to the revision of the new mozharness <code>production</code> tag and land normally.
 
NOTE: once all of mozharness moves in-tree, this step will be unnecessary.
 
= Updating master/master_config.json =
 
If you have added/removed a platform that will change the content of master/master_config.json in tools/buildfarm/maintenance/production_masters.json, you'll need to manually update the masters that this change impacts because the end_to_end_reconfig.sh script does not do this step. {{bug|1215294}} opened to enable this in the script.
 
Example
<pre>
cd /builds/buildbot/tests1-linux64/
export PRODUCTION_MASTERS=tools/buildfarm/maintenance/production-masters.json
python buildbot-configs/update-master-json.py $PRODUCTION_MASTERS master/master_config.json
make checkconfig
make reconfig
</pre>
 
= Help, my reconfig failed! =
Assuming you're using the end_to_end_reconfig.sh script, you can resume after fixing the error. Errors and exit state can be found in the manage_masters-##########.log which is created in /tmp/reconfig by default. The hourly auto reconfig logs are in the bb dir as <tt>reconfig.{log,lock}</tt>.
 
Running the script again will yield the following menu:
 
* Please select one of the following options:
1) Continue with existing reconfig (e.g. if you have resolved a merge conflict)
2) Delete saved state for existing reconfig, and start from fresh
3) Abort and exit reconfig process
 
Number 1 is usually the best option here, especially if the hg operations actually succeeded during the previous attempt. This way the wiki/bugzilla updates will still be properly applied.


= Help, my reconfig is stuck! =
= Help, my reconfig is stuck! =
If the reconfig gets stuck, see [https://wiki.mozilla.org/ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master How To/Unstick a Stuck Slave From A Master].
Reconfigs can take as little as 30 minutes to run, but can take up to 2 hours depending on how busy the systems are.
 
In general, linux test masters are the slowest to reconfig. You can tail the manage_masters-##########.log to keep up with progress. By default, this log is created in /tmp/reconfig. The hourly auto reconfig logs are in the bb dir as <tt>reconfig.{log,lock}</tt>.
 
If the reconfig gets stuck, see [[ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master|How To/Unstick a Stuck Slave From A Master]].

Latest revision as of 23:31, 25 May 2018

A reconfig (short for "reconfigure the buildbot masters") is how changes to buildbot configurations make it into production. CiDuty is responsible for running reconfigs.

In a nutshell, a reconfig consists of:

  • moving the production tag for the buildbot-configs repository, and the production-0.8 tag for the buildbotcustom to the current tip (or chosen revision)
  • updating the source checkout of both repositories on all the buildbot masters
  • updating the tools checkout on each master to the current tip
  • executing a buildbot reconfig command on each buildbot master

Please see the instructions for how to land buildbot changes for more information.

It is polite to ask in #releng if anyone has further changes to land before starting the reconfig process.

How to reconfig

The current state of the art is to use the end_to_end_reconfig.sh script. To see all the available options the script provides:

bash ./end_to_end_reconfig.sh -h

The end_to_end_reconfig.sh script uses the original fabric scripts as it's core, but also takes care of updating the wiki, updating bugs affected by the merge, and updating the tools checkouts on foopies as well. As such, the script has a few, indirect python module dependencies:

  • fabric
  • requests

However, the script will try to install the dependencies automatically if they are missing.

In order to update the wiki/bugzilla, the script also requires your credentials for those services in a config file:

# Needed if updating wiki - note the wiki does *not* use your LDAP credentials...
export WIKI_USERNAME=XXX
export WIKI_PASSWORD=XXXXXXXX

# Needed if updating Bugzilla bugs to mark them as in production - *no* Persona integration - must be a  native Bugzilla account.. .
# Details for the 'Release Engineering SlaveAPI Service' <slaveapi@mozilla.releng.tld> Bugzilla user can be found in the RelEng
# private repo, in file passwords/slaveapi-bugzilla.txt.gpg (needs decrypting with an approved gpg key).
export BUGZILLA_USERNAME='XXX@mozilla.com'
export BUGZILLA_PASSWORD='XXXXXXXXX'

# Used for slaveapi actions
export LDAP_USERNAME='XXX@mozilla.com'
export LDAP_PASSWORD='XXXXXXXX'

The script will also:

  • create a template config file (default is ~/.reconfig/config) if it does not exist, but you'll need to fill in your own credentials before it will work.
  • attempt to update IRC with reconfig status. To do so, it uses a minimal bash IRC client called ii. You can download ii from their website, or install it via a package manager (e.g. port install ii). Updating irc is non-fatal, but make sure ii is in your PATH if you want it to work.
  • create a temporary folder (/tmp/reconfig) to store the reconfig files. When starting a new reconfig, you should delete the reconfig folder corresponding to an older run of the script before attempting to run it again.

Updating the pinned version of mozharness on mozilla-central

The revision of mozharness used by a particular branch of mozilla code is now tracked in-tree. As a courtesy to developers and sheriffs, CiDuty is expected to update the pinned revision in the mozilla-central integration branch when they move the production tag. The change will be merged from mozilla-central to other branches by sheriffs as part of their normal duties.

The pinned revision is tracked in this file: http://hg.mozilla.org/mozilla-central/file/920ded6a1f77/testing/mozharness/mozharness.json

Update the revision in the file to point to the revision of the new mozharness production tag and land normally.

NOTE: once all of mozharness moves in-tree, this step will be unnecessary.

Updating master/master_config.json

If you have added/removed a platform that will change the content of master/master_config.json in tools/buildfarm/maintenance/production_masters.json, you'll need to manually update the masters that this change impacts because the end_to_end_reconfig.sh script does not do this step. bug 1215294 opened to enable this in the script.

Example

cd /builds/buildbot/tests1-linux64/ 
export PRODUCTION_MASTERS=tools/buildfarm/maintenance/production-masters.json
python buildbot-configs/update-master-json.py $PRODUCTION_MASTERS master/master_config.json
make checkconfig
make reconfig

Help, my reconfig failed!

Assuming you're using the end_to_end_reconfig.sh script, you can resume after fixing the error. Errors and exit state can be found in the manage_masters-##########.log which is created in /tmp/reconfig by default. The hourly auto reconfig logs are in the bb dir as reconfig.{log,lock}.

Running the script again will yield the following menu:

* Please select one of the following options:
1) Continue with existing reconfig (e.g. if you have resolved a merge conflict)
2) Delete saved state for existing reconfig, and start from fresh
3) Abort and exit reconfig process

Number 1 is usually the best option here, especially if the hg operations actually succeeded during the previous attempt. This way the wiki/bugzilla updates will still be properly applied.

Help, my reconfig is stuck!

Reconfigs can take as little as 30 minutes to run, but can take up to 2 hours depending on how busy the systems are.

In general, linux test masters are the slowest to reconfig. You can tail the manage_masters-##########.log to keep up with progress. By default, this log is created in /tmp/reconfig. The hourly auto reconfig logs are in the bb dir as reconfig.{log,lock}.

If the reconfig gets stuck, see How To/Unstick a Stuck Slave From A Master.