User:Ianconnolly/runner: Difference between revisions

Jump to navigation Jump to search
 
Line 33: Line 33:
== Long-term To-Dos/Stretch goals ==
== Long-term To-Dos/Stretch goals ==


Catlee has a list of stuff that he tentatively wants added to Runner here: https://github.com/catlee/runner/blob/master/README.md
<i>Currently we reboot after almost every job on buildbot in order to do things like:
 
* Make sure we re-puppetize
* Clean up temporary files
* Make sure no old processes are running
* Clean up memory fragmentation
 
However, by rebooting, we cause some problems:
* We lose the filesystem cache between every job. In AWS this turns into lots of extra IO to read back the same files over and over after each reboot
* We waste 2-5 minutes per job doing a reboot
* Extra load on puppet masters
 
We can address nearly all of the issues we reboot for in pre-flight checks:
 
* Check if there are puppet (or AMI) changes that need to be applied
* We can still clean up temporary files
* We can kill stray processes
* I don't think memory fragmentation is an issue any more. We used to have problems on 32-bit linux machines that were up for a long time. Eventually they weren't able to link large libraries. All our build machines are 64-bit now I believe.
 
<b>This will require that 'runner' be in charge of starting and stopping buildbot</b>. I imagine we'd do something like this:
* Run through pre-flight checks
* Start buildbot
* Watch twistd.log, make sure buildbot actually starts and connects to a master
* Initiate graceful shutdown of buildbot after X minutes (30?). There are ways to do this locally (e.g. by touching a shutdown.stamp file) instead of poking the buildbot master.
* Run any post-flight tasks
* Go back to beginning</i>
 
From [https://bugzilla.mozilla.org/show_bug.cgi?id=1028191 Bug 1028191 - Stop rebooting after every job]


== Stuff I Need ==  
== Stuff I Need ==  
Confirmed users
17

edits

Navigation menu