Build farm infrastructure update and discussion
From MozillaWiki
Build System
Previous discussion
Pain Points in Current Build Farm
- Tinderbox is unmaintained
- preed/rhelmer maintaining, looking at alternatives
- Magic strings that you have to know about
- config vars vs. switches
- Adding a new test is ridiculous (need to edit build-seamonkey-util.pl etc)
- rhelmer splitting out test
- two kinds of test - pass/fail and perf
- drive from make (unit, acceptance and performance)
- Finding out if a build is actually running
- pings
- Failure mode is bad (tinderbox just has a yellow build that goes on for 12 hours, no indication of machine death, respin, whatever)
- Difficult for developers to make mozconfig/tinderconfig changes
- preed moving to public CVS (c.f. Lightning)
- No separation between nightly tinderbox builds and official builds
- "official" builds are "magic" from tinderboxes
- Separation between official/dev builds
- official builds should be correct and reproducible
- developer builds need to be fast
- checkins that require clobber require manual build eng involvement
- (mostly) everyone has access to everything
- config changes are not tracked/versioned/auditable (* are now, manually)
- adding new machines is a pain
- being able to integrate ccache
- what is a build machine deliverable?
- no ability to save artifacts for important builds (e.g. release builds don't save srcdir, objdir, etc.)
- better data output from build status (so that other tools can display it in other formats)
- showbuilds.cgi sucks (list of 500 l10n tinderboxes, separate builds per machine per locale, instead of just showing them all in sequence under one column, etc)
- tinderbox doesn't die on a lot of errors (e.g. talkback build failure, rsync failure)
- tinderbox is sloppy about stderr/stdout
- real-time logs
- no sandbox developer patch submission / one-off build
- management/maintenance of build farm doesn't scale as number of tinderboxen increase (some kind of central management?)
- (req) community build status submission (joe random's OS2 build sending tinderbox updates)
- run builds on fast machine, run performance tests on slow machines, run unit/correctness tests on fast machine/VM
- no correlation between commit and build
- (req) builds should be triggered from a commit, instead of cycles going for no reason (except for tests)
- waterfall UI - time fields are random
- waterfall UI - historic commits are broken (bonsai integration is crap)
- no monitoring whether expected tinderbox builds actually make it to ftp (e.g. no notification that partial/complete update generation fails) -- tinderbox columns just drop off
- log management is bad
- should show commands, not just output
- tests should be part of main tree
- should be a way for developers to trigger clobber