Build farm infrastructure update and discussion

From MozillaWiki
Jump to: navigation, search

Build System

OSAF tbox2 page

Previous discussion

Pain Points in Current Build Farm

  • Tinderbox is unmaintained
    • preed/rhelmer maintaining, looking at alternatives
  • Magic strings that you have to know about
    • config vars vs. switches
  • Adding a new test is ridiculous (need to edit etc)
    • rhelmer splitting out test
    • two kinds of test - pass/fail and perf
    • drive from make (unit, acceptance and performance)
  • Finding out if a build is actually running
    • pings
  • Failure mode is bad (tinderbox just has a yellow build that goes on for 12 hours, no indication of machine death, respin, whatever)
  • Difficult for developers to make mozconfig/tinderconfig changes
    • preed moving to public CVS (c.f. Lightning)
  • No separation between nightly tinderbox builds and official builds
    • "official" builds are "magic" from tinderboxes
  • Separation between official/dev builds
    • official builds should be correct and reproducible
    • developer builds need to be fast
  • checkins that require clobber require manual build eng involvement
  • (mostly) everyone has access to everything
  • config changes are not tracked/versioned/auditable (* are now, manually)
  • adding new machines is a pain
  • being able to integrate ccache
  • what is a build machine deliverable?
    • no ability to save artifacts for important builds (e.g. release builds don't save srcdir, objdir, etc.)
    • better data output from build status (so that other tools can display it in other formats)
    • showbuilds.cgi sucks (list of 500 l10n tinderboxes, separate builds per machine per locale, instead of just showing them all in sequence under one column, etc)
  • tinderbox doesn't die on a lot of errors (e.g. talkback build failure, rsync failure)
  • tinderbox is sloppy about stderr/stdout
  • real-time logs
  • no sandbox developer patch submission / one-off build
  • management/maintenance of build farm doesn't scale as number of tinderboxen increase (some kind of central management?)
    • (req) community build status submission (joe random's OS2 build sending tinderbox updates)
  • run builds on fast machine, run performance tests on slow machines, run unit/correctness tests on fast machine/VM
  • no correlation between commit and build
  • (req) builds should be triggered from a commit, instead of cycles going for no reason (except for tests)
  • waterfall UI - time fields are random
  • waterfall UI - historic commits are broken (bonsai integration is crap)
  • no monitoring whether expected tinderbox builds actually make it to ftp (e.g. no notification that partial/complete update generation fails) -- tinderbox columns just drop off
  • log management is bad
  • should show commands, not just output
  • tests should be part of main tree
  • should be a way for developers to trigger clobber

Other continuous integration systems