Auto-tools/Projects/Bisect in the cloud

From MozillaWiki
Jump to: navigation, search

Team

David Burns (AutomatedTester)

Problem

As we look to run specific tests to a revision we there are possible times that we miss catching a regression. To do this we need to have a mechanism that allows us to bisect builds really quickly on a number of different platforms.

This will allow developers and sheriffs to find the bad revision really quickly to either fix or back out of the tree.

Goals & Considerations

Overall Requirements

  • The ability to input a bad and good revision, the test that can reproduce this issue and which tree to run against
  • Be able to work again different types of regression like tests or build
  • Bisect on Pushes (ie using the hgweb pushlog, like TBPL does), rather than picking random changesets, since the last changeset in a push is more likely to build than one picked from the middle of a 10 changeset landing (ideally devs should make each changeset buildable on its own, but that doesn't always happen).
  • Ability to handle custom test vs running existing test suite
  • Have the ability to spawn multiple builds based on group requesting. E.g. Sheriffs have tree closed and need to bisect quickly, so pre-empt the next builds for both bisect "good" and "bad", even if it means greater overall resources usage.
  • Run test against all versions that are returned, at least for version 1, so that we can manage intermittent oranges properly.
    • Have a mechanism for running tests again to handle intermittents
  • Push results back to be processed by TBPL v2
  • Allow local machine, via a CLI tool, to be used for testing locally to save resources

Considerations

  • How can we use mach
  • Can we extend MozRegression to be core to this so we dont have to do builds again
  • Can we use the Self-Serve API for some tasks to make things quicker
  • We need be able to reuse RelEng infrastructure to run tasks.

Non-Goals

This is not building out infrastructure, nor the tools to run tests. This will just call a scheduler and download the builds(if they are available) to be distributed to the relevant machines for running

Design and Approach

Ideas to approach

  • Re-use MozRegression
    • Use nightly builds to narrow range to 24 hours.
    • If range is in last 30 days, then we can use the existing per push builds to narrow further (after that any non-Nightly builds/logs are purged).
    • If the range is still not reduced to just one push (eg not in last 30 days, or else was recent, but we coalesced builds), then schedule builds to reduce it further.
  • Push different items into a queue that we can then pop so tasks are managed properly
  • have web interface, using playdoh, to manage tasks

Implementation

  • Use Playdoh for the Web Site since that is very widely used within Mozilla by Web Dev and IT
  • Repository is on Github

Initial Notes