Auto-tools/Projects/Autolog: Difference between revisions

Revision as of 20:26, 2 March 2011

Goal

The Autolog project seeks to implement a TBPL-like system for viewing test results produced by the a-team's various tools, at least those of which aren't hooked up to TBPL. Such projects potentially include mobile automation, Crossweave, profile manager, etc.

Proposed Implementation

The system will be comprised of both a front-end web UI, and a backend database. The database which will be used is ElasticSearch, the same instance of which we're using for OrangeFactor, since our experience with that seems to indicate that it is fast, reliable, and easy to use.

Communication with the db will be provided by two channels: a REST API, which we can probably build by extending Orange Factor's woo_server.py, and a python library that automation tools can use to post results to the database.

The front-end UI should at least superficially resemble TBPL. Whether or not we will re-use TBPL code is open to question; the current implementation of TBPL is JS-based and is tightly tied to Tinderbox, so it may or may not be a good starting point for a new UI.

asuth of MoMo created an alternative view to tbpl called ArbPL (code). Might be interesting for possible UI ideas and as a source of more code that interfaces with tinderbox.

Data Structure

There are two types of data structures we are concerned with. One is the structure of data that a test suite will have to provide in order to insert test results into ElasticSearch. The second is the structure of data inside ElasticSearch itself.

For the former:

{
  // testgroup definition
  'testgroup': 'mochitest-other',
  'machine': 'talos-r3-fed64-044',
  'testsuite_count': 1,           // supplied by python lib
  'starttime': 1297879654,
  'date': '2011-02-16',           // supplied by python lib
  'logurl': '...',                // optional
  'os': 'fedora12',
  'platform': 'linux',
  
  // Base product definition: the primary product under test.  For
  // Crossweave, this is the fx-sync code; for Android it is
  // the mobile-browser code.  In a TBPL-like display, this
  // product's rev would be displayed in the "commit" column.
  'tree': 'fx-sync',
  'branch': '1.7',                // optional?
  'revision': '553f7b1974a3',
  'buildtype': 'xpi',
  'buildid': '20110210030206',    // optional
  'version': '1.7.pre',           // optional
  'buildurl': '...',              // optional
  
  // Secondary product definitions:  additional products involved
  // in the test.  For Crossweave or Android, this might be 
  // 'mozilla-central', etc. There can be as many secondary
  // products as needed.
  'tree2': 'mozilla-central',
  'branch2': 'default',           // optional?
  'revision2': '553f7b1974a3',
  'buildtype2': 'opt',
  'buildid2': '20110210030206',   // optional
  'version2': '4.0b13pre',        // optional
  'buildurl2': '...',             // optional
  
  // Testsuite definition.  This is an array (to support cases like
  // mochitest-other); only one member is shown in the example below.
  'testsuites': [
    // for cases other than mochitest-other, this is probably the
    // same as 'testgroup' above
    'suitename': 'mochitest-ally1',
    'cmdline': '...',             // optional
    'testfailure_count': 1,       // provided by python lib
    'elapsedtime': 665,           // in seconds
    'passed': 85152,
    'failed': 1,
    'todo': 124,
    // These errors are testsuite errors, that cannot be pinned to a
    // specific test case (e.g., a crash that occurred after all tests
    // had finished).  Each member of 'errors' can contain additional
    // keys, depending on type of test, e.g., 'stacktrace' for crashes,
    // etc.
    'errors': [
      'status': 'PROCESS-CRASH',
      'text': 'application crashed (minidump found)',
    ],
    
    // These are failures that occur during specific test cases.
    'testfailures': [
      'test': 'xpcshell/tests/toolkit/components/places/tests/autocomplete/test_download_embed_bookmarks.js',
      // Like testsuite errors, each member of 'failures' can contain
      // additional metadata depending on failure type.
      'failures': [
        'status': 'TEST-UNEXPECTED-FAIL',
        'text': 'Acceleration enabled on Windows XP or newer - didn't expect 0, but got it'
      ]
    ],
    
    // Do we want/need to record passed/todo tests?
  ]
}

For the structure in ElasticSearch, the data will be separated into three document types (by the python library if that's used; the test suite will have to do this if it's posting to ES directly via HTTP), similar to the way that tinderbox logs are spread across three document types at present:

a testgroup document (corresponding to tinderbox build documents, example here)
one or more testsuite documents (corresponding to tinderbox testrun documents, example here)
one or more testfailure documents (corresponding to tinderbox testfailure documents, example here)

Q: Why do we separate the data into three document types, why not just use one big document?
A: Because searches in ElasticSearch are must faster and easier with basic data types; searching inside complex nested JSON is slower and the syntax is much more complex.

Q: Can't the python library automatically provide 'os' and 'platform'?
A: It would be nice, wouldn't it? Unfortunately, there are lots of things which can confuse the issue; e.g., if you're using mozilla-build on Windows, it will see your 64-bit version of Windows as win32, regardless of what you're testing. Similarly, we sometimes test 32-bit Mac stuff on macosx64. It seems safest to have the test tools provide this data instead of trying to guess.

Q: Why do we have both testgroup and testsuite?
A: It's entirely to support mochitest-other. :( In most cases, each testgroup will have 1 testsuite.

Q: Where are the test runs in this structure?
A: We've been using the term 'testrun' to mean different things in different places. In this structure, I imagine 'testrun' to mean the same thing as it does in OrangeFactor: that is, a collection of testgroups that are run against the same primary changeset.

Q: Is this really the best way to include data about multiple products, or code from multiple repos?
A: I'm not sure. I suggested this structure because it's easy to use when searching ES. Other structures are possible. For instance, we could create a 'product' document type, and store all the products there, and then just include indexes to this document in the 'testgroup' document. The downside to this is that getting certain data out of ES would require multiple queries.

Open Issues

The above structure would work fine for displaying a TBPL-like result view. It might be problematic if we intend to feed into OrangeFactor, however.

The problem is in identifying unique test runs. For OrangeFactor, we rely on the fact the buildbot uses the same buildid with all related 'testgroups'. If buildbot reruns the same testgroups (because it's bored), it generates a new buildid, even though the revision is still the same. Thus we can identify unique test runs.

For non-buildbot cases (and I'm specifically thinking of Crossweave), we don't have an analogous buildid. In the Crossweave case, each 'testgroup' fired off against a given revision is independent and doesn't share any metadata (like a buildid) with other testgroups run at the same time.

If we want to maintain consistency with OrangeFactor, we may have to require a 'buildid' value, which would be the same across all testgroups which are initiated by the same event. This would require some refactoring of Crossweave and possibly other tools.

Using 'buildid' isn't a perfect solution, though, as some buildbot jobs (like once-a-day win64 runs) use a distinct buildid that doesn't match other buildbot jobs run on the same changeset. We're currently excluding these in OrangeFactor so it isn't skewing data, but it illustrates the drawbacks of relying on buildid.

The other option is to change the way OrangeFactor identifies testruns. Instead of relying on buildid, we could implement some algorithm like this:

- identify all the changesets that have testgroups; the number of changesets is our preliminary testrun_count - for each changeset, identify the list of testgroups (L) and the unique set of testgroups (S), based on 'testgroup' and 'platform' - if len(S) > len(L), create a list of 'extra testgroups' (E) by removing members of (S) from (L) - sort (E) and look for the maximum number of duplicate entries (e.g., if (E) contains three 'mochitest-other'/'win32' and two 'mochitest-1/5'/'linux64', return 3), and add this number of the preliminary testrun_count

There's always going to be some guesswork involved, since the idea of a 'testrun' we're using in OrangeFactor is entirely an intellectual concept and not something supported intrinsically by our test frameworks.

Tasks


Task	Owner	Notes
Investigate TBPL code and determine how much to re-use	Mcote?
Propose common data structure for test results in db	Jgriffin
Clean up woo_server.py and modularize in order to make future additions easier	?
Create python library for test tools to use to post results to db	?