EngineeringProductivity/Projects/Treeherder/Design Draft

From MozillaWiki
Jump to: navigation, search

Architecture Bottlenecks

One of the first steps in this refactor project is to identify architectural bottlenecks. The following list was derived from an email exchange between jgriffin and ctalbert.

  • The application can only accept data that comes from buildbot
  • A large portion of the TBPL data model is defined in javascript and not in the underlying database. This prevents other applications from accessing or using TBPL's nomenclature.
  • There is a dependency on run-time log parsing to gather data about test failures, even though buildbot has already parsed the logfile once. This parsed data is not stored anywhere that's accessible so no third party applications, like OrangeFactor, can use it. Every application parses the log files again. This is very error prone and results in a poor use of resources.
  • There's no interface that allows other tools to write data to it; in order to do so at present, you'd have to be on the same internal network as TBPL's db and have mysql credentials.
  • It's implicitly tied to mercurial; there's no support for git-based commits

Requirements

  • The complete data model for test runs, test results, and star data would be defined by the DB, and would be consumed by TBPL
  • There would be a single set of OS/platform/test names in the DB that would be used by TBPL and any other tool that wanted to interface with it (probably including buildbot)
  • There would be a well-defined, web-accessible interface for reading/writing data from the DB.
  • Any tool could write results to the DB which would get rendered by TBPL.
  • Log files would be parsed exactly once and the resulting data stored in the DB.
  • TBPL would work with git-based commits and mercurial.
  • Adding a new tree or a new test type to TBPL would simply be a matter of writing appropriate data to the DB using interfaces designed for this; TBPL would automatically pick this up and render it. It wouldn't be necessary to tweak constants in a JS file and then file an IT bug to deploy the changes.
  • TBPL Must support intermittently run tests.
  • TBPL and orange factor functionality should continue to merge together.
  • The UI should not be thrown out, but should be reviewed closely for expansion/drill down points.
  • Extend the UI with a better set of filters like the "unstarred view" that TBPL has, but cutting it down to a specific test type on a tree.

Revaluation of the Data Model

Some parts of the requirements are specific to the TBPL database, web service, and user interface, other parts are generic to any third party application.

Even if TBPL stores this information in a database instead of source code the original sources of a lot of that information are the build system (Buildbot or any other automated build process) and the source code repository (mercurial or git). If we're going to have a single source of nomenclature mappings, do they belong in TBPL? Maybe TBPL should just be a UI that consumes web services that are available to all third party applications. If so we need to start defining that web service separate from TBPL.

It would be worth taking a long look at https://hg.mozilla.org/webtools/tbpl/file/8c09f46f3d7e/dataimport/import-buildbot-data.py, or a short look because it's not very long, and all of the individual sources of data that go into populating https://hg.mozilla.org/webtools/tbpl/file/8c09f46f3d7e/schema.sql. Do we have all of the data entities/attributes that we want to expose to third party applications?

We need a shared generic data model definition for all of the entities involved (product, build, log, push, branch, revision, os, platform, test etc...), their attributes, and the relationship between them. With that in hand we could start identifying where correct root sources of truth reside. And at what point in the life cycle of a source code push and build that information is available. If were going to be using TBPL as a web service provider for lots of third party applications the data model is going to need to be significantly expanded.

Some possible next steps to solidify the data model.

  • Define a generic data model and ontology that describes the entities, their attributes, and their relationships to one another.
    • Example entities: product, build, push, log, branch, revision, os, platform, test, etc...
    • Example attributes: A branch has a version attribute, A build has a log attribute
    • Do we have that represented correctly in the existing TBPL database schema + hard coded javascript?
    • Is this the right starting place https://hg.mozilla.org/webtools/tbpl/file/8c09f46f3d7e/schema.sql?
  • Is there a direct physical representation of the data model that would be possible to implement with the existing systems. Another words, could we output a JSON object from buildbot that would define the build object separate from the log? What would be the best way to store and access those objects? If not, can we retrieve the information from a single parse of the log and store and expose it through the web service for all third party applications to use?

Single Source of Truth

  • The following excerpt was derived from an email exchange between ctalbert and catlee and describes a possible way forward for exposing build objects for downstream applications.
  • One of the ways we can experiment with creating a ubiquitous data model for TBPL and all our tools that exist downstream from a build (mozregression, orange factor, datazilla, graphs.mozilla.org etc) would be to find a choke point in the buildbot automation system where the single source of truth for a set of values from the build can be written to a REST web service that can then be queried by all the other downstream tools. Perhaps the write could be the build product, the build location, its state, the downstream jobs it will kick off and the location where those logs will be appearing once they start, as well any unique build IDs etc.
  • catlee's response to this was "The best place to put this sort of thing would be in the 'postrun.py' utility: http://hg.mozilla.org/build/buildbotcustom/file/default/bin/postrun.py It gets run after every job finishes. Right now it does things like upload logs to ftp, send messages to pulse, and update the mysql status db. It would be pretty easy to add another step to submit data to TBPLv2. The tricky bit is *what* gets submitted.

WIP schema/object design