B2G/QA/Automation/UI/Strategy/Integration vs End to end

From MozillaWiki
Jump to: navigation, search


Separate concerns for Integration vs. End-to-end tests so that test processes and automation harnesses can optimize for their specific purposes and minimize risk and dependency.

Challenges Addressed

  • Development and QA have different approaches and needs from UI testing
  • Team has been blocked too much by cross-team dependencies
  • The product is too unstable for end-to-end testing during development phases
  • End-to-end coverage is insufficient to guarantee a quality build

The Problem

To date, FxOS UI automation has been treated as a single type of testing, with the differences expressed largely as to whether it was running on TBPL or not, was written in JS or Python, who sheriffed, and so forth. And so much discussion, debate, and analysis paralysis has involved whether the automation should be run, written, or treated a particular way. It has been all too common to reach an impasse based on differing opinions.

But not all automation is the same just because it targets the UI, just as not all testing is the same just because it targets the SUT.

In particular, integration automation run before landing a commit is subject to a series of restrictions meant to ensure that the test results are returned quickly and in an unambiguous fashion. Those restrictions reduce the usefulness of the automation for use in end-to-end testing, which must often accept a higher level of fragility and longer runtime to be more comprehensive.

When viewed through the lens of the other purpose, each type of automation is inferior:

Commit integration automation is incomplete and shallow, trading expedience for coverage, and is generally written from a developer's perspective to ensure the behavior of what they did write--which might not actually meet the requirements of what they should have written. As the tests tend to be written against a particular section or application of the SUT, broad interactions may go uncovered. Large portions of the SUT may go untested because they rely on unreliable external conditions. As such, these tests frequently fall short of the needs of end-to-end.

Conversely, end-to-end automation is too slow, device-bound, and sometimes has spurious failures due to non-determinism inherent in the SUT and its externals. As the user scenarios it tests often bridge multiple parts of the system, it's more fragile with poor isolation. The need to rely on user-like behavior and check after every step slows the tests down. The wider scope means it often can't be written adequately until entire subsystems are in place. So these tests are often wholly unsuitable for the incremental growth, quick land-or-not decisions, and debugging assistance that a good continuous integration suite provides.

However, these are weaknesses in perception. When used for their own specific purposes, each type of automation provides great value. Further, the compromises each makes are compensated for by the other.

The Solution

The solution is to separate the concerns. And since each type of automation has a different primary stakeholder, ownership should be separated as well.

Two different suites of UI automation

  • Gaia Integration
    • Must be quick-running with no non-deterministic factors
    • Results must be absolutely unambiguous
    • Coverage is restricted because of these rules
    • Will be run prior to each commit and can prevent a bad code change from landing
    • Scope, depth, and maintenance is owned by functional teams as part of their code
    • Small isolated tests can be checked in with code changes immediately
    • Is sheriffed reliably and quickly through long-standing, established process
    • Runs on B2G Desktop currently, will run on device as well soon
  • Gaia End-to-end
    • Can have longer-running tests and have reasonable amounts of fragility due to non-determinism
    • Results can be occasionally ambiguous as a trade-off for higher coverage
    • Has much less restriction than Gaia Integration, so long as cost of triaging failures is acceptable
    • Runs after each Tinderbox or candidate build to quickly find full-stack bugs
    • Scope, depth, and maintenance is owned by QA
    • Tests can be created after area is delivered and stable enough to make cost acceptable
    • Must be sheriffed by QA, via a combination of alerts and result reviews
    • Primarily runs on-device to test full stack

Different Contexts

The main differences are:

  • Gaia Integration tests run faster and are generally more isolated.
  • Gaia End-to-end tests have more coverage due to less rules.
  • Gaia Integration tests can never fail unless there's a defective code change. They must be unambiguous.
  • Gaia End-to-end tests can still fail due to system non-determinism if they are reliable enough to be a net gain.
  • Gaia Integration tests are maintained by functional teams to give themselves confidence in code changes.
  • Gaia End-to-end tests are maintained by QA to replace or extend manual end-to-end testing.
  • Gaia Integration tests frequently test fragments of UI behavior and may be based on mocks or other low-level objects.
  • Gaia End-to-end tests are written as complete user-like scenarios, and operate and verify as the user would.
  • It is more important that Gaia Integration tests be solid than complete. Any incremental gain is valuable.
  • It is more important that Gaia End-to-end tests be complete than solid. All end-to-end criteria must be tested to accept a build.

Ownership and Overlap

Ownership is separated because these differences can lead to variations in needs for breadth and depth of testing.

Tests can and should overlap between Gaia Integration and Gaia End-to-end. It is unacceptable process complication for QA to expect developers to always consult their needs and vice versa for every change. If the test flows are shared, it's all too easy for one group to inadvertently make a change that damages the purpose of the other. For end-to-end, in particular, this is too risky, as proper end-to-end depends on maintaining a particular scope and depth.

Avoiding these communication issues and this risk requires separate tests, even to the point where there might be tests whose flow is entirely duplicated between the two suites.

While this seemingly violates "single point of truth," the different contexts in which the tests are specified, scoped and maintained actually makes these two different tests, much like two separate applications would never refer to each others' source unless it can be pushed into an independent library.

Via skillful reuse of View and other code modules between suites, each test suite can be treated as an independent target without increasing maintenance unacceptably. Ideally, only abstract fixture setup and test flow is expressed in the test function, with all other maintainable aspects in reusable modules. So long as groups agree to maintain interfaces and promised behavior of shared module code, they can each work freely at any level of communication.

Of course, each group should have an opinion on coverage for either suite, and can (and should) help expand both suites, but single point of ownership allows decisions to be made quickly as appropriate for each set of primary stakeholders.


The time is now.

Unlike other aspects of our strategy, this is a perspective shift and articulates a dual path towards the rest of our plans:

QA has defined and will own and expand Gaia End-to-end as an aid to its build end-to-end mandate, and functional teams will continue to own and expand Gaia Integration as an aid to stabilization and their own processes. QA will help whenever resources are available.