Compatibility/Web Regression Test Suite/

From MozillaWiki
Jump to: navigation, search

Web Regression Test Suite

Broadly, this is planned to be an automatable tool which will take "snapshots" of a page as rendered in a known good build, and be able to compare those snapshots to how the page renders in a newer build, so that regressions may be detected.

This will require snapshots to be both a bundle of all the resources needed to load the page offline as deterministically as possible (the "page snapshot"), as well as an analysis of the results of loading the page in that initial build (the "result snapshot"). Initially only the CSS layout will be compared between snapshots to determine a pass/fail result, but this will be extensible.

It is hoped that this tool can be created as a WebExtension so that it may be used by multiple browsers (ie, Firefox and Servo), assuming that test builds can be made with the required APIs to gather the metrics necessary for a useful comparison of snapshots.

Prototype Plan

Requirements analysis for initial prototype(s):

 - APIs needed for creating a page snapshot, which can be loaded offline by multiple builds.
   - browsing to a given URL over a live network:
     - with or without some initial session state
     - at a given inner window size
     - perhaps using deterministic RNG, system date/time, etc
   - recording the *raw* response for each network request (compressed, etc)
   - recording the final session state (cookies, etc)
   - recording and storing the system fonts which were used
   - recording seed values for other deterministic options
     - RNG, date/time, window size, detected plugin versions, etc
   - detecting when a page has "loaded enough" for a snapshot to be worthwhile
     - based on DOM events, nominal CPU activity, nominal network activity, etc.
   - writing a user-accessible snapshot file to the filesystem
 - additional APIs needed for loading a page snapshot in a controlled manner, to produce a result snapshot.
   - limiting resource accesses to the list of URLs/requests/fonts in the page snapshot
   - detecting unexpected resource accesses (network, font, etc)
   - making it possible to link each frame to its source node
   - possibly allow manual marking of certain frames/trees as non-deterministic (not worth comparing)
   - record the frozen DOM state as a frame tree (for comparing result snapshots)
   - possibly record the frozen DOM state as a loadable HTML document (including window dimensions)
   - possibly record a screenshot of the final rendering
     - perhaps maybe even compressed videos of the loading process?
   - recording debugging information, possibly in a neutral format instead of raw text
     - web console output
     - summary log of network activity
   - probably the ability to write summary output to stdout/CLI for CI tools
 - techniques needed for comparing result snapshots.
   - comparing the frame tree of two result snapshots, perhaps in a fuzzy manner.
   - detecting differences in frame output, to get a pass/fail result.
   - possibly being able to find the analogous frames in two snapshots (for manual marking, etc)
   - detecting possible failure reasons to target debugging:
     - missing resources being accessed
     - not having the same version of a plugin
     - scripts not running or being executed in different order
 - decisions on how to present output.
   - as an HTML-based browser UI with options to create/compare snapshots?
   - pass/fail, based on a fuzzy threshold for how similar the pages are?
   - a view summarizing the likely-to-be-major differences found in the frame trees.
   - able to show the screenshots for both versions (and easily compare them)
   - able to show a self-contained iframe with the frozen DOM outputs (and easily compare them)
 - decisions on how to integrate with CI.
   - can this be run easily in CI as a WebExtension (plus WE Experiments)?
   - what needs to be logged to the console to be picked up as a pass/fail?


Minimum Viable Product for initial prototype:

 - able to take minimal page snapshots reasonably reliably
   - for at least a few select sites (TBD)
   - only as much determinism as necessary for those sites (window size?)
   - only storing the minimum necessary information
 - able to take a minimal result snapshot from the page snapshot
   - saving the frame tree
   - saving a screenshot
 - able to present a comparison of two *result snapshots*
   - a simple pass/fail output (or fuzzy pass/fail)
   - a comparison of differences between frame trees
   - ability to see dumps of the frame trees for both results
   - ability to show screenshots of both results
   - not necessarily in a CI-friendly manner
   - can simply be a set of output files
 - does not have to be based on WebExtensions
   - can be a custom patchset against moz/central, XUL+XPCOM addon, etc


Q1 2017 Key Results

 1. Page snapshot creation
   a. recording the raw network requests and responses (done)
   b. recording the initial and final session state (done)
   c. recording inner window size (done)
   d. recording and storing which system fonts were used
   e. writing a user-accessible snapshot file to the filesystem (done)
   f. recording plugin versions
 2. Result snapshot creation
   a. opening the given snapshot with initial state (window size, session, etc) and detecting when it has "loaded" (done [1])
   b. sandboxing networking requests to the list in the page snapshot (done)
   c. recording unexpected resource access (done [2])
   d. recording a screenshot of the final page (done)
   e. recording the final frame tree/display list (done)
   f. recording the final "frozen" DOM state as an HTML file
   g. writing a user-accessible snapshot file to the filesystem (done)
 3. Result snapshot comparison
   a. comparing the frame tree of two results and giving a pass/fail result (partly done [3])
   b. presenting frame trees of both results for comparison (partly done [4])
   c. presenting screenshots of both results for comparison (done)
   d. presenting the frozen DOM results, with links from the frame tree to the nodes they represent
   e. summarizing unexpected network fetches, missing system fonts, plugins, web console output, full network log
 [1] may still need better heuristics for detecting when a page has "loaded".
 [2] ads and "random" stuff on pages can still cause "unexpected" network requests and such.
 [3] have a diff-like interface; need to decide what types of diffs count as passes/fails.
 [4] showing side-by-side diffs, but the UI still needs a lot of work.

Estimate of overall progress toward meeting OKRs: ~70%