Tripwire Web Regression Test Suite
Broadly, this is planned to be an automatable tool which will take "snapshots" of a page as rendered in a known good build, and be able to compare those snapshots to how the page renders in a newer build, so that regressions may be detected.
This will require snapshots to be both a bundle of all the resources needed to load the page offline as deterministically as possible (the "page snapshot"), as well as an analysis of the results of loading the page in that initial build (the "result snapshot"). Initially only the CSS layout will be compared between snapshots to determine a pass/fail result, but this will be extensible.
It is hoped that this tool can be created as a WebExtension so that it may be used by multiple browsers (ie, Firefox and Servo), assuming that test builds can be made with the required APIs to gather the metrics necessary for a useful comparison of snapshots.
What works now.
- creating a .webtest file in a known-good-version to compare against in the future;
- the base addon gathers information on tabs as they load. - heuristically detects when the page has finished loading. - lets the user save a .webtest file ("freezing" the page to record it). - saves the recorded network requests and data, and layout information as-rendered by the browser.
- allows comparing results "live" in-browser (presumably in a future version);
- the base addon allows the user to load a previously-saved .webtest file. - it re-runs the test with the saved network/state data in the .webtest file. - it compares the results with the old result. - it presents the "differences" it found with a simple UI with screenshots.
- also allows comparing results in automation;
- the base addon also works with automation via a marionette driver. - similarly loads and re-runs .webtests, comparing results. - presents a pass/fail to the marionette driver. - pass is based on whether *any* differences were found. - environment variables are used to load a single .webtest, or a "manifest" with multiple tests;
WEBTEST="file:///path/to/test.webtest" ./mach marionette-test toolkit/components/tripwire/tests/marionette/test_tripwire.py WEBTEST_MANIFEST="http://server.org/webtests.json" ./mach marionette-test toolkit/components/tripwire/tests/marionette/test_tripwire.py where webtests.json contains an array: ["domain1.webtest", "domain2.webtest", "https://another-server.org/another.webtest"]
What still needs to be done
- improving the heurstics detecting when the page has loaded (it is not working with some sites)
- making results deterministic and stable;
- fuzzing: nightly builds vary more than expected, differences in layout aren't pixel-precise over even the short-term. Fuzzing may be enough to mitigate this, as it seems related to sub-pixel layout differences accumulating. - determinism: still locking down the variables that cause a .webtest file to render differently not because the browser is at fault, but because the site is using RNGs, timestamps, or other things which can trigger different ads, A/B tests, or animation frames that end up appearing as major differences. - stability: there may still be network requests which aren't being logged due to CORS/etc. Work is underway to try to mitigate this, and seems likely to work. - deciding where to host the .webtest files for automation, as they cannot be in-tree.
- adding artifacts to the marionette test results (screenshots and other differences) to aid in debugging.
- user-interface polish;
- the "diff" tool is rudimentary, and needs to better-explain what the differences are. - irrelevant differences may need to be filtered out (boxes which lay out differently, but do not visibly affect the final result). - WebExtension APIs needed for opening a file-input dialog via hotkey, or reduce the amount of clicks needed to load a .webtest file. - adding a mobile-friendly interface for running .webtests on Android.
- increasing the amount and type of data that's being considered by the tests;
- the actual resulting markup, not just the layout box-model information. - RAM usage and other performance metrics. - JS console logs, uncaught promises, CORS failures, etc. - which CSS rules were being applied/not applied. - sounds, canvases, videos and other animation. - recording user-interactions and/or taking multiple snapshots per test-file.
- pulling out the network request recording sub-module so it can be used elsewhere in automation.