- 1 Feature HTML 5 Parser
- 2 Feature Release Readiness Assessment
- 3 Feature Documentation
- 4 Feature Bug Management
- 5 Feature Test Items
- 6 Feature Tests
- 7 Community Test Events
- 8 Feature Documentation Details
- 8.1 Project Wiki
- 8.2 Developer Links
- 8.3 Other Docs
- 8.4 Developer QA Review
- 8.4.1 Do we have automated tests for the feature?
- 8.4.2 What do they cover?
- 8.4.3 What do they not cover?
- 8.4.4 How well do they cover the feature?
- 8.4.5 What are the important areas we should focus on?
- 8.4.6 What are the dependencies?
- 8.4.7 What is our comfort level with this feature in its current state?
- 8.4.8 What feedback would you like from QA? =
- 9 Feature Release Readiness Assessment Details
- 10 Feature Bug Management Details
- 11 Feature Test Items Details
- 12 Feature Tests Details
- 13 Community Test Events Details
Feature HTML 5 Parser
- Development Status: - In progress (date)
- Feature Testing: - In progress (date)
- Team: Developer Henri Sivonen (hsivonen), Matt Evans (mevans)
- Tracking Bugs: Bug bug 373864 - (html5-parsing) Replace HTML parser with an HTML5 parser
Update existing parser with the new HTML5 parser
The table below provides a top level go/no go assessment of whether the feature is release ready for the given milestone. Each milestone link references a section below that discusses the criteria and evaluation that went into the QA go/no-go decision.
|#Project_Wiki||Wiki Links to all feature related entries|
|#Developer_Links (blogs)||Developer links to feature related sites|
|#Other_Docs||Web links to feature related sites|
|#Developer_QA_Review||Details from developer and qa discussions regarding feature test strategies and issues.|
|#Bug_Tracking||Top level bugs tracking feature|
|#Bug_Verification||Feature bugs that need verification|
|#Bug_Triage||Links triage bug tasks|
The table below provides a breakdown of all feature items that should be covered and how they will be tested. Not all items will be covered by internal QA team members. It is important to list what should be covered. If it is not covered, list it as not covered.
Note: not all items listed below will apply for a given feature
|Test Item||Description||Covered By||Status|
|Item 1||Item 1 Description||Developer Tests|
|Item 2||Item 2 Description||Beta tester exposure|
|#Topsites||Top internet sites compatibilities|
|#Developer_Tests||Links to automated developer tests|
|#Mozmill_Tests||Links to automated mozmill feature test cases|
|#Smoke_Tests||link to smoke tests|
|#Regression_Tests||link to BFT and/or regression tests|
|#Functional_Tests||link to FFT and/or complete functional tests|
|#Testdays||Links to test day event results for feature|
|#Bugdays||Links to bug day event results for feature|
|#Meetups||Links to Meetup events for feature|
- Provide link to all project related wikis
- Provide links to all feature related developer links to blogs and other internet sites
- Provide links to all feature related developer links to blogs and other internet sites
Developer QA Review
The QA person responsible for the feature should hold a formal interview with the lead developer or feature champion. Below are questions that should be asked in the interview:
Do we have automated tests for the feature?
The Java version of the HTML5 parser is tested using the html5lib test suite ( http://code.google.com/p/html5lib/source/browse/#hg/testdata/). At present, all tokenizer test failures are cases where the test suite assumes implementation details of html5lib itself and the two tree builder failures are deliberately unfixed, because it's not clear that the spec is optimal on the point tested.
The encoding tests from html5lib aren't being run, and I'm not sure if those tests are up-to-date.
The Java to C++ translation is believed to preserve the properties of the parser that the html5lib tree builder and tokenizer tests test, but it is, of course, possible for this belief to be incorrect.
In Mochitest, there exists a test harness that makes it possible to run html5lib tree builder tests on the in-Gecko C++ version of the parser (http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/htmlparser/tests/mochitest/test_html5_tree_construction.html?force=1). However, currently only the first 3 or the 15 tree builder test data files have been imported from upstream into mozilla-central. Fixing this is https://bugzilla.mozilla.org/show_bug.cgi?id=559023 .
There are also other HTML parser-relevant test scattered around http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/htmlparser/tests/mochitest/ http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/htmlparser/tests/reftest/ and http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/content/base/test/
What do they cover?
The html5lib tokenizer tests cover most of the different tokenizer state transitions.
The html5lib tree builder tests covers interesting cases in the HTML5 tree building algorithm.
The mochitests under content/base/test/ have "smoketest" level coverage for script execution, document.write() and doctype sniffing. These test have been written as part of older Gecko bug fixing and haven't been written systematically for the HTML5 spec.
What do they not cover?
Coverage for encoding sniffing is in theory (I believe bitrotted, but I'm not sure) in the html5lib repository. Those tests aren't run with Gecko, though. Thus, encoding sniffing is an area that lacks proper test coverage. (There's one notable known bug in this area: https://bugzilla.mozilla.org/show_bug.cgi?id=490916 )
The html5lib tests don't test script execution or document.write() at all.
There are no tests making sure that SVG and MathML features work on the higher level when the DOM has been built by the HTML5 parser. For example, there's no coverage for checking that SMIL animations actually start.
How well do they cover the feature?
The html5lib tokenizer tests are incomplete in their coverage of U+0000 and line breaks in various tokenizer states. Also, they only spot-test named character references. Otherwise, I believe the tokenizer tests are very complete.
The html5lib tree builder tests have very superficial coverage for the SVG and MathML features. The coverage for the HTML-only parts isn't fully systematic. It has happened that real-world testing has required a change to the parsing algorithm and the test suite hasn't had a test for the old behavior of the algorithm. That is, a test has only been added after the case has been found to be of interest as opposed to coverage having been built systematically for "everything".
If QA writes more tree construction or tokenization test cases, I would like to encourage you to use the html5lib test formats and to contribute the tests under the MIT license to the html5lib project and then pull them to mozilla-central from there. Having everyone contribute to the same test suite has been a huge productivity and compliance boost so far, so it would be good to continue that.
I believe the script execution and document.write() tests cover the relevant area sufficiently, but they do not cover the whole spec systematically. For example, I discovered http://www.w3.org/Bugs/Public/show_bug.cgi?id=9843 by reading WebKit bugs instead of by running our tests.
What are the important areas we should focus on?
I think we should focus on real-world site compatibility on one hand (going through "top site" lists navigating deeper than the front page and seeing if stuff breaks) and on SVG features working above the parser layer (that is, checking that selectors match camelCase names right, DOM getters do the right thing, SMIL animations work). It may also be worthwhile to stress nested document.write() order some more.
About real-world site compat, there are two things I think the QA and triagers of incoming bugs should be particularly aware of:
- document.write() only writes to the stream if it is called from a parser-inserted script that is being executed by the parser synchronously with the parse. In other cases, document.write() implies a call to document.open(), which blows away the document. These "other cases" include calls from: 'defer' scripts, 'async' scripts, scripts created with createElement() and inserted to the DOM, timeouts, intervals and event handlers. Previously, Gecko only blew away the document if the parser was done and allowed document.write() to insert content into a timing-dependent point in the stream if the parser wasn't done. The HTML5 behavior is like IE's behavior but, it turns out, not exactly. So far, whenever I've seen problems related to document.write(), they have been ad or analytics scripts that do browser sniffing and serve different code to Firefox and IE. The problem manifests as the page going blank and not finishing loading. There is a pending spec bug about mitigating this problem: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9767 (See also the b.m.o evangelism bugs linked from that bug.)
- The HTML5 parser never reparses due to hitting end of file inside a comment, <title>, <script>, <style>, <xmp> or <textarea>. Previously, browsers have reparsed in that case. Old browsers, when hitting the end of file inside a comment, rewind to the start of the comment and reparse so that '>' ends the comment instead of '-->' ending the comment. Old browsers, when hitting the end of file after <script> ... <!-- ... </script> ... -->, reparse looking for </script> ignoring the <!-- escape. Reparsing is a potential XSS problem and involves complexity, so if at all feasible, it's desirable to get rid of reparsing.
I'm aware of two failure involving the not reparsing and major sites:
Since the HTML5 parsing algorithm doesn't reparse, it currently closes comments a bit more eagerly than old parsers. This causes https://bugzilla.mozilla.org/show_bug.cgi?id=570309. I'm planning on landing a fix for that bug. Afterwards, it's very important to find out if this causes more breakage elsewhere than it solves on CNN.
In order to avoid reparsing scripts, the HTML5 parsing algorithm does some carefully researched (by Opera QA) trickery (http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-end-tag-open-state) to guess if </script> after <!-- means to close the script or not. This is probably the riskiest change in the HTML5 parsing algorithm compared to traditional browser behavior. Currently, I am aware of this breaking one banking site: https://bugzilla.mozilla.org/show_bug.cgi?id=565689 . I *really* hope we don't need to change the parser here. However, in case solution isn't good enough for real-world compatibility, I'd like to know sooner than later. That's why it would be good for QA to be aware of this issue so that it can be recognized if it shows up. (I can't say how much exactly would have to break to warrant redesigning this part of the parsing algorithm.)
What are the dependencies?
I'm not sure I understand the question, but the main dependency is round-tripping findings as spec feedback and implementing the spec changes, which in practice means comparing notes with Hixie, Opera QA and Chrome implementors.
What is our comfort level with this feature in its current state?
Very comfortable, except I'm not entirely comfortable with the level of evidence about the real-world compatibility of new non-reparsing script and comment closing behavior.
What feedback would you like from QA? =
I'm most interested in data showing if late-breaking changes to the parsing algorithm cause more problems than they fix. Unfortunately, this isn't as much a thing QA can answer directly, but it's a problem QA can hopefully recognize from incoming beta tester reports that haven't made it to the right bugzilla component. For example, https://bugzilla.mozilla.org/show_bug.cgi?id=558302 caused https://bugzilla.mozilla.org/show_bug.cgi?id=569528
- Top level bugs tracking feature. Include any relevant bug queries that are helpful for tracking feature status.
|bugzilla query url link||query description|
- Feature bugs that need verification
- Bug triage information
- Details of feature localization test requirements
- Details of feature accessibility test requirements
- Details of plugins compatibility test requirements
- Details of addons compatibility
- Details of top internet sites test requirements
- Links to automated developer tests
If a particular feature needs manual tests which should also be covered by Mozmill tests please add the "[mozmill-test-needed]" whiteboard entry to the feature implementation or regression bug.
List of Mozmill Tests:
- Links to automated mozmill feature test cases
- links to litmus smoke tests or description
- links to litmus BFT and/or regression tests description
- links to litmus FFT and/or complete functional tests description
- Links to test day event results for feature
- Links to bug day event results for feature
- Links to Meetup events for feature