User:Mevans/Html5ParserTestplan: Difference between revisions

no edit summary
No edit summary
 
Line 272: Line 272:
# document.write() only writes to the stream if it is called from a parser-inserted script that is being executed by the parser synchronously with the parse. In other cases, document.write() implies a call to document.open(), which blows away the document. These "other cases" include calls from: 'defer' scripts, 'async' scripts, scripts created with createElement() and inserted to the DOM, timeouts, intervals and event handlers. Previously, Gecko only blew away the document if the parser was done and allowed document.write() to insert content into a timing-dependent point in the stream if the parser wasn't done. The HTML5 behavior is like IE's behavior but, it turns out, not exactly. So far, whenever I've seen problems related to document.write(), they have been ad or analytics scripts that do browser sniffing and serve different code to Firefox and IE. The problem manifests as the page going blank and not finishing loading. There is a pending spec bug about mitigating this problem: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9767 (See also the b.m.o evangelism bugs linked from that bug.)
# document.write() only writes to the stream if it is called from a parser-inserted script that is being executed by the parser synchronously with the parse. In other cases, document.write() implies a call to document.open(), which blows away the document. These "other cases" include calls from: 'defer' scripts, 'async' scripts, scripts created with createElement() and inserted to the DOM, timeouts, intervals and event handlers. Previously, Gecko only blew away the document if the parser was done and allowed document.write() to insert content into a timing-dependent point in the stream if the parser wasn't done. The HTML5 behavior is like IE's behavior but, it turns out, not exactly. So far, whenever I've seen problems related to document.write(), they have been ad or analytics scripts that do browser sniffing and serve different code to Firefox and IE. The problem manifests as the page going blank and not finishing loading. There is a pending spec bug about mitigating this problem: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9767 (See also the b.m.o evangelism bugs linked from that bug.)


# The HTML5 parser never reparses due to hitting end of file inside a comment, <title>, <script>, <style>, <xmp> or <textarea>. Previously, browsers have reparsed in that case. Old browsers, when hitting the end of file inside a comment, rewind to the start of the comment and reparse so that '>' ends the comment instead of '-->' ending the comment. Old browsers, when hitting the end of file after <script> ... <!-- ... </script> ..., reparse looking for </script> ignoring the <!-- escape. Reparsing is a potential XSS problem and involves complexity, so if at all feasible, it's desirable to get rid of reparsing.
# The HTML5 parser never reparses due to hitting end of file inside a comment, <title>, <script>, <style>, <xmp> or <textarea>. Previously, browsers have reparsed in that case. Old browsers, when hitting the end of file inside a comment, rewind to the start of the comment and reparse so that '>' ends the comment instead of '-->' ending the comment. Old browsers, when hitting the end of file after <script> ... &lt;!-- ... </script> ... -->, reparse looking for </script> ignoring the &lt;!-- escape. Reparsing is a potential XSS problem and involves complexity, so if at all feasible, it's desirable to get rid of reparsing.


I'm aware of two failure involving the not reparsing and major sites:
I'm aware of two failure involving the not reparsing and major sites:
Line 278: Line 278:
Since the HTML5 parsing algorithm doesn't reparse, it currently closes comments a bit more eagerly than old parsers. This causes https://bugzilla.mozilla.org/show_bug.cgi?id=570309. I'm planning on landing a fix for that bug. Afterwards, it's very important to find out if this causes more breakage elsewhere than it solves on CNN.
Since the HTML5 parsing algorithm doesn't reparse, it currently closes comments a bit more eagerly than old parsers. This causes https://bugzilla.mozilla.org/show_bug.cgi?id=570309. I'm planning on landing a fix for that bug. Afterwards, it's very important to find out if this causes more breakage elsewhere than it solves on CNN.


In order to avoid reparsing scripts, the HTML5 parsing algorithm does some carefully researched (by Opera QA) trickery (http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-end-tag-open-state) to guess if </script> after <!-- means to close the script or not. This is probably the riskiest change in the HTML5 parsing algorithm compared to traditional browser behavior. Currently, I am aware of this breaking one banking site: https://bugzilla.mozilla.org/show_bug.cgi?id=565689 . I *really* hope we don't need to change the parser here. However, in case solution isn't good enough for real-world compatibility, I'd like to know sooner than later. That's why it would be good for QA to be aware of this issue so that it can be recognized if it shows up. (I can't say how much exactly would have to break to warrant redesigning this part of the parsing algorithm.)
In order to avoid reparsing scripts, the HTML5 parsing algorithm does some carefully researched (by Opera QA) trickery (http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-end-tag-open-state) to guess if </script> after &lt;!-- means to close the script or not. This is probably the riskiest change in the HTML5 parsing algorithm compared to traditional browser behavior. Currently, I am aware of this breaking one banking site: https://bugzilla.mozilla.org/show_bug.cgi?id=565689 . I *really* hope we don't need to change the parser here. However, in case solution isn't good enough for real-world compatibility, I'd like to know sooner than later. That's why it would be good for QA to be aware of this issue so that it can be recognized if it shows up. (I can't say how much exactly would have to break to warrant redesigning this part of the parsing algorithm.)




1,285

edits