254
edits
(Start a plan) |
(More plan) |
||
| Line 16: | Line 16: | ||
* Moving XUL/XBL1/SAX/RDF/XSLT off the main thread | * Moving XUL/XBL1/SAX/RDF/XSLT off the main thread | ||
== | ==Background observations== | ||
The HTML5 parser has a design that works. When document.write handling complexity is not considered, the HTML5 parser has these major parts: | |||
* A parser object (nsHtml5Parser) that nsDocument sees and that holds the rest together. | |||
* An IO driver (nsHtml5StreamParser) that can receive bytes from a network stream, manages the character encoding conversion and pushes UTF-16 code units to the portable parser core. | |||
* The portable parser core (nsHtml5Tokenizer and nsHtml5TreeBuilder). | |||
* Glue code that produces tree ops from what the portable core does (nsHtml5TreeBuilderCppSupplement) | |||
* An executor for the tree ops (nsHtml5TreeOpExecutor) | |||
The parser object also supports fragment parsing, but that functionality doesn't really benefit from being in the class that's oriented towards full page loading, so I think even on the HTML side, the fragment parsing functionality should be separated from nsHtml5Parser. | |||
==Basic for Web content loading on the XML side== | |||
I propose making the XML Web content load path have the same structure as the HTML loads path (with document.write simplified out). That is, it would have these major parts: | |||
* A parser object (mozilla::parser::xml::Parser) that nsDocument sees and that holds the rest together. | |||
* An IO driver (mozilla::parser::xml::StreamParser) that can receive bytes from a network stream, manages the character encoding conversion and pushes UTF-16 code units to expat. | |||
* expat (portable parser core) | |||
* An object that implements handler callback for expat and produces tree ops. (mozilla::parser::xml::TreeOpGenerator) | |||
* The same executor for the tree ops an on the HTML side (nsHtml5TreeOpExecutor, eventually to be named mozilla::parser::TreeOpExecutor) | |||
===Details about Web content loading=== | |||
====Character encodings==== | |||
expat has built-in capability to decode US-ASCII, ISO-8859-1, UTF-8 and UTF-16 and has an API for plugging in support for other decoders. So why bother with putting bytes to UTF-16 conversion in mozilla::parser::xml::StreamParser outside expat? | |||
Unfortunately, expat has an unconventional API for encoding pluggability. Instead of having an API where byte buffers go in and UTF-16 or UTF-8 buffers come out, expat has an API for loading conversion tables into expat in the format that expat wants. Our pre-existing decoders don't expose their internals in that format. Therefore, to be able to use our pre-existing converters, we can't let expat manage the conversion. | |||
Encoding sniffing should be handled the [https://bugzilla.mozilla.org/attachment.cgi?id=524615&action=diff same way nsHtml5StreamParser handles it in the XML View Source mode]: mozilla::parser::xml::StreamParser itself should handle UTF-8 and UTF-16 BOM sniffing. If there's no BOM, an instance of expat itself should be used for extracting the encoding name from the XML declaration. | |||
edits