Platform/XML Rewrite: Difference between revisions

Platform/XML Rewrite (view source)

Revision as of 13:20, 27 April 2011

2,854 bytes added , 27 April 2011

More plan

Hsivonen

254

edits

@@ Line 16: / Line 16: @@
 * Moving XUL/XBL1/SAX/RDF/XSLT off the main thread
-==Plan==
+==Background observations==
+The HTML5 parser has a design that works. When document.write handling complexity is not considered, the HTML5 parser has these major parts:
+* A parser object (nsHtml5Parser) that nsDocument sees and that holds the rest together.
+* An IO driver (nsHtml5StreamParser) that can receive bytes from a network stream, manages the character encoding conversion and pushes UTF-16 code units to the portable parser core.
+* The portable parser core (nsHtml5Tokenizer and nsHtml5TreeBuilder).
+* Glue code that produces tree ops from what the portable core does (nsHtml5TreeBuilderCppSupplement)
+* An executor for the tree ops (nsHtml5TreeOpExecutor)
+The parser object also supports fragment parsing, but that functionality doesn't really benefit from being in the class that's oriented towards full page loading, so I think even on the HTML side, the fragment parsing functionality should be separated from nsHtml5Parser.
+==Basic for Web content loading on the XML side==
+I propose making the XML Web content load path have the same structure as the HTML loads path (with document.write simplified out). That is, it would have these major parts:
+* A parser object (mozilla::parser::xml::Parser) that nsDocument sees and that holds the rest together.
+* An IO driver (mozilla::parser::xml::StreamParser) that can receive bytes from a network stream, manages the character encoding conversion and pushes UTF-16 code units to expat.
+* expat (portable parser core)
+* An object that implements handler callback for expat and produces tree ops. (mozilla::parser::xml::TreeOpGenerator)
+* The same executor for the tree ops an on the HTML side (nsHtml5TreeOpExecutor, eventually to be named mozilla::parser::TreeOpExecutor)
+===Details about Web content loading===
+====Character encodings====
+expat has built-in capability to decode US-ASCII, ISO-8859-1, UTF-8 and UTF-16 and has an API for plugging in support for other decoders. So why bother with putting bytes to UTF-16 conversion in mozilla::parser::xml::StreamParser outside expat?
+Unfortunately, expat has an unconventional API for encoding pluggability. Instead of having an API where byte buffers go in and UTF-16 or UTF-8 buffers come out, expat has an API for loading conversion tables into expat in the format that expat wants. Our pre-existing decoders don't expose their internals in that format. Therefore, to be able to use our pre-existing converters, we can't let expat manage the conversion.
+Encoding sniffing should be handled the [https://bugzilla.mozilla.org/attachment.cgi?id=524615&action=diff same way nsHtml5StreamParser handles it in the XML View Source mode]: mozilla::parser::xml::StreamParser itself should handle UTF-8 and UTF-16 BOM sniffing. If there's no BOM, an instance of expat itself should be used for extracting the encoding name from the XML declaration.

Platform/XML Rewrite: Difference between revisions

Platform/XML Rewrite (view source)

Revision as of 13:20, 27 April 2011

Navigation menu

Search