Security/Reviews/Firefox4/HTML5 Parser Security Review: Difference between revisions
Line 21: | Line 21: | ||
* What potential security issues in your feature have you already considered and addressed? | * What potential security issues in your feature have you already considered and addressed? | ||
I have considered the security issues that the spec itself addresses (see immediately above). | |||
Gecko's layout system runs algorithms that are recursive along the depth of the DOM tree. This means that deep trees lead to an overflow of the runtime stack, especially on Windows. The HTML5 parser limits the depth of the tree it creates to 200. This works against DoS-by-incompetence but not against DoS-by-malice (since deep trees can be created by other means). | Gecko's layout system runs algorithms that are recursive along the depth of the DOM tree. This means that deep trees lead to an overflow of the runtime stack, especially on Windows. The HTML5 parser limits the depth of the tree it creates to 200. This works against DoS-by-incompetence but not against DoS-by-malice (since deep trees can be created by other means). | ||
Line 33: | Line 35: | ||
When the tag <isindex> is parsed, a string that depends on the UI localization of the browser is inserted into the resulting DOM. An untrusted JavaScript program can use this string to obtain configuration-dependent entropy for fingerprinting or can infer the UI locale of the user. However, Gecko already leaks this data elsewhere. | When the tag <isindex> is parsed, a string that depends on the UI localization of the browser is inserted into the resulting DOM. An untrusted JavaScript program can use this string to obtain configuration-dependent entropy for fingerprinting or can infer the UI locale of the user. However, Gecko already leaks this data elsewhere. | ||
[http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reconstruct-the-active-formatting-elements Reconstructing active formatting elements] can make the DOM grow more than linearly as a function of the length of the input. This makes DoS attacks by resource exhaustion easy. [http://lists.w3.org/Archives/Public/public-html/2010Sep/0163.html Mitigations are being considered]. | |||
RAM can be exhausted by sending a lot of data to the HTML5 parser. For example, element and attribute names and attribute values get buffered until they've been completely seen, so a server-side script can serve an infinitely long attribute value to exhaust RAM on the client. This is not an attack point _introduced_ by the HTML5 parser, since the old parser had this attack point as well. However, the HTML5 parser's reliance on "infallible" malloc might make RAM exhaustion lead to a crash more often. As a mitigating factor on desktops, the rate at which the network can feed the parser is low enough relative to the available memory that the user can stop the load of pages that consume excessive RAM due to being large for non-malicious reasons. However, a malicious attacker could use gzip bombs to work around the rate at which the network can feed the parser. | |||
* How are transitions in/out of Private Browsing mode handled? | * How are transitions in/out of Private Browsing mode handled? |
Revision as of 13:46, 27 September 2010
Overview
Describe the goals and objectives of the feature here.
- Make HTML parsing well-defined commodity functionality that everyone does the same way instead of having product-specific magic.
- Enable the use of SVG and MathML in text/html.
- Replace Gecko's HTML parser with something better understood and more maintainable.
- Move HTML parsing off the main thread in the hope of improving responsiveness.
- Background links
- bug html5-parsing Tracking bug
- Relevant bugs
- spec
Security and Privacy
- Is this feature a security feature? If it is, what security issues is it intended to resolve?
The HTML5 parser is not a security feature. However, the HTML5 parsing algorithm attempts to have these defense-in-depth security features:
- U+0000 is not ignored where script or style sheet data can occur. (It is turned into U+FFFD instead.) This way, if a security gatekeeper is blacklist-based (which they shouldn't be; everyone should use whitelist-based gatekeepers), if the attacker tries to fool the gatekeeper by injecting U+0000 into blacklisted identifiers, the browser doesn't treat the parsed identifiers as those dangerous identifiers, because U+0000 has been turned into U+FFFD instead of getting dropped.
- Forcing a premature end of file doesn't change the executability of a given piece of the page compared to the situation where a premature end of file hasn't been forced. This is achieved by not retokenizing in a different mode if the EOF is seen inside [R]CDATA text or inside a comment.
- If the EOF occurs within a token, the incomplete token is discarded. This way, a premature EOF can't truncate the code in event handler attributes.
- What potential security issues in your feature have you already considered and addressed?
I have considered the security issues that the spec itself addresses (see immediately above).
Gecko's layout system runs algorithms that are recursive along the depth of the DOM tree. This means that deep trees lead to an overflow of the runtime stack, especially on Windows. The HTML5 parser limits the depth of the tree it creates to 200. This works against DoS-by-incompetence but not against DoS-by-malice (since deep trees can be created by other means).
The Adoption Agency Agency algorithm has two loops one inside another, which means that the work done by the parser can grow more than linearly as a function of the length of the input. A patch for limiting the number of iterations is in queue. See bug 596180.
- Is system or subsystem security compromised in any way if your project's configuration files / prefs are corrupt or missing?
No.
- Include a thorough description of the security assumptions, capabilities and any potential risks (possible attack points) being introduced by your project.
When the tag <isindex> is parsed, a string that depends on the UI localization of the browser is inserted into the resulting DOM. An untrusted JavaScript program can use this string to obtain configuration-dependent entropy for fingerprinting or can infer the UI locale of the user. However, Gecko already leaks this data elsewhere.
Reconstructing active formatting elements can make the DOM grow more than linearly as a function of the length of the input. This makes DoS attacks by resource exhaustion easy. Mitigations are being considered.
RAM can be exhausted by sending a lot of data to the HTML5 parser. For example, element and attribute names and attribute values get buffered until they've been completely seen, so a server-side script can serve an infinitely long attribute value to exhaust RAM on the client. This is not an attack point _introduced_ by the HTML5 parser, since the old parser had this attack point as well. However, the HTML5 parser's reliance on "infallible" malloc might make RAM exhaustion lead to a crash more often. As a mitigating factor on desktops, the rate at which the network can feed the parser is low enough relative to the available memory that the user can stop the load of pages that consume excessive RAM due to being large for non-malicious reasons. However, a malicious attacker could use gzip bombs to work around the rate at which the network can feed the parser.
- How are transitions in/out of Private Browsing mode handled?
They are not handled.
Exported APIs
- Please provide a table of exported interfaces (APIs, ABIs, protocols, UI, etc.)
- Does it interoperate with a web service? How will it do so?
- Explain the significant file formats, names, syntax, and semantics.
- Are the externally visible interfaces documented clearly enough for a non-Mozilla developer to use them successfully?
- Does it change any existing interfaces?
Module interactions
- What other modules are used (REQUIRES in the makefile, interfaces)?
Data
- What data is read or parsed by this feature?
- What is the output of this feature?
- What storage formats are used?
Reliability
- What failure modes or decision points are presented to the user?
- Can its files be corrupted by failures? Does it clean up any locks/files after crashes?
Configuration
- Can the end user configure settings, via a UI or about:config? Hidden prefs? Environment variables?
- Are there build options for developers? [#ifdefs, ac_add_options, etc.]
- What ranges for the tunable are appropriate? How are they determined?
- What are its on-going maintenance requirements (e.g. Web links, perishable data files)?
Relationships to other projects
Are there related projects in the community?
- If so, what is the proposal's relationship to their work? Do you depend on others' work, or vice-versa?
- Are you updating, copying or changing functional areas maintained by other groups? How are you coordinating and communicating with them? Do they "approve" of what you propose?