Platform/HTML5 sanitizer

From MozillaWiki
Jump to: navigation, search

Gecko Requirements

  • Allow a setting for enabling styles.
  • Allow a setting for enabling comments. See bug 572642
    • Or always enable comments? (What about "--" in comments?)
  • Have three element white lists: HTML, SVG and MathML.
    • This turns out to lead to a lot of complexity without clear benefit.
  • Have three attribute white lists: HTML, SVG and MathML. The attributes don't depend on the element they are on beyond the element namespace.
    • XXX: Figure out what the requirements are for attributes starting with data- or _.
  • Have three lists of attributes that take URLs. Drop the attributes when they have prohibited URLs (after trimming whitespace from the value).
    • Resolve relative URLs into absolute ones using a per fragment base URL. (Is this correct for Gecko reqs? Current code uses the node's base URI. Is that right?)
    • However, allow any URL in the src attribute on the img element, because imgs are safe. bug 572637
  • Have a list of SVG attributes that take different-document references.
  • Have a list of SVG attributes that are allowed to have same-document references only.
  • If styles are allowed, sanitize style attribute values. If styles aren't allowed, drop the style attribute.
  • Always drop script and title elements and their contents.
  • If styles are disabled, drop style elements and their contents.
  • If styles are enabled, sanitize the content of style elements.
  • Add the controls attribute to the video and audio elements (if it isn't there already).

Open Questions

  • Can stylistic SVG attributes have values that need to be sanitized?
  • Should Semantic MathML be on the white list for clipboard round-tripping? (Mainly a footprint issue.)
  • Is it dangerous for SVG fragment id references to be able to refer to an id in the document the untrusted fragment gets inserted into?
  • What to do about microdata?

Non-Gecko Requirements

These are features for the HTML5 parser when it is used outside Gecko.

  • Allow form-related elements to be toggled on and off in the white list.
  • Allow using the sanitizer in non-fragment mode (in which case, the title element should be allowed).
    • Are there compelling use cases for non-fragment mode sanitization?
  • Have a configurable white list of permitted URL schemes in attributes that take URLs.