The autocomplete attribute and web documents using XHTML

From MozillaWiki
Jump to: navigation, search

Executive summary

Many important web content producers, including banks and Google, insist on using the non-standard HTML autocomplete attribute for either security or usability reasons. When writing HTML, such authors should declare and validate against a custom doctype including the autocomplete attribute (example HTML document).

However, there is currently no way to trigger the same user agent functionality with an attribute in XHTML. This constitutes an unnecessary obstacle to the adoption of XML-based markup.

If the W3C adopts Web Forms 2.0 then this problem will be solved.

If that doesn't happen, by adopting my solution and supporting a small XHTML module and delimited class microformat, browsers could enable authors serving XHTML 1.0 as text/html to trigger the same behaviour with class="fh disable_form_history" (example XHTML 1.0 document) and authors serving XHTML properly as application/xhtml+xml to trigger the same behaviour with a namespaced attribute (fh:form_history="off") (example modularized XHTML document). Those authors who wish to serve both XHTML and HTML properly using an XSL transformation would could convert to the HTML autocomplete attribute with simple XSL rules.

Background

When Microsoft introduced form AutoComplete to the web with Internet Explorer 5 in 1999, they also "extended" the form and text and password input elements of HTML 3.2 with an autocomplete attribute which allowed site authors to disable this feature on a case-by-case basis. Gecko-based browsers gained an autocomplete feature in 2000, and by 2001 they too were forced to support the autocomplete attribute (Netscape Devedge: How to Turn Off Form Autocompletion). The primary motivation for the attribute at that time was that banks believed disabling autocomplete was a necessary security measure for the login information on their websites, and would bar from their services browsers that had autocomplete features without support for disabling them.

In practice, disabling autocomplete is not a particularly effective security measure. Even in the days of Internet Explorer 5, any machine could be compromised by keylogging that would of course be undeterred by a mere autocomplete attribute. Thanks to the rise of User-Agent spoofing, it became increasingly difficult to exclude browsers like Opera that support form history but ignore the autocomplete attribute by default (Opera's Settings File Explained), or browsers like ELinks and OmniWeb that have form history and ignore the attribute entirely. And now that web users increasing edit the content delivered to them from the server to suit their own preferences, circumventing the autocomplete attribute is trivial (Remember Password Bookmarklet) even in Mozilla browsers that claim to support it "perfectly" (Mozilla Developer Center: How To Turn Off Form Autocompletion). Nonetheless, even technical opinion is divided over whether autocomplete offers valuable protection against casual attackers or merely lulls users and web authors into a false sense of security, leaving them vulnerable to more determined assailants.

I don't know of a formal statement of the current attitude of banks towards autocomplete, but I suspect many still believe it to be an important safeguard, notwithstanding the flaws I've just mentioned. As late as November 2004, financial standards bodies like APACS - Information on Cheque Clearing and Cheque Fraud were insisting that sensitive systems make use of the attribute and failure to use it could spell public relations disaster (BBC News: Bank moves to close web loophole). Security consultants like McAfee's Corey Benninger continue to recommend that site owners employ the autocomplete attribute (Developer: Browser Cache: Goodies For Hackers). The developers of the Web Forms 2.0 specification were forced to support the autocomplete attribute (Web Forms 2.0 Working Draft: The autocomplete attribute) even though they do not believe it offers any genuine security benefits (Lachlan Hunt kicking off an epic thread on the subject at the WHATWG mailing list).


Here's a non-exhaustive list of major banking groups I've found currently making use of autocomplete:

In 2004, Kevin Gibb's Google Suggest found a new use for the autocomplete attribute: disabling the browser's autocompletion in order to allow a website's own JavaScript autocompletion to begin with a blank input field. Although it is possible to achieve a similar effect without the attribute, Google has set a precendent and now such non-standard code is churned out by Ajax developers, libraries, and toolsets everywhere. For example, Ruby on Rails's autocomplete helpers use it.

In summary, it seems that despite being non-standard, having known security flaws, being replicable with alternate techniques, and suffering from limited browser support, autocomplete is here to stay.

The Problem

1. Mainstream web development should be based on common markup based on some sort of standards.

2. When serving content is important to be accurate about what standard you are following (hence need for the Vary HTTP header, MIME types, document types, XML schema, microformats, etc), or the standards themselves become depreciated.

2. The web is gradually transitioning from HTML to XML-based markup, helped by minority browsers that can parse the application/xhtml+xml internet media type correctly.

3. When writing HTML, it is trivial to express the autocomplete attribute using the SGML standard by creating a custom DTD and using that as your doctype. You can create such a DTD by importing an HTML DTD and then simply adding the following line:

<!ATTLIST (form|input) autocomplete (on,off) #IMPLIED>

I've put up some example DTDs at my site:

To use such as custom doctype, simply declare it like so:

<!DOCTYPE html SYSTEM "http://www.benjaminhawkeslewis.com/legacymarkup/dtd/html-4.01-strict-plus-autocomplete.dtd">

Contrary to common misapprehensions, declaring a custom DTD in this way does not trigger Quirks mode, not even in IE 5/Mac. Unrecognized doctypes with a URI are interpreted in full Standards mode. See:

I've added an example document using the Strict version to my site.

4. The X in XHTML stands for "extensible". But because all elements and attributes are namespaced in XML and the W3C jealously guard their XHTML namespace from extension by others, a site author may not do the same with XHTML-based markup. While it is certainly possible to build a custom XML DTD to include an autocomplete, the act would be meaningless as the W3C have made it clear they would not regard documents extending the XHTML namespace in this way as even using XHTML at all (XHTML Modularization 1.1: Working Draft: Conformance Definition, also see A List Apart: More About Custom DTDs).

5. Therefore there is no current way to use a simple markup attribute to turn off autocompletion when using XHTML.

A solution?

Whichever solution is adopted, it is at present only important for Mozilla, KHTML, and WebKit developers to recognize it, as (AFAIK) only browsers based on their engines both correctly parse application/xhtml+xml and claim to allow form history to be disabled by site authors.

The options

1. The very best solution would be to radically overhaul XHTML and JavaScript to make it easy to write secure web applications. Andrew van der Stock has some pointers over at the Web Application Security Mailing List. However, that will likely take years and will still leave the problem of legacy browsers. Would-be XHTML authors need a solution to this problem now.

2. If we agree that autocomplete is ultimately detrimental to security, the next-best solution would be to persuade the web and financial communities to employ other techniques. Good luck with that.

3. Failing that, a good solution would be to persuade the W3C to include the autocomplete attribute in an XHTML module similar to the Legacy module. When I originally drafted this proposal, I thought that would never happen, but it looks like there's a fighting chance Web Forms 2.0 (which comes in HTML and XML flavours) will be adopted by the W3C - and Web Forms 2.0 includes the "autocomplete" attribute. :) See my discussion with Opera's Anne van Kesteren and keep an eye out for Web Forms 2.0 news in September!

4. Still let's not count our chickens until they've hatched. Failing that, we could add browser support for a formhistory attribute in a vendor neutral URI such as http://www.legacymarkup.org/xmlns/formhistory, and implement it in a simple XHTML module. When serving the same resource as HTML following content negotiation, the XSL transformation to HTML would be trivial:

<xsl:template match="@formhistory:formhistory='on'">
  <!-- Replace with autocomplete attribute -->
  <xsl:attribute name="autocomplete">on</xsl:attribute>
</xsl:template>

<xsl:template match="@formhistory:formhistory='off'">
  <!-- Replace with autocomplete attribute -->
  <xsl:attribute name="autocomplete">off</xsl:attribute>
</xsl:template>

(N.B. This should work, but needs testing anyhow.)

Remember however, that (properly) HTML used in this way must use a custom doctype, as described above. Indeed, following the example of embedded XHTML accessibility roles (Embedding Accessibility Role and State Metadata in HTML Documents: Working Draft), we could create an associated microformat (Microformats Wiki: Introduction to Microformats) that would overload the class attribute. Accessibility uses "axs" as a delimiter to prevent namespace collision; we could use "fh". So we'd end up supporting:

class="some_css_class some_other_css_class fh disable_form_history"

and

class="fh enable_form_history"

The two variations are necessary because an input might have autocomplete enabled within a form that generally wants autocomplete disabled (hat tip to Richard Moore).

Microformatted (X)HTML could still be converted to custom HTML comprehensible to older, non-supporting user agents with an XSL transformation, something along the lines of:

<xsl:template match="xhtml:input[contains(@class,'disable_form_history')]">
    <xsl:element name="{local-name()}">
      <!-- Process existing attributes -->
      <xsl:apply-templates select="@*"/>
      <!-- Add autocomplete attribute -->
      <xsl:attribute name="autocomplete">off</xsl:attribute>
    </xsl:element>
  </xsl:template>

(N.B. This XSL definitely needs testing.)

5. Other alternatives might involve using proprietary extensions, such as XUL/XBL. I suspect any implementation would be dependent on JavaScript, which would not satisfy those using autocomplete for putative security reasons. It would also be inferior to a cross-browser solution. However, such extensions might prove useful for backporting support for namespaced or microformated autocompletion instructions to old versions of browsers, whether Mozilla (with extensions or XUL/XBL), Konqueror (with KParts), or even to Internet Explorer (with HTC). For an inspiring example of what can be achieved with such extensions, have a look at Sjoerd Visscher's XHTML 2 page at W3Future, which uses proprietary Opera CSS, XBL, and HTC to mimic XHTML 2 on current user agents.

6. Use JavaScript to set the attribute.

document.getElementById( "MyInput" ).setAttribute( "autocomplete","off" )

Conclusion

At present the most practical option is 4, hopefully aided by 5.

(As a sidenote, if we went down the namespaced route, we might like to create a more general http://www.legacymarkup.org/xmlns/legacyml namespace as a retirement home for all sorts of legacy markup that the W3C would, admirably but somewhat impractically, ignore.)

How would Option 4 help web authors if implemented?

Option 4 would affect web authors differently depending on what sort of markup and internet media (MIME) type they are using.

Authors who want to use only HTML

If they declare that they are authoring according to a W3C HTML doctype, but use the autocomplete attribute then they are simply generating junk markup (as has always been the case). What they should do is use a custom DTD (as described above). Because of browser support, this is the best option for those who want to use autocomplete for security.

My suggestion would allow developers to author HTML according to W3C DTDs and disable autocompletion in supporting browsers, using a namespaced class like so:

<INPUT TYPE="text" CLASS="fh disable_form_history" NAME="sensitive ID="sensitive">

This would be a great option for people creating Ajax applications like Google Suggest.

Authors who want to use only XHTML

I cannot emphasize enough that the use of the autocomplete attribute is currently utterly impossible in XHTML documents. With my proposal implemented, authors could at last use an attribute that does the same thing as the autocomplete attribute. Browser support would be less widespread, but browser support for XHTML as a whole is less common and browser support for the attribute is patchy in the first place (e.g. Opera, ELinks, etc.).

Using my attribute would require the import of a namespace (and ideally a suitable doctype). Such imports are common in the XHTML world, as they are required to mix in goodies like SVG, MathML, and XForms. Here's what such a document might look like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
    "-//LegacyMarkup//DTD W3C XHTML 1.1 plus FormHistory"
    "http://www.legacymarkup.org/2006/08/xhtml-formhistory/xhtml-formhistory.dtd">
<html xml:lang="en" 
	xmlns="http://www.w3.org/1999/xhtml" 
	xmlns:fh="http://www.legacymarkup.org/xmlns/formhistory">
<head>
<title>Example document</title>
</head>
	
<body>

<form action="http://www.example.com/someaction" method="post">
<div>

<label for="non_sensitive">Your non-sensitive data:</label>
<!-- UAs implementing the HTML autocomplete attribute
 SHOULD autocomplete this field (the default behaviour): -->
<input type="text" name="non_sensitive" id="non_sensitive"/>

<label for="sensitive">Your sensitive data:</label>
<!-- UAs implementing the HTML autocomplete attribute 
SHOULD NOT autocomplete this form: -->
<input type="text" fh:formhistory="off" name="sensitive" id="sensitive" />

</div>
</form>

</body>

</html>

Edit by FunkyRes - mpeters@mac.com

I'm sorry, but this wiki page is incorrect. It is easier in xhtml to add an attribute than it is in html.

xhtml makes it very easy to do this. I do it.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!ATTLIST form autocomplete CDATA #IMPLIED>
]>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

Note the magic in the DTD declaration -

[
<!ATTLIST form autocomplete CDATA #IMPLIED>
]

With that, my xhtml properly validates, no need to create a custom DTD. Only gotcha is that you must send the application/xhtml+xml header with the document, but if you are using xhtml, you should be sending that header anyway.

Someone with better writing skills than me should properly update this page so that it presents things in a factual manner.

Authors who want to write XHTML 1.0, but use content negotiation to serve XHTML 1.0 as text/html (i.e. tag soup) to user agents that don't support XHTML

This group *must not* use the existing autocomplete attribute when serving as text/html (because it's not XHTML), and (I think) *must not* use the proposed namespaced attribute (because it's not HTML). But it is trivial for them to use the proposed microformat in ordinary XHTML 1.0, just like this:

<input type="text" class="fh disable_form_history" name="sensitive id="sensitive" />

Authors who want to write XHTML, but use content negotiation to serve HTML as text/html to user agents that don't support XHTML

This is the group to whom the XSL transformations are relevant. Serving both XHTML and HTML requires a transformation, usually using XSL. My suggestion would require one additional template rule. Given such authors will often be transforming not only vanilla XHTML, but XForms, SVG, etc. to HTML equivalents, such a template rule is but a drop in an ocean of XSL complexity. (I expect such complexity to be increasingly standardized as XHTML is more widely adopted.)

Another way to serve both html and xhtml is to use a libxml2 based tool, such as PHP DOMDocument class, to construct your document.

First you look at the $_SERVER['HTTP_ACCEPT'] variable to see if the client accepts application/xhtml+xml. If it does, you construct (or import) your document in a DOMDocument object using an xhtml Doctype. If it doesn't, you construct (or import) your document in a DOMDocument object using an html Doctype.

Then to serve, you use print $doc->saveXML() for xhtml capable clients and $doc->saveHTML() for the rest. No need to use an XSL transform.

Most server side dynamic languages have bindings to libxml2 that allow for this kind of thing.

Appendix A: What about WHATWG?

Update: If Web Forms 2.0 is adopted by the W3C, then the namespace issues I discuss below will disappear.

I welcome and support the work of WHATWG on Web Forms 2.0 and Web Applications 1.0. When Web Forms 2.0 is finalized, it seems set to become a welcome addition to the HTML armoury. Unfortunately, I do not believe it currently offers much to help XHTML authors. Unlike my microformat, Web Forms 2.0 offers nothing at all to authors of XHTML 1.0: there is no way to incorporate its autocomplete attribute in their markup. And unlike my namespaced attribute, its autocomplete currently cannot be legitimately included in an modular XHTML document, as far as I can tell. Although Web Forms 2.0 claims to be XHTML as well as HTML, it seems incompatible with the W3C's specifications for XML markup, according to which only the W3C alone is allowed to extend the "http://www.w3.org/1999/xhtml" namespace. The Web Forms 2.0 XHTML Module is at present an only a curiosity. By extending this namespace, it violates the W3C's own specification for XHTML Family Modules: "The module definition's elements and attributes must be part of an XML namespace [XMLNAMES]. If the module is defined by an organization other than the W3C, this namespace must NOT be the same as the namespace in which other W3C modules are defined." Nor, judging by the Working Draft for the next version of the Modularization specification, is that requirement likely to change in the future. So Web Forms cannot be part of a conforming XHTML host language or XHTML integration set document - that is, it cannot claim XHTML conformance of any sort that would currently be recognized by the W3C.

The W3C Team Comment on Web Forms 2.0 Submission shows no sign of the W3C budging on this issue. Conversely, WHATWG have so far rejected pleas to place WHATWG elements and attributes in an XML namespace other than "http://www.w3.org/1999/xhtml". This perhaps isn't terribly surprising, given that WHATWG arose in the context of an Opera and Mozilla position paper openly critical of namespace overuse. See also the following threads on the WHATWG mailing list:

1. Is this introducing incompatibilities with future W3C work? (June 2004)

2. clear naming for WHAT work (July 2004)

It is not yet clear whether this divergence between the W3C and WHATWG on namespacing in XHTML modularization should matter. But clearly for those to whom it does matter, Web Forms offers no solution to the problem addressed by this proposal.

Having said that, I think there is a strong case for implementations of Option 4 to allow room to adapt any changes by Web Forms 2.0 to the autocomplete attribute to the XHTML world. One question for discussion is whether Option 4 should mimic Web Forms's requirement that: "Support for the attribute should be enabled by default, and the ability to disable support should not be trivially accessible, as there are significant security implications for the user if support for this attribute is disabled." Notably, Opera currently disables support by default, and the ability to enable support is not trivially accessible.

Appendix B: Autocomplete alternatives

Some readers may be interested in what else is possible with present technology and markup.

Security

1. You could try educating your users in the dangers of public and unsecured machines, and in how to use their browser's autocompletion functionality selectively. In particular, warn them of the dangers even from from acquaintances and family, who are the perpetrators in around 11% of UK identify fraud (CIFAS Research: Identify Fraud -- What About the Victim?). Unfortunately, your users are unlikely to have the same interest in the subject that you have -- until their identity is compromised. Even if they're interested, the majority of internet users are a poor match for more technically sophisticated crooks. This, ultimately, is a serious practical flaw in the argument advanced by Lachlan Hunt and others that site authors have no "right" to restrict web clients' use of autocompletion.

2. Instead of an autocomplete attribute, consider using a nonce (Wikipedia).

Let's say you have a field like:

<input type="text" name="my_sensitive_data" autocomplete="off" />

You could replace that with:

<input type="hidden" name="my_nonce" value="dLafr5aCo0pH7eyo" />
<input type="text" name="dLafr5aCo0pH7eyo" />

Here "dLafr5aCo0pH7eyo" is a value generated randomly server-side for each request of the form. On submission, the server simply reads the value of the field named "dLafr5aCo0pH7eyo" into "my_sensitive_data". While form data will be remembered by the browser, it will never be used for autocompletion because the name of the relevant field is different each time. This method has the advantage of being fully compatible with the W3C's HTML and XHTML specification and it does protect the user from their sensitive data simply appearing in fields when someone else uses their computer. However, it obviously offers no protection against someone able to read their stored personal data.

3. Consider using a two-factor system including a one-shot password, as with TAN (Wikipedia), which might be distributed via a list, SMS, or a custom hardware device like SafeWord. Be aware that to some extent this merely converts an electronic security problem into a physical one, and that some users will run a mile when faced with the need for yet another gadget.

JavaScript autocompletion

The nonce method described above is a good candidate here.

Although Google Suggest does use the autocomplete attribute, there are Ajax libraries that achieve similar effects without it.