The autocomplete attribute and web documents using XHTML

Background

When Microsoft introduced form AutoComplete to the web with Internet Explorer 5 in 1999, they also "extended" the form and text and password input elements of HTML 3.2 with an autocomplete attribute which allowed site authors to disable this feature on a case-by-case basis. Gecko-based browsers gained an autocomplete feature in 2000, and by 2001 they too were forced to support the autocomplete attribute (Netscape Devedge: How to Turn Off Form Autocompletion). The primary motivation for the attribute at that time was that banks believed disabling autocomplete was a necessary security measure for the login information on their websites, and would bar from their services browsers that had autocomplete features without support for disabling them.

In practice, disabling autocomplete is not a particularly effective security measure. Even in the days of Internet Explorer 5, any machine could be compromised by keylogging that would of course be undeterred by a mere autocomplete attribute. Thanks to the rise of User-Agent spoofing, it became increasingly difficult to exclude browsers like Opera that support form history but ignore the autocomplete attribute by default (Opera's Settings File Explained). And now that web users increasing edit the content delivered to them from the server to suit their own preferences, circumventing the autocomplete attribute is trivial (Remember Password Bookmarklet) even in Mozilla browsers that claim to support it "perfectly" (Mozilla Developer Center: How To Turn Off Form Autocompletion). Nonetheless, even technical opinion is divided over whether autocomplete offers valuable protection against casual attackers or merely lulls users and web authors into a false sense of security, leaving them vulnerable to more determined assailants.

I don't know of a formal statement of the current attitude of banks towards autocomplete, but I suspect many still believe it to be an important safeguard, notwithstanding the flaws I've just mentioned. As late as November 2004, financial standards bodies like APACS were insisting that sensitive systems make use of the attribute and failure to use it could spell public relations disaster (BBC News: Bank moves to close web loophole). Security consultants like McAfee's Corey Benninger continue to recommend that site owners employ the autocomplete attribute (Developer: Browser Cache: Goodies For Hackers). The developers of the Web Forms 2.0 specification were forced to support the autocomplete attribute (Web Forms 2.0 Working Draft: The autocomplete attribute) even though they do not believe it offers any genuine security benefits (Lachlan Hunt kicking off an epic thread on the subject at the WHATWG mailing list).

In 2004, Kevin Gibb's Google Suggest found a new use for the autocomplete attribute: disabling the browser's autocompletion in order to allow a website's own JavaScript autocompletion to begin with a blank input field. Although it is possible to achieve a similar effect without the attribute, Google has set a precendent and now such non-standard code is churned out by Ajax developers, libraries, and toolsets everywhere. For example, Ruby on Rails's autocomplete helpers use it.

In summary, it seems that despite being non-standard, having known security flaws, being replicable with alternate techniques, and suffering from limited browser support (e.g. ELinks has form history but does not recognize the attribute), autocomplete is here to stay.

The Problem

1. Mainstream web development should be based on common markup based on some sort of standards.

2. When serving content is important to be accurate about what standard you are following (hence need for the Vary HTTP header, MIME types, document types, XML schema, microformats, etc), or the standards themselves become depreciated.

2. The web is gradually transitioning from HTML to XML-based markup, helped by minority browsers that can parse the application/xhtml+xml internet media type correctly.

3. When writing HTML, it is trivial to express the autocomplete attribute using the SGML standard by creating a custom DTD and using that as your doctype. You can create such a DTD by importing an HTML DTD and then simply adding the following line:

<!ATTLIST (form|input) autocomplete (on,off) #IMPLIED>

I've put up some example DTDs at my site:

To use such as custom doctype, simply declare it like so:

<!DOCTYPE html SYSTEM "http://www.benjaminhawkeslewis.com/legacymarkup/dtd/html-4.01-strict-plus-autocomplete.dtd">

Contrary to common misapprehensions, declaring a custom DTD in this way does not trigger Quirks mode. (Except perhaps in IE 5/Mac -- can anyone confirm that?) Unrecognized doctypes with a URI are interpreted in full Standards mode. See:

I've added an example document using the Strict version to my site.


4. The X in XHTML stands for "extensible". But because all elements and attributes are namespaced in XML and the W3C jealously guard their XHTML namespace from extension by others, a site author may not do the same with XHTML-based markup. While it is certainly possible to build a custom XML DTD to include an autocomplete, the act would be meaningless as the W3C have made it clear they would not regard documents extending the XHTML namespace in this way as even using XHTML at all (XHTML Modularization 1.1: Working Draft: Conformance Definition, also see A List Apart: More About Custom DTDs).

5. Therefore there is no current way to use a simple markup attribute to turn off autocompletion when using XHTML.

A solution?

Whichever solution is adopted, it is at present only important for Mozilla, KHTML, and WebKit developers to recognize it, as (AFAIK) only browsers based on their engines both correctly parse application/xhtml+xml and claim to allow form history to be disabled by site authors.

The options

1. The very best solution would be to radically overhaul XHTML and JavaScript to make it easy to write secure web applications. Andrew van der Stock has some pointers over at the Web Application Security Mailing List. However, that will likely take years and will still leave the problem of legacy browsers. Would-be XHTML authors need a solution to this problem now.

2. If we agree that autocomplete is ultimately detrimental to security, the next-best solution would be to persuade the web and financial communities to employ other techniques. Good luck with that.

3. Failing that, a good solution would be to persuade the W3C to include the autocomplete attribute in an XHTML module similar to the Legacy module. But I think that will never happen.

4. More plausibly, we could add browser support for a formhistory attribute in a vendor neutral URI such as http://www.legacymarkup.org/xmlns/formhistory, and implement it in a simple XHTML module. When serving the same resource as HTML following content negotiation, the XSL transformation to HTML would be trivial:

<xsl:template match="@formhistory:formhistory='on'">
  <!-- Replace with autocomplete attribute -->
  <xsl:attribute name="autocomplete">on</xsl:attribute>
</xsl:template>

<xsl:template match="@formhistory:formhistory='off'">
  <!-- Replace with autocomplete attribute -->
  <xsl:attribute name="autocomplete">off</xsl:attribute>
</xsl:template>

(N.B. This should work, but needs testing anyhow.)

Remember however, that (properly) HTML used in this way must use a custom doctype, as described above. Indeed, following the example of embedded XHTML accessibility roles (Embedding Accessibility Role and State Metadata in HTML Documents: Working Draft), we could create an associated microformat (Microformats Wiki: Introduction to Microformats) that would overload the class attribute. Accessibility uses "axs" as a delimiter to prevent namespace collision; we could use "fh". So we'd end up supporting:

class="some_css_class some_other_css_class fh disable_form_history"

and

class="fh enable_form_history"

The two variations are necessary because an input might have autocomplete enabled within a form that generally wants autocomplete disabled (hat tip to Richard Moore).

Microformatted (X)HTML could still be converted to custom HTML comprehensible to older, non-supporting user agents with an XSL transformation, something along the lines of:

<xsl:template match="xhtml:input[contains(@class,'disable_form_history')]">
    <xsl:element name="{local-name()}">
      <!-- Process existing attributes -->
      <xsl:apply-templates select="@*"/>
      <!-- Add autocomplete attribute -->
      <xsl:attribute name="autocomplete">off</xsl:attribute>
    </xsl:element>
  </xsl:template>

(N.B. This XSL definitely needs testing.)

5. Other alternatives might involve using proprietary extensions, such as XUL/XBL. I suspect any implementation would be dependent on JavaScript, which would not satisfy those using autocomplete for putative security reasons. It would also be inferior to a cross-browser solution. However, such extensions might prove useful for backporting support for namespaced or microformated autocompletion instructions to old versions of browsers, whether Mozilla (with extensions or XUL/XBL), Konqueror (with KParts), or even to Internet Explorer (with HTC). For an inspiring example of what can be achieved with such extensions, have a look at Sjoerd Visscher's XHTML 2 page at W3Future, which uses proprietary Opera CSS, XBL, and HTC to mimic XHTML 2 on current user agents.

Conclusion

At present the most practical option is 4, hopefully aided by 5.

(As a sidenote, if we went down the namespaced route, we might like to create a more general http://www.legacymarkup.org/xmlns/legacyml namespace as a retirement home for all sorts of legacy markup that the W3C would, admirably but somewhat impractically, ignore.)

How would Option 4 help web authors if implemented?

Option 4 would affect web authors differently depending on what sort of markup and internet media (MIME) type they are using.

Authors who want to use only HTML

If they declare that they are authoring according to a W3C HTML doctype, but use the autocomplete attribute then they are simply generating junk markup (as has always been the case). What they should do is use a custom DTD (as described above). Because of browser support, this is the best option for those who want to use autocomplete for security.

My suggestion would allow developers to author HTML according to W3C DTDs and disable autocompletion in supporting browsers, using a namespaced class like so:

<INPUT TYPE="text" CLASS="fh disable_form_history" NAME="sensitive ID="sensitive">

This would be a great option for people creating Ajax applications like Google Suggest.

Authors who want to use only XHTML

I cannot emphasize enough that the use of the autocomplete attribute is currently utterly impossible in XHTML documents. With my proposal implemented, authors could at last use an attribute that does the same thing as the autocomplete attribute. Browser support would be less widespread, but browser support for XHTML as a whole is less common and browser support for the attribute is patchy in the first place (e.g. Opera, ELinks, etc.).

Using my attribute would require the import of a namespace (and ideally a suitable doctype). Such imports are common in the XHTML world, as they are required to mix in goodies like SVG, MathML, and XForms. Here's what such a document might look like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
    "-//LegacyMarkup//DTD W3C XHTML 1.1 plus FormHistory"
    "http://www.legacymarkup.org/2006/08/xhtml-formhistory/xhtml-formhistory.dtd">
<html xml:lang="en" 
	xmlns="http://www.w3.org/1999/xhtml" 
	xmlns:fh="http://www.legacymarkup.org/xmlns/formhistory">
<head>
<title>Example document</title>
</head>
	
<body>

<form action="http://www.example.com/someaction" method="post">
<div>

<label for="non_sensitive">Your non-sensitive data:</label>
<!-- UAs implementing the HTML autocomplete attribute
 SHOULD autocomplete this field (the default behaviour): -->
<input type="text" name="non_sensitive" id="non_sensitive"/>

<label for="sensitive">Your sensitive data:</label>
<!-- UAs implementing the HTML autocomplete attribute 
SHOULD NOT autocomplete this form: -->
<input type="text" fh:formhistory="off" name="sensitive" id="sensitive" />

</div>
</form>

</body>

</html>

Authors who want to write XHTML 1.0, but use content negotiation to serve XHTML 1.0 as text/html (i.e. tag soup) to user agents that don't support XHTML

This group *must not* use the existing autocomplete attribute when serving as text/html (because it's not XHTML), and (I think) *must not* use the proposed namespaced attribute (because it's not HTML). But it is trivial for them to use the proposed microformat in ordinary XHTML 1.0, just like this:

<input type="text" class="fh disable_form_history" name="sensitive id="sensitive" />

Authors who want to write XHTML, but use content negotiation to serve HTML as text/html to user agents that don't support XHTML

This is the group to whom the XSL transformations are relevant. Serving both XHTML and HTML requires a transformation, usually using XSL. My suggestion would require one additional template rule. Given such authors will often be transforming not only vanilla XHTML, but XForms, SVG, etc. to HTML equivalents, such a template rule is but a drop in an ocean of XSL complexity. (I expect such complexity to be increasingly standardized as XHTML is more widely adopted.)

Appendix: Autocomplete alternatives

Some readers may be interested in what else is possible with present technology and markup.

Security

1. You could try educating your users in the dangers of public and unsecured machines, and in how to use their browser's autocompletion functionality selectively. In particular, warn them of the dangers even from from acquaintances and family, who are the perpetrators in around 11% of UK identify fraud (CIFAS Research: Identify Fraud -- What About the Victim?). Unfortunately, your users are unlikely to have the same interest in the subject that you have -- until their identity is compromised. Even if they're interested, the majority of internet users are a poor match for more technically sophisticated crooks. This, ultimately, is a serious practical flaw in the argument advanced by Lachlan Hunt and others that site authors have no "right" to restrict web clients' use of autocompletion.

2. Instead of an autocomplete attribute, consider using a nonce (Wikipedia).

Let's say you have a field like:

<input type="text" name="my_sensitive_data" autocomplete="off" />

You could replace that with:

<input type="hidden" name="my_nonce" value="dLafr5aCo0pH7eyo" />
<input type="text" name="dLafr5aCo0pH7eyo" />

Here "dLafr5aCo0pH7eyo" is a value generated randomly server-side for each request of the form. On submission, the server simply reads the value of the field named "dLafr5aCo0pH7eyo" into "my_sensitive_data". While form data will be remembered by the browser, it will never be used for autocompletion because the name of the relevant field is different each time. This method has the advantage of being fully compatible with the W3C's HTML and XHTML specification and it does protect the user from their sensitive data simply appearing in fields when someone else uses their computer. However, it obviously offers no protection against someone able to read their stored personal data.

3. Consider using a two-factor system including a one-shot password, as with TAN (Wikipedia), which might be distributed via a list, SMS, or a custom hardware device like SafeWord. Be aware that to some extent this merely converts an electronic security problem into a physical one, and that some users will run a mile when faced with the need for yet another gadget.

JavaScript autocompletion

The nonce method described above is a good candidate here.

Although Google Suggest does use the autocomplete attribute, there are Ajax libraries that achieve similar effects without it.