Fingerprinting
Overview
The EFF published an excellent study in May, detailing some of the various methods of fingerprinting a browser. See http://www.eff.org/deeplinks/2010/05/every-browser-unique-results-fom-panopticlick. They found that, over their study of around 1 million visits to their study website, 83.6% of the browsers seen had a unique fingerprint; among those with Flash or Java enabled, 94.2%. This does not include cookies! They ranked the various bits of information in order of importance (i.e. how useful they are in uniquely identifying a browser): things like UA string, what addons are installed, and the font list of the system. We need to go through these, one by one, and do what we can to reduce the number of bits of information (entropy) it provides. In their study, they placed a lower bound on the fingerprint distribution of 18.1 bits of entropy. (This means that, choosing a browser at random, at best one in 286,777 other browsers will share its fingerprint.)
Data
The following data is taken from the published paper, https://panopticlick.eff.org/browser-uniqueness.pdf:
| Variable | Entropy (bits) |
| plugins | 15.4 |
| fonts | 13.9 |
| user agent | 10.0 |
| http accept | 6.09 |
| screen resolution | 4.83 |
| timezone | 3.04 |
| supercookies | 2.12 |
| cookies enabled | 0.353 |
In all cases, data was either collected or inferred via HTTP, or collected by JS code and posted back to the server via AJAX.
Plugins
The PluginDetect JS library was used to check for 8 common plugins on that platform, plus extra code to estimate the Acrobat Reader version. Data sent by AJAX post.
IE does not allow enumeration via navigator.plugins. We could follow suit here. A short list could be brute-forced by simply including several plugin-specific objects in a page. I'm not sure what can be done about that, other than perhaps limiting the number of plugin types a page can display?
Fonts
System fonts collected by Flash or Java applet, if installed, and sent via AJAX post. Font list was not sorted, which provides a bit or two of additional entropy. We can ask them to either limit this list by default; or ask them to implement an API such that we can provide the list to them; or (made possible by OOPP) replace the OS API calls they use to get the font list, and give them our own. None of these things are easy, but given that this is #1, we should definitely do something here. The fastest option is probably to hack the OS API calls ourselves.
Font lists can also be determined by CSS introspection. At the very least we should sort the list; perhaps shorten it to a smaller set of common fonts; and back off (exponentially?) if script attempts to brute-force the list.
Web fonts (WOFF) -- provide a small, standard set, and require sites to simply provide their own WOFF if they want a nonstandard font?
User Agent
Detected from HTTP header. Pretty simple fix, but has the potential for breakage (as with any UA change!). For instance: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.7) Gecko/20100106
Ubuntu/9.10 (karmic) Firefox/3.5.7. Remedies: remove the last point digit in the Firefox and Gecko versions, and the Gecko build date; for Linux, remove distribution and version; possibly remove CPU. Windows is actually the best since the OS version string only identifies the major version (e.g. XP).
HTTP ACCEPT
Example: text/html, */* ISO-8859-1,utf-8;q=0.7,*;q=0.7 gzip,deflate en-
us,en;q=0.5. Not sure we can do much here?
Screen resolution
Example: 1280x800x24. Can't mess with this, except perhaps to always report "24" for the color depth -- of dubious value.
Timezone
Too useful to break.
Supercookies
The reported entropy includes only whether the following were enabled: DOM localStorage, DOM sessionStorage, and (for IE) userData. It did not test Flash LSOs, Silverlight cookies, HTML5 databases, or DOM globalStorage. We can't do anything to prevent testing whether these are enabled, but we can lock them down for third parties, as we will with cookies.
For Flash and Silverlight we need to pressure them to implement better APIs for controlling and clearing stored data. This is undoubtedly more important than anything else on this list, though it was ignored in this study since it does not fit within their definition of fingerprinting. We could be aggressive here by using the new Flash API for private browsing mode very liberally; or do something with the OS APIs as mentioned above.
Cookies enabled
Irrelevant due to low amount of entropy.
Extra credit
Other fingerprinting methods were mentioned, but not included, in the study. Examples:
Other data acquired via plugins
Undoubtedly Flash and Java provide other interesting tidbits. ActiveX and Silverlight, for example, allow querying the "CPU type and many other details". More study needed here.
Clock skew measurements
"41st Parameter looks at more than 100 parameters, and at the core of its algorithm is a time differential parameter that measures the time difference between a user’s PC (down to the millisecond) and a server’s PC." We can't break the millisecond resolution of Date.now, but we could try adding a small (< 100ms) offset to it. This would be generated per-origin, and would last for some relatively short time: life of session, life of tab, etc. Would have to be careful that it can't be reversed.
TCP stack
"ThreatMetrix claims that it can detect irregularities in the TCP/IP stack and can pierce through proxy servers". Not sure what this means yet.
JS behavioral tests
Can be used to gather information about whether certain addons are installed, exact browser version, etc. Probably nothing we can do here.
"TorButton has evolved to give considerable thought to fingerprint resistance [19] and may be receiving the levels of scrutiny necessary to succeed in that project [15]. NoScript is a useful privacy enhancing technology that seems to reduce fingerprintability."
"We identified only three groups of browser with comparatively good resistance to fingerprinting: those that block JavaScript, those that use TorButton, and certain types of smartphone."
We should study what TorButton does, and see if we can integrate some of its features. We can also recommend it, NoScript, and Flashblock to users. We could suggest improvements to relevant addons, such as providing options for blocking third party but not first party content. (This doesn't strictly solve anything, but makes gathering the data more difficult, since the third party now relies on the first party to collect it.)
User interface
Things like geolocation, database access and such require the user to grant permission for a given site. For geolocation, this is done with an infobar. We should do everything we can to make it clear to users what they're providing, and give them centralized control of those permissions in the privacy panel. This is what the UX privacy proposals seek to do.