Papers:Sending the Right Signals

From MozillaWiki
Jump to: navigation, search
Your feedback and comments are welcomed on the discussion page.

Sending the Right Signals

This is Mozilla's submission for the upcoming W3C Workshop on Transparency and Usability of Web Authentication. The purpose of the Workshop is

"to identify steps W3C can take to improve Web Security from the user-facing end of the spectrum: Practical security online often fails because users can be induced to make decisions that jeopardize their security and privacy, based on a lack of working authentication of Web sites' identities (phishing). We will look at technologies that can support Web users to better assess the trustworthiness and identity of sites with which they deal."

There is also a pithy [1] presentation that goes along with it.

Jane, IRL

Jane is traveling, and finds herself in an unfamiliar area. She turns a corner and sees a bank, a corner store, and a taxi. She's hungry and wants to get back to her hotel, so she enters the bank, uses her ATM card to withdraw some money, walks to the corner store and gets a local snack and drink, and finally hops in the cab and heads off.

How did Jane know that the bank could be trusted? How could she be sure that the food she was about to buy wouldn't make her sick? What convinced her that the taxi driver was on the level?

In the physical world, there are a variety of signals that Jane can use to establish a sense of trust. Some of these signals are physical in form such as the architecture of the buildings, the cleanliness of the taxi, and freshness seals on packages. Other signals are entirely conceptual such as brand recognition. In all cases however, Jane's assessment of trust is based on levels of familiarity. If Jane recognizes the name of the bank, she will likely trust it completely. Jane may also decide to trust the bank if she recognizes the pattern of the name of the bank (i.e.: First National Bank of Whereverland) or if its physical characteristics match her mental image of a bank. There is a chance that Jane will be fooled, but we tend to be very effective at pattern matching, and even small inconsistencies would very likely raise suspicion.

Jane, Online

Jane returns home from traveling, and decides to go online and plan her next trip. After using a search engine to look for recommendations, she finds herself on an unfamiliar message board. She sees a link to a website that builds custom vacation packages. Jane likes this idea, and follows the link, submits her preferences and identification information, and charges her next trip to her credit card.

This time, when Jane had to make her assessment of trust, she had a similar set of signals to choose from. The name of the website may be a recognizable brand, or have closely matched a pattern that was familiar to Jane. The look and feel of the website may also matched Jane's expectation of what a professional website looks like.

Sadly, however, it is entirely possible that Jane had stumbled upon a malicious website which was impersonating a legitimate travel business. Jane may have just provided that malicious user with her personal identification and credit information.

Signals, IRL vs. Online

The physical world is obviously different from the online world. What is less obvious is that we all carry a set of expectations and experiences -- a "default philosophy" -- based on our real world experiences, and we interpret everything through this philosophy, including our "virtual world" experiences online (for more on this idea, see Small Pieces, Loosely Joined by David Weinberger). There are some fundamental differences between signals available to an individual in the physical and online worlds, however, and these differences are what make internet users so vulnerable to attack.

  • Tangibility: Perhaps the most obvious difference is that the physical world is tangible whereas the virtual world is not. When an individual visits a location in the physical world, they can examine it directly and in many dimensions. In the virtual world we are limited to the dimensions presented to us by the software used to view the virtual locations. As a result, we experience locations in the physical world in many more dimensions than those of the virtual. The additional dimensions (such as weight, smell, depth, tactile sensation) all provide contextual signals which are absent from locations in the virtual world, and which can contribute to one's evaluation of trust.
  • Cost of Impersonation: Closely related to tangibility is the cost of impersonation. Impersonating physical world locations is both complex and costly because they must be convincing in so many dimensions and because the human brain is so adept at recognizing patterns and exceptions to patterns. Impersonating virtual world locations, however, is relatively easy as they exist in far fewer dimensions, and thus can be duplicated much more cheaply and easily.
  • Familiarity: As individuals, we have existed in the physical world for our entire lives. As a civilization, we have existed in the physical world for hundreds of years. This familiarity yields expectations of how locations in the physical world will look, how objects will feel and behave. The virtual world, on the other hand, is new and unfamiliar to many of its users. As a result, there is less of an expectation of how a location should appear in the virtual world. While it is true that many virtual locations such as online banks have patterned themselves after one another (i.e.: similar features, navigation structure and use of a prominent client login area) these patterns are young and malleable. The physical world, on the other hand, has well established patterns that result in a expectation of what a location such as a bank would look like (ie: tellers, thick doors, slips of paper, a security guard).
  • Consistency: Signals from the physical world are consistently presented to us through our own senses. We cannot modify our senses, merely interpret the signals that we receive through them. In the virtual world, however, there is an intermediary between the location and our senses. The software used to present a virtual location presents signals about that location in an arbitrary fashion. As a result, signals from the virtual world are not necessarily consistently presented, but are instead dependent on the tool with which we are viewing the virtual location.

Evaluations of trust in the physical world are assisted by the fact that locations are tangible, costly to impersonate, familiar and consistently interpreted by our own senses. In the virtual world, however, we are hindered by the fact that locations are intangible, easily impersonated, unfamiliar and interpreted by clients that are not necessarily consistent.

Any solution that aims to simplify the task of evaluating trustworthiness in the virtual world needs to address these limitations on our abilities. The virtual world, however, is filled with locations that are by definition intangible, by design easily impersonated, and by immaturity unfamiliar. The only factor within our control is the consistency of how signals are presented to the user.

Available Online Signals for Trust

As established above, in the online world an individual must make a judgement of trustworthiness based on the signals available about a virtual location. In addition to signals such as name recognition or look and feel, the online world currently provides three additional signals that we can use to assist users in evaluating trustworthiness:

  • Encryption lets us comment on the likelihood that the information has been intercepted.
  • Certificates allow us to comment on the authenticity of an location's claim to its identity as asserted by a certificate authority (CA).
  • Recommendations about the trustworthiness of an location can be made by an organization or network.

Most web browsers available to users today provide some mechanism to indicate these signals to users. Unfortunately, each browser interprets and represents the signals slightly differently:

  • Inconsistent icons: Internet Explorer 7, Mozilla, Safari and Opera all use a lock icon to indicate when security signals are present, but Internet Explorer 7 also uses red and yellow shields to indicate when an location is thought to be suspicious or malicious.
  • Inconsistent colors: Mozilla and Opera use a yellow background to indicate when encryption and certificate signals are present. Internet Explorer 7 uses green to indicate a positive interpretation of trust, yellow to indicate suspicion and red to indicate a negative interpretation of trust. Safari doesn't use color to communicate either of these signals.
  • Inconsistent location: Internet Explorer 7 and Opera display the location's claimed identity in the URL bar, and allow an individual to investigate the name of the CA by clicking on this information. Mozilla displays the location's claimed identity in the bottom right corner of the browser, and the name of the CA is shown when the user hovers over the lock icon in the URL bar. Safari displays both the location's claimed identity and the name of the CA in dialog presented when the user clicks on the lock icon in the upper right corner of the screen.
  • Inconsistent terminology: Internet Explorer 7 says that an location is "identified by" a CA. Opera calls the CA the "Certificate issuer" and Mozilla says that an location is "Signed by" a CA. Safari tells users that a given certificate has been "issued by" a CA. All browsers refer to "encryption", but present the encryption standards differently.
  • Inconsistent signals: Some add-on tools for popular browsers create a network of trust that results in a recommendation signal being presented to a user. These signals are not always present, and not available in all web browsers.

Position on Usability of Website Authentication

The technologies and frameworks that exist in the virtual world for providing signals about website authentication are currently in flux. The next generation of web browsers will leverage whichever of these signals are workable at the time of their release. Perhaps that will be SSL/PKI, or SSL/PKI with multi-tiered certificates, or perhaps it will be a network of trust or some other heuristic measure based on meta-browsing habits. These technologies should continue to be allowed to grow in ways that address the hard questions of implementing security infrastructures.

It is our position, however, that tools which are used to connect individuals to locations in the virtual world should be consistent in the way they present the available signals to users. This consistency will help users develop an understanding of how to interpret signals when visiting an online location, which will in turn ease the task of making a judgement about the trustworthiness of that location.

Consistency both shapes user expectations, and allows users to transfer their skills between tools used to visit locations in the virtual world. By providing a single, clear set of signals, users will be able to focus on interpreting the trustworthiness of a site instead of having to focus on first interpreting the signals themselves.

Example using Existing SSL/PKI Signals

Organizations like the W3C often focus on ensuring that vendors consistently observe a technology standard. The resources and processes of these organizations should also be used to promote standards of expressing these signals to users. An example expression of our current technologies might be:

  • A connection to a location should be said to be secure when the connection is encrypted and it can be reasonably assured that communication is restricted to the user and that location.
  • If a connection is signed, then the location should be said to be identified with some name, by some signing authority.
  • If a signal exists that asserts a site to be trustworthy or untrustworthy, then the location should be said to be recommended or suspected by the source of that signal (FoaF networks, whitelists, preferred CA signatories, etc).

This example is limited, however, to expressions based on our current technology. Ideally the standards for the expression of security signals would be general in nature, allowing for the user to be insulated from the requirement to understand the underlying technology used to generate those signals.