Privacy/2010-10-27 Shorten-Referer Meeting Notes
Meeting notes from 2010-10-27
- Adam Barth, Google
- Nate Chapin, Google
- Dirk Pranke, Google
- David Recordon, Facebook
- Scott Renfro, Facebook
- Jonas Sicking, Mozilla
- Sid Stamm, Mozilla
- Paul Tarjan, Facebook
- Mike Beltzner, Mozilla
- Darin Fisher, Google
Discussion of the proposed "Shorten-Referer" HTTP header and related problems.
Recent experience at Facebook has led them to want to prevent the browser from automatically sending data from URLs to third parties through the Referer Header.
While there are a number of ways to protect the Referer today (Existing alternatives for cloaking the referer), they are all awkward and have various interoperability problems. FB has drafted a proposal for a "Shorten-Referer" header to be added to the HTTP response that would tell user agents to send only a subset (or none) of the Referer header on subsequent requests from the page (or contained iframes).
We established at the beginning that this was not an "urgent" problem and that we desired to look for the correct long term solution or solutions. We all agreed that the Referer was problematic and that we were interested in making things better.
- There is the desire to remove the Referer header outright, possibly in favor of the Origin header. It can leak sensitive data accidentally and can be abused as a form of ambient authority. Unfortunately, we can't just stop sending it on requests because too many things on the web might break. We need a migration strategy that we don't currently have.
- Facebook would like the ability to ensure that requests initiated from a page on facebook.com do not contain the full Referer header (e.g., user clicks on an Ad on facebook.com).
- Facebook would like the ability to pass sensitive data in a URL to a partner site embedded in an iframe without the partner accidentally leaking the sensitive data through any requests initiated via the partner's page (e.g., user clicks on an ad on a iframe from zynga.com embedded in facebook.com).
- We would generally like a way to offer the protection of #2 and #3 without requiring the page content itself to be modified. There are two reasons for this:
- It can be too difficult to ensure that every page on a site is modified correctly.
- If the protection is provided in-page, it may be possible for in-page attacks to defeat the protection.
- We would like to find a solution that fit into a more general security approach (or security policy framework) rather than pursue a one-off ad hoc solution.
Initial Solution Proposal
The draft proposed a "Shorten-Referer" header that would allow the server to tell the user agent that subsequent requests from the site should contain only the domain and path of the referer (omitting any request parameters), or just the domain, or no referer at all. The header would apply to a top level document and to any immediate iframes.
Issues to be considered
- Performance impact of the header being sent on every request (vs. configured in-page, perhaps by a cached script, or vs. a policy file at a "well known" hostmeta URL).
- browser, server, and protocol-complexity for a "one-off" header request.
- is the DOM affected by this header?
- the Referer shortening need to cascade arbitrarily into descendent/sub- iframes (e.g., facebook embeds zynga embeds doubleclick, etc.)?
- the Referer shortening affect enclosing/ancestor frames (e.g. if facebook.com doesn't send the header, but zynga.com does)?
- implications (hopefully increased security from less data being sent).
- potential affect on the "RESTful"ness of the web architecture.
Initial discussion and alternatives
We quickly agreed that we would shorten to the "origin" rather than the "domain", i.e., the shortened version would contain the full scheme, host, port triple. The shortened version would get a "/" tacked onto the end in order to remain a valid URL.
We fairly quickly agreed that we had no terribly compelling use cases for keeping the path but dropping the query parameters. The closest we got was a desire to keep "trackback" URLs in blog posts but strip all the other link-tracking cruft that often accompanies them.
We agreed that the HTML 5 History.replaceState() JS API (which is currently shipping in every current browser version) met requirement #2 (but not requirements #3 or #4). We hope to keep requirement #3 as well to minimize the burden on third-party developers from making them also have to implement requirement #2.
There was a lengthy discussion of the pros and cons of cascading the shortened Referer requirement into descendent iframes. Some were uncomfortable limiting iframes in this way without there being a way for the child iframe to "feature detect" the fact that the referer was shortened. We discussed whether cascading should be optional. We also discussed the need for it to cascade more than one level (for example, did the entire tab/window become "tainted"?) We generally agreed that it did not need to cascade more than one level (descendent iframes could use their own Shorten-Referer headers to that end), and that it should not affect a parent frame.
There was also some concern about whether a malicious site could use a shortened referer to attack a site embedded in an iframe, although it was unclear if this was a practical attack.
Adam observed that at least one study has shown that 1-5% of web traffic has the Referer stripped by proxies or other means, and so sites should already be somewhat prepared for the idea that the Referer might not be present or be accurate.
We discussed the fairly-well-understood tradeoffs between an in-page restriction (via a meta tag or a script element), a HTTP header, and a policy file at a well-known URI (requirement #4). We agreed that we couldn't expect to solve that tradeoff here in a way that made everyone happy, but that we would like to follow whatever established conventions might exist for offering a range of solutions for different administrative/maintenance needs.
Scott comments: within the discussion of in-page enforcement, there's the one:many relationship of a big site (e.g., Facebook) and the many sites it may embed (e.g., apps on our site). In page enforcement of the iframe URL/referrer is arguably tolerable because the policy is outside the enforced page (i.e., the iframe's DOM is affected by postdata not the containing page's DOM), and we're more open to doing something in-page for this case (which really only affects small number of pages) than the general facebook.com ads case (which affects pretty much all of our pages).
Mozilla's Content Security Policy proposal
We discussed Mozilla's Content Security Policy proposal and how these requirements might fit in. We observed that (a) so far it was a very experimental API, although a good idea, (b) it was slowly heading towards a W3C working group in some form, and (c) it was so far concerned only with protecting content, and hence the idea of it controlling interactions in descendent iframes would be a new thing. We were uncomfortable with the implications of (c), but otherwise agreed that it seemed like the right place to address requirement #5.
It remained TBD whether or not we would want to push CSP in its current (experimental) form, or use an ad-hoc new header while we continue to evolve CSP towards general buy-in.
It was suggested that we add an "document.outgoingReferrer" method as an alternative to history.replaceState(). The believe was that (a) it would be slightly easier to author (with a clearer intent), and that (b) it could then be feature-detected.
This would also provide a cleaner solution to "trackback URL" use case and hence might be generally useful even if the URLs contained no sensitive data.
This was also nice because it did not involve any changes to the HTTP protocol or require servers to do anything.
However, this solution did not meet requirement 4 (keep things in a header), if it did affect requirement 3 (iframe cascade) it would be somewhat weird, and only helped somewhat with requirement 1 (kill Referer outright). Requirement 5 (security policy framework) was totally outside of this proposal.
No strong conclusion was reached here.
It was also suggested that, since the problem was that there was data in the URL that we didn't want in the Referer, what if we found some other way to send data to the server that wasn't in the URL? The example of form POSTs was cited; perhaps we could give the page a way to specify additional data in a header or the request body?
<iframe src="http://zynga.com/farmville" postdata="encrypted_user_id">
where the data would either be sent in a new http header, or sent in the request body.
This meets Requirement #3 (protect iframes from accidentally leaking data) in a very clean manner (no worries about implicit cascading).
Of course, this is a rather unorthodox idea.
- clean general solution to the problem of sensitive data in URLs - this makes the "web key" design technique of URLs as bearer tokens or capabilities potentially more compelling since the full token is not leaked nearly as easily (see use of URI fragments for a similar approach).
- should be relatively easy to implement on the server side
- easy for authors to understand
- requires no change to the Referer header, no need to add a Shorten-Referer header to cascade in to protect iframes
- no weird cascading
- doesn't meet requirement #1 (kill Referer)
- doesn't meet requirement #2 (still need replaceState or equivalent)
- doesn't meet #4 (out-of-page policy enforcement)
- doesn't meet #5 (general policy framework)
- potentially breaks the model of a URI as being sufficient to identify a resource (the RESTfulness of the web).
- may impact caching of requests - (does it render existing sensitive-but-cacheable requests uncacheable?)
There was discussion of how generally the "postdata" attribute should be - can it be used on <a>, <img>, <script>, etc.? It seems useful everywhere.
There was discussion of whether or not this turned the request into an actual POST (and thus perhaps mitigating some of the RESTful concerns).
The encoding and actual transmission of the data was TBD. Is the data pre-encoded? Does it get form-encoded by the browser? Do we need to specify a MIME type? Does the data go in the body of the request, or in a separate header? etc.
The RESTful impact was acknowledged as significant but the advantages of the approach are compelling enough to (we think) take this idea very seriously.
This may not work too well since the back button triggers a modal dialog on some browsers after post data has been submitted.
One idea is a variant on this "postdata" approach, which would be to add a mechanism to the browser to allow us to create form POSTs and explicitly indicate that they don't need user intervention/approval. We could then create a form with a target of the child iframe and POST to the server. This would have the advantage of being a purely client-side modification that required no changes to the protocol or the server implementations. Of course, it would work for iframes but not img, script, css, etc. (It probably works for <a> since that triggers a navigation).
Some combination of replaceState() and postdata="" seems like the best short-term way forward. Of course, postdata="" has a roll-out problem for adoption.
We would supplement this with either a CSP directive or an ad-hoc header for shortening the Referer on the originating document, with the decision being made based on how comfortable we are with the maturity and stability of CSP. It was TBD whether or not this header would still cascade into child iframes.
We think that Shorten-Referer and/or a CSP-based equivalent is compelling enough on its own as a way to migrate people away from Referers to continue to pursue it in addition to the above ideas.
Dirk and/or David to write up and circulate notes (done, posted here ;).
All of us to continue thinking about the pros and cons of the various approaches.