Services/Sync/WEP/114

From MozillaWiki
< Services‎ | Sync‎ | WEP(Redirected from Labs/Weave/WEP/114)
Jump to: navigation, search

WEP 114 - Favicon Proxying

Problem

Upon initial sync, cell phones have synched tabs from the desktop machines, but do not have the appropriate favicons. This creates a less attractive interface for the user. However, the cost of building up and tearing down network connections on these devices makes fetching the various favicons more expensive than a single call to get all of them from a central server would. It would also be cheaper, storage-wise, to store favicons centrally rather than in WBOs.


There are limitations to what a single server can offer and still scale at a reasonable level, both in terms of proxy time and local cache/storage. Therefore, some compromises will need to be made.

Proposed Solution

The favicon server will only fetch top-level domain favicons. This is not the optimal interface approach, but it is expected that it will be acceptable in 90% of the cases, and the device will eventually synch the different/correct favicon if applicable.

All transactions to the API to retrieve these favicons will be done over https, ensuring the anonymity of the favicons. The server will handle a JSON POST-body request (so that it will not be logged) of the following form:

{"url1", "url2", ...}

Each URL will have everything but the domain name stripped and the favicon corresponding to http://domain/favicon.ico, will be returned. The response will be of the form:

{"url1": "ab432243..", "url2": "f3a21...", ...}

There should be a limit (20?) to the number of favicons retrievable by a single call.

The server will hash the favicon domain and store the resulting favicon with that key, allowing popular favicons to be retrieved without a proxy call. Policies may be needed to determine how long to retain a favicon before a new version is retrieved.

Issues

1) Popularity: There's a good chance that non-Weave applications will want to use the service. As we can't scale up to handle all-internet popularity, we will likely need to support Weave login requirements (meaning that we'll need centralized login) and username in the url (so that we can detect abusive behavior)

2) Anonymity: It is important that we are not able to associate the users with the urls they request, or that represents a privacy leak. This is why we will require https, and having the urls requested in a POST body that isn't logged. However, there is no way to externally verify this, and a compromised server would be able to access the data.

Discussion

> All transactions to the API to retrieve these favicons will be done over https, ensuring the anonymity of the favicons.

I wouldn't use phrases like that. The Weave philosophy (from my observations) is about protecting clients against the server. An API which gives a list of all your bookmarked/visited sites to the server (whether protected from eavesdroppers or not) is not anonymous.

Having the original uploading client fetch their own favicons and store the results in the encrypted table (so that mobile clients could grab them from the weave server, rather than the original web sites) would be anonymous. We should collect some hard numbers on how much data this would represent. (I too suspect it would be untenably large, but we should measure it rather than speculate). There are PIR (Private Information Retrieval) schemes that can provide truely anonymous lookups of data like this, but they aren't generally very efficient.

--Brian Warner 20:52, 9 June 2010 (UTC)