Labs/Weave/Identity/Contact Pool
Contact Pool Proposal
Most users of the web have one or more collections of data relating to people they know, either in desktop software or in online services. To enable more "people-centric" applications in the browser, we propose to create a Contact Pool API that consolidates the information in all these collections into a single, easy-to-use Javascript API.
Recognizing that "people-centric" applications will sometimes simply want access to profile data, but sometimes want to perform operations on a set of people, the Contact Pool API supports an extension mechanism that allows applications or libraries to dynamically extend the attributes of a contact.
This proposal is a refinement of JEP 15 (https://wiki.mozilla.org/Labs/Jetpack/JEP/15).
Design Principles
1. Provides a single interface for information that exists elsewhere. It does not itself provide a contact database. It uses caching where appropriate to enhance responsiveness.
2. Treats desktop and online sources equivalently: a user who keeps all their addresses in (say) Windows Contacts has an equally good experience as one who keeps everything in Facebook.
3. Maintains the source of all data provided through the API, and allows dynamic inclusion and refresh of sources. For example, the user could remove Twitter from their "active set" of sources, and some subset of information and contacts would immediately be removed from the Contact Pool.
4. Recognizes that contact information is noisy and prone to error, and provides good heuristics and recovery mechanisms to enable useful consolidation. For example, if a user indicates that two contacts are in fact the same person, despite the failure of automated techniques to detect this, the system should remember their equivalence, and use it to inform further automated consolidation.
An Architecture Sketch
The library maintains a Contact Pool, and defines a Contact object. A Contact is a dictionary of fields; a field may be multi-valued, and each value has a list of collector instances (see below).
Accessor methods exist on the Pool to perform searches and filtered iterations. A method to add a contact is also provided: when a contact is added, the Contact Consolidator is invoked to determine whether this contact is equivalent to one already provided.
The library defines an interface for a Contact Collector. A Collector is responsible for gathering contacts from some source (local or remote) and submitting them to the pool. Every value is tagged with a reference to the collector that provided it. Examples would be Windows Contacts API, MacOS Address Book API, GMail contacts, Twitter followerlist, Facebook friendlist, etc.
The library defines an interface for a Contact Extender (or something). This interface allows an object to be registered for notification when new contacts are added, and to perform additional work on it, possibly triggering the Consolidator to run again. An example would be a webpage-to-activity-stream scraper, which inspects a contact for a "url" field, retrieves the URL, and extracts the activity stream Link. The discovery of an identical Link URL could then cause two contacts to be recognized as co-referential.
Rough control flow:
1. A set of collectors is saved in prefs or instantiated by default. Each is an instance, with behavior and optional data. For example, the user may have "MacOS addresses", "Twitter friends for account @joebob", and "Gmail contacts for joebob@gmail.com". Alternatively each instance could infer an appropriate set of data from the user's history or password database.
2. Each collector is invoked, producing a set of contacts. Each contact in the set is merged into the pool. As each merge is executed, consolidation candidates are produced, each with a strength. If the consolidation strength for some pair of contacts rises above a threshold, the contacts are merged. (Note that UI would need to exist somewhere to allow a user to indicate assent or denial for this, and that would need to be persisted - it could be modeled as a consolidation strength itself).
3. When the collectors are done (or whenever), the Extenders are invoked, traversing all (or the new?) contacts and inspecting them. Each Extender may perform additional work, and reinvoke the Consolidator.
4. When the Extenders are finished, the Pool is in a quiescent state and awaits library calls.
Collector and Extender data flow examples
Twitter API (collector) -> people with URL property -> load URLs and scrape for RDF/RSS (extender) -> annotate contacts with activity streams/blogs
Facebook API (collector) -> list of friends annotated with activity streams
Windows Address Book (collector) -> list of contacts with email addresses -> webfinger resolution (extender) -> annotate contacts with activity streams and microblogs
Application Examples
1. The Contact Pool could easily populate an enhanced form-fill experience. An E-mail address drop-down menu could contain prefix-matching addresses and small avatar photos.
2. The Contact Pool could provide a "what's new" experience across a user's entire social network, by resolving an activity stream across many networks and protocols.
3. The Contact Pool could provide a robust set of verbs on a single Person (or Contact?) object, by discovering available messaging and collaboration endpoints (which is a fancy way of saying email addresses, IM names, twitter names, etc.).
Interaction with Weave
If Weave becomes a primary contact data source itself, it would implement its own Collector. It would be necessary to retain all of the Collector,m Consolidator and Extender logic, to allow Weave to continue to respond to changes in the user's contact setup.
Related Work
Windows Contacts: http://msdn.microsoft.com/en-us/library/ms735779%28VS.85%29.aspx
Gilbert's work on Social Tie strength: http://social.cs.uiuc.edu/people/gilbert/30