Browser Metrics

From MozillaWiki
Jump to: navigation, search

Please comment in the Talk page (use the Discussion tab above)

Goals & Objectives

In order to understand how people use the browser and to evaluate our efforts to improve the browser, we need to collect and analyze usage data. There are many types of data that can be collected, such as:

  • Session history navigation (to understand navigation patterns and improve the tab/window UI as well as backend features like bfcache)
  • UI elements (which widgets are and aren't being used)
  • Cache effectiveness (hit rate, bloat, etc)
  • Memory usage (to understand how memory usage changes during normal navigation)
  • Unsupported content (to understand how many people are affected and prioritize projects accordingly)
  • "Problems" (unhandled exceptions in browser chrome, assertion failures)

We think that given the opportunity to opt-in to this data collection ("Help us improve Firefox"), a statistically significant number of users would enable this functionality. In addition, for prereleases, it may be feasible to enable this collection by default. It should be possible to strike the right balance for users -- see the Privacy section below for details.

The current proposal is to implement instrumentation as an extension. The advantages to this approach include the ability to update the extension independently and greater flexibility to promote/market the extension.

Technical Design

This project can be divided into several components:

Data Collection Service

The data collection service aggregates the data and uploads it to the collection server. Before uploading data, a manifest file will be fetched from the server to control which items will be uploaded. This allows us to tweak the volume of data collected as desired, without client-side changes.

Collection Service Design

Data Collectors

Data collectors will hook in at a variety of locations in the backend and frontend. Each collector submits events to the collection service, where each event can contain collector-specific key/value pairs.

Data Collector Design

Server Infrastructure

Overview of the server side interaction

Build Instruction

See Browser Metrics:Building

Privacy

We need to be up-front with users about what data is being collected and how we use it. Each user will be assigned a unique id the first time they submit data, so that we can correlate data over several sessions. In addition to information about usage of the browser, we'll collect some data about the user's hardware and software configuration to help isolate problems.

Unresolved questions:

Some types of data contain personally identifiable, or potentially confidential, information. Unfortunately, these same types of data may be helpful in tracking user problems. For example,

Installed Extensions

Many user complaints can be traced to misbehaving extensions. If we keep track of "unexpected" events in the browser, it would be quite useful to know which extensions are installed. However, this may be undesirable, for example if the user is running an unreleased extension that they consider confidential. We may be able to get a sanitized list by looking for extensions which have an update URL pointing to addons.mozilla.org.

Sites Visited

If users are experiencing unexpectedly high memory usage, or other problems, knowing the URLs in question would help us to debug those problems. Clearly there are privacy implications in sending this data to a third party, such as sensitive information in GET parameters, exposing URLs in intranets, and users simply not wanting to be tracked as they surf the web. We need to decide whether the benefits here are large enough to justify a separate opt-in for collecting this data.