Document Overview

Feature/Product: Telemetry
Projected Feature Freeze Date: (tbd)
Product Champions: Taras Glek
Privacy Champions: Sid Stamm, Asa Dotzler
Security Contact: Curtis Koenig
Document State: [ON TRACK]


Timeline:

Architectural Overview: [DONE] 28-April-2011
Recommendation Meeting: [DONE] 18-May-2011
Wrap-up Meeting: (if necessary)

Architecture

In this section, the product's architecture is described. Any individual components or actors are identified, their "knowledge" or what data they store is identified, and data flow between components and external entities is described.

The main objective of this feature/product is: to allow Engineering to receive aggregate data of browser health in the field. Think cache hit rates, page load times across all browser instances or anything else we're interested in.

Design Documents: Link to any design or architectural documents here.

Components

Describe any major components in the system and how they interact. Also include any third-party APIs (those Mozilla does not control) and what type of data is sent or received via those APIs.

Client Component (Firefox)

This component gathers metrics and uploads counters and histograms to the Telemetry server.

The tables below simply summarize the data encountered by this component.

Stored Data:

What Where
There are 2 kinds of metrics:

1) Metrics that are recorded while performing an operation. Ie startup decisions/timing, cycle collection timing

2) Data that are polled for every time the browser is idle for more than a minute. In Taras' experience, data are gathered a couple times an hour. Currently we poll various about:memory fields. Other things to poll for: number of open tabs, sizes of key sqlite databases, sizes of cache, etc.

All telemetry data is stored in memory. If a telemetry ping never happens, data is lost on shutdown

Communication with Server Component

Direction Message Data Notes
In: ACK HTTP 200/OK (no additional data)
Out: HTTP POST to /submit/telemetry/ text/plain JSON-encoded object containing historgrams and counters The types of data represented by histograms and counters will change over

time and the submission will contain a unique ID (nonce to identify strange duplicate submissions). Pings are once per day.

Server Component

This component receives metrics from the Client Component and creates visualizations and queries for Mozilla people.

The tables below simply summarize the data encountered by this component.

Stored Data:

What Where
data type Entire ping is store in a json log. Client ip is added to the log. See security review in bug 655746.

Communication with Client Component

Direction Message Data Notes
In: HTTP POST Telemetry Data (see above)
Out: ACK HTTP 200/OK (no additional data)

User Data Risk Minimization

In this section, the privacy champion will identify areas of user data risk and recommendations for minimizing the risk.

Fingerprinting

Based on metrics that are similar from day to day, an individual user might be fingerprinted and tracked across time. Someone with consistent day-to-day browsing habits may have the same memory usage, speed, etc; it is likely that the machine's attributes will also have an effect on the measurements taken so a combination of browsing habits and machine attributes could be a fairly detailed "fingerprint".

Required Action: To minimize fingerprinting risk, it is crucial to ensure that arbitrary web sites absolutely cannot access the telemetry data while it's stored on the client. Additionally, the data should be transmitted from the Client Component to the Server Component over a secured (and preferably authenticated) channel; this means SSL/HTTPS must be used.

Recommendation: If possible, the SSL certificate fingerprint should be hard coded into the client and verified before transmitting data so the client can be sure the server where it is sending data is indeed the Telemetry server (and not an attacker intercepting traffic).

Resolution:
[RESOLVED] Required action completed, SSL used on server and invalid certificates cause connection to drop. Recommended fingerprint-hardcoding not implemented.

Conformity to Private Browsing Mode

Private browsing is intended to protect from someone who has local access to the browser from knowing what you did in private browsing mode. Since Telemetry collects data that is ultimately affected by how the user browses the web, any data collected should not be retained persistently through private browsing mode.

Recommendations: Telemetry should be disabled in private browsing mode. If nothing else, the measurements must not be stored on disk or other non-volatile storage devices while the client is in private browsing mode. Any measurements taken during private mode should be erased from memory when private mode is exited.

Resolution:
[RESOLVED] Telemetry data always kept only in volatile memory. Recommended erasure of measurements when exiting private mode not implemented.

Alignment with Privacy Operating Principles

In this section, the privacy champion will identify how the feature lines up with Mozilla's privacy operating principles.

See Also: Privacy/Roadmap_2011#Operating_Principles:

Principle: Transparency / No Surprises: People should know that the metrics are being gathered and submitted. This feature is opt-in, though it's not clear whether or not people fully understand what type of data they're letting us collect.

Required Action: It should be clear in the UI what we collect and how we collect it. For example, Test Pilot asks the user to approve not only data collection but also data submission, providing information about what's being collected or submitted at the time it begins. Telemetry should do something similar to make it very explicit what is being collected as well as when it's being submitted.


Principle: Real Choice: Users of this system should not only understand what it does, but be able to choose whether or not to participate.

Required Action: It should be clear in the UI what we collect and how we collect it.

Taras says: The opt-in UI will link to the privacy policy and eventually to a fancy about:telemetry page which will list all of the probes being collected. https://bugzilla.mozilla.org/show_bug.cgi?id=652657


Principle: Sensible Defaults: Telemetry is off by default and is opt-in.

Recommendations: None.


Principle: Limited Data: Telemetry should only collect data that we will actually use for improvements to the product. All data that's collected should be backed up with clearly stated reasons.

Recommendations: Maintain a table of counters and histograms that are gathered and reasons for collecting it. The table should be revisited regularly to identify unnecessary metrics and we should stop collecting those.

Resolution:
[AT RISK] Not Resolved. (UI has not been discussed or identified)

Follow-up Tasks and tracking

What Who Bug Details
[DONE] Initial Overview Discussion Sid and Taras Meeting 28-April-2011
[DONE] Discuss Missing Information Sid and Taras Via IRC 17-May-2011
[ON TRACK] Discuss risks and recommendations Sid and Taras Scheduled 18-May-2011