Privacy/Reviews/Telemetry

From MozillaWiki
Jump to: navigation, search

Document Overview

Feature/Product: Telemetry
Projected Feature Freeze Date: On train for 7
Product Champions: Taras Glek
Privacy Champions: Sid Stamm, Asa Dotzler
Security Contact: Curtis Koenig
Document State: [DONE]


Timeline:

Architectural Overview: [DONE] 28-April-2011
Recommendation Meeting: [DONE] 18-May-2011
Wrap-up Meeting: (if necessary)

Architecture

In this section, the product's architecture is described. Any individual components or actors are identified, their "knowledge" or what data they store is identified, and data flow between components and external entities is described.

The main objective of this feature/product is: to allow Engineering to receive aggregate data of browser health in the field. Think cache hit rates, page load times across all browser instances or anything else we're interested in.

Design Documents: Link to any design or architectural documents here.

This document does not discuss individual measurements collected by the Telemetry infrastructure, but rather just the framework and feature-scope itself. For discussion of the specific measurements, see the measurements list .

Components

Describe any major components in the system and how they interact. Also include any third-party APIs (those Mozilla does not control) and what type of data is sent or received via those APIs.

Client Component (Firefox)

This component gathers metrics and uploads counters and histograms to the Telemetry server.

The tables below simply summarize the data encountered by this component.

Stored Data:

What Where
There are 2 kinds of metrics:

1) Metrics that are recorded while performing an operation. Ie startup decisions/timing, cycle collection timing

2) Data that are polled for every time the browser is idle for more than a minute. In Taras' experience, data are gathered a couple times an hour. Currently we poll various about:memory fields. Other things to poll for: number of open tabs, sizes of key sqlite databases, sizes of cache, etc.

All telemetry data is stored in memory. Upon shutdown, some measurements are recorded and persisted to disk for a short period of time. When Firefox starts again and a ping is sent, persisted measurements are transmitted and then erased.

Communication with Server Component

Direction Message Data Notes
In: ACK HTTP 200/OK (no additional data)
Out: HTTP POST to /submit/telemetry/ text/plain JSON-encoded object containing historgrams and counters The types of data represented by histograms and counters will change over

time and the submission will contain a unique ID (nonce to identify strange duplicate submissions). Pings are once per day.

Server Component

This component receives metrics from the Client Component and creates visualizations and queries for Mozilla people.

The tables below simply summarize the data encountered by this component.

Stored Data:

What Where
data type Entire ping is store in a json log. Client ip is added to the log. See security review in bug 655746.

Communication with Client Component

Direction Message Data Notes
In: HTTP POST Telemetry Data (see above)
Out: ACK HTTP 200/OK (no additional data)

User Data Risk Minimization

In this section, the privacy champion will identify areas of user data risk and recommendations for minimizing the risk.

Fingerprinting / Tracking

Based on metrics that are similar from day to day, an individual user might be fingerprinted and tracked across time. Someone with consistent day-to-day browsing habits may have the same memory usage, speed, etc; it is likely that the machine's attributes will also have an effect on the measurements taken so a combination of browsing habits and machine attributes could be a fairly detailed "fingerprint". It is important to identify and eliminate duplicate entries, however, so some unique ID must be maintained for a short window of time.

Required Action: To minimize fingerprinting risk, it is crucial to ensure that arbitrary web sites absolutely cannot access the telemetry data while it's stored on the client. Additionally, the data should be transmitted from the Client Component to the Server Component over a secured (and preferably authenticated) channel; this means SSL/HTTPS must be used. Any data that is no longer needed should be erased from our servers, and a unique ID used for duplicate elimination should be short-lived.

Recommendation: If possible, the SSL certificate fingerprint should be hard coded into the client and verified before transmitting data so the client can be sure the server where it is sending data is indeed the Telemetry server (and not an attacker intercepting traffic).

Resolution:
[RESOLVED] Required action completed, SSL used on server and invalid certificates cause connection to drop. Recommended fingerprint-hardcoding not implemented. Duplicate-removal unique ID is session-only and reset for each new session.

Conformity to Private Browsing Mode

Private browsing is intended to protect from someone who has local access to the browser from knowing what you did in private browsing mode. Since Telemetry collects data that is ultimately affected by how the user browses the web, any data collected should not be retained persistently through private browsing mode.

Some measurements need to be persisted to disk because they are only available during shutdown (e.g., measuring how long it takes to shut down plugins). Any measurements taken between ping and shutdown are persisted to disk upon application shutdown. Telemetry ping code checks for stored data when sending it to the server, then after successfully sending it the data is erased and the telemetry "state" is reset. From Bug 707320 comment 2:

a) if there is no serialized telemetry data, send a ping same as we do now
b) if there is serialized data:
 b1) send serialized data
 b2) reset UID, wipe all histograms


Recommendations: Telemetry should be disabled in private browsing mode. If nothing else, new measurements must not be stored on disk or other non-volatile storage devices while the client is in private browsing mode. Any measurements taken during private mode should be erased from memory when private mode is exited.

Resolution:
[RESOLVED] Telemetry data always kept only in volatile memory or temporarily persisted to disk (see above). Telemetry collection and reporting is entirely suspended when private mode is entered, and resumed on private mode exit. See bug 661573

Alignment with Privacy Operating Principles

In this section, the privacy champion will identify how the feature lines up with Mozilla's privacy operating principles.

See Also: Privacy/Roadmap_2011#Operating_Principles:

Principle: Transparency / No Surprises: People should know that the metrics are being gathered and submitted. This feature is opt-in, though it's not clear whether or not people fully understand what type of data they're letting us collect.

Required Action: It should be clear in the UI what we collect and how we collect it. For example, Test Pilot asks the user to approve not only data collection but also data submission, providing information about what's being collected or submitted at the time it begins. Telemetry should do something similar to make it very explicit what is being collected as well as when it's being submitted.

Resolution:
[RESOLVED] Opt-in string should directly describe types of data collected (link to privacy policy is not adequate). Current string is "Would you like to help improve Firefox by automatically reporting memory usage, performance, and responsiveness to Mozilla?" which seems adequate. Details of specific data collected need to be made available in about:telemetry or listed explicitly in a way accessible from the UI so users can find it.


Principle: Real Choice: Users of this system should not only understand what it does, but be able to choose whether or not to participate.

Taras says: The opt-in UI will link to the privacy policy and eventually to a fancy about:telemetry page which will list all of the probes being collected. https://bugzilla.mozilla.org/show_bug.cgi?id=652657

Required Action: It should be clear in the UI what we collect and how we collect it.

Resolution:
[RESOLVED] Opt-in string should directly describe types of data collected (link to privacy policy is not adequate). Current string is "Would you like to help improve Firefox by automatically reporting memory usage, performance, and responsiveness to Mozilla?" which seems adequate.


Principle: Sensible Defaults: Telemetry is off by default and is opt-in.

Recommendations: None.


Principle: Limited Data: Telemetry should only collect data that we will actually use for improvements to the product. All data that's collected should be backed up with clearly stated reasons.

Required Action: Maintain a table of counters and histograms that are gathered and reasons for collecting it. The table should be revisited regularly to identify unnecessary metrics and we should stop collecting those.

Resolution:
[RESOLVED] List of collected metrics is maintained in the code and on the wiki for developers, see Privacy/Reviews/Telemetry/Measurements.

Follow-up Tasks and tracking

What Who Bug Details
[DONE] Initial Overview Discussion Sid and Taras Meeting 28-April-2011
[DONE] Discuss Missing Information Sid and Taras Via IRC 17-May-2011
[DONE] Discuss risks and recommendations Sid and Taras Scheduled 18-May-2011
[DONE] Create list of gathered metrics that reflects current state of collected data Taras bug 661574 Also see Performance/Telemetry, Telemetry Measurements
[DONE] Change string for opt-in UI to describe what type of data is collected Taras bug 652657
[DONE] Implement strict private browsing conformity (stop recording when on-enter-private-browsing). Taras bug 661573
[DONE] Create privacy discussion framework for adding new metrics without heavyweight review Sid and Taras Rapid risk analysis template at Privacy/Reviews/Telemetry/Measurements, most review done in bugs and cataloged in the Measurements list.
[DONE] Document change to persist some Telemetry measurements to disk Sid bug 707320 Some stuff has to be persisted to disk; it will be done in a way that prohibits tying together sessions easily and the stored data will be deleted from the client once it has been transmitted to the Telemetry server.
[DONE] Implement about:telemetry to show users what's being collected. Taras bug 661881 Landed in Firefox 19.