Document Overview
| Feature/Product: | Telemetry |
| Projected Feature Freeze Date: | (tbd) |
| Product Champions: | Taras Glek |
| Privacy Champions: | Sid Stamm, Asa Dotzler |
| Security Contact: | Curtis Koenig |
| Document State: | [ON TRACK] |
Timeline:
| Architectural Overview: | [DONE] 28-April-2011 |
| Recommendation Meeting: | [DONE] 18-May-2011 |
| Wrap-up Meeting: | (if necessary) |
Architecture
In this section, the product's architecture is described. Any individual components or actors are identified, their "knowledge" or what data they store is identified, and data flow between components and external entities is described.
The main objective of this feature/product is: to allow Engineering to receive aggregate data of browser health in the field. Think cache hit rates, page load times across all browser instances or anything else we're interested in.
Design Documents: Link to any design or architectural documents here.
- UI components bug]
- EtherPad Design Doc
- Feature Page
- bug 585196: Infrastructure Implementation Bug for Telemetry
- Chromium code similar to our plan
Components
Describe any major components in the system and how they interact. Also include any third-party APIs (those Mozilla does not control) and what type of data is sent or received via those APIs.
Client Component (Firefox)
This component gathers metrics and uploads counters and histograms to the Telemetry server.
The tables below simply summarize the data encountered by this component.
Stored Data:
| What | Where |
|---|---|
| There are 2 kinds of metrics:
1) Metrics that are recorded while performing an operation. Ie startup decisions/timing, cycle collection timing 2) Data that are polled for every time the browser is idle for more than a minute. In Taras' experience, data are gathered a couple times an hour. Currently we poll various about:memory fields. Other things to poll for: number of open tabs, sizes of key sqlite databases, sizes of cache, etc. |
All telemetry data is stored in memory. If a telemetry ping never happens, data is lost on shutdown |
Communication with Server Component
| Direction | Message | Data | Notes |
|---|---|---|---|
| In: | ACK | HTTP 200/OK (no additional data) | |
| Out: | HTTP POST to /submit/telemetry/ | text/plain JSON-encoded object containing historgrams and counters | The types of data represented by histograms and counters will change over
time and the submission will contain a unique ID (nonce to identify strange duplicate submissions). Pings are once per day. |
Server Component
This component receives metrics from the Client Component and creates visualizations and queries for Mozilla people.
The tables below simply summarize the data encountered by this component.
Stored Data:
| What | Where |
|---|---|
| data type | Entire ping is store in a json log. Client ip is added to the log. See security review in bug 655746. |
Communication with Client Component
| Direction | Message | Data | Notes |
|---|---|---|---|
| In: | HTTP POST | Telemetry Data | (see above) |
| Out: | ACK | HTTP 200/OK | (no additional data) |
User Data Risk Minimization
In this section, the privacy champion will identify areas of user data risk and recommendations for minimizing the risk.
Fingerprinting
Based on metrics that are similar from day to day, an individual user might be fingerprinted and tracked across time. Someone with consistent day-to-day browsing habits may have the same memory usage, speed, etc; it is likely that the machine's attributes will also have an effect on the measurements taken so a combination of browsing habits and machine attributes could be a fairly detailed "fingerprint".
Required Action: To minimize fingerprinting risk, it is crucial to ensure that arbitrary web sites absolutely cannot access the telemetry data while it's stored on the client. Additionally, the data should be transmitted from the Client Component to the Server Component over a secured (and preferably authenticated) channel; this means SSL/HTTPS must be used.
Recommendation: If possible, the SSL certificate fingerprint should be hard coded into the client and verified before transmitting data so the client can be sure the server where it is sending data is indeed the Telemetry server (and not an attacker intercepting traffic).
Conformity to Private Browsing Mode
Private browsing is intended to protect from someone who has local access to the browser from knowing what you did in private browsing mode. Since Telemetry collects data that is ultimately affected by how the user browses the web, any data collected should not be retained persistently through private browsing mode.
Recommendations: Telemetry should be disabled in private browsing mode. If nothing else, the measurements must not be stored on disk or other non-volatile storage devices while the client is in private browsing mode. Any measurements taken during private mode should be erased from memory when private mode is exited.
Alignment with Privacy Operating Principles
In this section, the privacy champion will identify how the feature lines up with Mozilla's privacy operating principles.
See Also: Privacy/Roadmap_2011#Operating_Principles:
Principle: Transparency / No Surprises: People should know that the metrics are being gathered and submitted. This feature is opt-in, though it's not clear whether or not people fully understand what type of data they're letting us collect.
Required Action: It should be clear in the UI what we collect and how we collect it. For example, Test Pilot asks the user to approve not only data collection but also data submission, providing information about what's being collected or submitted at the time it begins. Telemetry should do something similar to make it very explicit what is being collected as well as when it's being submitted.
Principle: Real Choice:
Users of this system should not only understand what it does, but be able to choose whether or not to participate.
Required Action: It should be clear in the UI what we collect and how we collect it.
Taras says: The opt-in UI will link to the privacy policy and eventually to a fancy about:telemetry page which will list all of the probes being collected. https://bugzilla.mozilla.org/show_bug.cgi?id=652657
Principle: Sensible Defaults:
Telemetry is off by default and is opt-in.
Recommendations: None.
Principle: Limited Data:
Telemetry should only collect data that we will actually use for improvements to the product. All data that's collected should be backed up with clearly stated reasons.
Recommendations: Maintain a table of counters and histograms that are gathered and reasons for collecting it. The table should be revisited regularly to identify unnecessary metrics and we should stop collecting those.
Follow-up Tasks and tracking
| What | Who | Bug | Details |
|---|---|---|---|
| [DONE] Initial Overview Discussion | Sid and Taras | Meeting 28-April-2011 | |
| [DONE] Discuss Missing Information | Sid and Taras | Via IRC 17-May-2011 | |
| [ON TRACK] Discuss risks and recommendations | Sid and Taras | Scheduled 18-May-2011 |