131
edits
No edit summary |
DEinspanjer (talk | contribs) No edit summary |
||
| (29 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
DEPRECATED: This proposal has been updated and the official project name is "Firefox Health Report". Please see the following links for further discussion. | |||
[https://groups.google.com/d/topic/mozilla.dev.platform/rOO1HGpAb9Q/discussion Post on dev.platform] | |||
[https://blog.mozilla.org/metrics/2012/09/21/firefox-health-report/ Firefox Health Report blog post] | |||
[https://blog.mozilla.org/metrics/firefox-health-report/fhr-faq/ Firefox Health Report FAQ] | |||
= Description = | = Description = | ||
This project is centered around adding to Firefox the ability to measure adoption, retention, stability, performance, and aggregated search counts by engine. It records possible explanatory dimensions using a statistically unbiased approach. A key feature of the project is to enable the user to review, analyze, and remove the data collected about the browser if they desire. | |||
'''Note''': The description below is the current proposal form the metrics team. There are concerns from some employees and community members about potentially serious privacy problems. The metrics team is attempting to keep this page focused on the project status and specific technical implementation details. Discussion, opposing views, and possible alternatives are encouraged on the [[Talk:MetricsDataPing|discussion page]] (also linked at the top of the page). A large portion of the original discussion and views created by [[User:BenB]] are also included [[MetricsDataPing#Privacy|below]] per his request that we not relocate them. If there are any alternatives that meet the requirements of the project while providing a different approach that increases the perceived privacy concerns, this page will be updated to reflect them. | |||
== Requirements == | |||
*;Enable retention analytics:Mozilla has a critical need to be able to understand the factors that cause installations of Firefox to no longer be used. The system must have some way to detect an abandoned installation. The current implementation handles this by using a generated document ID for each submission and deleting the previous submission on the server when a new one is posted. With this method, an abandoned installation can be detected based on the age of the last submitted document. Retention analysis includes being able to ask and answer questions such as: | |||
** Are abandoned installations typically new or old? Were they created and used once then abandoned or were they used actively for a period of time? | |||
** What were the performance and stability characteristics of abandoned installations? Were they slow or crashy? | |||
** Did the installations have addons that potentially affect stability or performance? | |||
** What operating systems and OS versions were abandoned installations running upon? | |||
** How frequently were abandoned installations upgraded? Were they running the latest stable release or an old release? | |||
*;Enable reliable unique installation counting:Mozilla currently uses Active Daily Installations (ADI) as a key metric for the health of the product. ADI is currently calculated by looking at the number of AMO Blocklist requests made each day. However, this is not a number that can be summed over timespans larger than one day. We cannot accurately measure how many unique active installations there were over a week or month. Code was added to the Blocklist feature at the beginning of 2011 to enable this, but a technical limitation of the core implementation of the Blocklist feature (the metrics are not conditional on successful response from the server) rendered this method unreliable. Further changes to the Blocklist system that were not relevant to the Blocklist feature itself were prohibited by the module owners. | |||
*;Consolidate metric gathering and reduce or eliminate piggy-backing metrics on other systems:The metrics team is in agreement with the reasoning behind avoiding further changes to the blocklist system. We feel that metrics gathering should have a clear place in the code and provide reasonable and useful control to the user without having to resort to turning off unrelated features to disable it. Further, eliminating piggy-backing has the advantage of reducing unnecessary complexity in unrelated systems. | |||
*;Ensure metrics are based on a statistically representative and unbiased majority of Firefox users.:Any opt-in system is subject to self-selection bias. Controlling for this bias and performing analysis with the purpose of optimizing for the majority of users is extremely prone to failure unless there is some unbiased source of data to use as a control. Having MDP as this unbiased source will allow us to properly control for bias in other systems such as Test Pilot or Telemetry (without any linking of the datasets). | |||
*;Provide end users who desire it with the ability to review the data being submitted and perform their own analysis locally:Tying this data to a concrete feature of Firefox through the about:metrics page is useful to the user, but it also enables us to use the currently proposed implementation in compliance with various regulatory bodies. We are developing the about:metrics interface to allow the user to answer the following types of questions. There are many other questions we would like to enable the user or Mozilla to answer, but the initial implementation of this project was restricted to a set of metrics that are already available through other systems such as Blocklist (with the exception of search counts). | |||
** What data is being collected about my installation by the MDP feature? | |||
** How much am I using this browser? | |||
** Has performance or stability improved since I installed this latest version? | |||
** Has adding or removing specific add-ons caused a change in the browser's performance or stability? | |||
*;Provide end users with the ability to remove the data collected about their installation from our servers:This is a goal to demonstrate collecting metrics in a way that is transparent to users and provides them with ownership and control. The current implementation requires some form of document ID to enable the user to instruct the service to remove the data associated with their installation. | |||
= Data Elements = | = Data Elements = | ||
| Line 9: | Line 41: | ||
A directory of elements collected by the various data collection pings (Metrics Data Collection Ping, Blocklist, AUS Ping, Version Check Ping, Services AMO, Telemetry) can be found here: [https://metrics.etherpad.mozilla.org/ep/pad/view/ro.9e6LG/latest Data Collection Paths]<br> | A directory of elements collected by the various data collection pings (Metrics Data Collection Ping, Blocklist, AUS Ping, Version Check Ping, Services AMO, Telemetry) can be found here: [https://metrics.etherpad.mozilla.org/ep/pad/view/ro.9e6LG/latest Data Collection Paths]<br> | ||
<br> | <br> | ||
The list and definitions of data elements in the Metrics Ping is here [https:// | The list and definitions of data elements in the Metrics Ping is here [https://docs.google.com/spreadsheet/ccc?key=0AtdL1GrYQUbldFBBUUNkbTBKNjZTd3dTeTZ0QUhaNXc MDP Data Point Descriptions] | ||
== Submission ID == | == Submission ID == | ||
| Line 22: | Line 54: | ||
Sample JSON output that is recieved mozilla server side:<br> | Sample JSON output that is recieved mozilla server side:<br> | ||
<pre> | <pre>Format updated 2012/02/01: | ||
{ | { | ||
"ver": | "ver": 2, | ||
"lastPingTime": "2012-01-31T16:57:26.000Z", | |||
"lastPingTime": | "thisPingTime": "2012-02-02T14:18:30.507Z", | ||
" | |||
"env": { | "env": { | ||
"reason": " | "reason": "startup", | ||
"OS": "Linux", | "OS": "Linux", | ||
"appID": "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}", | "appID": "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}", | ||
"appVersion": " | "appVersion": "12.0a1", | ||
"appVendor": "Mozilla", | "appVendor": "Mozilla", | ||
"appName": "Firefox", | "appName": "Firefox", | ||
"appBuildID": " | "appBuildID": "20120202101451", | ||
"appABI": "x86_64-gcc3", | "appABI": "x86_64-gcc3", | ||
"appUpdateChannel": "default", | "appUpdateChannel": "default", | ||
"appDistribution": "default", | "appDistribution": "default", | ||
"appDistributionVersion": "default", | "appDistributionVersion": "default", | ||
"platformBuildID": " | "appHotfixVersion": "", | ||
"platformVersion": " | "platformBuildID": "20120126141109", | ||
"platformVersion": "12.0a1", | |||
"locale": "en-US", | "locale": "en-US", | ||
"name": "Linux", | "name": "Linux", | ||
"version": " | "version": "3.0.0-15-generic", | ||
"cpucount": 4, | "cpucount": 4, | ||
"memsize": 7889, | "memsize": 7889, | ||
"arch": "x86-64" | "arch": "x86-64" | ||
}, | }, | ||
" | "addons": [ | ||
" | { | ||
" | "id": "crashme@ted.mielczarek.org", | ||
" | "userDisabled": false, | ||
" | "appDisabled": false, | ||
" | "version": "0.3", | ||
" | "installDate": "2011-10-25", | ||
" | "updateDate": "2011-10-25", | ||
" | "type": "extension", | ||
" | "hasBinaryComponents": false | ||
{ | }, | ||
{ | |||
"id": "ping.telemetry@mozilla.com", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "0.5", | |||
"installDate": "2011-11-16", | |||
"updateDate": "2011-12-20", | |||
"type": "extension", | |||
"hasBinaryComponents": false | |||
}, | |||
{ | |||
"id": "about.blank@mozilla.com", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "0.5", | |||
"installDate": "2012-01-27", | |||
"updateDate": "2012-02-01", | |||
"type": "extension", | |||
"hasBinaryComponents": false | |||
}, | |||
{ | |||
"id": "{e2c52c1c-5ee1-cc23-15fa-35945fd58806}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "1.0.0.0", | |||
"installDate": "2012-01-26", | |||
"updateDate": "2012-01-26", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{18965679-bddd-de62-52b4-b56e6316d854}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-12-13", | |||
"updateDate": "2011-12-13", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{5c0830c7-3003-fc43-0daf-d29b579f5f6b}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-12-13", | |||
"updateDate": "2011-12-13", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{ed5c33eb-95c1-f1be-50ba-eb0ade42d912}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-12-02", | |||
"updateDate": "2011-12-02", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{79eb71d7-19b2-ef97-2247-9a8960804972}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-11-08", | |||
"updateDate": "2011-11-08", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{f2d261dc-c5c4-ca3c-ae02-ccb3ff227c7f}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-10-18", | |||
"updateDate": "2011-10-18", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{84e372b2-f1a3-032a-0001-b725ee38d1ed}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-10-18", | |||
"updateDate": "2011-10-18", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{fcdc99b5-45d4-daeb-9239-0a41e6c9b7ce}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-10-18", | |||
"updateDate": "2011-10-18", | |||
"type": "plugin" | |||
}, | |||
{ | |||
"id": "{24f3e033-1a7c-ae8b-3fc7-4ac494c18e91}", | |||
"userDisabled": false, | |||
"appDisabled": false, | |||
"version": "", | |||
"installDate": "2011-10-18T", | |||
"updateDate": "2011-10-18", | |||
"type": "plugin" | |||
} | |||
], | |||
"currentSessionTime": 60, | |||
"currentSessionActiveTime": 50, | |||
"dataPoints": { | |||
"2012-02-02": { | |||
"search": { | |||
"searchbar": { | |||
"Google": 1 | |||
}, | |||
"abouthome": { | |||
"Google": 1 | |||
} | |||
}, | |||
"sessions": { | |||
"completedSessions": 1, | |||
"completedSessionTime": 567, | |||
"completedSessionActiveTime": 115 | |||
}, | }, | ||
{ | "simpleMeasurements": { | ||
" | "uptime": 2, | ||
" | "main": 184, | ||
" | "firstPaint": 1039, | ||
" | "sessionRestored": 903, | ||
" | "isDefaultBrowser": false, | ||
"crashCountSubmitted": 0, | |||
"profileAge": 121, | |||
"placesPagesCount": 508, | |||
"placesBookmarksCount": 77, | |||
"addonCount": 14, | |||
"version": "12.0a1" | |||
} | } | ||
}, | |||
"2012-02-01": { | |||
"simpleMeasurements": { | |||
"search": { | "uptime": 2, | ||
"main": 17, | |||
"firstPaint": 582, | |||
"sessionRestored": 468, | |||
"isDefaultBrowser": false, | |||
"crashCountSubmitted": 0, | |||
"profileAge": 120, | |||
"placesPagesCount": 456, | |||
"placesBookmarksCount": 76, | |||
"addonCount": 14, | |||
"version": "12.0a1" | |||
}, | |||
"sessions": { | |||
"completedSessions": 10, | |||
"completedSessionTime": 5852, | |||
"completedSessionActiveTime": 780, | |||
"abortedSessions": 1, | |||
"abortedSessionTime": 222, | |||
"abortedSessionActiveTime": 55 | |||
} | |||
}, | |||
"2012-01-31": { | |||
"search": { | |||
"abouthome": { | |||
"Google": 1 | |||
} | |||
}, | }, | ||
" | "sessions": { | ||
" | "completedSessions": 3, | ||
" | "completedSessionTime": 73096, | ||
" | "completedSessionActiveTime": 310 | ||
}, | }, | ||
" | "simpleMeasurements": { | ||
" | "uptime": 2, | ||
"main": 10, | |||
"firstPaint": 566, | |||
"sessionRestored": 446, | |||
"isDefaultBrowser": false, | |||
"crashCountSubmitted": 0, | |||
"profileAge": 119, | |||
"placesPagesCount": 452, | |||
"placesBookmarksCount": 77, | |||
"addonCount": 14, | |||
"version": "12.0a1" | |||
} | } | ||
}, | }, | ||
"sessions": { | "2012-01-30": { | ||
"sessions": { | |||
"completedSessions": 10, | |||
"completedSessionTime": 2202, | |||
"completedSessionActiveTime": 640, | |||
"abortedSessions": 2, | |||
"abortedSessionTime": 60, | |||
" | "abortedSessionActiveTime": 50 | ||
" | }, | ||
"search": { | |||
"abouthome": { | |||
"Google": 1 | |||
}, | |||
"searchbar": { | |||
} | "Bing": 1, | ||
"Other": 1, | |||
"Google": 2 | |||
} | |||
}, | |||
"simpleMeasurements": { | |||
"uptime": 2, | |||
"main": 11, | |||
"firstPaint": 698, | |||
"sessionRestored": 564, | |||
"isDefaultBrowser": false, | |||
"crashCountSubmitted": 0, | |||
"profileAge": 118, | |||
"placesPagesCount": 479, | |||
"placesBookmarksCount": 74, | |||
"addonCount": 14, | |||
"version": "12.0a1" | |||
} | |||
} | |||
}, | |||
"currentTime": "2012-02-02T14:19:30.522Z" | |||
} | } | ||
</pre> | </pre> | ||
*Clients will POST data to the configured URL not more than once every 24 hours. | *Clients will POST data to the configured URL not more than once every 24 hours. | ||
*The first timer check should be one minute after startup. | *The first timer check should be one minute after startup. | ||
*The POST data will consist of a JSON document containing a document ID and all the metrics that were collected since the last submission. | *The POST data will consist of a JSON document containing a document ID and all the metrics that were collected since the last submission. | ||
*The server side will receive the POST request and perform GeoIP location on the IP address. ''The raw IP will never be stored. ''The GeoIP data and submission timestamp will be added to the JSON document. | |||
== Server-side == | |||
*The server side will receive the POST request and perform GeoIP location on the IP address. ''The raw IP will never be stored in the document.'' The GeoIP data and submission timestamp will be added to the JSON document. | |||
*The server will store the JSON document into a daily staging collection with all other documents received during that date, UTC. | *The server will store the JSON document into a daily staging collection with all other documents received during that date, UTC. | ||
*The server will return an HTTP response to the client indicating success of the storage | *If the POST request contains a header with the ID of a previously submitted document, the server will delete that old document as part of the transaction of storing the new one. | ||
*The server will return an HTTP response to the client indicating success of both the deletion of the old document and storage of the new document. | |||
*In the future this response might also include instructions to the client for things such as changing timing or MetricsDataPing configuration. | *In the future this response might also include instructions to the client for things such as changing timing or MetricsDataPing configuration. | ||
*Longitudinal data for 6 months (e.g. intensity of use) is stored cumulatively in the JSON objects indexed by document ID. ''Documents older than 6 months are deleted.'' | |||
*Longitudinal data for 6 months (e.g. intensity of use) is stored cumulatively in the JSON objects indexed by | *''At the end of the day, UTC, the server will aggregate all the documents submitted on that date and store the aggregate data (with no document IDs) in aggregate history tables in our data warehouse.'' (In subsequent releases of the MDP project, there will be a public API to retrieve information from these aggregate views to support additional analysis by users and the community. | ||
*''At the end of the day, UTC, the server will aggregate all the documents submitted on that date and store the aggregate data (with no | |||
= Data Access Policies = | = Data Access Policies = | ||
| Line 128: | Line 331: | ||
* Must be a member of the metrics team | * Must be a member of the metrics team | ||
* Must have an SSH account with LDAP integrated key | * Must have an SSH account with LDAP integrated key | ||
* Must have | * Must have VPN access | ||
= User Data = | = User Data = | ||
| Line 141: | Line 344: | ||
= UI Implementation = | = UI Implementation = | ||
The Metrics Team is consulting with UX to determine the proper UI implementation. | The Metrics Team is consulting with UX to determine the proper UI implementation of the preference to control data submission. Given the opt-out requirement, UX proposes a check box to opt-out in the preferences pane and notifying users through non-modal and non-chrome channels (blog posts, privacy policies, download pages). | ||
see: https://bugzilla.mozilla.org/show_bug.cgi?id=707970 | see: https://bugzilla.mozilla.org/show_bug.cgi?id=707970 | ||
Below are some wireframes showing potential UI layouts of the about:metrics page that users can use to review the data and use it for analysis of their own installation. | |||
[[image:about-metrics-initial-gt800px.png|left|thumb|200px]] | |||
[[image:about-metrics-initial-lt800px.png|left|thumb|200px]] | |||
[[image:about-metrics-analysis-gt800px.png|left|thumb|200px]] | |||
[[image:about-metrics-analysis-lte800px.png|left|thumb|200px]] | |||
<br style="clear:both;" /> | |||
= Security Reviews = | = Security Reviews = | ||
Review for Bagheera, the back end server that | Review for Bagheera, the back end server that receives and stores user data: [https://bugzilla.mozilla.org/show_bug.cgi?id=655746 https://bugzilla.mozilla.org/show_bug.cgi?id=655746] | ||
'''The below feedback was provided by [[User:BenB]] and was retained on this page at his request. Discussion regarding these points are available here: [[Talk:MetricsDataPing]] | |||
= Privacy = | = Privacy = | ||
| Line 167: | Line 381: | ||
*We want to acquire representative data and analyze it for the ‘de-averaged’ benefit of multiple but still large sub-populations of users | *We want to acquire representative data and analyze it for the ‘de-averaged’ benefit of multiple but still large sub-populations of users | ||
*Each subpopulation requires insights and actions that are not of the ‘one size fits all’ variety | *Each subpopulation requires insights and actions that are not of the ‘one size fits all’ variety | ||
=== What difference does it make for the user? === | === What difference does it make for the user? === | ||
| Line 202: | Line 414: | ||
* policy changes on the Mozilla side. | * policy changes on the Mozilla side. | ||
Having a UUID would allow, for example, to track all my dynamic IP addresses over time | Having a UUID would allow, for example, to track all my dynamic IP addresses over time. That would allow to track my notebook or mobile and thus the places where I go based on IP geolocation / whois data. For example, you could see that I am normally in my little village with 3000 households, but suddenly I appear at IBM headquarters, so clearly I am working for or consulting them. Or you could see who exactly my friends are, because my device appears on the same IP address as theirs, and you can even see roughly how often I am there or they in my place. | ||
Even if you are not collecting the data right now, the user has no way to verify whether any of the above (break-in, intercept, intended or lawful or not) is happening or not, and that already is a privacy violation. So, it's irrelevant what the intended usage was, only what is theoretically possible. The above must be impossible - not just "We won't do it, we promise!", but impossible. | |||
=== Google Chrome === | === Google Chrome === | ||
| Line 227: | Line 439: | ||
The current proposal changed a stable UUID for a profile to a submission ID. However, the previous submission ID is also transferred, which allows the server to trivially match them together and still build a unique ID on the server. (Again, whether the server does that or not is immaterial.) So, the submission ID proposal has the same privacy consequences discussed above. | The current proposal changed a stable UUID for a profile to a submission ID. However, the previous submission ID is also transferred, which allows the server to trivially match them together and still build a unique ID on the server. (Again, whether the server does that or not is immaterial.) So, the submission ID proposal has the same privacy consequences discussed above. | ||
= Anonymous alternative = | |||
The following is an alternative approach, proposed by Ben Bucksch: | |||
For simplicity, I will take the number of crashes (e.g. in the last week or overall) as data point that you want to gather. The data itself is anonymous and can (apart from fingerprinting, more to that later) not identify a single user. | |||
== Avoiding UUID == | |||
You wanted to know which profiles are not used anymore (dormant, retention problem) and which characteristics they have. This is inherently difficult without tracking individual users (installations), but it is possible with the following algo: | |||
The client submits: | |||
* Date of last submission - e.g. 2012-01-18 | |||
* Current date (from client perspective) - only date, not time - e.g. 2012-01-20 | |||
* Age of profile (Firefox installation) in days - e.g. 500 | |||
* (Last submitted age is implied or explicit - e.g. 498 ) | |||
* Number of crashes - e.g. 15 | |||
* Number of crashes submitted last time - e.g. 10 | |||
Then, on the server, you write that information in a database, as such: | |||
Date of submission | Age of installation | Crash count | Number of users | |||
2012-01-20 | 500 | 15 | 100000 | |||
Any additional user also submitting today the same combination "age 500, crash count 15" increases the "number of users" column by 1, new value is 100001. | |||
Also, you look up the row for the last submission, namely | |||
2012-01-18 | 498 | 10 | 20000 | |||
and decrease the number of users by 1, new value is 19999. | |||
If the user later that day decided that there were too many crashes and switches to Chrome, he will now be stranded on the row | |||
2012-01-20 | 500 | 15 | 5000 | |||
while other users who have continued to use FF have been subtracted after a while. So, you can say with certainty that there were 5000 users who used Firefox the last time on 2012-01-20, after having used Firefox for 500 days, and they had 15 crashes (per day/week/total, whatever you submit) when they stopped using Firefox. | |||
That is exactly the information you are so desperately seeking. Tsere, you has it. Without tracking any individual user: it's completely anonymous. | |||
== Avoiding Fingerprinting == | |||
Now, what about all the other information that you need: startup times, addons, etc.? If we just add all that information to the same table and row, it would allow fingerprinting. But that is not necessary. You merely make one table per atomic information. I.e. | |||
Table A | |||
Date of submission | Age of installation | Crash count | Number of users | |||
Table B | |||
Date of submission | Age of installation | Startup time | Number of users | |||
or of course whatever other database schema you want, as long as each value is separate. That takes care of the fingerprinting. | |||
At least on the server side, not on the submission side. I would have to trust you, and anything between you and me. It would be possible to separate the calls and submit each value separately, but I think that would be overdoing it. | |||
edits