MetricsDataPing: Difference between revisions

Jump to navigation Jump to search
2,734 bytes removed ,  3 February 2012
No edit summary
Line 432: Line 432:
= Anonymous alternative =
= Anonymous alternative =


The following is an alternative approach, proposed by Ben Bucksch.  It is copied to [[Talk:MetricsDataPing#NonIdAlternative]] for discussion.
Ben Bucksch proposed an alternative implementation that attempts to avoid using document identifiers just before the 2012-02-01 security review meeting.  It is copied to [[Talk:MetricsDataPing#NonIdAlternative]] with a response from [[User:DEinspanjer|DEinspanjer]] 20:14, 2 February 2012 (PST)
 
For simplicity, I will take the number of crashes (e.g. in the last week or overall) as data point that you want to gather. The data itself is anonymous and can (apart from fingerprinting, more to that later) not identify a single user.
 
== Avoiding UUID ==
 
You wanted to know which profiles are not used anymore (dormant, retention problem) and which characteristics they have. This is inherently difficult without tracking individual users (installations), but it is possible with the following algo:
 
The client submits:
 
* Date of last submission - e.g. 2012-01-18
* Current date (from client perspective) - only date, not time - e.g. 2012-01-20
* Age of profile (Firefox installation) in days - e.g. 500
* (Last submitted age is implied or explicit - e.g. 498 )
* Number of crashes - e.g. 15
* Number of crashes submitted last time - e.g. 10
 
Then, on the server, you write that information in a database, as such:
Date of submission | Age of installation | Crash count | Number of users
2012-01-20         | 500                | 15          | 100000
Any additional user also submitting today the same combination "age 500, crash count 15" increases the "number of users" column by 1, new value is 100001.
Also, you look up the row for the last submission, namely
2012-01-18        | 498                | 10          | 20000
and decrease the number of users by 1, new value is 19999.
 
If the user later that day decided that there were too many crashes and switches to Chrome, he will now be stranded on the row
2012-01-20        | 500                | 15          | 5000
while other users who have continued to use FF have been subtracted after a while. So, you can say with certainty that there were 5000 users who used Firefox the last time on 2012-01-20, after having used Firefox for 500 days, and they had 15 crashes (per day/week/total, whatever you submit) when they stopped using Firefox.
 
That is exactly the information you are so desperately seeking. Tsere, you has it. Without tracking any individual user: it's completely anonymous.
 
== Avoiding Fingerprinting ==
 
Now, what about all the other information that you need: startup times, addons, etc.? If we just add all that information to the same table and row, it would allow fingerprinting. But that is not necessary. You merely make one table per atomic information. I.e.
Table A
Date of submission | Age of installation | Crash count | Number of users
Table B
Date of submission | Age of installation | Startup time | Number of users
or of course whatever other database schema you want, as long as each value is separate. That takes care of the fingerprinting.
 
At least on the server side, not on the submission side. I would have to trust you, and anything between you and me. It would be possible to separate the calls and submit each value separately, but I think that would be overdoing it.
131

edits

Navigation menu