Changes

Jump to: navigation, search

Data Publishing

71 bytes added, 12:45, 21 September 2020
Minor cleanup, including adding a link to the data blog and fixing the doc link.
<big>'''Dataset Publishing Process'''</big>
We want our data publishing review process, as well as our review decisions to be public and understandable, similar to our [[Firefox/Data_Collection|Mozilla Data Collection]] program. To that end, our full dataset publishing policy and details about what considerations we look at before determining what is safe to publish can be found below, including asummary a summary of the critical pieces of that process.
The goal of our data publishing process is to:
[Canada, Linux, “Other locales”, nightly] for rare locales
|-
| 3 || '''Dimension-level aggregation w/o minimum bucket sizes''' || Clientid Client ID count by country, os, locale, channel, where there could be: [Canada, Linux, PL, nightly] which has one client in it.
|-
| 4 || '''Probabilistic Aggregates''' || [https://en.wikipedia.org/wiki/HyperLogLog HLL] for computing approximate unique client counts, [https://en.wikipedia.org/wiki/Bloom_filter bloom filter] for computing presence in a set.
|-
| 5 || '''Anonymized individual-level data''' ||
* Schedule it to update on the desired frequency
* Plumb it in to the public facing dataset infrastructure, including metadata that links the public data back to the above review bug.
* Once the dataset has been published, it will be announced on the new [https://blog.mozilla.org/data/ Data @ Mozilla blog]. It will also be added to Accessing the public data is described on the [https://docs.telemetry.mozilla.org/datasetscookbooks/public_data.html data documentation page].
<big>'''Definitions'''</big>
'''Metric''' - A metric is anything we want to measure.
Examples: the number of clients that used the developer tools console, the number of active clients.
'''Dimension''' - A dimension is a qualitative value such as OS, channel, or date. In practice, a dimension often defines a sub-population on which we can calculate a metric, allowing us to segment the metric for further analysis.
Examples: if we have an OS dimension, we can analyze the number of active clients by OS; .
'''Aggregate''' - A combined value of many measurements (metric values), typically grouped by dimension or sets of dimensions. See also Aggregate Data.
'''Individual-level Data''' - Data containing a dimension which uniquely identifies a single profile, user, client, etc.
Confirm
36
edits

Navigation menu