Changes

Jump to: navigation, search

Data Publishing

91 bytes removed, 22:02, 4 February 2021
Add table of contents, linkable headers
<big>'''== Introduction'''</big>==
Mozilla’s history is steeped in openness and transparency - it’s simply core to what we do and how we see ourselves in the world. We are always looking for ways to bring our mission to life in ways that help create a healthy internet and support the Mozilla Manifesto. One of our commitments says “We are committed to an internet that elevates critical thinking, reasoned argument, shared knowledge, and verifiable facts”.
To this end, we have spent a good amount of time considering how we can publicly share our Mozilla telemetry data sets - it is one of the most simple and effective ways we can enable collaboration and share knowledge. But, only if it can be done safely and in a privacy protecting, principled way. We believe we’ve designed a way to do this and we are excited to outline our approach here.
<big>'''== Dataset Publishing Process'''</big>==
We want our data publishing review process, as well as our review decisions to be public and understandable, similar to our [[Firefox/Data_Collection|Mozilla Data Collection]] program. To that end, our full dataset publishing policy and details about what considerations we look at before determining what is safe to publish can be found below, including a summary of the critical pieces of that process.
* How we characterize the levels of aggregation
<big>'''## How we characterize the levels of aggregation'''</big>
The table below describes the various types of aggregation levels we are defining.
|}
<big>'''== How we characterize the sensitivity of dimensions'''</big>==
Based on the [[https://wiki.mozilla.org/Firefox/Data_Collection#Data_Collection_Categories|Data Collection Categories]], most Telemetry data naturally falls within category 1 (technical data) and 2 (interaction data), which are not considered sensitive. A notable exception, however, is geo location, which we geocode from IP addresses to extract City / Region / Country, but only include cities with a population > 15,000 (according to the Geonames database).
| '''Category 4 (Highly Sensitive Data)''' || '''1, 2''' || Technically, category 4 often involves highly sensitive data, such as explicit identifiers, that will be removed in the process of aggregation. We include it here for the sake of completeness.
|}
<big>'''== How do we characterize the sensitivity of metrics?'''</big>==
Most metrics are not sensitive information, per se. That said, if a metric indicates or directly implies something about revenue, it is “sensitive”. Example: Search counts.
<big>'''== Dataset Publishing Process'''</big>==
The goal of this process is to (1) make the “easy” (that is, safe) data publishing requests relatively friction-less, (2) have guard rails in-place so we don’t publish something that exposes us or our users to risk in some way, and (3) ensure that the dataset publishing request process matches closely other processes that are familiar to the data stewards.
* Once the dataset has been published, it will be announced on the [https://blog.mozilla.org/data/ Data @ Mozilla blog]. Accessing the public data is described on the [https://docs.telemetry.mozilla.org/cookbooks/public_data.html data documentation page].
<big>'''== Definitions'''</big>==
'''Metric''' - A metric is anything we want to measure.
'''Tabular Data''' - Data that consists of rows (or records) and columns (or fields). Each row has the same number of columns, and each column represents a dimension or metric for that row. Think of a spreadsheet or CSV file as examples of this type of data.
<big>'''== What's Been Published So Far?'''</big>==
Our publicly available datasets are [https://public-data.telemetry.mozilla.org/all-datasets.json here].
Confirm
955
edits

Navigation menu