Changes

Jump to: navigation, search

Data Publishing

779 bytes added, 18:03, 22 September 2020
added the descriptions for aggregation levels.
! Level !! Aggregation !! Examples
|-
| 1 || '''Statistical / ML Models''' <br /> A model built/trained using real data. || TAAR, Federated learning models, Forecasting models
|-
| 2 || '''Dimension-level aggregation w/ minimum bucket sizes''' <br /> Aggregated by dimensions, minimum "bucket" size of population 5,000. || Total page loads by country, OS, locale, channel where any combination with a count less than 5,000 are grouped into “Other”
[Canada, Linux, “Other locales”, nightly] for rare locales
|-
| 3 || '''Dimension-level aggregation w/o minimum bucket sizes''' <br /> Aggregated by dimensions, no minimum bucket size. || Client ID count by country, os, locale, channel, where there could be: [Canada, Linux, PL, nightly] which has one client in it.
|-
| 4 || '''Probabilistic Aggregates''' <br /> Data structures for approximations. || [https://en.wikipedia.org/wiki/HyperLogLog HLL] for computing approximate unique client counts, [https://en.wikipedia.org/wiki/Bloom_filter bloom filter] for computing presence in a set.
|-
| 5 || '''Anonymized individual-level data''' <br /> Covers “partial aggregates” like clients_daily which is aggregated by day. Key feature is that it still has an individual-level identifier. Actual identifiers are anonymized using a one-to-one replacement value. In this example, we replaced the ID with A, B, C, etc. ||
* Anonymized_id, date, country, os, locale, channel
* A, 2019-08-08, Canada, Linux, PL, nightly
* B, 2019-08-10, Peru, Windows, EN, release
|-
| 6 || '''Not-anonymized individual-level data''' <br /> This data contains individual-level identifiers as they exist in the raw data. Compared with anonymized data, instead of A, B, and C we use the original identifiers. ||
* actual_id, date, country, os, locale, channel
* 859c8a32-0b73-b547-a5e7-8ef4ed9c4c2d, 2019-08-08, Canada, Linux, PL, nightly
* 4db8d07d-1935-9c45-93c9-6d97a790bb12, 2019-08-10, Peru, Windows, EN, release
|-
| 7 || '''High resolution individual-level data''' <br /> The highest level of resolution is releasing events at the per-second or per-subsecond resolution. || Raw telemetry events data, a sequence of actions in order of occurrence.
|-
|}
Confirm
36
edits

Navigation menu