Confirmed users
36
edits
No edit summary |
(added the descriptions for aggregation levels.) |
||
| Line 30: | Line 30: | ||
! Level !! Aggregation !! Examples | ! Level !! Aggregation !! Examples | ||
|- | |- | ||
| 1 || '''Statistical / ML Models''' || TAAR, Federated learning models, Forecasting models | | 1 || '''Statistical / ML Models''' <br /> A model built/trained using real data. || TAAR, Federated learning models, Forecasting models | ||
|- | |- | ||
| 2 || '''Dimension-level aggregation w/ minimum bucket sizes''' || Total page loads by country, OS, locale, channel where any combination with a count less than 5,000 are grouped into “Other” | | 2 || '''Dimension-level aggregation w/ minimum bucket sizes''' <br /> Aggregated by dimensions, minimum "bucket" size of population 5,000. || Total page loads by country, OS, locale, channel where any combination with a count less than 5,000 are grouped into “Other” | ||
[Canada, Linux, “Other locales”, nightly] for rare locales | [Canada, Linux, “Other locales”, nightly] for rare locales | ||
|- | |- | ||
| 3 || '''Dimension-level aggregation w/o minimum bucket sizes''' || Client ID count by country, os, locale, channel, where there could be: [Canada, Linux, PL, nightly] which has one client in it. | | 3 || '''Dimension-level aggregation w/o minimum bucket sizes''' <br /> Aggregated by dimensions, no minimum bucket size. || Client ID count by country, os, locale, channel, where there could be: [Canada, Linux, PL, nightly] which has one client in it. | ||
|- | |- | ||
| 4 || '''Probabilistic Aggregates''' || [https://en.wikipedia.org/wiki/HyperLogLog HLL] for computing approximate unique client counts, [https://en.wikipedia.org/wiki/Bloom_filter bloom filter] for computing presence in a set. | | 4 || '''Probabilistic Aggregates''' <br /> Data structures for approximations. || [https://en.wikipedia.org/wiki/HyperLogLog HLL] for computing approximate unique client counts, [https://en.wikipedia.org/wiki/Bloom_filter bloom filter] for computing presence in a set. | ||
|- | |- | ||
| 5 || '''Anonymized individual-level data''' || | | 5 || '''Anonymized individual-level data''' <br /> Covers “partial aggregates” like clients_daily which is aggregated by day. Key feature is that it still has an individual-level identifier. Actual identifiers are anonymized using a one-to-one replacement value. In this example, we replaced the ID with A, B, C, etc. || | ||
* Anonymized_id, date, country, os, locale, channel | * Anonymized_id, date, country, os, locale, channel | ||
* A, 2019-08-08, Canada, Linux, PL, nightly | * A, 2019-08-08, Canada, Linux, PL, nightly | ||
| Line 46: | Line 46: | ||
* B, 2019-08-10, Peru, Windows, EN, release | * B, 2019-08-10, Peru, Windows, EN, release | ||
|- | |- | ||
| 6 || '''Not-anonymized individual-level data''' || | | 6 || '''Not-anonymized individual-level data''' <br /> This data contains individual-level identifiers as they exist in the raw data. Compared with anonymized data, instead of A, B, and C we use the original identifiers. || | ||
* actual_id, date, country, os, locale, channel | * actual_id, date, country, os, locale, channel | ||
* 859c8a32-0b73-b547-a5e7-8ef4ed9c4c2d, 2019-08-08, Canada, Linux, PL, nightly | * 859c8a32-0b73-b547-a5e7-8ef4ed9c4c2d, 2019-08-08, Canada, Linux, PL, nightly | ||
| Line 53: | Line 53: | ||
* 4db8d07d-1935-9c45-93c9-6d97a790bb12, 2019-08-10, Peru, Windows, EN, release | * 4db8d07d-1935-9c45-93c9-6d97a790bb12, 2019-08-10, Peru, Windows, EN, release | ||
|- | |- | ||
| 7 || '''High resolution individual-level data''' || Raw telemetry events data, a sequence of actions in order of occurrence. | | 7 || '''High resolution individual-level data''' <br /> The highest level of resolution is releasing events at the per-second or per-subsecond resolution. || Raw telemetry events data, a sequence of actions in order of occurrence. | ||
|- | |- | ||
|} | |} | ||