Telemetry/Available Telemetry Datasets and their Applications: Difference between revisions

Added client count dataset
(frank changes)
(Added client count dataset)
Line 27: Line 27:


==Client Count==
==Client Count==
The Client Count dataset is simply a count of clients in a time period, separated out into a set of dimensions.
This is useful for questions similar to: ''"How many X type of users were there during Y?"'' - where X is some dimensions, and Y is some dates. Examples of X are: E10s Enabled, Operating System Type, or Country. For a complete list of dimensions, see [https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/ClientCountView.scala#L22 here].
Client Count does not contain a traditional int count column, instead the counts are stored as a HyperLogLogs in the hll column. The count of the hll is found using  <code>cardinality(cast(hll AS HLL))</code>, and different hll's can be merged using  <code>merge(cast(hll AS HLL))</code>. An example can be found in the [https://sql.telemetry.mozilla.org/queries/81/source#129 Firefox ER Reporting].
===Caveats===
Currently there is no Python wrapper for the HyperLogLog library, so the client count dataset is unavailable in Spark.


==Crash Aggregates==
==Crash Aggregates==
29

edits