29
edits
(frank changes) |
(Added client count dataset) |
||
| Line 27: | Line 27: | ||
==Client Count== | ==Client Count== | ||
The Client Count dataset is simply a count of clients in a time period, separated out into a set of dimensions. | |||
This is useful for questions similar to: ''"How many X type of users were there during Y?"'' - where X is some dimensions, and Y is some dates. Examples of X are: E10s Enabled, Operating System Type, or Country. For a complete list of dimensions, see [https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/ClientCountView.scala#L22 here]. | |||
Client Count does not contain a traditional int count column, instead the counts are stored as a HyperLogLogs in the hll column. The count of the hll is found using <code>cardinality(cast(hll AS HLL))</code>, and different hll's can be merged using <code>merge(cast(hll AS HLL))</code>. An example can be found in the [https://sql.telemetry.mozilla.org/queries/81/source#129 Firefox ER Reporting]. | |||
===Caveats=== | |||
Currently there is no Python wrapper for the HyperLogLog library, so the client count dataset is unavailable in Spark. | |||
==Crash Aggregates== | ==Crash Aggregates== | ||
edits