Telemetry/LongitudinalExamples

< Telemetry
Revision as of 12:32, 20 July 2016 by Gfritzsche (talk | contribs) (→‎Table structure: Explain that this is a 1% sample)

Note: There is good background in the example notebook for the longitudinal data set.

Table structure

Get an overview of the longitudinal data table:

describe longitudinal

That table has a row for each client, with columns for the different parts of the ping. Some properties are directly available to query on:

SELECT count(*) AS count
FROM longitudinal
WHERE os = 'Linux'

Note that the longitudinal table is from a 1% sample of the clients. The sampling is based on the client id and stable over time.

Sampling

While composing queries, it can be helpful to work on small samples to reduce query runtimes:

SELECT * FROM longitudinal LIMIT 1000 ...

Or to look at a 1% sample of the clients:

SELECT * FROM longitudinal WHERE sample_id[1] = 5 ...

The sample_id partitions the clients into stable ~1% samples.

Arrays

Other properties are arrays, which contain one entry for each submission from that client (newest first):

SELECT reason[1] AS newest_reason
FROM longitudinal
WHERE os = 'Linux'

To expand arrays and maps and work on the data in them row-wise we can use UNNEST(array):

WITH lengths AS
  (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
   FROM longitudinal
   CROSS JOIN UNNEST(session_length, reason) AS t(sl, r)
   WHERE r = 'shutdown' OR r = 'aborted-session')
SELECT os, days, count(*) AS count
FROM lengths
GROUP BY days, os ORDER BY days ASC

Links:

Maps

Some fields like active_addons or user_prefs are handled as maps, on which you can use the [] operator and special functions:

WITH adp AS
  (SELECT active_addons[1]['{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}']
            IS NOT null AS has_adblockplus
   FROM longitudinal)
SELECT has_adblockplus, count(*) AS count
FROM adp GROUP BY 1 ORDER BY 2 DESC

Links:

Examples

  • Blocklist URLs (extensions.blocklist.url):
SELECT bl, COUNT(bl)
FROM
  (SELECT element_at(settings, 1).user_prefs['extensions.blocklist.url'] AS bl
   FROM longitudinal)
GROUP BY bl
  • Blocklist enabled/disabled (extensions.blocklist.enabled) count:
SELECT bl, COUNT(bl)
FROM
  (SELECT element_at(settings, 1).blocklist_enabled AS bl
   FROM longitudinal)
GROUP BY bl