Telemetry/LongitudinalExamples: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (→‎Arrays: use subscript operator)
(Added UNNEST example)
Line 15: Line 15:
  FROM longitudinal_v20160229
  FROM longitudinal_v20160229
  WHERE os = 'Linux'
  WHERE os = 'Linux'
To expand arrays and maps and work on the data in them row-wise we can use <code>UNNEST(array)</code>:
WITH lengths AS
  (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
    FROM longitudinal_v20160229
    CROSS JOIN UNNEST(session_length, reason) AS t(sl, r)
    WHERE r = 'shutdown' OR r = 'aborted-session')
SELECT os, days, count(*) AS count
FROM lengths
GROUP BY days, os ORDER BY days ASC


Links:
Links:
* [https://prestodb.io/docs/current/functions/array.html Documentation on array functions]
* [https://prestodb.io/docs/current/functions/array.html Documentation on array functions]
* [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation]


=== Maps ===
=== Maps ===

Revision as of 14:54, 8 March 2016

Note: There is good background in the example notebook for the longitudinal data set.

Get an overview of the longitudinal data table:

describe longitudinal_v20160229

That table has a row for each client, with columns for the different parts of the ping. Some properties are directly available to query on:

SELECT count(*) AS count
FROM longitudinal_v20160229
WHERE os = 'Linux'

Arrays

Other properties are arrays, which contain one entry for each submission from that client (newest first):

SELECT reason[1] AS newest_reason
FROM longitudinal_v20160229
WHERE os = 'Linux'

To expand arrays and maps and work on the data in them row-wise we can use UNNEST(array):

WITH lengths AS
  (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
   FROM longitudinal_v20160229
   CROSS JOIN UNNEST(session_length, reason) AS t(sl, r)
   WHERE r = 'shutdown' OR r = 'aborted-session')
SELECT os, days, count(*) AS count
FROM lengths
GROUP BY days, os ORDER BY days ASC

Links:

Maps

Some fields like active_addons or user_prefs are handled as maps, on which you can use the [] operator and special functions:

WITH adp AS
  (SELECT active_addons[1]['{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}']
            IS NOT null AS has_adblockplus
   FROM longitudinal_v20160229)
SELECT has_adblockplus, count(*) AS count
FROM adp GROUP BY 1 ORDER BY 2 DESC

Links:

Examples

  • Blocklist URLs (extensions.blocklist.url):
SELECT bl, COUNT(bl)
FROM
  (SELECT element_at(settings, 1).user_prefs['extensions.blocklist.url'] AS bl
   FROM longitudinal_v20160229)
GROUP BY bl
  • Blocklist enabled/disabled (extensions.blocklist.enabled) count:
SELECT bl, COUNT(bl)
FROM
  (SELECT element_at(settings, 1).blocklist_enabled AS bl
   FROM longitudinal_v20160229)
GROUP BY bl