Telemetry/LongitudinalExamples: Difference between revisions
Jump to navigation
Jump to search
Gfritzsche (talk | contribs) m (→Arrays: use subscript operator) |
Gfritzsche (talk | contribs) (Added UNNEST example) |
||
| Line 15: | Line 15: | ||
FROM longitudinal_v20160229 | FROM longitudinal_v20160229 | ||
WHERE os = 'Linux' | WHERE os = 'Linux' | ||
To expand arrays and maps and work on the data in them row-wise we can use <code>UNNEST(array)</code>: | |||
WITH lengths AS | |||
(SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days | |||
FROM longitudinal_v20160229 | |||
CROSS JOIN UNNEST(session_length, reason) AS t(sl, r) | |||
WHERE r = 'shutdown' OR r = 'aborted-session') | |||
SELECT os, days, count(*) AS count | |||
FROM lengths | |||
GROUP BY days, os ORDER BY days ASC | |||
Links: | Links: | ||
* [https://prestodb.io/docs/current/functions/array.html Documentation on array functions] | * [https://prestodb.io/docs/current/functions/array.html Documentation on array functions] | ||
* [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation] | |||
=== Maps === | === Maps === | ||
Revision as of 14:54, 8 March 2016
Note: There is good background in the example notebook for the longitudinal data set.
Get an overview of the longitudinal data table:
describe longitudinal_v20160229
That table has a row for each client, with columns for the different parts of the ping. Some properties are directly available to query on:
SELECT count(*) AS count FROM longitudinal_v20160229 WHERE os = 'Linux'
Arrays
Other properties are arrays, which contain one entry for each submission from that client (newest first):
SELECT reason[1] AS newest_reason FROM longitudinal_v20160229 WHERE os = 'Linux'
To expand arrays and maps and work on the data in them row-wise we can use UNNEST(array):
WITH lengths AS (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days FROM longitudinal_v20160229 CROSS JOIN UNNEST(session_length, reason) AS t(sl, r) WHERE r = 'shutdown' OR r = 'aborted-session') SELECT os, days, count(*) AS count FROM lengths GROUP BY days, os ORDER BY days ASC
Links:
Maps
Some fields like active_addons or user_prefs are handled as maps, on which you can use the [] operator and special functions:
WITH adp AS
(SELECT active_addons[1]['{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}']
IS NOT null AS has_adblockplus
FROM longitudinal_v20160229)
SELECT has_adblockplus, count(*) AS count
FROM adp GROUP BY 1 ORDER BY 2 DESC
Links:
Examples
- Blocklist URLs (extensions.blocklist.url):
SELECT bl, COUNT(bl) FROM (SELECT element_at(settings, 1).user_prefs['extensions.blocklist.url'] AS bl FROM longitudinal_v20160229) GROUP BY bl
- Blocklist enabled/disabled (extensions.blocklist.enabled) count:
SELECT bl, COUNT(bl) FROM (SELECT element_at(settings, 1).blocklist_enabled AS bl FROM longitudinal_v20160229) GROUP BY bl