Telemetry/LongitudinalExamples: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎Table structure: Explain that this is a 1% sample)
(Deprecate wiki based d10n)
 
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Note: There is good background in the [https://gist.github.com/vitillo/627eab7e2b3f814725d2 example notebook] for the longitudinal data set.
This document is now hosted here:
https://github.com/mozilla/telemetry-batch-view/blob/master/docs/longitudinal_examples.md


=== Table structure ===
Wiki pages linking to this dead page:
 
https://wiki.mozilla.org/api.php?action=query&list=backlinks&bltitle=Telemetry/LongitudinalExamples
Get an overview of the longitudinal data table:
describe longitudinal
 
That table has a row for each client, with columns for the different parts of the ping.
Some properties are directly available to query on:
SELECT count(*) AS count
FROM longitudinal
WHERE os = 'Linux'
 
Note that the longitudinal table is from a 1% sample of the clients. The sampling is based on the client id and stable over time.
 
=== Sampling ===
 
While composing queries, it can be helpful to work on small samples to reduce query runtimes:
 
SELECT * FROM longitudinal LIMIT 1000 ...
 
Or to look at a 1% sample of the clients:
 
SELECT * FROM longitudinal WHERE sample_id[1] = 5 ...
 
The sample_id partitions the clients into stable ~1% samples.
 
=== Arrays ===
Other properties are arrays, which contain one entry for each submission from that client (newest first):
SELECT reason[1] AS newest_reason
FROM longitudinal
WHERE os = 'Linux'
 
To expand arrays and maps and work on the data in them row-wise we can use <code>UNNEST(array)</code>:
WITH lengths AS
  (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
    FROM longitudinal
    CROSS JOIN UNNEST(session_length, reason) AS t(sl, r)
    WHERE r = 'shutdown' OR r = 'aborted-session')
SELECT os, days, count(*) AS count
FROM lengths
GROUP BY days, os ORDER BY days ASC
 
Links:
* [https://prestodb.io/docs/current/functions/array.html Documentation on array functions]
* [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation]
 
=== Maps ===
Some fields like <code>active_addons</code> or <code>user_prefs</code> are handled as maps, on which you can use the <code>[]</code> operator and special functions:
 
WITH adp AS
  (SELECT active_addons[1]['{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}']
            IS NOT null AS has_adblockplus
    FROM longitudinal)
SELECT has_adblockplus, count(*) AS count
FROM adp GROUP BY 1 ORDER BY 2 DESC
 
Links:
* [https://prestodb.io/docs/current/functions/map.html Documentation on map functions]
 
=== Examples ===
* Blocklist URLs (extensions.blocklist.url):
SELECT bl, COUNT(bl)
FROM
  (SELECT element_at(settings, 1).user_prefs['extensions.blocklist.url'] AS bl
    FROM longitudinal)
GROUP BY bl
 
* Blocklist enabled/disabled (extensions.blocklist.enabled) count:
SELECT bl, COUNT(bl)
FROM
  (SELECT element_at(settings, 1).blocklist_enabled AS bl
    FROM longitudinal)
GROUP BY bl

Latest revision as of 22:11, 7 November 2016