Telemetry/LongitudinalExamples: Difference between revisions

Jump to navigation Jump to search
Organize table structure section
(Adding notes on working with AWS)
(Organize table structure section)
Line 4: Line 4:
=== Table structure ===
=== Table structure ===


Get an overview of the longitudinal data table:
To get an overview of the longitudinal data table:
  DESCRIBE longitudinal
  DESCRIBE longitudinal


That table has a row for each client, with columns for the different parts of the ping.
That table has a row for each client, with columns for the different parts of the ping.
There are a lot of fields here, so I recommend downloading the results as a CSV if you want to search through these fields.
There are a lot of fields here, so I recommend downloading the results as a CSV if you want to search through these fields.
Unfortunately, there's no way to filter the outputof DESCRIBE in Presto.
Unfortunately, there's no way to filter the output of DESCRIBE in Presto.


Some properties are directly available to query on:
Because this table combines all rows for a given client id, most columns contain either Arrays or Maps (described below). A few properties are directly available to query on:
  SELECT count(*) AS count
  SELECT count(*) AS count
  FROM longitudinal
  FROM longitudinal
  WHERE os = 'Linux'
  WHERE os = 'Linux'


Note that the longitudinal table is from a 1% sample of the clients. The sampling is based on the client id and stable over time.
==== Arrays ====
 
Most properties are arrays, which contain one entry for each submission from a given client (newest first). Note that indexing starts at 1:
=== Sampling ===
 
While composing queries, it can be helpful to work on small samples to reduce query runtimes:
 
SELECT * FROM longitudinal LIMIT 1000 ...
 
There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random.
 
=== Arrays ===
Some properties are arrays, which contain one entry for each submission from that client (newest first):
  SELECT reason[1] AS newest_reason
  SELECT reason[1] AS newest_reason
  FROM longitudinal
  FROM longitudinal
  WHERE os = 'Linux'
  WHERE os = 'Linux'


To expand arrays and maps and work on the data in them row-wise we can use <code>UNNEST(array)</code>:
To expand arrays and maps and work on the data row-wise we can use <code>UNNEST(array)</code>.
  WITH lengths AS
  WITH lengths AS
   (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
   (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
Line 41: Line 31:
  FROM lengths
  FROM lengths
  GROUP BY days, os ORDER BY days ASC
  GROUP BY days, os ORDER BY days ASC
However, it may be better to use a sample from the main_summary table instead.


Links:
Links:
Line 46: Line 38:
* [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation]
* [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation]


=== Maps ===
==== Maps ====
Some fields like <code>active_addons</code> or <code>user_prefs</code> are handled as maps, on which you can use the <code>[]</code> operator and special functions:
Some fields like <code>active_addons</code> or <code>user_prefs</code> are handled as maps, on which you can use the <code>[]</code> operator and special functions:


Line 58: Line 50:
Links:
Links:
* [https://prestodb.io/docs/current/functions/map.html Documentation on map functions]
* [https://prestodb.io/docs/current/functions/map.html Documentation on map functions]
=== Sampling ===
While composing queries, it can be helpful to work on small samples to reduce query runtimes:
SELECT * FROM longitudinal LIMIT 1000 ...
There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random.


=== Example Queries ===
=== Example Queries ===
54

edits

Navigation menu