54
edits
(Adding notes on working with AWS) |
(Organize table structure section) |
||
| Line 4: | Line 4: | ||
=== Table structure === | === Table structure === | ||
To get an overview of the longitudinal data table: | |||
DESCRIBE longitudinal | DESCRIBE longitudinal | ||
That table has a row for each client, with columns for the different parts of the ping. | That table has a row for each client, with columns for the different parts of the ping. | ||
There are a lot of fields here, so I recommend downloading the results as a CSV if you want to search through these fields. | There are a lot of fields here, so I recommend downloading the results as a CSV if you want to search through these fields. | ||
Unfortunately, there's no way to filter the | Unfortunately, there's no way to filter the output of DESCRIBE in Presto. | ||
Because this table combines all rows for a given client id, most columns contain either Arrays or Maps (described below). A few properties are directly available to query on: | |||
SELECT count(*) AS count | SELECT count(*) AS count | ||
FROM longitudinal | FROM longitudinal | ||
WHERE os = 'Linux' | WHERE os = 'Linux' | ||
==== Arrays ==== | |||
Most properties are arrays, which contain one entry for each submission from a given client (newest first). Note that indexing starts at 1: | |||
=== | |||
= | |||
SELECT reason[1] AS newest_reason | SELECT reason[1] AS newest_reason | ||
FROM longitudinal | FROM longitudinal | ||
WHERE os = 'Linux' | WHERE os = 'Linux' | ||
To expand arrays and maps and work on the data | To expand arrays and maps and work on the data row-wise we can use <code>UNNEST(array)</code>. | ||
WITH lengths AS | WITH lengths AS | ||
(SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days | (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days | ||
| Line 41: | Line 31: | ||
FROM lengths | FROM lengths | ||
GROUP BY days, os ORDER BY days ASC | GROUP BY days, os ORDER BY days ASC | ||
However, it may be better to use a sample from the main_summary table instead. | |||
Links: | Links: | ||
| Line 46: | Line 38: | ||
* [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation] | * [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation] | ||
=== Maps === | ==== Maps ==== | ||
Some fields like <code>active_addons</code> or <code>user_prefs</code> are handled as maps, on which you can use the <code>[]</code> operator and special functions: | Some fields like <code>active_addons</code> or <code>user_prefs</code> are handled as maps, on which you can use the <code>[]</code> operator and special functions: | ||
| Line 58: | Line 50: | ||
Links: | Links: | ||
* [https://prestodb.io/docs/current/functions/map.html Documentation on map functions] | * [https://prestodb.io/docs/current/functions/map.html Documentation on map functions] | ||
=== Sampling === | |||
While composing queries, it can be helpful to work on small samples to reduce query runtimes: | |||
SELECT * FROM longitudinal LIMIT 1000 ... | |||
There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random. | |||
=== Example Queries === | === Example Queries === | ||
edits