Telemetry/LongitudinalExamples: Difference between revisions

Telemetry/LongitudinalExamples (view source)

Revision as of 00:21, 1 September 2016

108 bytes added , 1 September 2016

Organize table structure section

Harter

54

edits

@@ Line 4: / Line 4: @@
 === Table structure ===
-Get an overview of the longitudinal data table:
+To get an overview of the longitudinal data table:
   DESCRIBE longitudinal
 That table has a row for each client, with columns for the different parts of the ping.
 There are a lot of fields here, so I recommend downloading the results as a CSV if you want to search through these fields.
-Unfortunately, there's no way to filter the outputof DESCRIBE in Presto.
+Unfortunately, there's no way to filter the output of DESCRIBE in Presto.
-Some properties are directly available to query on:
+Because this table combines all rows for a given client id, most columns contain either Arrays or Maps (described below). A few properties are directly available to query on:
   SELECT count(*) AS count
   FROM longitudinal
   WHERE os = 'Linux'
-Note that the longitudinal table is from a 1% sample of the clients. The sampling is based on the client id and stable over time.
+==== Arrays ====
+Most properties are arrays, which contain one entry for each submission from a given client (newest first). Note that indexing starts at 1:
-=== Sampling ===
-While composing queries, it can be helpful to work on small samples to reduce query runtimes:
- SELECT * FROM longitudinal LIMIT 1000 ...
-There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random.
-=== Arrays ===
-Some properties are arrays, which contain one entry for each submission from that client (newest first):
   SELECT reason[1] AS newest_reason
   FROM longitudinal
   WHERE os = 'Linux'
-To expand arrays and maps and work on the data in them row-wise we can use <code>UNNEST(array)</code>:
+To expand arrays and maps and work on the data row-wise we can use <code>UNNEST(array)</code>.
   WITH lengths AS
     (SELECT os, greatest(-1, least(31, sl / (24*60*60))) AS days
@@ Line 41: / Line 31: @@
   FROM lengths
   GROUP BY days, os ORDER BY days ASC
+However, it may be better to use a sample from the main_summary table instead.
 Links:
@@ Line 46: / Line 38: @@
 * [https://prestodb.io/docs/current/sql/select.html#unnest <code>UNNEST</code> documentation]
-=== Maps ===
+==== Maps ====
 Some fields like <code>active_addons</code> or <code>user_prefs</code> are handled as maps, on which you can use the <code>[]</code> operator and special functions:
@@ Line 58: / Line 50: @@
 Links:
 * [https://prestodb.io/docs/current/functions/map.html Documentation on map functions]
+=== Sampling ===
+While composing queries, it can be helpful to work on small samples to reduce query runtimes:
+ SELECT * FROM longitudinal LIMIT 1000 ...
+There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random.
 === Example Queries ===

Telemetry/LongitudinalExamples: Difference between revisions

Telemetry/LongitudinalExamples (view source)

Revision as of 00:21, 1 September 2016

Navigation menu

Search