Telemetry/LongitudinalExamples: Difference between revisions

→‎Sampling: removing reference to bernoulli sampling
(Fixing formatting)
(→‎Sampling: removing reference to bernoulli sampling)
Line 24: Line 24:
  SELECT * FROM longitudinal LIMIT 1000 ...
  SELECT * FROM longitudinal LIMIT 1000 ...


For a statistically sound sample, use TABLESAMPLE:
There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random.
SELECT * FROM longitudinal TABLESAMPLE BERNOULLI(xx)
 
Where xx is an integer representing what percentage of data you want to include in your sample (e.g. 10% sample -> xx=10).
 
A couple of caveats:
* This sampling method will only decrease your query run time if you're manipulating the data a lot. Bernoulli sampling still requires reading the whole DB before proceeding.
* This sample will not be deterministic. I.e. you will not get the same sample for every run. This can cause problems when using Presto Views or logical tables.
* Unlike LIMIT, this method does not guarantee a fixed number of results.


=== Arrays ===
=== Arrays ===
54

edits