|
|
| Line 24: |
Line 24: |
| SELECT * FROM longitudinal LIMIT 1000 ... | | SELECT * FROM longitudinal LIMIT 1000 ... |
|
| |
|
| For a statistically sound sample, use TABLESAMPLE:
| | There's no need to use other sampling methods, such as TABLESAMPLE, on the longitudinal set. Rows are randomly ordered, so a LIMIT sample is expected to be random. |
|
| |
| SELECT * FROM longitudinal TABLESAMPLE BERNOULLI(xx)
| |
| | |
| Where xx is an integer representing what percentage of data you want to include in your sample (e.g. 10% sample -> xx=10).
| |
| | |
| A couple of caveats:
| |
| * This sampling method will only decrease your query run time if you're manipulating the data a lot. Bernoulli sampling still requires reading the whole DB before proceeding.
| |
| * This sample will not be deterministic. I.e. you will not get the same sample for every run. This can cause problems when using Presto Views or logical tables.
| |
| * Unlike LIMIT, this method does not guarantee a fixed number of results.
| |
|
| |
|
| === Arrays === | | === Arrays === |