CloudServices/Sync/ReDash: Difference between revisions
(→Engine Record: added q2 and a2) |
(→Query Examples: added query example) |
||
| Line 79: | Line 79: | ||
WHERE failurereason IS NOT NULL | WHERE failurereason IS NOT NULL | ||
) | ) | ||
SELECT | SELECT | ||
name, | name, | ||
| Line 85: | Line 84: | ||
FROM errors | FROM errors | ||
GROUP BY name | GROUP BY name | ||
</pre> | |||
The next example shows how to handle unix time stamps and how to use the Re:Dash date picker. It's important to either cast the date or to use the type constructor like below or you won't be able to use any operators on the date which is required for the date picker. | |||
<pre> | |||
WITH syncs AS ( | |||
SELECT | |||
/* convert date from unix timestamp */ | |||
date_trunc('day', from_unixtime("when"/1000)) AS day, | |||
status.service AS service | |||
FROM | |||
sync_summary | |||
) | |||
SELECT day, status, volume | |||
FROM ( | |||
SELECT | |||
day, | |||
'success' as status, | |||
SUM( | |||
CASE | |||
WHEN service IS NULL THEN 1 | |||
ELSE 0 | |||
END | |||
) AS volume | |||
FROM syncs | |||
GROUP BY day | |||
UNION ALL | |||
SELECT | |||
day, | |||
'failed' as status, | |||
SUM( | |||
CASE | |||
WHEN service IS NOT NULL THEN 1 | |||
ELSE 0 | |||
END | |||
) AS volume | |||
FROM syncs | |||
GROUP BY day | |||
) | |||
/* date picker */ | |||
WHERE day >= timestamp '{{start_date}}' AND day <= timestamp '{{end_date}}' | |||
GROUP BY 1,2,3 | |||
ORDER BY 1,2,3 | |||
</pre> | </pre> | ||
Revision as of 16:33, 19 October 2016
Table Described
The Sync team's telemetry goes through our data pipeline and lands in Presto to be explored in Re:Dash. Below is a description of the data available in the sync_summary table.
| Field Names | Data Type | Description |
|---|---|---|
| app_build_id | varchar | Firefox build ID (e.g. 20160801085519) |
| app_version | varchar | Firefox version (e.g. 50.0a2) - corresponds to the MOZ_APP_VERSION configure value |
| app_display_version | varchar | The application version as shown in the about dialog. Almost always identical to app_version. Corresponds to the MOZ_APP_VERSION_DISPLAY configure value. |
| app_name | varchar | Will always be the string "Firefox" in desktop submitted pings. |
| uid | varchar | Hashed Sync/FxA ID |
| deviceid | varchar | Hashed FxA device ID. |
| when | bigint | Unix timestamp of when sync occurred. Make sure to put in "quotes" since when is a reserved SQL word. Note that because this is taken from the client's clock, the time may be wildly inaccurate. |
| took | bigint | Number of milli-seconds it took to Sync. |
| failurereason | row(name varchar, value varchar) | Sync failure reason, or null if no failure. |
| status | row(sync varchar, service varchar) | The status of sync after completion, or null is both statuses record success. |
| why | varchar | Currently always null, but eventually should be the reason the sync was performed (eg, timer, button press, score update, etc) |
| engines | array(engine_record) | A record of the engines that synced. Each element of the array is in the format of an engine record. |
| submission_date_s3 | varchar | The date this ping was submitted to the telemetry servers. Because a ping will typically be sent for a previous session immediately after a browser restart, this submission date may be later than the date recorded in the ping. Note also that this is a timestamp supplied by the server so is likely to be more reliable than the dates recorded in the ping itself. |
Engine Record
An engine record is defined as:
| Field Name | Data Type | Description |
|---|---|---|
| name | varchar | The name of the engine. |
| took | bigint | How many milliseconds this engine took to sync. |
| status | varchar | The status of this engine after sync completed, or null is the status reflects success. |
| failureReason | row(name varchar, value varchar) | Details of any errors recorded during the sync of this engine, or null on success. |
| incoming | row(applied bigint, failed bigint, newFailed bigint, reconciled bigint) | Details of how many incoming records were applied, failed, failed for the first time in this sync, and reconciled. |
| outgoing | array(row(sent bigint, failed bigint))) | For each batch of records uploaded as part of a Sync, how many records were sent in that batch and how many failed. |
FAQ
Q1: In engines, we have status and failureReason, how are they different from the columns with the same names?
A1: These are the exceptions and status for the engine itself, whereas the columns at the top-level of the table are for the entire sync. The error handling should be cleaned up, but in general, failureReason will be reporting bugs, whereas "status" is reporting when we explicitly decided that we couldn't apply a record.
---
Q2: Do we only log the engines array when we see "service":"error.sync.failed_partial" in status?
A2: I don't think that's true - eg, "select * from sync_summary where engines is not null and status is null limit 10" shows records.
Query Examples
The example below demonstrates how to select data in JSON object.
WITH errors AS ( SELECT failurereason.name AS name, failurereason.value AS value FROM sync_summary WHERE failurereason IS NOT NULL ) SELECT name, COUNT(value) FROM errors GROUP BY name
The next example shows how to handle unix time stamps and how to use the Re:Dash date picker. It's important to either cast the date or to use the type constructor like below or you won't be able to use any operators on the date which is required for the date picker.
WITH syncs AS (
SELECT
/* convert date from unix timestamp */
date_trunc('day', from_unixtime("when"/1000)) AS day,
status.service AS service
FROM
sync_summary
)
SELECT day, status, volume
FROM (
SELECT
day,
'success' as status,
SUM(
CASE
WHEN service IS NULL THEN 1
ELSE 0
END
) AS volume
FROM syncs
GROUP BY day
UNION ALL
SELECT
day,
'failed' as status,
SUM(
CASE
WHEN service IS NOT NULL THEN 1
ELSE 0
END
) AS volume
FROM syncs
GROUP BY day
)
/* date picker */
WHERE day >= timestamp '{{start_date}}' AND day <= timestamp '{{end_date}}'
GROUP BY 1,2,3
ORDER BY 1,2,3