CloudServices/Sync/ReDash

Table Described

The Sync team's telemetry goes through our data pipeline and lands in Presto to be explored in Re:Dash. Below is a description of the data available in the sync_summary table. Further documentation on this data is available in the telemetry-batch-view repo, and in the general sync ping documentation.

Field Names	Data Type	Description
app_build_id	varchar	Firefox build ID (e.g. 20160801085519)
app_version	varchar	Firefox version (e.g. 50.0a2) - corresponds to the MOZ_APP_VERSION configure value
app_display_version	varchar	The application version as shown in the about dialog. Almost always identical to app_version. Corresponds to the MOZ_APP_VERSION_DISPLAY configure value.
app_name	varchar	Will always be the string "Firefox" in desktop submitted pings.
app_channel	varchar	The update channel (e.g. "release", "nightly", etc) - corresponds to the MOZ_UPDATE_CHANNEL config variable.
uid	varchar	Hashed Sync/FxA ID
deviceid	varchar	Hashed FxA device ID.
when	bigint	Unix timestamp of when sync occurred. Make sure to put in "quotes" since when is a reserved SQL word. Note that because this is taken from the client's clock, the time may be wildly inaccurate.
took	bigint	Number of milli-seconds it took to Sync.
failure_reason	row(name varchar, value varchar)	Sync failure reason, or null if no failure.
status	row(sync varchar, service varchar)	The status of sync after completion, or null is both statuses record success.
why	varchar	Currently always null, but eventually should be the reason the sync was performed (eg, timer, button press, score update, etc)
devices	array(row(id varchar, os varchar, version varchar))	Array of the other devices in this user's device constellation.
engines	array(engine_record)	A record of the engines that synced. Each element of the array is in the format of an engine record.
submission_date_s3	varchar	The date this ping was submitted to the telemetry servers. Because a ping will typically be sent for a previous session immediately after a browser restart, this submission date may be later than the date recorded in the ping. Note also that this is a timestamp supplied by the server so is likely to be more reliable than the dates recorded in the ping itself.

Engine Record

An engine record is defined as:

Field Name	Data Type	Description
name	varchar	The name of the engine.
took	bigint	How many milliseconds this engine took to sync.
status	varchar	The status of this engine after sync completed, or null is the status reflects success.
failure_reason	row(name varchar, value varchar)	Details of any errors recorded during the sync of this engine, or null on success.
incoming	row(applied bigint, failed bigint, newFailed bigint, reconciled bigint)	Details of how many incoming records were applied, failed, failed for the first time in this sync, and reconciled.
outgoing	array(row(sent bigint, failed bigint)))	For each batch of records uploaded as part of a Sync, how many records were sent in that batch and how many failed.
validation	validation_record	Validation information for this engine. Null if validation cannot or did not run during this sync (common). If present, it's format is of a validation record

Validation Record

A validation record is defined as:

Field Name	Data Type	Description
version	bigint	Version of the validator used to get this data.
checked	bigint	Number of records the validator checked in this engine.
took	bigint	How long validation took for this engine.
problems	array(row(name varchar, count varchar))	The problems identified. Problems with a count of 0 are excluded. Null on failure or if no problems occurred.
failure_reason	row(name varchar, value varchar)	Details of any errors recorded during the validation, or null if validation succeeded.

FAQ

Q1: In engines, we have status and failureReason, how are they different from the columns with the same names?

A1: These are the exceptions and status for the engine itself, whereas the columns at the top-level of the table are for the entire sync. The error handling should be cleaned up, but in general, failureReason will be reporting bugs, whereas "status" is reporting when we explicitly decided that we couldn't apply a record.

---

Q2: Do we only log the engines array when we see "service":"error.sync.failed_partial" in status?

A2: I don't think that's true - eg, "select * from sync_summary where engines is not null and status is null limit 10" shows records.

---

Q3: What values are valid in the validation record's name field?

A3: It's up to the engine and to the client. For desktop bookmarks, they're documented here (or here if that link breaks).

Event Telemetry

Sync will often write "events" to telemetry. These are made available via the sync_events_v1 table.

Table Description

Field Names	Data Type	Description
document_id	varchar	The document ID of the original ping this event is from.
app_build_id	varchar	From the original ping
app_display_version	varchar	From the original ping
app_name	varchar	From the original ping
app_version	varchar	From the original ping
app_channel	varchar	From the original ping
uid	varchar	From the original ping
why	varchar	From the original ping
deviceId	varchar	From the original ping
event_timestamp	bigint	Number of milliseconds since the process started when the event was created. (??)
event_category	varchar	Always the string "sync"
event_method	varchar	The type of action taken on the object
event_object	varchar	The type of object being acted on
event_string_value	varchar	Additional data
event_map_values	map(varchar, varchar)	Additional data specific to this event.
event_flow_id	varchar	An ID used to identify a pair of commands which span multiple devices (todo: explain this!)
event_device_id	varchar	If this event is being targeted at a different sync device, this is the ID of the target.
event_device_version	varchar	If this event is being targeted at a different sync device, this is the Firefox version it is running.
event_device_os	varchar	If this event is being targeted at a different sync device, this is the OS it is running
submission_date_s3	varchar	The date this ping was submitted.

Event Usage

Commands sent between clients

Sometimes Sync will send a "command" from one device to another. The most obvious example of this is "send tab". Event telemetry is recorded both by the sending device and the receiving device.

In all cases, the sending device will create a unique "flowID" which will appear in event_flow_id, and it will send this value as part of the command. The device receiving the command will record this flow ID in the event it creates - so both the sending and receiving device will record the same flowID.

In all cases, the sending device will record in the event the ID of the device that is to receive the command. However, the receiving device does not record the device that sent the command.

The sending device will record: event_object="sendcommand", event_method will be the specific command being sent (eg, "displayURI" for sendTab).

The receiving device will record: event_object="processcommand", event_method will be the command being sent.

Commands sent by the repair process

Bookmark repair is fairly chatty when it comes to events - both the device initiating the repair and any other devices participating in the repair process record events, and they all use the same flowID - so it should be possible to reconstruct the repair process by looking for all events with the same flow ID.

The general repair process is:

One device finds problems and starts a repair. It emits an event event_object="repair", event_method="started".
It then attempts to find other devices able to help with the repair - if it finds any, it will emit event_object="repair", event_method="request", event_string_value="upload", and includes the device ID of the device it wants to respond - and waits.
Hopefully the other device will see the repair request. If it finds any objects it can upload, it will emit an event event_object="repairResponse", event_method="uploading".
Once the upload is complete, it will write an event event_object="repairResponse", event_method="finished".
If the repair responder fails for some reason, it will emit event_object="repairResponse", event_method="failed" or "aborted"
If initial device that sent the repair request doesn't see a response from the other device it will emit event_object="repair", event_method="silent". If the other device disappears completely, it will write event_object="repair", event_method="missing".
The initial device will then check to see if it is still missing items, and if so, attempt to find a new client. If it can, it jumps back to (2) above.
The initial device will eventually run out of clients, or will find all of its IDs. At that time it will emit event_object="repair", event_method="finished" - or if it gives up due to some error it will emit event_object="repair", event_method="failed"

whew! So in summary:

every repair should emit event_object="repair", event_method="started" as it starts, and either event_object="repair", event_method="finished/failed" as the last event.
There may be any number of repairResponse events written (including zero events if there are no suitable devices). Each of these should result in a event_object="repairResponse", event_method="finished/aborted"

Bookmark Validation Data Roll-ups

telemetry.sync_bmk_total_per_day

This table is a roll-up of all bookmark validations (success and failures).

This table is useful to calculate rates in conjunction with the telemetry.sync_bmk_validation_problems table.

Column name	Data format	Description
sync_day	integer (e.g. 20170518)	The day we ran validation on the user
total_bookmark_validations	integer	Number of times we ran bookmark validation per day. This includes successful validations.
total_validated_users	integer	Number of users we ran bookmark validation on for each day.
total_bookmarks_checked	integer	Number of bookmarks we checked for validation problems.
run_start_date	integer	Date we imported the data to Presto.

telemetry.sync_bmk_validation_problems

This table was rolled-up using the following workbook: https://gist.github.com/kitcambridge/364f56182f3e96fb3131bf38ff648609

This table has a row for every validation problem detected during validation.

Example: If a user has 4 different validation problems, we will log each problem in their own row.

Logic used for format: This choice was made to avoid having endless amounts of columns representing each validation problem checked (which would mostly all be null) and to avoid having to add columns as we detect new types of validation problems.

Field name	Data type	Description
app_build_id	varchar	Firefox build ID (e.g. 20160801085519)
app_version	varchar	Firefox version (e.g. 50.0a2) - corresponds to the MOZ_APP_VERSION configure value
app_display_version	varchar	The application version as shown in the about dialog. Almost always identical to app_version. Corresponds to the MOZ_APP_VERSION_DISPLAY configure value.
app_name	varchar	Will always be the string "Firefox" in desktop submitted pings.
app_channel	varchar	The update channel (e.g. "release", "nightly", etc) - corresponds to the MOZ_UPDATE_CHANNEL config variable.
uid	varchar	Hashed Sync/FxA ID
deviceid	varchar	Hashed FxA device ID.
submission_day	integer	Date data was added to Presto (e.g. 20170324)
sync_day	integer	Date of sync. (e.g. 20170320)
when	bigint	Unix timestamp of when sync occurred. Make sure to put in "quotes" since when is a reserved SQL word. Note that because this is taken from the client's clock, the time may be wildly inaccurate.
status	row(sync varchar, service varchar)	The status of sync after completion, or null is both statuses record success.
engine_name	varchar	Should always be bookmarks in this table.
engine_status	varchar	The status of this engine after sync completed, or null if the status reflects success.
engine_failure_reason	row(name varchar, value varchar)	Details of any errors recorded during the sync of this engine, or null on success.
engine_has_problems	true/false	engine_has_problems is always going to be true in sync_bmk_validation_problems. Used to filter results out of all_engine_validation_results
engine_validation_version	integer	Version of the validator used to get this data.
engine_validation_checked	bigint	Number of records the validator checked in this engine.
engine_validation_took	bigint	How long validation took for this engine.
engine_validation_problem_name	varchar	Name of error recorded during the validation. Will never be null since this table only includes validation problems.
engine_validation_problem_count	integer	Number of bookmarks afflicted by the problem during the validation check.
run_start_date	integer	Date added to presto. (e.g. 20170331)

Query Examples

The example below demonstrates how to select data in JSON object.

WITH errors AS (
  SELECT 
  	failurereason.name AS name, 
  	failurereason.value AS value 
  FROM sync_summary 
  WHERE failurereason IS NOT NULL
)
SELECT 
	name, 
	COUNT(value) 
FROM errors 
GROUP BY name

The next example shows how to handle unix time stamps and how to use the Re:Dash date picker. It's important to either cast the date or to use the type constructor like below or you won't be able to use any operators on the date which is required for the date picker.

WITH syncs AS (
	SELECT
		/* convert date from unix timestamp */
		date_trunc('day', from_unixtime("when"/1000)) AS day,
  		status.service AS service
	FROM
		sync_summary
)
SELECT day, status, volume
FROM (
  SELECT
    day,
    'success' as status,
    SUM(
      CASE 
      WHEN service IS NULL THEN 1 
      ELSE 0 
      END
    ) AS volume
  FROM syncs
  GROUP BY day

  UNION ALL

  SELECT
    day,
    'failed' as status,
    SUM(
      CASE 
      WHEN service IS NOT NULL THEN 1 
      ELSE 0 
      END
    ) AS volume
  FROM syncs
  GROUP BY day
)
/* date picker */
WHERE day >= timestamp '{{start_date}}' AND day <= timestamp '{{end_date}}'

GROUP BY 1,2,3
ORDER BY 1,2,3

This example is how you would unpack the engines array into it's own table to then query:

WITH engine_errors AS (
  SELECT 
  	uid, 
  	date_trunc('day', from_unixtime("when"/1000)) AS date,
  	engine
  FROM sync_summary
/* The CROSS JOIN UNNEST will join the array to each row */
  CROSS JOIN UNNEST(engines) AS t (engine)
  --LIMIT 1000000
)

SELECT 
	engine.name AS engine_name,
	SUM(
      CASE
      WHEN engine.failureReason IS NOT NULL THEN 1
      ELSE 0
      END
    ) AS errors
FROM engine_errors
WHERE date >= timestamp '{{start_date}}' AND date <= timestamp '{{end_date}}'
GROUP BY engine.name

CloudServices/Sync/ReDash

Contents

Table Described

Engine Record

Validation Record

Event Telemetry

Table Description

Event Usage

Commands sent between clients

Commands sent by the repair process

Bookmark Validation Data Roll-ups

telemetry.sync_bmk_total_per_day

telemetry.sync_bmk_validation_problems

Query Examples

Navigation menu

CloudServices/Sync/ReDash

Table Described

Engine Record

Validation Record

Event Telemetry

Table Description

Event Usage

Commands sent between clients

Commands sent by the repair process

Bookmark Validation Data Roll-ups

telemetry.sync_bmk_total_per_day

telemetry.sync_bmk_validation_problems

Query Examples

Navigation menu

Search