Connected Devices/Projects/Metrics/Database and Visualization

From MozillaWiki
Jump to: navigation, search

Summary

We are using a combination of Google Analytics, Google BigQuery database, and Re:Dash to enable teams to view & analyze project data. This page describes what data we have and how to access it.

BigQuery

BigQuery is Google's cloud-based big data solution. We are using it as a store for all of the data sent to Google Analytics so that teams can view the full detail of all events sent by their application, rather than the aggregated views that are presented in the Google Analytics dashboards.

Data is imported in batches from Analytics several times per day. Note that there may be a delay of up to 8 hours before event data is exported and made available for you to report on, though it usually seems to arrive within 2 hours.

BigQuery supports a subset of SQL, covering most of the common clauses and operators. For a full reference, see https://cloud.google.com/bigquery/query-reference.

Table Schema

ga_sessions Tables

The Analytics data is exported into a series of tables, which are named as follows:

  • ga_sessions_YYYYMMDD - This table contains all event data recorded on the date in the table name. A new table is created for each day's data.
  • ga_sessions_intraday_YYYYMMDD - This table contains all event data that has been exported for the current day, and batches of data will be appended to it several times during the day.

If you are familiar with Google Analytics data and/or working with multi-dimensional data, you are welcome to query these tables directly. You can find the schema reference here: https://support.google.com/analytics/answer/3437719

Simplified Views

With the Connected Devices metrics solution, we are using a relatively small subset of the data available in Google Analytics. To simplify the picture, we have created aggregated views from the larger daily tables to allow easier access to the information. These are flattened representations of the data and can be queried with simple SQL.

There are currently two primary views available:

  • AllEvents_7Day - This view contains all events recorded over the past 7 days, including today's intraday events.
  • AllEvents_30Day - This view contains all events recorded over the past 30 days, including today's intraday events.

The schema for these tables is identical:

Column Data Type Description
HitTimestamp String The UTC datetime in which the event was received by Google Analytics. This is a timestamp with the format "YYYY-MM-DD 24:MM:SS", e.g. "2016-06-11 21:00:54".
HitTimestampUsec Integer The UTC time (in microseconds) in which the event was received by Google Analytics, e.g. 1465678854000000.
EventTimestamp String The UTC datetime in which the event was recorded by the Metrics library, if you are using that method to record your events. Because the library may batch up your events before sending them to GA, this can be a more accurate time of the event occurrence. If you are posting your events directly via URL, you can disregard this column and use HitTimestamp instead.
FullVisitorId String A unique ID generated for the user, based on (but not equal to) the 'clientId' parameter in the API. Repeated events from the same clientId will result in the same FullVisitorId value.
VisitId Integer A unique identifier for this visit/session. Sessions seem to time out when no new events have been seen for ~6 minutes.
VisitNumber Integer Incremented with each distinct visit/session for this clientID.
HitNumber Integer Incremented for each event recorded within this visit/session.
AppName String The 'app_name' parameter.
EventCategory String The 'event_category' parameter.
EventAction String The 'event_action' parameter.
EventLabel String The 'event_label' parameter.
EventValue Integer The 'event_value' parameter.
EventValueFP String The event value when recording a floating point value rather than an integer. This is stored as a string due to limitations in the Analytics data export, but it is generally interpreted correctly by charting tools and can be cast with FLOAT(eventValueFP) if necessary.
OS String The 'os' parameter.
OSVersion String The 'os_version' parameter.
Device String The 'device' parameter.
Arch String The 'arch' parameter.
AppPlatform String The 'app_platform' parameter.
AppBuildID String The 'app_build_id' parameter.

(For reference, the queries used to create these views can be found at: https://github.com/dylano/metrics/tree/master/sql)

If these views are insufficient for your project or you'd like to discuss other ways to view or use the data, feel free to contact the CD Metrics team. (#cd-metrics, :doliver)

Query & Visualization Tools

Google Analytics Dashboards

The GA Dashboard allows you to see real-time event traffic, which is a handy tool to verify that your data is hitting the server. However, much of the dashboard functionality is tailored to the more common use case of website tracking and is less useful for the type of events we are measuring in CD.

If you would like access to GA, please file a bug in the following component and request access to the Connected Devices project: https://bugzilla.mozilla.org/enter_bug.cgi?product=Websites&component=Web%20Analytics

Re:dash

https://sql.telemetry.mozilla.org

Re:dash is a nice open source tool which we have connected to the CD Metrics database. It is suitable for both general queries of the data and for creating charts & other visualizations of your data.

The documentation (http://docs.redash.io/en/latest/) is not great but the learning curve is not too bad and it's being used by several teams at Mozilla. We are happy to work with you on a couple of sample reports to help you learn the tool when you are ready.

Others

If you have another package or tool you are interested in using, we may be able to help you connect it to the CD database. Verify that it supports Google BigQuery and contact :doliver to see if we can get it wired up.