Event Telemetry: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(new page)
 
(Update in-tree docs link)
 
(30 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The [[Telemetry]] wiki page has more information about using Telemetry -- this page describes the 2015 project.
The [[Telemetry]] wiki page has more information about using Telemetry -- this page describes the Event Telemetry project.


= Overview =
= Overview =
In 2015, we migrated [[Firefox Health Report]] data collection to the [[Telemetry]] system. At the same time, we made changes to Telemetry so that pings would be sent more frequently. We also updated the [[CloudServices/DataPipeline|Data Pipeline]] that ingests and processes the data.
There is a common need across teams (fx-team, mobile, test-pilot, heartbeat, …) to have a mechanism for recording, storing, sending & analysing application usage in an event-oriented format.
The Data Platform team wants to support this with a common API and mechanisms for dealing with the collected data, without owning the individual measurements.
The solution here is to provide common client code, a standard data format, so we can come up with common processes and tooling for data pipeline & analysis work.
Historically we already send a form of UITelemetry data, but the current format is too complicated to work with and to maintain.


=== Dates ===
=== Goals for Event Telemetry ===
 
* Standardized event format for all pings collecting events (Main Ping, Test Pilot, Shield, Sync, etc.)
* '''Fx41''' (2015-09-22): Started sending opt-out telemetry (base set) for 5% of the release population
* Expand collection of UI Telemetry to support Product teams
* '''Fx42''' (2015-11-03): Started sending opt-out telemetry (base set) for 100% of the release population
* Enable top line metrics for 2017
* '''Fx43''' (2015-12-15): Stopped sending FHR v2 data
* Standardized support for analysis of events
 
* API support in client for collecting event telemetry  
=== Goals for Unified Telemetry ===
* On the client, unify the telemetry and FHR measurement systems so that measurements do not have to be implemented more than once in different systems.
* Reduce the latency from the time a measurement occurs until it can be analyzed on the server.
* Increase the accuracy of measurements so that they can be better correlated with factors in the user environment such as the specific build, enabled addons, and other hardware or software factors.
* Use a common data pipeline for client telemetry and service log data.


=== Documentation ===
=== Documentation ===
* [https://gecko.readthedocs.org/en/latest/toolkit/components/telemetry/telemetry/index.html Client pings (tree documentation)]
* [https://docs.google.com/document/d/1hNuS9lUJMvMqgntZXbFA6xZBU9zBpQgo7x73-sXKRpI/ Event Telemetry draft]
* [https://docs.google.com/spreadsheets/d/1bqamxVskDF7kQ6xL7S2BqY8TpngL-w41v6keiX_qByg/edit?usp=sharing V2 - V4 mappings]
* [https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/collection/events.html In-tree docs]
* [https://docs.google.com/document/d/1cFCymhLQE7qI-p_czzz9-KexCMMhnf9ezLTMkGAKj58/edit#heading=h.w4fgaxpswo Table of projects that will be collecting Event telemetry]


=== Analysis and Reporting ===
=== Analysis and Reporting ===
* Telemetry Dashboard (now using v4 unified telemetry data!): https://telemetry.mozilla.org/
* Raw data using a spark cluster (ATMO): https://analysis.telemetry.mozilla.org/
* Launch a spark cluster: https://telemetry-dash.mozilla.org/
* re:dash event data tables (STMO): https://sql.telemetry.mozilla.org/
* Stream processing, heka reporting: [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Exploring+with+the+Mozilla+Data+Pipeline+Demo Exploring with the Mozilla Data Pipeline Demo]


= Project =
= Project =
=== Deliverables ===
=== Deliverables ===
* Monitoring and alerting about pipeline health
* '''2016 Q3'''
* Basic tool support
** Project kickoff
** Telemetry Dashboard works against new pipeline data
** Common Event Format design
** Telemetry-dash (or new equivalent) can launch spark, heka reporting jobs
** Event registration mechanism design
* Derived data sets
** Dataset strategy ([https://docs.google.com/document/d/1FI-jvzE4nVdas3e0o3QauNXUf5aQg9BGuXyVz0cDC1I/edit Event Telemetry Data sets discussion])
** Executive dashboard rollup
* '''2016 Q4'''
** 1% sample of clientIds for longitudinal analysis
** Telemetry events on main pings implemented in Firefox
* v2-v4 Data Continuity
** Activity Stream submitting events (using common format, in main ping)
** Executive dashboard continues to work
** Event data accessible in Spark
** Search analysis continues to work
** Generic Event data table available in re:dash
** Test Pilot submitting events (using common format)
** Custom sync ping is submitting events (using common format)
* '''2017 Q1'''
** More custom pings submitting events (using common format)
** Non-specialized, analysis-oriented datasets available
** Mobile plan (Q1/TBD)
** Some project (TBD) is on the main ping


=== Client work ===
=== Client work ===
* Backlog as [https://docs.google.com/a/mozilla.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing spreadsheet], with estimates
* [https://bugzilla.mozilla.org/showdependencytree.cgi?id=1286606&hide_resolved=1 client bug tree]
* Bug tree, phase 4: https://bugzilla.mozilla.org/show_bug.cgi?id=1122482
* [http://georgf.github.io/measurements-dash/ current sprint]
* Bug tree, phase 3: https://bugzilla.mozilla.org/show_bug.cgi?id=1120356 (Done)
* Bug tree, phase 2: https://bugzilla.mozilla.org/show_bug.cgi?id=1069869 (Done)
* Bug tree, phase 1: https://bugzilla.mozilla.org/show_bug.cgi?id=1040800 (Done)
 
=== Pipeline work ===
* Bugzilla: http://mzl.la/1KWiNST


=== Client Testing ===
=== Client Testing ===
* [https://docs.google.com/document/d/10sZICCbsfcSTF3RPyeVDskSI9-I2E4iApmShmIWSLfg/edit#heading=h.a6hfij6xookn Test cases document]
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1302670 Bugzilla tracking]
* [https://docs.google.com/a/mozilla.com/spreadsheets/d/1YxqvjRJuuIPRegNXAFCLHA7_56vhQ6leaZLaLeFqyxY/edit#gid=0 Spreadsheet to track testing]


= Communication =
= Communication =
* Conversation about unified telemetry on fhr-dev: https://mail.mozilla.org/listinfo/fhr-dev
* Conversation about Event telemetry on fhr-dev: https://mail.mozilla.org/listinfo/fhr-dev
* Data verification meeting notes: https://etherpad.mozilla.org/fhr-v4-status
* IRC: #telemetry
* IRC: #telemetry, #datapipeline, #metrics
* Slack: #fx-metrics
* [[Unified Telemetry/Status reports]]
* [https://docs.google.com/document/d/1P0BmMRLSglX9G53-j5udU5CnrwDaqcHKP5fFjU5hEwo/edit Weekly Meeting notes]
* [[Unified Telemetry/Data Continuity]]
* [[Unified_Telemetry/Status_reports|EPM reports]]
 
= Resources =
* [https://docs.google.com/document/d/1IGpzsYGi_sq3YFQDAPyKOkU_BKvXAC95fZYA2i4ceVs/edit?usp=sharing Kickoff document]
** "Query Requirements" section has list of sample queries/questions that get asked frequently of FHR data


= People and Roles =
= People and Roles =
* Georg Fritzsche (client data collection)
* Georg Fritzsche (Data Platform, lead)
* Alessio Placitelli, :Dexter (client data collection)
* Alessio Placitelli, :Dexter (Data Platform)
* Mark Reid (data pipeline, telemetry server)
* Mark Reid (Data Platform)
* Michael Trinkala, :trink (data pipeline, heka)
* Roberto Vitillo (Data Platform)
* Wesley Dawson, :whd (data pipeline operations)
* Sunah Suh (Data Platform)
* Daniel Thornton, :relud (data pipeline operations)
* Rebecca Weiss (PM)
* Stuart Philp (test automation)
* Ilana Segall (Analysis)
* Anthony Zhang (Telemetry dashboard)
* John Dorlus (Quality Engineering)
* Roberto Vitillo (Spark analysis tool, telemetry data validation)
* Brendan Colloran (metrics team, data validation)
* Sam Penrose (metrics team, data validation)
* Thomas Huelbert (project management)
* Thomas Huelbert (project management)
* Katie Parlante (eng manager)
* Benjamin Smedberg (project sponsor, data steward)

Latest revision as of 20:54, 19 May 2020

The Telemetry wiki page has more information about using Telemetry -- this page describes the Event Telemetry project.

Overview

There is a common need across teams (fx-team, mobile, test-pilot, heartbeat, …) to have a mechanism for recording, storing, sending & analysing application usage in an event-oriented format. The Data Platform team wants to support this with a common API and mechanisms for dealing with the collected data, without owning the individual measurements. The solution here is to provide common client code, a standard data format, so we can come up with common processes and tooling for data pipeline & analysis work. Historically we already send a form of UITelemetry data, but the current format is too complicated to work with and to maintain.

Goals for Event Telemetry

  • Standardized event format for all pings collecting events (Main Ping, Test Pilot, Shield, Sync, etc.)
  • Expand collection of UI Telemetry to support Product teams
  • Enable top line metrics for 2017
  • Standardized support for analysis of events
  • API support in client for collecting event telemetry

Documentation

Analysis and Reporting

Project

Deliverables

  • 2016 Q3
  • 2016 Q4
    • Telemetry events on main pings implemented in Firefox
    • Activity Stream submitting events (using common format, in main ping)
    • Event data accessible in Spark
    • Generic Event data table available in re:dash
    • Test Pilot submitting events (using common format)
    • Custom sync ping is submitting events (using common format)
  • 2017 Q1
    • More custom pings submitting events (using common format)
    • Non-specialized, analysis-oriented datasets available
    • Mobile plan (Q1/TBD)
    • Some project (TBD) is on the main ping

Client work

Client Testing

Communication

People and Roles

  • Georg Fritzsche (Data Platform, lead)
  • Alessio Placitelli, :Dexter (Data Platform)
  • Mark Reid (Data Platform)
  • Roberto Vitillo (Data Platform)
  • Sunah Suh (Data Platform)
  • Rebecca Weiss (PM)
  • Ilana Segall (Analysis)
  • John Dorlus (Quality Engineering)
  • Thomas Huelbert (project management)