Unified Telemetry: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Add milestones)
(→‎Overview: Added dates)
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The [[Telemetry]] wiki page has more information about using Telemetry -- this page describes the 2015 project.
= Overview =
= Overview =
We're unifying the [[Telemetry]] and [[Firefox Health Report]] collection systems on the client, and sending them through one [[CloudServices/DataPipeline|Data Pipeline]]. To accomplish this on the client, we're migrating all of the FHR data to the Telemetry system. The new data pipeline has some features of the old telemetry pipeline as well as the cloud services data pipeline that we use to ingest server log data from Firefox services.
In 2015, we migrated [[Firefox Health Report]] data collection to the [[Telemetry]] system. At the same time, we made changes to Telemetry so that pings would be sent more frequently. We also updated the [[CloudServices/DataPipeline|Data Pipeline]] that ingests and processes the data.
 
=== Dates ===
 
* '''Fx41''' (2015-09-22): Started sending opt-out telemetry (base set) for 5% of the release population
* '''Fx42''' (2015-11-03): Started sending opt-out telemetry (base set) for 100% of the release population
* '''Fx43''' (2015-12-15): Stopped sending FHR v2 data


=== Goals for Unified Telemetry ===
=== Goals for Unified Telemetry ===
Line 8: Line 16:
* Use a common data pipeline for client telemetry and service log data.
* Use a common data pipeline for client telemetry and service log data.


= People and Roles =
=== Documentation ===
* Thomas Huelbert (project management)
* [https://gecko.readthedocs.org/en/latest/toolkit/components/telemetry/telemetry/index.html Client pings (tree documentation)]
* Katie Parlante (eng manager)
* [https://docs.google.com/spreadsheets/d/1bqamxVskDF7kQ6xL7S2BqY8TpngL-w41v6keiX_qByg/edit?usp=sharing V2 - V4 mappings]
* Benjamin Smedberg (budget, data steward)
* Alessio Placitelli, :Dexter (client data collection)
* Georg Fritzsche (client data collection)
* Mark Reid (data pipeline, telemetry server)
* Michael Trinkala, :trink (data pipeline, heka)
* Wesley Dawson, :whd (data pipeline operations)
* Daniel Thornton, :relud (data pipeline operations)
* Brendan Colloran (metrics team, data validation)
* Sam Penrose (data validation)
* Roberto Vitillo (Spark analysis tool, telemetry data validation)
* (Telemetry dashboard)
* Stuart Philp (test automation)


= Resources =
=== Analysis and Reporting ===
* [https://docs.google.com/document/d/1IGpzsYGi_sq3YFQDAPyKOkU_BKvXAC95fZYA2i4ceVs/edit?usp=sharing Kickoff document]
* Telemetry Dashboard (now using v4 unified telemetry data!): https://telemetry.mozilla.org/
** "Query Requirements" section has list of sample queries/questions that get asked frequently of FHR data
* Launch a spark cluster: https://telemetry-dash.mozilla.org/
* Format documentation
* Stream processing, heka reporting: [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Exploring+with+the+Mozilla+Data+Pipeline+Demo Exploring with the Mozilla Data Pipeline Demo]
** [https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/toolkit/components/telemetry/telemetry/index.html Client pings (tree documentation)]
** [https://docs.google.com/spreadsheets/d/1bqamxVskDF7kQ6xL7S2BqY8TpngL-w41v6keiX_qByg/edit?usp=sharing V2 - V4 mappings]
** [https://pipeline-prototype-cep.prod.mozaws.net/data/PrototypeSandbox-HekaMessageSchema.MessageSchema.txt Schema observed by pipeline]


= Milestones =
= Project =
Plan of record, subject to change if acceptance criteria are not met.
=== Deliverables ===
=== Deliverables ===
* Monitoring and alerting about pipeline health
* Monitoring and alerting about pipeline health
Line 46: Line 38:
** Search analysis continues to work
** Search analysis continues to work


=== Dates ===
=== Client work ===
* 2015-05-29: '''39 Beta''' (slipped due to 38.0.5)
** We start receiving Beta traffic on new pipeline
** FHR v2 data still sent to old pipeline
** saved-session pings to both old telemetry and new pipeline
** main pings go to new pipeline from beta, aurora, and nightly channels
* 2015-06-29: '''40 Beta''', 39 Release
** No change
** FHR v2 data still sent to old pipeline
** saved-session pings to both old telemetry and new pipeline
** main pings go to new pipeline from beta, aurora, and nightly channels
* 2015-07-15: '''40 Beta 5''', Client Complete
** Client work done
** Data validation work done
* 2015-08-04: 40 Release Candidate, Pipeline Complete
** Operations work done
* 2015-08-11: 40 Release
** FHR v2 data stops
** saved-sessions ping stops
** main pings sent to new pipeline from all channels
** base data sent from most of release population (unless they've opted out)
 
=== Acceptance Criteria (Beta -> Release) ===
* metrics team signoff
** metrics team analysis can proceed on new data streams
** longitudinal data has internal consistency and consistency with v2: [https://bugzilla.mozilla.org/show_bug.cgi?id=1169103 Tracking Bug 1169103]
** executive dashboard (in particular MAU)
** search analysis
* pipeline/ops team signoff
** pipeline is ready and can handle capacity
** monitoring and alerting set up
** no blocking issues:
*** [https://bugzilla.mozilla.org/show_bug.cgi?id=1140037 Telemetry submission rate spikes every hour]
* performance team signoff
** performance team analysis can proceed on new data streams
** <bug tree here>
* qa signoff
** <bug tree here>
* ua signoff
** Doesn't put any burden on the user (prefs are respected, no performance issues, etc.)
** <bug tree here>
 
= Client work =
* Backlog as [https://docs.google.com/a/mozilla.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing spreadsheet], with estimates
* Backlog as [https://docs.google.com/a/mozilla.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing spreadsheet], with estimates
* Bug tree, phase 3: https://bugzilla.mozilla.org/show_bug.cgi?id=1120356
* Bug tree, phase 4: https://bugzilla.mozilla.org/show_bug.cgi?id=1122482
* Bug tree, phase 3: https://bugzilla.mozilla.org/show_bug.cgi?id=1120356 (Done)
* Bug tree, phase 2: https://bugzilla.mozilla.org/show_bug.cgi?id=1069869 (Done)
* Bug tree, phase 2: https://bugzilla.mozilla.org/show_bug.cgi?id=1069869 (Done)
* Bug tree, phase 1: https://bugzilla.mozilla.org/show_bug.cgi?id=1040800 (Done)
* Bug tree, phase 1: https://bugzilla.mozilla.org/show_bug.cgi?id=1040800 (Done)


= Pipeline work =
=== Pipeline work ===
* Bugzilla: http://mzl.la/1KWiNST
* Bugzilla: http://mzl.la/1KWiNST
= Data validation =
=== Metrics Team Validation ===
* https://bugzilla.mozilla.org/show_bug.cgi?id=1134661 (An automated script to compare FHR v2 results and FHR-v4 for a sample of users)
* For beta period, rollup fields compare reasonably to v2
** # of sessions
** session lengths
** searches
** default browser status
** places counts


=== Client Testing ===
=== Client Testing ===
* [https://docs.google.com/document/d/10sZICCbsfcSTF3RPyeVDskSI9-I2E4iApmShmIWSLfg/edit#heading=h.a6hfij6xookn Test cases document]
* [https://docs.google.com/document/d/10sZICCbsfcSTF3RPyeVDskSI9-I2E4iApmShmIWSLfg/edit#heading=h.a6hfij6xookn Test cases document]
* [https://docs.google.com/a/mozilla.com/spreadsheets/d/1YxqvjRJuuIPRegNXAFCLHA7_56vhQ6leaZLaLeFqyxY/edit#gid=0 Spreadsheet to track testing]
* [https://docs.google.com/a/mozilla.com/spreadsheets/d/1YxqvjRJuuIPRegNXAFCLHA7_56vhQ6leaZLaLeFqyxY/edit#gid=0 Spreadsheet to track testing]
=== Monitoring Tasks ===
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1147395 Compare a few telemetry measurements between "saved-session" and "main" pings]
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1129185 Reporting to make sure we don't have broken or incomplete session fragment chains]
* [https://bugzilla.mozilla.org/show_bug.cgi?id=1134669 unified-FHR quality report: activity latency]
=== Monitors ===
* [https://pipeline-prototype-cep.prod.mozaws.net/#plugins/filters/PrototypeSandbox-mreid_CountRecentByDocType Count Recent By Doc Type]
* [https://pipeline-prototype-cep.prod.mozaws.net/#sandboxes/PrototypeSandbox-gfritzsche_ChannelDiffers app.channel vs. environment.settings.update.channel]
=== Investigations ===
* https://etherpad.mozilla.org/unified-telemetry-investigations
= Analysis and Reporting =
=== Tools ===
* Automated data dump for data validation exercise
* Spark
** [https://bugzilla.mozilla.org/show_bug.cgi?id=1152539 Make FHRv4 data available per client through Spark]
* Stream processing on real time data
** [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Exploring+with+the+Mozilla+Data+Pipeline+Demo Exploring with the Mozilla Data Pipeline Demo]
* Reporting using stream processing tools
** [https://bugzilla.mozilla.org/show_bug.cgi?id=1146699 Reprocessing and incremental processing architecture for reporting]


= Communication =
= Communication =
Line 136: Line 56:
* Data verification meeting notes: https://etherpad.mozilla.org/fhr-v4-status
* Data verification meeting notes: https://etherpad.mozilla.org/fhr-v4-status
* IRC: #telemetry, #datapipeline, #metrics
* IRC: #telemetry, #datapipeline, #metrics
* [[Unified Telemetry/Status reports]]
* [[Unified Telemetry/Data Continuity]]
= Resources =
* [https://docs.google.com/document/d/1IGpzsYGi_sq3YFQDAPyKOkU_BKvXAC95fZYA2i4ceVs/edit?usp=sharing Kickoff document]
** "Query Requirements" section has list of sample queries/questions that get asked frequently of FHR data
= People and Roles =
* Georg Fritzsche (client data collection)
* Alessio Placitelli, :Dexter (client data collection)
* Mark Reid (data pipeline, telemetry server)
* Michael Trinkala, :trink (data pipeline, heka)
* Wesley Dawson, :whd (data pipeline operations)
* Daniel Thornton, :relud (data pipeline operations)
* Stuart Philp (test automation)
* Anthony Zhang (Telemetry dashboard)
* Roberto Vitillo (Spark analysis tool, telemetry data validation)
* Brendan Colloran (metrics team, data validation)
* Sam Penrose (metrics team, data validation)
* Thomas Huelbert (project management)
* Katie Parlante (eng manager)
* Benjamin Smedberg (project sponsor, data steward)

Latest revision as of 20:49, 27 April 2016

The Telemetry wiki page has more information about using Telemetry -- this page describes the 2015 project.

Overview

In 2015, we migrated Firefox Health Report data collection to the Telemetry system. At the same time, we made changes to Telemetry so that pings would be sent more frequently. We also updated the Data Pipeline that ingests and processes the data.

Dates

  • Fx41 (2015-09-22): Started sending opt-out telemetry (base set) for 5% of the release population
  • Fx42 (2015-11-03): Started sending opt-out telemetry (base set) for 100% of the release population
  • Fx43 (2015-12-15): Stopped sending FHR v2 data

Goals for Unified Telemetry

  • On the client, unify the telemetry and FHR measurement systems so that measurements do not have to be implemented more than once in different systems.
  • Reduce the latency from the time a measurement occurs until it can be analyzed on the server.
  • Increase the accuracy of measurements so that they can be better correlated with factors in the user environment such as the specific build, enabled addons, and other hardware or software factors.
  • Use a common data pipeline for client telemetry and service log data.

Documentation

Analysis and Reporting

Project

Deliverables

  • Monitoring and alerting about pipeline health
  • Basic tool support
    • Telemetry Dashboard works against new pipeline data
    • Telemetry-dash (or new equivalent) can launch spark, heka reporting jobs
  • Derived data sets
    • Executive dashboard rollup
    • 1% sample of clientIds for longitudinal analysis
  • v2-v4 Data Continuity
    • Executive dashboard continues to work
    • Search analysis continues to work

Client work

Pipeline work

Client Testing

Communication

Resources

  • Kickoff document
    • "Query Requirements" section has list of sample queries/questions that get asked frequently of FHR data

People and Roles

  • Georg Fritzsche (client data collection)
  • Alessio Placitelli, :Dexter (client data collection)
  • Mark Reid (data pipeline, telemetry server)
  • Michael Trinkala, :trink (data pipeline, heka)
  • Wesley Dawson, :whd (data pipeline operations)
  • Daniel Thornton, :relud (data pipeline operations)
  • Stuart Philp (test automation)
  • Anthony Zhang (Telemetry dashboard)
  • Roberto Vitillo (Spark analysis tool, telemetry data validation)
  • Brendan Colloran (metrics team, data validation)
  • Sam Penrose (metrics team, data validation)
  • Thomas Huelbert (project management)
  • Katie Parlante (eng manager)
  • Benjamin Smedberg (project sponsor, data steward)