CloudServices/DataPipeline: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Add bugzilla list)
m (update)
Line 2: Line 2:
The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing cloud services server logs. We're in the process of improving it to support desktop and device telemetry data. The data pipeline team also works on [https://docs.services.mozilla.com/heka/ Heka] (a major component of the pipeline implementation), custom dashboards for cloud services projects, and the [[Telemetry]] server.
The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing cloud services server logs. We're in the process of improving it to support desktop and device telemetry data. The data pipeline team also works on [https://docs.services.mozilla.com/heka/ Heka] (a major component of the pipeline implementation), custom dashboards for cloud services projects, and the [[Telemetry]] server.


= Resources =
= Communication =
* IRC channel: #datapipeline
* IRC channel: #datapipeline
* Standup meeting: https://etherpad.mozilla.org/data-pipeline-meeting-notes
* Standup meeting: https://etherpad.mozilla.org/data-pipeline-meeting-notes
* Cross team coordination meeting: https://etherpad.mozilla.org/data-pipeline-coordination
* Cross team coordination meeting: https://etherpad.mozilla.org/data-pipeline-coordination
= Resources =
* [https://docs.google.com/a/mozilla.com/document/d/1tzPc9hIACNi07psaQEKfpYQho8wuObC_BkMg3QEDIwA/edit#heading=h.vbs9qotdifjb Pipeline technical proposal]
* [https://docs.google.com/a/mozilla.com/document/d/1tzPc9hIACNi07psaQEKfpYQho8wuObC_BkMg3QEDIwA/edit#heading=h.vbs9qotdifjb Pipeline technical proposal]
* [https://docs.google.com/a/mozilla.com/document/d/1QGiXfQ0AHCkJNXfMPArjab8Gq8zIdqDopCBr-1qD3sc/edit?usp=sharing Reporting and monitoring overview]
* [https://docs.google.com/a/mozilla.com/document/d/1QGiXfQ0AHCkJNXfMPArjab8Gq8zIdqDopCBr-1qD3sc/edit?usp=sharing Reporting and monitoring overview]

Revision as of 19:34, 23 January 2015

Overview

The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing cloud services server logs. We're in the process of improving it to support desktop and device telemetry data. The data pipeline team also works on Heka (a major component of the pipeline implementation), custom dashboards for cloud services projects, and the Telemetry server.

Communication

Resources

Pipeline Milestones

  • Q4 2014: Telemetry data running through pipeline
    • Server stack deploy in github ("opsified")
    • Re-implement monitoring dashboards
  • Q1 2015: Launch pipeline prototype
    • Architecture decisions completed; production stack up and running
    • Business Intelligence/Data Warehouse proof of concept implemented
    • Ingestion process completed for FHR+telemetry (start collecting on 2015-02-23)
    • Backprocessing from pipeline datastore implemented
    • Pipeline runs in parallel to existing infrastructure; not yet source of truth
  • Q2 2015: Pipeline officially supports business use cases
    • Complete set of use cases tbd (most likely primarily FHR+telemetry use cases)
    • Complete set of monitoring and reporting outputs tbd: dashboards, data warehouse, monitoring, self-service access to data
    • FHR+telemetry hits full release 2015-05-19, handle full production load
  • Q3 2015: Fill out monitoring and reporting capabilities; add sources and use cases

Related Dates and Schedules

  • FHR+Telemetry client work
    • Current plan: FF39 Nightly and uplifted to FF38. May not hit this schedule, but the pipeline needs to be ready
    • 2015-02-23 Nightly
    • 2015-05-19 Release

Work Queue

Tracking tasks in bugzilla: http://mzl.la/1DOOBZt

Risks and Open Questions

  • Old-FHR data through pipeline? Yes/No: [telliot]
  • Deletes & legal policy [telliot]

Archive