CloudServices/DataPipeline: Difference between revisions
Jump to navigation
Jump to search
(→V2 Pipeline: Add a few more pipeline code links) |
|||
(15 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
= Overview = | = Overview = | ||
The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing desktop and device [[Telemetry|Telemetry]] data and cloud services server logs. The [[ | The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing desktop and device [[Telemetry|Telemetry]] data and cloud services server logs. The ingestion pipeline is one component of the [[Data/Platform|Fx Data Platform]]. | ||
=== Pipeline specs/docs === | === Pipeline specs/docs === | ||
* [https://docs.google.com/a/mozilla.com/document/d/1tzPc9hIACNi07psaQEKfpYQho8wuObC_BkMg3QEDIwA/edit#heading=h.vbs9qotdifjb Pipeline technical proposal] | * [https://docs.google.com/a/mozilla.com/document/d/1tzPc9hIACNi07psaQEKfpYQho8wuObC_BkMg3QEDIwA/edit#heading=h.vbs9qotdifjb Pipeline technical proposal] | ||
Line 19: | Line 7: | ||
* [[CloudServices/DataPipeline/Metadata|Pipeline Metadata]] | * [[CloudServices/DataPipeline/Metadata|Pipeline Metadata]] | ||
=== | === Data sets and other documentation === | ||
* [ | * [http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/index.html Telemetry Data] | ||
* [https://wiki.mozilla.org/Mobile/Metrics/Redash Mobile Metrics] | |||
* [https://github.com/mozilla/testpilot/blob/master/docs/README-METRICS.md Test Pilot] | |||
* [https:// | |||
* [https:// | |||
= Code = | = Code = | ||
Line 73: | Line 27: | ||
|- | |- | ||
| https://github.com/mozilla-services/heka || Data collection and processing made easy | | https://github.com/mozilla-services/heka || Data collection and processing made easy | ||
|- | |||
| https://github.com/mozilla-services/nginx_moz_ingest || HTTP Data Pipeline Ingestion | |||
|- | |||
| https://github.com/trink/hindsight || Data collection and processing made light weight, fast, and more reliable | |||
|} | |} | ||
Line 93: | Line 51: | ||
| https://github.com/mozilla/emr-bootstrap-spark || AWS bootstrap scripts for Mozilla's flavoured Spark setup. | | https://github.com/mozilla/emr-bootstrap-spark || AWS bootstrap scripts for Mozilla's flavoured Spark setup. | ||
|- | |- | ||
| https://github.com/mozilla/moz-crash-rate-aggregates || Crash Rate Aggregation code | |||
|- | |||
| https://github.com/mozilla/jupyter-notebook-gist || Plugin to create, list, and load GitHub Gists from Jupyter notebooks | | https://github.com/mozilla/jupyter-notebook-gist || Plugin to create, list, and load GitHub Gists from Jupyter notebooks | ||
|- | |- | ||
| https://github.com/ | | https://github.com/mozilla/jupyter-spark || Jupyter Notebook extension for Apache Spark integration | ||
|- | |- | ||
| https://github.com/mozilla/python_mozaggregator || Aggregator job for telemetry.mozilla.org | | https://github.com/mozilla/python_mozaggregator || Aggregator job for telemetry.mozilla.org | ||
Line 102: | Line 62: | ||
|- | |- | ||
| https://github.com/mozilla/telemetry-analysis-service || Eventual home of the revamped a.t.m.o (per Bug 1248688) | | https://github.com/mozilla/telemetry-analysis-service || Eventual home of the revamped a.t.m.o (per Bug 1248688) | ||
|- | |||
| https://github.com/vitillo/telemetry-airflow || Scheduling / workflow management for Telemetry jobs | |||
|- | |||
| https://github.com/vitillo/e10s_analyses || Data analysis relating to Electrolysis / E10s | |||
|- | |- | ||
| https://github.com/mozilla/telemetry-tools || Utility code to work with Mozilla Telemetry data | | https://github.com/mozilla/telemetry-tools || Utility code to work with Mozilla Telemetry data | ||
Line 107: | Line 71: | ||
= Archive = | = Archive = | ||
* [https://docs.google.com/a/mozilla.com/document/d/1QGiXfQ0AHCkJNXfMPArjab8Gq8zIdqDopCBr-1qD3sc/edit?usp=sharing Q4 2014: Reporting and monitoring overview] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Bespoke+Dashboards Bespoke Dashboards] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Cloud+Services+Data Cloud Services Data Projects] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Data+Sources List of Data Sources] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/V1+Pipeline V1 Pipeline & Data Sources] | |||
* [https://docs.google.com/a/mozilla.com/document/d/1CTazW99zBK5K40f-fgSyTPw9IXgmFYjQmNhzxTT9Tts/edit?usp=sharing post workweek roadmap] | * [https://docs.google.com/a/mozilla.com/document/d/1CTazW99zBK5K40f-fgSyTPw9IXgmFYjQmNhzxTT9Tts/edit?usp=sharing post workweek roadmap] | ||
* [https://id.etherpad.mozilla.org/data-team old etherpad] | * [https://id.etherpad.mozilla.org/data-team old etherpad] |
Latest revision as of 16:44, 21 December 2016
Overview
The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing desktop and device Telemetry data and cloud services server logs. The ingestion pipeline is one component of the Fx Data Platform.
Pipeline specs/docs
Data sets and other documentation
Code
V2 Pipeline
Link | Description |
---|---|
https://github.com/mozilla-services/data-pipeline | Mozilla Services Data Pipeline |
https://github.com/mozilla-services/lua_sandbox | Generic Lua sandbox for dynamic data analysis |
https://github.com/mozilla-services/mozilla-pipeline-schemas | JSON Schema specifications of pipeline data |
https://github.com/mozilla/pipeline-monitoring-dashboard | Monitoring data quality issues for metrics pipeline |
https://github.com/mozilla-services/heka | Data collection and processing made easy |
https://github.com/mozilla-services/nginx_moz_ingest | HTTP Data Pipeline Ingestion |
https://github.com/trink/hindsight | Data collection and processing made light weight, fast, and more reliable |
Telemetry
Link | Description |
---|---|
https://github.com/vitillo/telemetry-onboarding | Slides / notebooks for Telemetry Onboarding |
https://github.com/mozilla/telemetry-server | Code for analysis.telemetry.mozilla.org among other things |
https://github.com/bsmedberg/telemetry-experiments-dashboard | A dashboard to track the deployment of Firefox Telemetry Experiments |
https://github.com/mozilla/telemetry-batch-view | A Scala framework to build derived datasets, aka batch views, of Telemetry data. |
https://github.com/mozilla/cerberus | Automatic alert system for telemetry histograms |
https://github.com/mozilla/emr-bootstrap-spark | AWS bootstrap scripts for Mozilla's flavoured Spark setup. |
https://github.com/mozilla/moz-crash-rate-aggregates | Crash Rate Aggregation code |
https://github.com/mozilla/jupyter-notebook-gist | Plugin to create, list, and load GitHub Gists from Jupyter notebooks |
https://github.com/mozilla/jupyter-spark | Jupyter Notebook extension for Apache Spark integration |
https://github.com/mozilla/python_mozaggregator | Aggregator job for telemetry.mozilla.org |
https://github.com/mozilla/python_moztelemetry | Spark bindings for Mozilla Telemetry |
https://github.com/mozilla/telemetry-analysis-service | Eventual home of the revamped a.t.m.o (per Bug 1248688) |
https://github.com/vitillo/telemetry-airflow | Scheduling / workflow management for Telemetry jobs |
https://github.com/vitillo/e10s_analyses | Data analysis relating to Electrolysis / E10s |
https://github.com/mozilla/telemetry-tools | Utility code to work with Mozilla Telemetry data |