39
edits
(Add fhr-dev) |
|||
(42 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
= Overview = | = Overview = | ||
The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing cloud services server logs | The cloud services data pipeline ingests data for analysis, monitoring and reporting. The pipeline is currently used for processing desktop and device [[Telemetry|Telemetry]] data and cloud services server logs. The ingestion pipeline is one component of the [[Data/Platform|Fx Data Platform]]. | ||
= | === Pipeline specs/docs === | ||
= | |||
* [https://docs.google.com/a/mozilla.com/document/d/1tzPc9hIACNi07psaQEKfpYQho8wuObC_BkMg3QEDIwA/edit#heading=h.vbs9qotdifjb Pipeline technical proposal] | * [https://docs.google.com/a/mozilla.com/document/d/1tzPc9hIACNi07psaQEKfpYQho8wuObC_BkMg3QEDIwA/edit#heading=h.vbs9qotdifjb Pipeline technical proposal] | ||
* [ | * [[CloudServices/DataPipeline/HTTPEdgeServerSpecification|HTTP Edge Server Specification]] | ||
* [ | * [[CloudServices/DataPipeline/Metadata|Pipeline Metadata]] | ||
= | === Data sets and other documentation === | ||
* | * [http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/index.html Telemetry Data] | ||
* [https://wiki.mozilla.org/Mobile/Metrics/Redash Mobile Metrics] | |||
* [https://github.com/mozilla/testpilot/blob/master/docs/README-METRICS.md Test Pilot] | |||
* | |||
* | |||
= | = Code = | ||
=== V2 Pipeline === | |||
{| class="wikitable" | |||
|- | |||
! Link !! Description | |||
|- | |||
| https://github.com/mozilla-services/data-pipeline || Mozilla Services Data Pipeline | |||
|- | |||
| https://github.com/mozilla-services/lua_sandbox || Generic Lua sandbox for dynamic data analysis | |||
|- | |||
| https://github.com/mozilla-services/mozilla-pipeline-schemas || JSON Schema specifications of pipeline data | |||
|- | |||
| https://github.com/mozilla/pipeline-monitoring-dashboard || Monitoring data quality issues for metrics pipeline | |||
|- | |||
| https://github.com/mozilla-services/heka || Data collection and processing made easy | |||
|- | |||
| https://github.com/mozilla-services/nginx_moz_ingest || HTTP Data Pipeline Ingestion | |||
|- | |||
| https://github.com/trink/hindsight || Data collection and processing made light weight, fast, and more reliable | |||
|} | |||
= | === Telemetry === | ||
= | {| class="wikitable" | ||
|- | |||
! Link !! Description | |||
|- | |||
| https://github.com/vitillo/telemetry-onboarding || Slides / notebooks for Telemetry Onboarding | |||
|- | |||
| https://github.com/mozilla/telemetry-server || Code for analysis.telemetry.mozilla.org among other things | |||
|- | |||
| https://github.com/bsmedberg/telemetry-experiments-dashboard || A dashboard to track the deployment of Firefox Telemetry Experiments | |||
|- | |||
| https://github.com/mozilla/telemetry-batch-view || A Scala framework to build derived datasets, aka batch views, of Telemetry data. | |||
|- | |||
| https://github.com/mozilla/cerberus || Automatic alert system for telemetry histograms | |||
|- | |||
| https://github.com/mozilla/emr-bootstrap-spark || AWS bootstrap scripts for Mozilla's flavoured Spark setup. | |||
|- | |||
| https://github.com/mozilla/moz-crash-rate-aggregates || Crash Rate Aggregation code | |||
|- | |||
| https://github.com/mozilla/jupyter-notebook-gist || Plugin to create, list, and load GitHub Gists from Jupyter notebooks | |||
|- | |||
| https://github.com/mozilla/jupyter-spark || Jupyter Notebook extension for Apache Spark integration | |||
|- | |||
| https://github.com/mozilla/python_mozaggregator || Aggregator job for telemetry.mozilla.org | |||
|- | |||
| https://github.com/mozilla/python_moztelemetry || Spark bindings for Mozilla Telemetry | |||
|- | |||
| https://github.com/mozilla/telemetry-analysis-service || Eventual home of the revamped a.t.m.o (per Bug 1248688) | |||
|- | |||
| https://github.com/vitillo/telemetry-airflow || Scheduling / workflow management for Telemetry jobs | |||
|- | |||
| https://github.com/vitillo/e10s_analyses || Data analysis relating to Electrolysis / E10s | |||
|- | |||
| https://github.com/mozilla/telemetry-tools || Utility code to work with Mozilla Telemetry data | |||
|} | |||
= Archive = | = Archive = | ||
* [https://docs.google.com/a/mozilla.com/document/d/1QGiXfQ0AHCkJNXfMPArjab8Gq8zIdqDopCBr-1qD3sc/edit?usp=sharing Q4 2014: Reporting and monitoring overview] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Bespoke+Dashboards Bespoke Dashboards] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Cloud+Services+Data Cloud Services Data Projects] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Data+Sources List of Data Sources] | |||
* [https://mana.mozilla.org/wiki/display/CLOUDSERVICES/V1+Pipeline V1 Pipeline & Data Sources] | |||
* [https://docs.google.com/a/mozilla.com/document/d/1CTazW99zBK5K40f-fgSyTPw9IXgmFYjQmNhzxTT9Tts/edit?usp=sharing post workweek roadmap] | * [https://docs.google.com/a/mozilla.com/document/d/1CTazW99zBK5K40f-fgSyTPw9IXgmFYjQmNhzxTT9Tts/edit?usp=sharing post workweek roadmap] | ||
* [https://id.etherpad.mozilla.org/data-team old etherpad] | * [https://id.etherpad.mozilla.org/data-team old etherpad] |
edits