Data/Platform/Airflow Runbook

From MozillaWiki
< Data‎ | Platform
Revision as of 20:00, 19 January 2017 by Sunahsuh (talk | contribs) (Created page with "= Airflow Runbook = [https://github.com/mozilla/telemetry-airflow Airflow] is our workflow management system for telemetry batch jobs. The main project docs are [https://airfl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Airflow Runbook

Airflow is our workflow management system for telemetry batch jobs. The main project docs are here. This document describes the process for resolving issues when things go sideways.

A DAG is running that I don't want to run

If you accidentally start DAG runs for dates that are either already processed or you're not interested in, the best course is often to mark the task(s) as `Success` from the web UI. To do this, click on the root task and, in the resulting modal dialog, click "Downstream" and then "Mark Success" to turn those task runs green. Click "Downstream" and then "Mark Sucess" in the task modal dialog

This doesn't stop any actually currently running clusters, however, so find those running clusters on EMR and kill them.

I want to run a backfill

To run a backfill on a whole DAG, the easiest way is to click on the root task, select "Downstream" and click on "Clear". ToDo: running a backfill on many days