CloudServices/DataPipeline: Difference between revisions

m
more formatting
(Formatted work queue)
m (more formatting)
Line 39: Line 39:
* Implement a specific flag to determine if data gets warehoused or not
* Implement a specific flag to determine if data gets warehoused or not
* Integrate Roberto’s spark data flow into new DWH
* Integrate Roberto’s spark data flow into new DWH
* Implies a similar db-backed table of DWH filenames for filtering (don’t want to list S3 every time - too slow)
** Implies a similar db-backed table of DWH filenames for filtering (don’t want to list S3 every time - too slow)
* Elasticsearch (Kibana) output filter
* Elasticsearch (Kibana) output filter
* Complete list of outputs (and filters and any other support)
* Complete list of outputs (and filters and any other support)
* Build a shim for debugging CEPs with local data
* Build a shim for debugging CEPs with local data
* Store the “raw raw” data for some period to ensure we’re safe if our code and/or CEP code is badly broken. Can’t just lose data.  
* Store the “raw raw” data for some period to ensure we’re safe if our code and/or CEP code is badly broken. Can’t just lose data.  
* Tee off to short-lived S3 before it goes through the main pipeline?
** Tee off to short-lived S3 before it goes through the main pipeline?
* BI query example that cross references data sources
* BI query example that cross references data sources
* example: does fxa/sync increase browser usage?
** example: does fxa/sync increase browser usage?


== To Do ==
== To Do ==
Confirmed users
539

edits