CloudServices/DataPipeline: Difference between revisions

CloudServices/DataPipeline (view source)

3 bytes added , 8 January 2015

m

more formatting

Confirmed users

539

edits

@@ Line 39: / Line 39: @@
 * Implement a specific flag to determine if data gets warehoused or not
 * Integrate Roberto’s spark data flow into new DWH
-* Implies a similar db-backed table of DWH filenames for filtering (don’t want to list S3 every time - too slow)
+** Implies a similar db-backed table of DWH filenames for filtering (don’t want to list S3 every time - too slow)
 * Elasticsearch (Kibana) output filter
 * Complete list of outputs (and filters and any other support)
 * Build a shim for debugging CEPs with local data
 * Store the “raw raw” data for some period to ensure we’re safe if our code and/or CEP code is badly broken. Can’t just lose data.
-* Tee off to short-lived S3 before it goes through the main pipeline?
+** Tee off to short-lived S3 before it goes through the main pipeline?
 * BI query example that cross references data sources
-* example: does fxa/sync increase browser usage?
+** example: does fxa/sync increase browser usage?
 == To Do ==