CloudServices/DataPipeline: Difference between revisions

Jump to navigation Jump to search
Add work queue
m (tweak)
(Add work queue)
Line 28: Line 28:
** 2015-02-23 Nightly
** 2015-02-23 Nightly
** 2015-05-19 Release
** 2015-05-19 Release
= Work Queue =
== Risks/Questions ==
Send something to dev-planning? [kparlante, telliot]
Old-FHR data through pipeline? Yes/No: [telliot]
Deletes & legal policy [telliot, mreid to provide cost estimate]
Stewing
Maintaining a sample data set for faster queries
Implement a specific flag to determine if data gets warehoused or not
Integrate Roberto’s spark data flow into new DWH
Implies a similar db-backed table of DWH filenames for filtering (don’t want to list S3 every time - too slow)
Elasticsearch (Kibana) output filter
Complete list of outputs (and filters and any other support)
Build a shim for debugging CEPs with local data
Store the “raw raw” data for some period to ensure we’re safe if our code and/or CEP code is badly broken. Can’t just lose data.
Tee off to short-lived S3 before it goes through the main pipeline?
BI query example that cross references data sources
example: does fxa/sync increase browser usage?
Queueing
Q4 telemetry: (re) implement telemetry monitoring dashboards [?]
Q1 BI: define schema for data warehouse (talk to jjensen) [kparlante]
should use multiple data sources
Q1 BI: write filter for data warehouse [trink]
Q1 BI: signal & schedule loading of data warehouse [mreid]
Q1 BI: redshift output [trink]
Q1 BI: setup domo and/or tableau to look at mysql or csv or whatever is easy [?]
Q1: Data format spec [kparlante, trink]
JSON schema, specifically for FHR+telemetry, also anticipate other sources
Q1: implement best guess at per user sampling [trink]
follow up with saptarshi for more complex algorithm
Doing
Opsify stack [whd]
Q4 telemetry: Send telemetry data through the pipeline [mreid]
Q4 telemetry: Larger payloads (32MB) for telemetry [trink]
risk mitigation: Estimate cost of “full scan” DWH query [mreid]
risk mitigation: Estimate cost of single DWH delete [mreid]
Done
Parallelize sandbox filters (eg FHRSearch) [trink]
Enable Lua JIT [trink]
Confirmed users
539

edits

Navigation menu