Unified Telemetry/Status reports/July 31 2015: Difference between revisions
Jump to navigation
Jump to search
(→Accomplished for Last Period: minor) |
(→Risks/Issues: updates) |
||
| Line 34: | Line 34: | ||
| Data loss incident || Fixed || mreid/whd/trink || [https://bugzilla.mozilla.org/show_bug.cgi?id=1179128 Tee server needs to return error status from old or new]. Added Ops resources (Daniel Thornton). || 7/15 | | Data loss incident || Fixed || mreid/whd/trink || [https://bugzilla.mozilla.org/show_bug.cgi?id=1179128 Tee server needs to return error status from old or new]. Added Ops resources (Daniel Thornton). || 7/15 | ||
|- | |- | ||
| Remote about:healthreport content || Open || Katie/ | | Remote about:healthreport content || Open || Katie/BDS || Working on pr for [https://github.com/mozilla/fhr-jelly fhr-jelly], will deploy next week || 8/10 | ||
|- | |- | ||
| Budget, size of UT pings || Open || Mark/BDS || https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 || 8/10 | | Budget, size of UT pings || Open || Mark/BDS || https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 || 8/10 | ||
|- | |- | ||
| Analysis difficulty || Open || Katie/tbd || | | Analysis difficulty || Open || Katie/tbd || Spark training; need comprehensive plan || 8/10 | ||
|} | |} | ||
Revision as of 21:06, 31 July 2015
Unified Telemetry status report July 31, 2015
Overall Project Health
Last week: Yellow
This week: Yellow - r41 is go live for unified Telemetry. Risks to the project revolve around missing subsession pings and getting about:healthreportcontent work prior to aug 10. Final client changes targeted for same.
Exec Summary
- Some outstanding issues from the July 30 milestone were missed, meaning not all data validation are yet completed, moved remaining items into August 10th (Team has adopted FireFox iterations to align with greater org)
- Final client changes needed by August 10
- Testing plan up on wiki:Telemetry/Testing
Risks/Issues
| Description of Risks/Issues | State | Owner | Plan to Resolve/Mitigation | Target Date |
|---|---|---|---|---|
| Investigate gaps in pings | Open | Stuart/Alessio | https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, working doc | 8/10 |
| Data integrity between V2/V4 and V4 internal data consistency | Open | Brendan/Sam | Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation | 8/10 |
| Data continuity across V2/V4 | Open | Katie/Mark/Trink | Plan, Metabug | 8/10 |
| Legal review | Open | BDS/Legal | Meeting between groups | 8/10 |
| QA sign off (functional, load) | Open | Stuart | Telemetry/Testing | 8/10 |
| Operations - data retention requirements | Open | Travis/Katie | Eng team owes ops a doc defining ping types and data retention requirements | 8/10 |
| Operations - analysis tools & microservices | Open | Travis/Mark/Roberto | Architecture/Data flow diagram | 8/10 |
| Data loss incident | Fixed | mreid/whd/trink | Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). | 7/15 |
| Remote about:healthreport content | Open | Katie/BDS | Working on pr for fhr-jelly, will deploy next week | 8/10 |
| Budget, size of UT pings | Open | Mark/BDS | https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 | 8/10 |
| Analysis difficulty | Open | Katie/tbd | Spark training; need comprehensive plan | 8/10 |
Accomplished for Last Period
Engineering & Ops
- Unexpected jump in traffic last friday (beta and release), doubled instance size on monday
- Client work: Spreadsheet
- Data validation
- Missing pings doc
- Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
- Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Pipeline scaling work
- Back fill of executive summary pings (hindsight)
- snappy support added to Spark and Heka infrastructure
- Telemetry tools and microservices
- Work on memory footprint of the Spark jobs: https://bugzilla.mozilla.org/show_bug.cgi?id=1182499
- Kickoff meeting for deployment plan for telemetry tools and microservices: Architecture flow diagram
QA
- Load testing
- work with softvision
Project management
- meetings, emails, hand waving
Planned for Upcoming Period
Engineering
- Client
- uplifts required to hit beta
- focus on work required for about:healthreport (use new apis and migrate content)
- Pipeline
- In talk with Databricks wrt to Sparks hosting
- Mechanism for Heka state preservation when it gets wedged
- UT specific monitoring and alerting
- data retention spec
- Data validation
- acceptance criteria
- missing subsessions ping investigation
- Many submission for few clients issue
- Data continuity
- Document strategy for executive dashboards with v2 + v4 data
Ops
- building automated jenkins deployments
- nginx load balanacing
QA:
- Look into prod T issue with Ops
- continue test suite creation
- finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
- Finish triage of bugs
- remainder of release tasks scheduled
Outstanding requests not yet road mapped into a release
| Description | State | Owner | Plan to Resolve/Mitigation | Target Date |
|---|---|---|---|---|
| FireFox OS - app pings | Open | Katie | Need to schedule and understand impact on project | TBD |
| histograms for loop/hello | Open | Katie | Need to schedule and understand impact on project | TBD |