Unified Telemetry/Status reports/July 17 2015: Difference between revisions
m (Kparlante moved page Status reports/July 17 2015 to Unified Telemetry/Status reports/July 17 2015: move to namespace) |
m (→Risks/Issues: eh) |
||
| (20 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
[https://wiki.mozilla.org/Status_reports/July_10_2015 previous weeks report] | [https://wiki.mozilla.org/Status_reports/July_10_2015 previous weeks report] | ||
== Unified Telemetry status report July 17, 2015 == | == Unified Telemetry status report July 17, 2015 == | ||
| Line 7: | Line 5: | ||
=== Overall Project Health === | === Overall Project Health === | ||
Green - | Green - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation. | ||
=== Exec Summary === | === Exec Summary === | ||
* | * Client work delayed this week by sick time. Send logic and a few other changes planned for uplift to Aurora & Beta next week. Remaining work for 41 waiting for reviews. | ||
* July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work | |||
* Testing plan up on wiki:[[Telemetry/Testing]] | |||
* | |||
* | |||
* Ongoing planning on FHR V2/V3 historic pipeline migration link to status [https://mana.mozilla.org/wiki/display/PM/FHR+historic+pipeline+update+July+6 here]. | * Ongoing planning on FHR V2/V3 historic pipeline migration link to status [https://mana.mozilla.org/wiki/display/PM/FHR+historic+pipeline+update+July+6 here]. | ||
=== Risks/Issues === | === Risks/Issues === | ||
| Line 27: | Line 18: | ||
! Description of Risks/Issues !! State !! Owner !! Plan to Resolve/Mitigation !! Target Date | ! Description of Risks/Issues !! State !! Owner !! Plan to Resolve/Mitigation !! Target Date | ||
|- | |- | ||
| Data integrity between V2/V4 and V4 internal data consistency || Open || Brendan/Sam || Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation || 7/ | | Data integrity between V2/V4 and V4 internal data consistency || Open || Brendan/Sam || Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation || 7/30 | ||
|- | |- | ||
| Data continuity across V2/V4 || Open || Katie/Mark/Trink || | | Data continuity across V2/V4 || Open || Katie/Mark/Trink || [https://docs.google.com/a/mozilla.com/document/d/1VzQHfzfA-S_lO2wpXDFjDzSJntJCMwP03TzefIj7RrE/edit?usp=sharing Plan], [https://bugzilla.mozilla.org/show_bug.cgi?id=1182684 Metabug] || 7/23 | ||
|- | |- | ||
| Legal review || Open || BDS/Legal || Meeting between groups || 8/04 | | Legal review || Open || BDS/Legal || Meeting between groups || 8/04 | ||
|- | |- | ||
| QA sign off (functional, load) || Open || Stuart || | | QA sign off (functional, load) || Open || Stuart || [[Telemetry/Testing]] || 8/04 | ||
|- | |- | ||
| Operations - data retention requirements || Open || Travis/Katie || Eng team owes ops a doc defining ping types and data retention requirements || 8/04 | | Operations - data retention requirements || Open || Travis/Katie || Eng team owes ops a doc defining ping types and data retention requirements || 8/04 | ||
|- | |- | ||
| Operations - analysis tools & microservices || Open || Travis/Mark/Roberto || [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing%20 Architecture/Data flow diagram] | | Operations - analysis tools & microservices || Open || Travis/Mark/Roberto || [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing%20 Architecture/Data flow diagram]|| 8/04 | ||
|- | |- | ||
| Data loss incident || | | Data loss incident || Fixed || mreid/whd/trink || [https://bugzilla.mozilla.org/show_bug.cgi?id=1179128 Tee server needs to return error status from old or new]. Added Ops resources (Daniel Thornton). || 7/15 | ||
|- | |- | ||
| Remote about:healthreport content || Open || Katie/Georg || Made a request to Laura Thomson for help || 8/04 | | Remote about:healthreport content || Open || Katie/Georg || Made a request to Laura Thomson for help || 8/04 | ||
| Line 49: | Line 40: | ||
=== Accomplished for Last Period === | === Accomplished for Last Period === | ||
Engineering & Ops | |||
* Heka 0.10.0 beta released | |||
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet] | * Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet] | ||
* | ** Not uplifting recent send logic changes to Beta (needs more bake time for confidence) | ||
* [ | ** Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified | ||
* | ** Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review | ||
** | * Data validation | ||
** | ** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24 | ||
** Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268 | |||
* Pipeline scaling work | |||
** Finished distributed aggregation work started at workweek: https://github.com/mozilla-services/data-pipeline/pull/93 | |||
** Deployed next round of changes | |||
* Telemetry tools and microservices | |||
** Work on memory footprint of the Spark jobs: https://bugzilla.mozilla.org/show_bug.cgi?id=1182499 | |||
** Kickoff meeting for deployment plan for telemetry tools and microservices: [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing Architecture flow diagram] | |||
QA | |||
* test cases, bug closing | |||
Project management | |||
* meeting, emails, hand waving | |||
=== Planned for Upcoming Period === | === Planned for Upcoming Period === | ||
Engineering | Engineering | ||
* | * Client | ||
* Data validation: https:// | ** Do code reviews for deletion pings and choices info bar | ||
* | ** Pending ping cleanup | ||
** Investigate count discrepancies between "main" pings and "saved session" pings | |||
* Pipeline | |||
** Continue with scaling work | |||
** Monitoring work for Telemetry data | |||
** Investigate executive stream discrepancies | |||
** Bug fixes | |||
* Data validation | |||
** Join corresponding v2 data to v4 nightly clients data set | |||
** Continue writing callbacks that look at other measures | |||
** Breadth first, do a first pass at most validations and flag big issues | |||
** Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268 | |||
* Data continuity | |||
** Document strategy for executive dashboards with v2 + v4 data | |||
Ops | Ops | ||
* | * data bricks investigation (big jobs on big clusters) - cost, resourcing etc | ||
* | QA | ||
* closing bugs | |||
* test suite creation | |||
* finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing) | |||
Project Management | Project Management | ||
* | * Finish triage of bugs | ||
* | * remainder of release tasks scheduled | ||
=== Outstanding requests not yet road mapped into a release === | === Outstanding requests not yet road mapped into a release === | ||
Latest revision as of 20:53, 17 July 2015
Unified Telemetry status report July 17, 2015
Overall Project Health
Green - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation.
Exec Summary
- Client work delayed this week by sick time. Send logic and a few other changes planned for uplift to Aurora & Beta next week. Remaining work for 41 waiting for reviews.
- July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work
- Testing plan up on wiki:Telemetry/Testing
- Ongoing planning on FHR V2/V3 historic pipeline migration link to status here.
Risks/Issues
| Description of Risks/Issues | State | Owner | Plan to Resolve/Mitigation | Target Date |
|---|---|---|---|---|
| Data integrity between V2/V4 and V4 internal data consistency | Open | Brendan/Sam | Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation | 7/30 |
| Data continuity across V2/V4 | Open | Katie/Mark/Trink | Plan, Metabug | 7/23 |
| Legal review | Open | BDS/Legal | Meeting between groups | 8/04 |
| QA sign off (functional, load) | Open | Stuart | Telemetry/Testing | 8/04 |
| Operations - data retention requirements | Open | Travis/Katie | Eng team owes ops a doc defining ping types and data retention requirements | 8/04 |
| Operations - analysis tools & microservices | Open | Travis/Mark/Roberto | Architecture/Data flow diagram | 8/04 |
| Data loss incident | Fixed | mreid/whd/trink | Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). | 7/15 |
| Remote about:healthreport content | Open | Katie/Georg | Made a request to Laura Thomson for help | 8/04 |
| Budget, size of UT pings | Open | Mark/BDS | https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 | 8/04 |
| Analysis difficulty | Open | Katie/tbd | No plan yet, aside from ongoing work on tools | 8/04 |
Accomplished for Last Period
Engineering & Ops
- Heka 0.10.0 beta released
- Client work: Spreadsheet
- Not uplifting recent send logic changes to Beta (needs more bake time for confidence)
- Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified
- Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review
- Data validation
- Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
- Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Pipeline scaling work
- Finished distributed aggregation work started at workweek: https://github.com/mozilla-services/data-pipeline/pull/93
- Deployed next round of changes
- Telemetry tools and microservices
- Work on memory footprint of the Spark jobs: https://bugzilla.mozilla.org/show_bug.cgi?id=1182499
- Kickoff meeting for deployment plan for telemetry tools and microservices: Architecture flow diagram
QA
- test cases, bug closing
Project management
- meeting, emails, hand waving
Planned for Upcoming Period
Engineering
- Client
- Do code reviews for deletion pings and choices info bar
- Pending ping cleanup
- Investigate count discrepancies between "main" pings and "saved session" pings
- Pipeline
- Continue with scaling work
- Monitoring work for Telemetry data
- Investigate executive stream discrepancies
- Bug fixes
- Data validation
- Join corresponding v2 data to v4 nightly clients data set
- Continue writing callbacks that look at other measures
- Breadth first, do a first pass at most validations and flag big issues
- Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Data continuity
- Document strategy for executive dashboards with v2 + v4 data
Ops
- data bricks investigation (big jobs on big clusters) - cost, resourcing etc
QA
- closing bugs
- test suite creation
- finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
- Finish triage of bugs
- remainder of release tasks scheduled
Outstanding requests not yet road mapped into a release
| Description | State | Owner | Plan to Resolve/Mitigation | Target Date |
|---|---|---|---|---|
| FireFox OS - app pings | Open | Katie | Need to schedule and understand impact on project | TBD |
| histograms for loop/hello | Open | Katie | Need to schedule and understand impact on project | TBD |