Unified Telemetry/Status reports/July 17 2015: Difference between revisions
Jump to navigation
Jump to search
m (→Accomplished for Last Period: formatting) |
m (→Accomplished for Last Period: formatting) |
||
Line 41: | Line 41: | ||
=== Accomplished for Last Period === | === Accomplished for Last Period === | ||
* Engineering | |||
* heka 0.10.0 beta pushed | ** heka 0.10.0 beta pushed | ||
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet] | ** Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet] | ||
** Not uplifting recent send logic changes to Beta (needs more bake time for confidence) | *** Not uplifting recent send logic changes to Beta (needs more bake time for confidence) | ||
** Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified | *** Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified | ||
** Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review | *** Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review | ||
* Updates to the unified telemetry decoder and executive report | ** Updates to the unified telemetry decoder and executive report | ||
* [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing Architecture flow diagram] in preparation for meeting with ops | ** [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing Architecture flow diagram] in preparation for meeting with ops | ||
* Progress on data validation | ** Progress on data validation | ||
** Compare FHR v2 and FHR v4 search, crash, and other fields: https://bugzilla.mozilla.org/show_bug.cgi?id=1179376 -- close agreement for search counts | *** Compare FHR v2 and FHR v4 search, crash, and other fields: https://bugzilla.mozilla.org/show_bug.cgi?id=1179376 -- close agreement for search counts | ||
** Saved-session vs main pings: https://bugzilla.mozilla.org/show_bug.cgi?id=1147395 -- mismatch in about 7% of sessions for one of the metrics investigated | *** Saved-session vs main pings: https://bugzilla.mozilla.org/show_bug.cgi?id=1147395 -- mismatch in about 7% of sessions for one of the metrics investigated | ||
* created new milestones for remainder of r40 cycle, triaged bugs into the buckets | ** created new milestones for remainder of r40 cycle, triaged bugs into the buckets | ||
* Ops | |||
* alerts and status code trapping | ** alerts and status code trapping | ||
Performance | * Performance | ||
* spark out of mem jobs | ** spark out of mem jobs | ||
* QA | |||
* test cases, bug closing | ** test cases, bug closing | ||
Project management | * Project management | ||
* meeting, emails, hand waving | ** meeting, emails, hand waving | ||
=== Planned for Upcoming Period === | === Planned for Upcoming Period === |
Revision as of 16:21, 17 July 2015
Unified Telemetry status report July 17, 2015
Overall Project Health
Green - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation.
Exec Summary
- Client work delayed this week by sick time.
- Working toward data validation milestone on July 30: http://mzl.la/1J2OdZA
- Pipeline scaling work to be completed by July 30(?)
- <tools>
- Ongoing planning on FHR V2/V3 historic pipeline migration link to status here.
Risks/Issues
Description of Risks/Issues | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
Data integrity between V2/V4 and V4 internal data consistency | Open | Brendan/Sam | Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation | 7/30 |
Data continuity across V2/V4 | Open | Katie/Mark/Trink | Mark writing up plan from Whistler; metrics team specifying data sets and reviewing "executive" data set. https://bugzilla.mozilla.org/show_bug.cgi?id=1182684 | 7/23 |
Legal review | Open | BDS/Legal | Meeting between groups | 8/04 |
QA sign off (functional, load) | Open | Stuart | Working with QA on creating test cases/test plans | 8/04 |
Operations - data retention requirements | Open | Travis/Katie | Eng team owes ops a doc defining ping types and data retention requirements | 8/04 |
Operations - analysis tools & microservices | Open | Travis/Mark/Roberto | Architecture/Data flow diagram; meeting next Monday (7/13) | 8/04 |
Data loss incident | Fixed | mreid/whd/trink | Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). | 7/15 |
Remote about:healthreport content | Open | Katie/Georg | Made a request to Laura Thomson for help | 8/04 |
Budget, size of UT pings | Open | Mark/BDS | https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 | 8/04 |
Analysis difficulty | Open | Katie/tbd | No plan yet, aside from ongoing work on tools | 8/04 |
Accomplished for Last Period
- Engineering
- heka 0.10.0 beta pushed
- Client work: Spreadsheet
- Not uplifting recent send logic changes to Beta (needs more bake time for confidence)
- Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified
- Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review
- Updates to the unified telemetry decoder and executive report
- Architecture flow diagram in preparation for meeting with ops
- Progress on data validation
- Compare FHR v2 and FHR v4 search, crash, and other fields: https://bugzilla.mozilla.org/show_bug.cgi?id=1179376 -- close agreement for search counts
- Saved-session vs main pings: https://bugzilla.mozilla.org/show_bug.cgi?id=1147395 -- mismatch in about 7% of sessions for one of the metrics investigated
- created new milestones for remainder of r40 cycle, triaged bugs into the buckets
- Ops
- alerts and status code trapping
- Performance
- spark out of mem jobs
- QA
- test cases, bug closing
- Project management
- meeting, emails, hand waving
Planned for Upcoming Period
Engineering
- Pipeline monitoring (tracking errors by channel and build id)
- Uplift final client changes for r40: spreadsheet
- Data validation: https://etherpad.mozilla.org/fhr-v4-validation
- Targeting 40r9 as next major milestone to have bulk of validation work completed.
Ops
- Meeting to go over Telemetry tools/microservices production deployment
- Continued work on scaling for release loads
- data bricks investigation (big jobs on big clusters) - cost, resourcing etc
Performance
- automate spark
QA
- closing bugs
- test suite creation
- finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
- Finish triage of bugs
- remainder of release tasks scheduled
Outstanding requests not yet road mapped into a release
Description | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
FireFox OS - app pings | Open | Katie | Need to schedule and understand impact on project | TBD |
histograms for loop/hello | Open | Katie | Need to schedule and understand impact on project | TBD |