Unified Telemetry/Status reports/July 24 2015: Difference between revisions

(→‎Risks/Issues: added new issue)
 
(10 intermediate revisions by the same user not shown)
Line 4: Line 4:


=== Overall Project Health ===
=== Overall Project Health ===
Last week: Green


Green - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation.
This week: Yellow - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation, [https://bugzilla.mozilla.org/show_bug.cgi?id=1185123 client side pings] being the current blocker under investigated.


=== Exec Summary ===
=== Exec Summary ===
* Client work delayed this week by sick time. Send logic and a few other changes planned for uplift to Aurora & Beta next week. Remaining work for 41 waiting for reviews.
* Validation work hits ping bugs, current blocking issue
* July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work
* July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work
* Testing plan up on wiki:[[Telemetry/Testing]]
* Testing plan up on wiki:[[Telemetry/Testing]]
Line 18: Line 19:
! Description of Risks/Issues !! State !! Owner !! Plan to Resolve/Mitigation !! Target Date
! Description of Risks/Issues !! State !! Owner !! Plan to Resolve/Mitigation !! Target Date
|-
|-
| Investigate gaps in pings || Open || Stuart/Alessio || https://bugzilla.mozilla.org/show_bug.cgi?id=1185123 || 8/04
| Investigate gaps in pings || Open || Stuart/Alessio || https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, [https://etherpad.mozilla.org/u230yVoP9S working doc] || 8/04
|-
|-
| Data integrity between V2/V4 and V4 internal data consistency || Open || Brendan/Sam || Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation || 7/30
| Data integrity between V2/V4 and V4 internal data consistency || Open || Brendan/Sam || Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation || 7/30
Line 43: Line 44:
=== Accomplished for Last Period ===
=== Accomplished for Last Period ===
Engineering & Ops
Engineering & Ops
* Heka 0.10.0 beta released
* Initial Databricks investigation: not useful to Perf Team, metrics team/Katie to decide next week if it suits our purpose.
* Aggregation work up in stage, needs testing
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet]
* Client work: [https://docs.google.com/spreadsheets/d/1yAJmgCGYyk1d7A41DZa653Z3u2AbH-kDWsO1vPSgbfE/edit?usp=sharing Spreadsheet]
** Not uplifting recent send logic changes to Beta (needs more bake time for confidence)
** Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified
** Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review
* Data validation
* Data validation
** Missing pings [https://etherpad.mozilla.org/u230yVoP9S doc]
** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
** Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
** Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
** Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
Line 58: Line 58:
** Kickoff meeting for deployment plan for telemetry tools and microservices: [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing Architecture flow diagram]
** Kickoff meeting for deployment plan for telemetry tools and microservices: [https://docs.google.com/a/mozilla.com/document/d/1KoLtIFV-aZtxruSVNmcc26F22MfqWjDynKgZ6adYk54/edit?usp=sharing Architecture flow diagram]
QA
QA
* test cases, bug closing
* Investigate client QA automated test scripts
* Update test wiki
*work with softvision to prepare for RC pass
Project management
Project management
* meeting, emails, hand waving
* meetings, emails, hand waving


=== Planned for Upcoming Period ===
=== Planned for Upcoming Period ===
Line 67: Line 69:
* Client
* Client
** Do code reviews for deletion pings and choices info bar
** Do code reviews for deletion pings and choices info bar
** Pending ping cleanup
** Continue Pending ping cleanup
** Investigate count discrepancies between "main" pings and "saved session" pings
** Continue Investigate count discrepancies between "main" pings and "saved session" pings
* Pipeline
* Pipeline
** Continue with scaling work
** Continue with scaling work
Line 75: Line 77:
** Bug fixes  
** Bug fixes  
* Data validation
* Data validation
** Join corresponding v2 data to v4 nightly clients data set
** Working on 100k-client paired v2/v4 pings from early June to early July
** Continue writing callbacks that look at other measures
** validation efforts (main vs saved-sessions, ending subsessions pings, broken chaining)
** Breadth first, do a first pass at most validations and flag big issues
** Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
** Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
* Data continuity
* Data continuity
** Document strategy for executive dashboards with v2 + v4 data
** Document strategy for executive dashboards with v2 + v4 data
Ops
Ops
* data bricks investigation (big jobs on big clusters) - cost, resourcing etc
* aggregate pipeline available in staging, needs testing
QA
QA
* closing bugs
* closing bugs
* test suite creation
* continue test suite creation
* finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
* finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
Project Management

Latest revision as of 17:04, 24 July 2015

previous weeks report

Unified Telemetry status report July 24, 2015

Overall Project Health

Last week: Green

This week: Yellow - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation, client side pings being the current blocker under investigated.

Exec Summary

  • Validation work hits ping bugs, current blocking issue
  • July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work
  • Testing plan up on wiki:Telemetry/Testing
  • Ongoing planning on FHR V2/V3 historic pipeline migration link to status here.

Risks/Issues

Description of Risks/Issues State Owner Plan to Resolve/Mitigation Target Date
Investigate gaps in pings Open Stuart/Alessio https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, working doc 8/04
Data integrity between V2/V4 and V4 internal data consistency Open Brendan/Sam Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation 7/30
Data continuity across V2/V4 Open Katie/Mark/Trink Plan, Metabug 7/30
Legal review Open BDS/Legal Meeting between groups 8/04
QA sign off (functional, load) Open Stuart Telemetry/Testing 8/04
Operations - data retention requirements Open Travis/Katie Eng team owes ops a doc defining ping types and data retention requirements 8/04
Operations - analysis tools & microservices Open Travis/Mark/Roberto Architecture/Data flow diagram 8/04
Data loss incident Fixed mreid/whd/trink Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). 7/15
Remote about:healthreport content Open Katie/Georg Made a request to Laura Thomson for help 8/04
Budget, size of UT pings Open Mark/BDS https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 8/04
Analysis difficulty Open Katie/tbd No plan yet, aside from ongoing work on tools 8/04

Accomplished for Last Period

Engineering & Ops

QA

  • Investigate client QA automated test scripts
  • Update test wiki
  • work with softvision to prepare for RC pass

Project management

  • meetings, emails, hand waving

Planned for Upcoming Period

Engineering

  • Client
    • Do code reviews for deletion pings and choices info bar
    • Continue Pending ping cleanup
    • Continue Investigate count discrepancies between "main" pings and "saved session" pings
  • Pipeline
    • Continue with scaling work
    • Monitoring work for Telemetry data
    • Investigate executive stream discrepancies
    • Bug fixes
  • Data validation
    • Working on 100k-client paired v2/v4 pings from early June to early July
    • validation efforts (main vs saved-sessions, ending subsessions pings, broken chaining)
    • Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
  • Data continuity
    • Document strategy for executive dashboards with v2 + v4 data

Ops

  • aggregate pipeline available in staging, needs testing

QA

  • closing bugs
  • continue test suite creation
  • finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)

Project Management

  • Finish triage of bugs
  • remainder of release tasks scheduled

Outstanding requests not yet road mapped into a release

Description State Owner Plan to Resolve/Mitigation Target Date
FireFox OS - app pings Open Katie Need to schedule and understand impact on project TBD
histograms for loop/hello Open Katie Need to schedule and understand impact on project TBD

Important Links/References