Telemetry Meeting 2012-05-16

Attendance: metrics and perf teams

Note: I was late to this meeting and had Vidyo trouble in the middle. There may be missing content.

Data validation

Data validation doesn’t need to be perfect
We can prototype validation with demo data
Limited validations to look at trunk versions of Firefox, not applicable to older data, still have data from older versions coming in every day
- Data from older versions is still valuable to collect
- Already making decisions based on telemetry in current form, can’t make as many decisions as we’d like
- Historical data used by memshink to track over time
- We have ability to validate old data we just have to do it per version
How do we have continuity of data when it differs?
- View data as different histograms
- We should see if the trend is in any way comparable
When do we stop paying attention to releases?
- Stakeholders have different requirements.
Validate consistency of each payload
How much data do we take in?
- 5% is 3G, 60Gb/day as of May 14 – this is data that we’re storing
- might be as high as 120Gb/day
Can get max of 2 pings in 24 hours
- 2 pings is the common (70%) case
Telemetry persistence is in beta population
- Concern for metrics as this can potentially double amount of traffic
- Persistence went in at the same time as compression, which only reduces over the wire transmission
- We can look at optimizing data storage

First piece of validation is amount of traffic on a daily basis
- Gauge based on how much traffic we expect to get
- Do we want to base this off of the data that we were previously receiving?
Have volume check in the dashboard
- Want poll check
- Don’t have production system yet
- Multiple ways to fail,
  - Fail sending data on browser level - fail to send
  - Fail to ETL data from hbase to elastic search
  - Metadata that said how much data made it through to elastic search may also be correct
- Perf cares about whether submissions are coming in as there is no way to recover from that failure
- Should implement fail safes to notify if volume changes significantly
Should push validation through as the specific validation would be useful in the new system
- Raw push notification of volume – count, size – of submissions going into hbase
  - Needs an owner to monitor
- For as long as we have multiple tiers we can do the same thing for each tier
Would like Perf team to be responsible for the validation checks
- Metrics will take extracts of data that you want - versions, other variations - and deliver to you a corpus of JSONs that you devise the appropriate checks to spit out valid or invalid
- Metrics proposed method of transforming invalid data
- Suggesting that we give you data or give you access for perf to do validation – agreed
Production deliverable is a harness that pulls down validation script on a regular basis and tests data
- Eventually you need to validate the whole data set, 5% of data should catch most of the issues
- Only thing not handled are start-up histograms
- Perf wants to run the script on the entire data set
Metrics doesn’t have enough domain experience to look at data in elastic search for inconsistencies
- Think we need to do this the hard, manual way, with domain experts who understand the knowledge
- Perf thinks that testing is not feasible for this data
- Metrics team will not be culpable for this type of data
Enable push notifications for 3 storage repositories for data volume
- Once we have a validation script it will push out counts of valid/invalid
Want check about percentage of Firefox users reporting to telemetry
Metrics will create a validation proposal for data validation and share with Perf for review
- Lawrence and Taras to set the statement of work
- Perf will sign off on validation
Metrics (Daniel) commits to perf being able to run validation scripts on server

Team interaction

Metrics feels that they are being told what to do by Perf
- Metrics has suggestions that we need to listen to for collaboration
Two problems from Webdetails perspective
- 1 analyst team feel that the Metrics team should be able to give more input to how we are collecting Telemetry data
  - Don’t understand why we are aggregating and collecting the data in the method that we are
- 2. Tools we have are the only way Telemetry data can be viewed, two dashboards give only visibility into data, that’s why it’s so important for Perf to communicate deliverables
  - Metrics hasn’t been able to prioritize features in a proper way - we're being inefficient
  - Webdetails can’t keep working in the way that they have
  - We had a problem with IT removing a server
  - We need well defined statements of work from Perf team where requirements are documented so that we can prioritize work
  - Answers that are expected immediately puts us in a bad position
There is a difference between technical detail requirement and business detail requirement
- Here is what my devs want to do
- First set of requirements will have some that are infeasible, too expensive, etc.
  - Lots of discussion on how to get from first requirement to implementation
Need to try to insulate requests so that we can act on them and provide a good deliverable
More isolated checks of work with specific requirements about what you’re trying to analyze
- Telemetry evolution is an example of this
- Never had reasonable checkpoints that both sides agreed on
Taras isn’t the Telemetry person
- Can have conversations but there are more people who need to be involved
- There are a comprehensive set of business questions coming from various sources
  - Getting this from interviews that Daniel and Lawrence conducted
- Once we have this analysts can work on proposals for how we can meet these requirements

Other

Should we start Telemetry over from the business questions to see what it would look like and then scope potential modifications to the existing system?
Who decided on histograms?
- We used Chrome as an example because they had an implementation that took multiple years to develop
- Lots of opposition to the idea of getting rid of histograms
  - In response, we dropped the idea
  - Taras understood that Justin, Joy and him agreed on this
Might be a better investment in time to do an a/b test framework instead of building out telemetry further
- Could be used by other teams
- Telemetry needs to sample the world
  - The question is how does this feature behave in the real world?
  - We're not performing experiments
- JS does want a/b testing
- Memshrink may want it as well
Other projects need further discussion for prioritization and scope
- Telemetry UX
- Notifications
- Production quality offering

ProgramManagement/Programs/Telemetry/Meetings/2012-05-16

Contents

Telemetry Meeting 2012-05-16

Data validation

Team interaction

Other

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools