Glean/Adding or changing Glean metric types

From MozillaWiki
Jump to: navigation, search

Background

Glean is Mozilla’s modern product analytics and telemetry solution that provides data for our new products. It aims to be easy to integrate, reliable and transparent by providing an SDK and integrated tools.

One of the Glean principles is to provide higher-level metric types that map semantically to what users want to measure: for example, it is helpful for both validation and analysis to know that something is a counter rather than just a more general "integer", as this implies that its value cannot be less than or equal to 0.

The current offered metric types were designed to cover the majority of Mozilla use-cases, but we know that new use-cases will come up. Some have already (UrlMetricType, StringList vs StringSet, dropping labelled booleans, coarse timing distributions, error stacks, changes to quantity/counters, enumerations, ratios).

Motivation

The base set of metric types offered by Glean, from our initial design document, were designed by going through the pings sent by our mobile products and identifying the higher level metric types required to reach feature parity with the existing telemetry system. The design document was reviewed by data engineering and that process helped smoothing out some of the rough edges of the existing legacy system. However, no metric type was fundamentally new and this meant we did not have to answer questions such as:

  • can the type answer the business questions we’re adding instrumentation for?
  • can the type be used to leak user data?
  • does the metric type require custom processing when ingested?

While the Glean/Telemetry team has experience and historical knowledge that could inform the answer to these questions, other teams have a much deeper expertise on these topics. Their opinions and recommendations are vital for the process of adding new metric types.

For this reason, the Glean/Telemetry team alone cannot make a call about whether or not a request for a new metric type is reasonable. The Glean end-to-end tech lead must be responsible for that and must base their decision on the feedback from the consulted stakeholders.

The committee

As mentioned, adding a new metric type to the Glean ecosystem does not exclusively have implications on the Glean SDK. All the teams involved in the Glean ecosystem, in addition to the team or individuals (NOTE: Teams or individuals part of the committee can file requests as well, if needed.) requesting the new metric type or changes to the existing ones, need to be consulted.

This process structure attempts to bring together all the points of view of the different stakeholders of the Glean ecosystem. The volume of the incoming requests is expected to decrease over time, as the Glean offering becomes more complete and comprehensive.

The following sections depicts the three roles identified to move the process along.

The requester

This is the team or individuals asking for the change or the addition of a new metric type in Glean.

Team Name Member name(s) Domain of expertise/angle
Requester (depends on the request)
  • Why is the new metric type required?
  • What is the data that needs to be collected?
  • What is the specific question that the data needs to answer?

The triage owner

This is the team or individuals that are responsible for triaging incoming requests and make sure that all the requests are acted on.

Team Name Member name(s) Domain of expertise/angle
Triage owner The owner of the bugzilla component
  • Is not required to be involved in the decision making process.
  • Guarantees that each request is acted on within 6 business days.

The Glean end-to-end tech lead

The individual who has an end-to-end understanding of the Glean ecosystem, oversees its strategy and long-term goals.


Team Name Member name(s) Domain of expertise/angle
Glean end-to-end tech lead Michael Droettboom
  • Does the requested type fit into the product strategy?
  • Does the cost for implementing this outweigh its benefits?
  • If the requested change can be technically achieved, should it still be done?

The consulted stakeholders

This table attempts to capture the stakeholders that need to be consulted with, as they work or have to do with the Glean ecosystem.


Team Name Member name(s) Domain of expertise/angle
Data Science Marissa Gorlick
  • How would the new type impact analyses?
  • Can the requested type be used to answer meaningful questions in a scalable way?
  • Can the use-case be satisfied by using any existing metric type?
  • Will the data be easy to misinterpret, and are there ways to minimize that?
  • Help vet definitions posted to organization
Data Stewards Any available steward
  • Does the new type pose privacy challenges?
  • Should the data-review process be changed to address this new type?
Data Tools
  • Marina Samuel
  • Rob Hudson
  • Can this metric type be unambiguously aggregated?
  • Will this metric pose problems when trying to plot it?
  • Can this metric type be accessed?
  • Will this metric type introduce previously unseen complexity to our aggregation process?
  • Will this metric type's data (or its aggregate data) introduce new complexity when we import it into the low latency dbs used by our web-based data tools?
Pipeline
  • Frank Bertsch
  • Mark Reid
  • Would this create problems with the payload?
  • How would the new type translate to BQ types?
  • Can the new type be represented at all in a convenient way?
  • Can the use-case be satisfied by using any existing metric type?
SDK
  • Alessio Placitelli
  • Beatriz Rizental
  • Would the new type violate SDK principles?
  • Would the API be reasonably ergonomic?
  • Would the API work on all the supported platforms?
  • Can the use-case be satisfied by using any existing metric type?
  • Are there any performance concerns? (speed, size on disk, bandwidth, etc.)

The processes

This section outlines the two processes involved in changing or adding metric types. The workflow starts with the user making the request. After that, the requested changes are discussed among the different stakeholders listed in the previous sections.

Requirements

Before any of the following processes can take place, the following requirements need to be satisfied:

  1. Representatives for each team of the committee must be nominated and added to this document in the committee section.
  2. Managers or representatives for each team of the committee must sign-off on this proposal, at the top of the document.
  3. A new Bugzilla component, "Data platforms & tools::Glean Metric Types", must be created.
  4. All the members of the committee must subscribe to the Bugzilla component.
  5. Representatives for each team must nominate the triage owner for the Bugzilla component.
  6. A bugzilla form for submitting requests must be created (see the related paragraph).
  7. A discussion document template must be available to be forked by the triage owner.
  8. Documentation for requesting new metric types or changing existing ones must be available on the Book of Glean.

The proposal process

This section describes how users should file a request for either changing or adding a new metric type.

  1. User files a bug using a custom form in the Data platforms & tools::Glean Metric Types component in Bugzilla.
  2. The triage owner of the Bugzilla component prioritizes this within 6 business days and kicks off the decision making process.
  3. Once the decision process is completed, the bug is closed with a comment outlining the decision that was made.

The custom Bugzilla form

The form contains the following information:

  • A description of the data that needs to be recorded.
  • A raw sample of the data that needs to be recorded. This is in the abstract, and not any particular implementation details about its representation in the payload or the database.
  • The business question/use-case that requires the data to be recorded.
  • How the data would be consumed.
  • Why existing metric types are not enough.
  • The timeline by which the data needs to be collected.

The decision making process

This section outlines the process with which a decision is made when a new request comes in.

  1. The triage owner of the Bugzilla component triages the request.

  2. The triage owner copies all the information from the bug into a document (for allowing easier async communication through comments).

  3. The triage owner attaches the document to the bug and flags all the members of the committee on bugzilla with a Bugzilla review-request on the document attachment.

  4. Members of the committee discuss the content of the document using the google docs commenting system.

    1. Reviewers are expected to review the document at most within 6 business days.
  5. (optional) The Glean end-to-end tech lead can call for a meeting to be organized to further discuss the document, if needed.

  6. If more information is required from the requester, they are flagged on the document with a comment.

  7. The teams that will need to do the implementation work will need to provide an estimate of the required work. The effort has to be weighted in the final decision.

  8. All the members of the committee sign off on the document (in their related sections, e.g. "consulted" or “own”) or leave a comment about why the change/new metric is not needed.

  9. The Glean end-to-end tech lead makes a decision on the request, based on the feedback by all the consulted stakeholders.

  10. The triage owner closes the bug and leaves a comment with the decision outcome.

    1. The triage owner makes the document publicly accessible for the public to consult, if needed, by attaching it to the bug as a Markdown document.

    2. If the decision is to proceed with the change, the triage owner files the required bugs in the relevant components (e.g. for new metric types, a bug for SDK changes is likely required).

    3. If the decision is to not make any change to metric types, the committee must identify and recommend an existing metric type to use instead.

Important: the triage owner will be responsible for driving the conversations and make sure that a decision is reached within the expected timeframe.

The discussion document

In addition to a copy of all the data present in the bugzilla form, the discussion document must contain the following sections:

  • a traceability matrix at the top of the document, listing all the committee members for facilitating the discovery of the sign-offs;
    • this should have at least two sections: Glean end-to-end tech lead and consulted.
  • a section for each team represented in the committee to add their considerations about the request; ideally each section should summarise their decision with respect to the proposal and highlight any critical issue;

The template is available here.

Q&A

Q: What’s the acceptance criteria for changes/new metrics?

A: In addition to the food for thought from this section, the committee must consider the following aspects:

  • the proposed changes/new metric must not break any of the Glean principles;
  • if changing an existing metric type, the change must not break any existing usage;
    • an email must be sent to fx-data-dev@mozilla.org to announce the intent to change the metric.
    • if it breaks existing uses, the requester must commit to fix the breakage (or partner with the appropriate stakeholders, if identifiable) as agreed with the committee.

Q: What if the committee disagrees on the decision for the changes/new metrics?

A: The Glean end-to-end tech lead has veto power: they can either decide to proceed with the change (e.g. because of strategic product needs) or to reject the request (e.g. because of cost/benefit ratio)