Security/Reviews/MetricsDataPing

From MozillaWiki
Jump to: navigation, search
Please use "Edit with form" above to edit this page.

Item Reviewed

Metrics Data ping
Target https://wiki.mozilla.org/MetricsDataPing Full Query
ID Summary Priority Status
718066 Initial landing of Firefox Health Report -- RESOLVED

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);

The given value "https://wiki.mozilla.org/MetricsDataPing Full Query
ID Summary Priority Status
718066 Initial landing of Firefox Health Report -- RESOLVED

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);

" contains strip markers and therefore it cannot be parsed sufficiently.

Introduce the Feature

Goal of Feature, what is trying to be achieved (problem solved, use cases, etc)

  • MetricsDataPing- get important metrics (see wiki)
    • this data is a criticial need for Moz for a variety of reasons
  • orig plans focused around a collection of metrics on client side to Moz servers once per day
    • for effective retention data and longitudinal study we need a cumulative view (over time)
    • initial proposal a UUID associated with installations profile, submitted each time so it can be merged with past data
    • the data set is opt-out vs opt-in to avoid self selection bias
  • changes
    • UUID removed and replaced with a document identifier, generated per request (per profile)
    • data accumulated client side vs. server side
    • sent with new ID and previous ID, which allows us to remove the older documents with the old ID

What solutions/approaches were considered other than the proposed solution?

  • UUID vs. Document ID (above)
  • blocklist ping - provides ADI, current metrics system, lots of attributes, only point in time, no time analysis; owners don't want other data collection on top, no retention analysis
  • telemetry - default opt-out nightly/aurora, but opt-in on others, focused on preformance data; not designed for time analysis, or retention
  • Test Pilot - double opt in, large self selection bias, skewed towards power user or early adopter not typical user
  • opt in vs. opt out - based on research bias on self selection
  • funnelcake - designed for adoption/retention, blocklist ping was the last part, but we were lossing this data
  • an actual representative "sample" rather than the full population
    • problem of keeping the sample stable and representative over time

Why was this solution chosen?

  • need for longitudinal analysis & retention analysis
  • we can look at them and see if there were problems if data stops coming

Any security threats already considered in the design and why?

  • UUID could be used if disclosed to find information about the user from the server system
    • this would persist across a backup, thus changed
  • Server side:
    • public unauthenticated system, write only or request to delete

Threat Brainstorming

  • Obvious Privacy stuff
  • why does the system have retrieval?
    • current system with document identifiers does not, maybe in a future version to allow client to get aggregate info so a user can compare things themselves
    • so user can see the data and remove if they want
  • Does the about:metrics / user data retrieval feature have to go out at the same time as the metrics collection on our servers?
  • What are the compliance issues mentioned on the wiki in regards to the data retrieval?
    • EU / Ger: privacy compliance regulations, even data about the functioning of the product without a user facing feature to support it
  • Where is the uuid/document identifier stored? Do webpages have access to UUID/Docuemtn ID?
    • Stored as a preference in about:config - accessible to "chrome" code, not regular web pages. Hence website fingerprinting not an issue.
    • a user could mess up the data by fiddling with about:config, could cause bogus data
  • How often will data be sent?
    • not more than once per 24 hours
  • What API is used?
    • simple post request to data collection system, same as telemetry.(data.mozilla.com)
  • Are there signatures on the request/responses? Is it over ssl?
    • yes over SSL, signature does not matter
  • is the certificate checked or basic SSL auth?
    • basic SSL Auth. Perhaps we could extend this.
  • When the server receives a new Document ID, it deletes the previous ID and data associated with it. Do we no longer need that data, or do we just delete the previous ID and retain the data?
    • each submission is a cumulative view from the client, there is only one doc at any time that represents that installation
    • allows for expiration of documents
  • What's the risk of other add-ons grabbing and using the Document ID as a unique identifier, much as iOS apps have been caught doing?
    • document ID changes every day, so not likely useful to other chrome privelaged processes unless they check all the time
      • but the add-on can just grab the document ID everyday and chain them.
        • if they had chrome privileges, they could just create an uuid themselves and use it anyway .
  • how random are the document IDs
    • uses UUID mechanism, same as crash stats
  • Property "SecReview feature goal" (as page type) with input value "* MetricsDataPing- get important metrics (see wiki)
      • this data is a criticial need for Moz for a variety of reasons
    • orig plans focused around a collection of metrics on client side to Moz servers once per day
      • for effective retention data and longitudinal study we need a cumulative view (over time)
      • initial proposal a UUID associated with installations profile, submitted each time so it can be merged with past data
      • the data set is opt-out vs opt-in to avoid self selection bias
    • changes
      • UUID removed and replaced with a document identifier, generated per request (per profile)
      • data accumulated client side vs. server side
      • sent with new ID and previous ID, which allows us to remove the older documents with the old ID" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
      • Property "SecReview alt solutions" (as page type) with input value "* UUID vs. Document ID (above)
    • blocklist ping - provides ADI, current metrics system, lots of attributes, only point in time, no time analysis; owners don't want other data collection on top, no retention analysis
    • telemetry - default opt-out nightly/aurora, but opt-in on others, focused on preformance data; not designed for time analysis, or retention
    • Test Pilot - double opt in, large self selection bias, skewed towards power user or early adopter not typical user
    • opt in vs. opt out - based on research bias on self selection
    • funnelcake - designed for adoption/retention, blocklist ping was the last part, but we were lossing this data
    • an actual representative "sample" rather than the full population
      • problem of keeping the sample stable and representative over time" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
      • Property "SecReview solution chosen" (as page type) with input value "* need for longitudinal analysis & retention analysis
    • we can look at them and see if there were problems if data stops coming" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
    • Property "SecReview threats considered" (as page type) with input value "* UUID could be used if disclosed to find information about the user from the server system
      • this would persist across a backup, thus changed
    • Server side:
      • public unauthenticated system, write only or request to delete" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
      • Property "SecReview threat brainstorming" (as page type) with input value "* Obvious Privacy stuff
    • why does the system have retrieval?
      • current system with document identifiers does not, maybe in a future version to allow client to get aggregate info so a user can compare things themselves
      • so user can see the data and remove if they want
    • Does the about:metrics / user data retrieval feature have to go out at the same time as the metrics collection on our servers?
    • What are the compliance issues mentioned on the wiki in regards to the data retrieval?
      • EU / Ger: privacy compliance regulations, even data about the functioning of the product without a user facing feature to support it
    • Where is the uuid/document identifier stored? Do webpages have access to UUID/Docuemtn ID?
      • Stored as a preference in about:config - accessible to "chrome" code, not regular web pages. Hence website fingerprinting not an issue.
      • a user could mess up the data by fiddling with about:config, could cause bogus data
    • How often will data be sent?
      • not more than once per 24 hours
    • What API is used?
      • simple post request to data collection system, same as telemetry.(data.mozilla.com)
    • Are there signatures on the request/responses? Is it over ssl?
      • yes over SSL, signature does not matter
    • is the certificate checked or basic SSL auth?
      • basic SSL Auth. Perhaps we could extend this.
    • When the server receives a new Document ID, it deletes the previous ID and data associated with it. Do we no longer need that data, or do we just delete the previous ID and retain the data?
      • each submission is a cumulative view from the client, there is only one doc at any time that represents that installation
      • allows for expiration of documents
    • What's the risk of other add-ons grabbing and using the Document ID as a unique identifier, much as iOS apps have been caught doing?
      • document ID changes every day, so not likely useful to other chrome privelaged processes unless they check all the time
        • but the add-on can just grab the document ID everyday and chain them.
          • if they had chrome privileges, they could just create an uuid themselves and use it anyway .
    • how random are the document IDs
      • uses UUID mechanism, same as crash stats" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Action Items

Action Item Status In Progress
Release Target Firefox 12
Action Items
WhoActionBy WhenCompleted date
code reveiw (about:metrics) bug 718066before landing on Aurora[NEW] in progress
Full Query
ID Summary Priority Status
764645 SecReview: Firefox Health Report - Security Code Review -- NEW

1 Total; 1 Open (100%); 0 Resolved (0%); 0 Verified (0%);

The given value "

WhoActionBy WhenCompleted date


code reveiw (about:metrics) bug 718066before landing on Aurora[NEW] in progress


Full Query
ID Summary Priority Status
764645 SecReview: Firefox Health Report - Security Code Review -- NEW

1 Total; 1 Open (100%); 0 Resolved (0%); 0 Verified (0%);

" contains strip markers and therefore it cannot be parsed sufficiently.