Data Collection: Difference between revisions

Removed mention of Pocket
(Clarifying our review process for sensitive data collection as per Sept 14 email discussion.)
(Removed mention of Pocket)
 
(25 intermediate revisions by 17 users not shown)
Line 12: Line 12:


Data Stewards:  
Data Stewards:  
* [https://people.mozilla.org/p/chutten/ :chutten]
* [https://people.mozilla.org/p/kennylong/ Kenny Long]
* [https://people.mozilla.org/p/p--bmntfp44vjd5goalkoeyqy Megan McCorquodale]
* [https://people.mozilla.org/p/jhirsch Jared Hirsch]
* [https://people.mozilla.org/p/jhirsch Jared Hirsch]
* [https://people.mozilla.org/p/daniela Daniela Arcese]
* [https://people.mozilla.org/p/TheOne Andreas Wagner]
* [https://people.mozilla.org/p/TheOne Andreas Wagner]
* [https://people.mozilla.org/p/tlong/ Travis Long]
* [https://people.mozilla.org/p/tlong/ Travis Long]
* [https://people.mozilla.org/p/willkg Will Kahn-Greene]
* [https://people.mozilla.org/p/p--n8wmyowcldls6pvp6ab1pj Roger Yang]
* [https://people.mozilla.org/p/p--n8wmyowcldls6pvp6ab1pj Roger Yang]
* [https://people.mozilla.org/p/elise Elise Richards]
* [https://people.mozilla.org/p/sancus :sancus]
* [https://people.mozilla.org/p/sancus :sancus]
* [https://people.mozilla.org/p/charlie-humphreys Charlie Humphreys]
* [https://people.mozilla.org/p/cboozarjomehri Cameron Boozarjomehri]
* [https://people.mozilla.org/p/chutten/ :chutten]
* [https://people.mozilla.org/p/sergiosonline Sergio Betancourt]
* [https://people.mozilla.org/p/aminomancer Shane Hughes]
* [https://people.mozilla.org/p/roux Roux Buciu]
* [https://people.mozilla.org/p/groovecoder Luke Crouch]


Data stewards come from a variety of teams within Mozilla, including data science, Firefox engineering, mobile products, Pocket, Common Voice, AMO, and Thunderbird. You are welcome to tag any steward for any collection request, without respect to the nature of your collection.
Data stewards come from a variety of teams within Mozilla, including data science, Firefox engineering, mobile products, AMO, and Thunderbird. You are welcome to tag any steward for any collection request, without respect to the nature of your collection.


Contact Us on Matrix https://chat.mozilla.org/#/room/#data-stewards:mozilla.org
Contact Us on Matrix https://chat.mozilla.org/#/room/#data-stewards:mozilla.org
Line 43: Line 44:
* Data steward - the person who ensures the data collection process is followed and that requested data complies with Mozilla policies  
* Data steward - the person who ensures the data collection process is followed and that requested data complies with Mozilla policies  


In some cases a data steward may escalate concerns to the Trust and Legal teams. They are the teams responsible for defining Firefox data collection policies and can field questions about internal policy and laws governing user privacy
In some cases a data steward may escalate concerns to the Trust and Legal teams. They are the teams responsible for defining data collection policies and can field questions about internal policy and laws governing user privacy


Mozilla always strives to make data reviews public.  However, there are sometimes limited sets of circumstances when we may conduct our reviews in a private bug; for example, a service is part of an agreement where the partnership is not yet public.  These reviews will be made public once the actual data collection begins.
Mozilla always strives to make data reviews public.  However, there are sometimes limited sets of circumstances when we may conduct our reviews in a private bug; for example, a service is part of an agreement where the partnership is not yet public.  These reviews will be made public once the actual data collection begins.


= Requesting Data Collection =
= Adding or Modifying Data Collection =
The process is slightly different for collections in [https://hg.mozilla.org/mozilla-central/ mozilla-central] code (Firefox Desktop, Firefox & Focus for Android, and Gecko) than it is elsewhere. Please consult the relevant section below.
 
== Firefox Desktop, Firefox and Focus for Android, Gecko (from May 7, 2024) ==
 
When a developer uploads a change to Phabricator that adds or modifies any data collection, Phabricator will automatically add the <tt>needs-data-classification</tt> tag, and explain what happens next.
 
If you’re adding or modifying data collection in your Phabricator revision and this doesn’t happen automatically, please manually add this tag and then follow the same procedure.
 
Once this tag is in place Herald will ask the patch author and reviewer to assess the [[#Data_Collection_Categories|correct category for the data collection ]]:
 
* If the data being collected fits in the “technical data” or “interaction data” categories described there, use the <tt>data-classification-low</tt> tag.
* If it’s any other category, or patch author and reviewer disagree about the right category, use the <tt>data-classification-high</tt> tag, and go through [[#Step_3:_Sensitive_Data_Collection_Review_Process|the sensitive data collection review process]].
* If you think that the data in question fits in “technical” or “interaction” data but would benefit from additional review, you can also explicitly choose to use the <tt>data-classification-high</tt> tag and thereby opt in to the sensitive data collection review process.
 
When using Glean for the data collection, the data classification of the new or expanded data collections should match the <tt>data_sensitivity</tt> property in the metric definitions. The entry in the <tt>data_reviews</tt> list should reflect the bug URL.
 
If the reviewer is unsure or feels uncomfortable making this assessment themselves, they can [mailto:data-stewards@mozilla.com email the data stewards group] or [https://chat.mozilla.org/#/room/#data-stewards:mozilla.org contact them on matrix] for help.
 
Whichever tag you choose, please '''leave a comment explaining your choice'''. Note that you will not be able to land this revision until the revision has one of these tags and you remove the <tt>needs-data-classification</tt> tag. For low sensitivity data collection, you will be able to land the patch once this sensitivity is marked and you remove the <tt>needs-data-classification</tt> tag. For high sensitivity data collection, the [https://phabricator.services.mozilla.com/project/view/209/ <tt>data-stewards</tt>] group will be added as a blocking reviewer on the patch. They will approve or request changes to the patch based on the [[#Step_3:_Sensitive_Data_Collection_Review_Process|sensitive data collection review process]].
 
Patch authors are encouraged to add these tags themselves, but '''reviewers are responsible for making sure the right tag is used'''.
 
If you do not yet have a code change but are in the planning stages of a change and want to proactively discuss data collection options, reach out to [mailto:data-stewards@mozilla.com the data stewards group].
 
== Other Products ==
 
== Step 1: Submit Request ==
== Step 1: Submit Request ==
To request a review for new or changed Data Collection in a Mozilla product, Data Review requesters are required to provide the following:
To request a review for new or changed Data Collection in a Mozilla product, Data Review requesters are required to provide the following:
Line 81: Line 108:
=== Determine if you need to follow this process ===
=== Determine if you need to follow this process ===


For any data collection that is classified as category 3 or 4 (described below) – including in pre-release channels and experiments – we require additional review to be performed and an announcement to a mailing list. The reason for this is that while our privacy policies describe what we can do without additional user notice, this is an upper bound; even for collection which fits within the policy, we need to determine whether that collection is appropriate and conforms to our overall commitment to privacy and minimization.
For any data collection that is classified as category 3 or 4 (described below) – including in pre-release channels and experiments – we require additional review to be performed and an announcement to a mailing list. The reason for this is that while our privacy policies describe what we can do without additional user notice, this is an upper bound; even for collection which fits within the policy, we need to determine whether that collection is appropriate and conforms to our overall commitment to privacy and minimization. While a Data Steward may provide assistance with escalating a request or submitting it through the sensitive data review process, they are not part of the actual review of escalations. That is handled by a separate cross-functional team.


=== Create documentation and request review===
=== Create documentation and request review===


As a first step, it is important that the details of the implementation, intended use, and value to users be clearly documented for future reference and efficient review. As soon as this is ready (we recommend as early as possible, before you move forward with the implementation), send an email to the [https://groups.google.com/a/mozilla.com/g/data-review data-review@mozilla.com] mailing list.
As a first step, it is important that the details of the implementation, intended use, privacy analysis and value to users be clearly documented for future reference and efficient review. As soon as this is ready (we recommend as early as possible, before you move forward with the implementation), send an email to the [https://groups.google.com/a/mozilla.com/g/data-review data-review@mozilla.com] mailing list.


The initial documentation from engineering/data stewardship and privacy/technical review should be completed as a prerequisite ahead of legal and security.  
The initial documentation from engineering/data stewardship and privacy/technical review should be completed as a prerequisite ahead of legal and security. Please ensure that your documentation includes privacy analysis that explains what privacy mitigations there are and how it reduces any potential risk from the additional data collection (e.g. data minimization, OHTTP, etc.). The Sensitive Data Review team can assist with further elaborating or clarifying parts of the privacy analysis, but your review will go quicker if you first provide the team with an explanation on how the data and privacy preserving methods chosen fit into the specific context.  


{| class="wikitable"
{| class="wikitable"
Line 93: Line 120:
! Risk Assessment !! Owner !! Facilitator
! Risk Assessment !! Owner !! Facilitator
|-
|-
| Privacy/Technical Review || Office of the Firefox CTO || Kate Hudson
| Privacy/Technical Review || Office of the Firefox CTO || Martin Thomson
|-
|-
| Legal/Trust Review || Legal || Nneka Soyinka
| Legal/Trust Review || Legal || Nneka Soyinka
|-
|-
| Security Review || Office of the CSO || Marc Perrault
| Security Review || Office of the CSO || Alex Heartsfield
|-
| Data Review || Data || Arkadiusz Komarzewski
|}
|}


Facilitators (named above) are expected to express judgement about how much risk is involved and will involve the appropriate reviewers.
Facilitators (named above) are expected to express judgement about how much risk is involved and will involve the appropriate reviewers.


If the level of risk is determined to be low enough and/or there is clear precedent, further discussion may not be necessary and each reviewer may give a sign-off immediately; otherwise, mitigations should be incorporated and documentation updated once they have been addressed. Live discussion is often very helpful – and should be planned for – when there is significant risk involved.
If the level of risk is determined to be low enough and/or there is clear precedent, further discussion may not be necessary and each reviewer may give a sign-off immediately; otherwise, mitigations should be incorporated and documentation updated once they have been addressed. Live discussion is often very helpful – and should be planned for – when there is significant risk involved. One reviewer (after consulting with the full group), is permitted to approve on the group's behalf.


Data collection may not be shipped to users until final sign-offs have been obtained.
Data collection may not be shipped to users until final sign-offs have been obtained.
Line 116: Line 145:
= Data Collection Categories =
= Data Collection Categories =


There are four "categories" of data collection that apply to Firefox:
There are four "categories" of data collection:


; '''Category 1 “Technical data”'''
; '''Category 1 “Technical data”'''
Line 123: Line 152:
:Examples include OS, crashes and errors, outcome of automated processes like updates, activation, version #s, etc.  This also includes aggregated compatibility information about features and API usage by websites, addons, and other 3rd-party software that interact with the application during usage.  
:Examples include OS, crashes and errors, outcome of automated processes like updates, activation, version #s, etc.  This also includes aggregated compatibility information about features and API usage by websites, addons, and other 3rd-party software that interact with the application during usage.  


:  It also includes information about the user's settings that is necessary to provide functionality. For example, what applications users have connected to a service or what services users have logged into using a Firefox Account.
:  It also includes information about the user's settings that is necessary to provide functionality. For example, what applications users have connected to a service or what services users have logged into using a Mozilla account.


; '''Category 2 “Interaction data”'''
; '''Category 2 “Interaction data”'''
30

edits