User Data Principles

From MozillaWiki
Jump to: navigation, search

Mozilla may, in the course of providing services for the user, need to retain user data on its servers.

When Mozilla retains user data, it will always be guided by the Mozilla Privacy and Data Operating Principles.

XXX Final text of the Data Operating Principles here - edits from blog post?

System Principles

In concrete terms, what this means for a Mozilla-operated service is:

  1. The principle of least privilege is observed throughout the design. Systems should collect only the data they need to provide a service, and should only provide data availability to systems that need to operate on it. (See "On Encryption" below, for more)
  2. Data persistence systems are designed with defense in depth. Individual services must be resilient against cross-site request forgery and script injection attacks. Services must be isolated, so that even a fully compromised service cannot escalate to an attack on the entire database.
  3. In addition to inter-service resilience, Mozilla-operated services have inter-user attack resilience. A compromise of a single user's data should not escalate into an attack on any other user.
  4. Users must always be able to determine what data is retained, and how that data is being used. They should be able remove any piece of data at will, and revoke any authorization that has been granted on a particular piece of data. Authorizations should be granted on the smallest possible unit of data, to the practical extent of available technology.
  5. Any Mozilla-operated service should protect users from administrative access as well. This means that if data is readable by administrative users or batch processes, there is a well-defined, auditable, revokable process for granting access to it. That is, there is no super-user level at which an administrative user or process can read all data without leaving a trace.

Engineering Implications

The engineering principles derived from this approach are:

  • User credentials should stay "close to the data". This means that, when a service does not have a user credential, it should not be able to read or write user data.
  • Implementation strategies for this are two-fold:
    • Data can be stored encrypted, with client-side keys; in this case, the decryption key is the ultimate user credential (see On Encryption)
    • Or user credentials must be relayed from the client-facing service all the way down to the data persistence system used for authorization and authentication decisions at that level.
  • When data is available to administrators or batch processes, there should be an authentication/authorization process that grants a "single user's worth" of authorization at a time. For example, a superuser may obtain a user credential that represents "Superuser Alice, accessing data for user Bob", and data systems should recognize this credential as having Bob's access level. The generation of any such credential should be audited in a way that Alice cannot tamper with it.
  • Data should be subjected to access control rules that enforce both user-level and application-level controls, and these rules should be visible to the user. If a user stores a piece of data with application A, it should not automatically be available to application B.
  • There should always be a simple process that the user can use to find out what has been stored on Mozilla's servers, and a simple process to remove, deauthorize, and/or export the data.

On Encryption

Encryption, on the client or server, can be used to provide some of these properties. An encrypted record should be thought of as a mobile object which encapsulates a user credential test. This has a number of good properties but also limits the design in some cases.

The Firefox Sync system uses client-side keys, which provides a strong guarantee of user authentication, which makes it a good match for conveying user credentials (like passwords). The requirement to never let a client-side key leave the client makes the design harder to use for other applications (such as server-side analysis of data), and the conveyance of keys between clients during the setup phase is a user experience challenge. Ironically, the opacity of encrypted records can make some pieces of the system principles harder to achieve - for example, transparency into stored data cannot be provided by a web application, since only the client knows how to read the data. The careful use of client-side technology is helpful here, but cross-browser implementations can be difficult to achieve.

The Sauropod system (more below) proposes to user server-side keys, generated on a per-user basis, managed by a high-trust Key Server (read more at trusted computing base). This would enable server-side analysis of data, and could reduce the setup phase of a client interaction to a simple authentication request. It raises the additional difficulty of securing and managing access to the Key Server.

Implementors should also take care that attackers cannot manipulate the system with encrypted data, for example with replay attacks or unsigned record deletion requests.

On Shared Persistence

A Labs project, termed Sauropod, is exploring the design of a data persistence system that has service-visible plaintext but still provides defense in depth and least privilege. More details are on the Sauropod project page.