Identity/CryptoIdeas/01-PBKDF-scrypt: Difference between revisions

← Older edit

Identity/CryptoIdeas/01-PBKDF-scrypt (view source)

Revision as of 17:05, 29 October 2013

6,789 bytes added , 29 October 2013

no edit summary

Toxborrow

Confirmed users

1,042

edits

@@ Line 1: / Line 1: @@
+{{LastUpdated}}
 == Key-Wrapping Techniques ==
@@ Line 36: / Line 37: @@
 minimizing the cost of each attempt.
-'''Remember:''' the cost figures described below are talking about an attacker who has already surmounted the difficult hurdle of breaking into a professionally-managed highly secure serer, either by force, coersion, or subpoena. These attacks are not available to a passive eavesdropper or a network-side man-in-the-middle: those folks don't have any feasible attack.
+'''Remember:''' the cost figures described below are talking about an attacker who has already surmounted the difficult hurdle of breaking into a professionally-managed highly secure server, either by force, coersion, or subpoena. These attacks are not available to a passive eavesdropper or a network-side man-in-the-middle: those folks don't have any feasible attack.
 === PBKDF User/Attacker Costs ===
@@ Line 120: / Line 121: @@
 === scrypt-helper Costs/Attacks ===
+Note: http://keywrapping.appspot.com hosts an interactive cost/latency
+calculator application to evaluate the parameters described here.
 [[File:04-scrypt-usertime.png|User cost for scrypt-helper]]
@@ Line 269: / Line 273: @@
 above (to establish the Verifier) must be applied here: checking an SSL
 certificate, or encrypting to/from a pre-established public key.
+== Server-Side Anonymous Data ==
+It would be nice if the server's database were anonymous: while each row is
+definitely associated with a particular user (email address), if the server
+cannot figure out which is which, then many attackers won't either.
+The benefit of this property would be that attackers who perform a
+brute-force search would be obligated to process more data, making their
+attack more expensive. Each row of the server's data includes a
+computationally-difficult function of an email address and a password. An
+attacker who is after a specific user's password knows the email address. If
+they also know which row holds that user's data, their task is to loop over
+all likely passwords, execute the function for each one, then compare the
+output value against that row.
+An attacker who doesn't know which row is which must compare their output
+value against all rows. If you have the whole dataset, this is bulky but not
+computationally hard (O(logN)). The most likely attack against the scrypt
+step is a botnet, in which the controller would send each node the target
+email address and a subset of the password list. To detect success, either
+the controller must also send the whole dataset to each node, or the node can
+stream back the computed values to the controller for local membership
+checks, or something in between. The optimum attack probably has the
+controller distribute a Bloom Filter (9.6 bits per dataset row for a 1%
+false-positive rate, so about 120MB for a 100M-row table) and the nodes
+sending back potential matches for the controller to evaluate.
+Both PBKDF steps require the target's email address, so if this is performed
+on the botnet node (and either the second PBKDF step must be run there, or
+the nodes must stream every intermediate "B" value back to the controller),
+then forensics may reveal which email address is being attacked. If detected,
+this may give the victim time to take action before the password is found.
+The biggest benefit of anonymous data probably lies in the exfiltration step.
+Attackers who manage to run code on the storage server (i.e. a SQL
+injection), but who can't do some kind of traffic-analysis (correlating the
+target's login time with database access patterns), won't know which rows to
+extract. Extracting the important information from all rows (perhaps 6.4GB of
+data for a 100M-user table) would be a massive download, easily detectable
+and preventable by basic monitoring tools. This helps against a rogue-code
+attacker, but not against one who gets a full copy of a backup tape, or
+coerces the server operator into providing full access.
+To accomplish server-side anonymous data, the KDF and Setup steps need to
+wind up with an identifier that points to a specific row of the database, but
+which is hard to correlate with an email address. Anyone (the user) who knows
+both the email address and password should be able to get this identifier
+(and, since this is a "cold start" protocol, it must be able to work with
+nothing but an email+password), which does limit our options somewhat.
+== Protecting the storage selection index ==
+I think we can tolerate a "TOFU" (Trust On First Use) setup step, especially
+if it is protected by a pinned-SSL implementation (i.e. implemented in a
+browser that overcomes the limitations of the XHR API, where you might prefer
+to tell XHR what SSL certificate you expected to see, by using and enforcing
+an embedded table of name-to-CA or name-to-cert mappings). But it'd be nice
+to only rely on this once: TOFU, not TOE-U (Trust On Every Use). With TOFU it
+should be safe to run the later communications in plaintext (with SSL
+providing an extra layer of safety).
+To accomplish this, nothing in the post-setup messages should give an
+attacker a way to brute-force the password (reserving that "power" for the
+storage server), including the field that tells the server which database row
+to use. This constraint precludes the use of anything that is derived from
+the password, leaving us with two choices:
+* the non-secret email addres
+* a random nonce generated by the server, retrieved in the TOFU step, remembered on the client
+The first choice loses the "anonymous data" property on the server. The
+second choice loses the server's ability to do per-user rate-limiting of
+online guessing attacks (but retaining other tools, like per-IP-address
+limiting of non-distributed attacks). I'm still trying to decide which is
+more important, or if there is some clever way to achieve both.
 == Protocol Diagram ==
-This four-page PDF diagram contains complete details on the protocol,
+The following PDF diagrams contains complete details on several variants of
-including how the salts are generated, and the client/server messages that
+the protocol, including how the salts are generated, and the client/server
-are exchanged. The first page describes the basic KDF operation and the
+messages that are exchanged.
-"init" message which establishes the user's SRP verifier. The second page
-shows how SRP is used to perform protected requests. The third page shows the
+* [[media:Keywrapping-details-anon-noSRP.pdf|anonymous data, no SRP]]
-"get" and "set" protected requests. The final page shows the "change
+* [[media:Keywrapping-details-anon-SRP.pdf|anonymous data, yes SRP]]
-password" request (which combines a "get" request, to fetch the UK, then uses
+* [[media:Keywrapping-details-email-SRP.pdf|email-indexed data, yes SRP]]
-a new "change" message to replace the stored data with a re-encrypted row).
+In each document, the first page describes the basic KDF operation and the
+"init" message which establishes or retrieves the user's SRP verifier or (for
+the non-SRP case) their "S2" selection index and "S3" shared secret. The
+second page shows how SRP or S2+S3 are used to perform protected requests.
+The third page shows the "get" and "set" protected requests. The final page
+shows the "change password" request (which combines a "get" request, to fetch
+the UK, then uses a new "change" message to replace the stored data with a
+re-encrypted row). There is also a "delete data" request that is similar for
+all variants.
+The anonymous-data forms never reveal the user's email address to the server,
+which somewhat increases an attacker's costs, but prevents account-based
+rate-limiting of online attacks. In all cases the "TOFU" window exists only
+during the init phase: once that setup has been performed, all subsequent
+messages could safely be sent in the clear.
+== Implementation-Driven Changes ==
+NSS, the cryptographic library used in Firefox, is enthusiastic about keeping key material behind a notional barrier (retaining the ability to implement primitives on a smart card or HSM). So its KDF functions don't want to reveal the derived keys, since they're "keys" and not "data". To overcome this (and make it easier to take advantage of NSS, instead of reimplementing many primitives), the protocol needs to sneakily derive values from the "keys".
+I'm still working through what changes are necessary to support this, but at first glance, we'll need to define an EXTRACT function that encrypts a block of zeros (equal in length to the key size), using the derived key as the encryption key, and use the resulting "ciphertext" as the output of the KDF step. For example, instead of A=PBKDF(args), we'd use A0=PBKDF(args) and A=AES(key=A0).encrypt(plaintext=000). That lets us get "A" outside the NSS barrier and either deliver it to the scrypt-server or into the next KDF step (probably passing it back *into* the barrier).
-[[media:Key-wrapping-details.pdf|Protocol Diagram]]
+We'll need this EXTRACT step after the first round of PBKDF (on "A"), after the scrypt step on "B", and after the second round of PBKDF (on "C", giving us the option of storing "C" in the password manager). If HKDF is provided by NSS too, we'll have to do it again on each of the outputs of the HKDF step (PWK, MAC, and S1/SRPpw).
 == Updates / Discussion ==