Identity/CryptoIdeas/01-PBKDF-scrypt: Difference between revisions

no edit summary
No edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{LastUpdated}}
== Key-Wrapping Techniques ==
== Key-Wrapping Techniques ==


Line 36: Line 37:
minimizing the cost of each attempt.
minimizing the cost of each attempt.


'''Remember:''' the cost figures described below are talking about an attacker who has already surmounted the difficult hurdle of breaking into a professionally-managed highly secure serer, either by force, coersion, or subpoena. These attacks are not available to a passive eavesdropper or a network-side man-in-the-middle: those folks don't have any feasible attack.
'''Remember:''' the cost figures described below are talking about an attacker who has already surmounted the difficult hurdle of breaking into a professionally-managed highly secure server, either by force, coersion, or subpoena. These attacks are not available to a passive eavesdropper or a network-side man-in-the-middle: those folks don't have any feasible attack.


=== PBKDF User/Attacker Costs ===
=== PBKDF User/Attacker Costs ===
Line 120: Line 121:


=== scrypt-helper Costs/Attacks ===
=== scrypt-helper Costs/Attacks ===
Note: http://keywrapping.appspot.com hosts an interactive cost/latency
calculator application to evaluate the parameters described here.


[[File:04-scrypt-usertime.png|User cost for scrypt-helper]]
[[File:04-scrypt-usertime.png|User cost for scrypt-helper]]
Line 269: Line 273:
above (to establish the Verifier) must be applied here: checking an SSL
above (to establish the Verifier) must be applied here: checking an SSL
certificate, or encrypting to/from a pre-established public key.
certificate, or encrypting to/from a pre-established public key.
== Server-Side Anonymous Data ==
It would be nice if the server's database were anonymous: while each row is
definitely associated with a particular user (email address), if the server
cannot figure out which is which, then many attackers won't either.
The benefit of this property would be that attackers who perform a
brute-force search would be obligated to process more data, making their
attack more expensive. Each row of the server's data includes a
computationally-difficult function of an email address and a password. An
attacker who is after a specific user's password knows the email address. If
they also know which row holds that user's data, their task is to loop over
all likely passwords, execute the function for each one, then compare the
output value against that row.
An attacker who doesn't know which row is which must compare their output
value against all rows. If you have the whole dataset, this is bulky but not
computationally hard (O(logN)). The most likely attack against the scrypt
step is a botnet, in which the controller would send each node the target
email address and a subset of the password list. To detect success, either
the controller must also send the whole dataset to each node, or the node can
stream back the computed values to the controller for local membership
checks, or something in between. The optimum attack probably has the
controller distribute a Bloom Filter (9.6 bits per dataset row for a 1%
false-positive rate, so about 120MB for a 100M-row table) and the nodes
sending back potential matches for the controller to evaluate.
Both PBKDF steps require the target's email address, so if this is performed
on the botnet node (and either the second PBKDF step must be run there, or
the nodes must stream every intermediate "B" value back to the controller),
then forensics may reveal which email address is being attacked. If detected,
this may give the victim time to take action before the password is found.
The biggest benefit of anonymous data probably lies in the exfiltration step.
Attackers who manage to run code on the storage server (i.e. a SQL
injection), but who can't do some kind of traffic-analysis (correlating the
target's login time with database access patterns), won't know which rows to
extract. Extracting the important information from all rows (perhaps 6.4GB of
data for a 100M-user table) would be a massive download, easily detectable
and preventable by basic monitoring tools. This helps against a rogue-code
attacker, but not against one who gets a full copy of a backup tape, or
coerces the server operator into providing full access.
To accomplish server-side anonymous data, the KDF and Setup steps need to
wind up with an identifier that points to a specific row of the database, but
which is hard to correlate with an email address. Anyone (the user) who knows
both the email address and password should be able to get this identifier
(and, since this is a "cold start" protocol, it must be able to work with
nothing but an email+password), which does limit our options somewhat.
== Protecting the storage selection index ==
I think we can tolerate a "TOFU" (Trust On First Use) setup step, especially
if it is protected by a pinned-SSL implementation (i.e. implemented in a
browser that overcomes the limitations of the XHR API, where you might prefer
to tell XHR what SSL certificate you expected to see, by using and enforcing
an embedded table of name-to-CA or name-to-cert mappings). But it'd be nice
to only rely on this once: TOFU, not TOE-U (Trust On Every Use). With TOFU it
should be safe to run the later communications in plaintext (with SSL
providing an extra layer of safety).
To accomplish this, nothing in the post-setup messages should give an
attacker a way to brute-force the password (reserving that "power" for the
storage server), including the field that tells the server which database row
to use. This constraint precludes the use of anything that is derived from
the password, leaving us with two choices:
* the non-secret email addres
* a random nonce generated by the server, retrieved in the TOFU step, remembered on the client
The first choice loses the "anonymous data" property on the server. The
second choice loses the server's ability to do per-user rate-limiting of
online guessing attacks (but retaining other tools, like per-IP-address
limiting of non-distributed attacks). I'm still trying to decide which is
more important, or if there is some clever way to achieve both.


== Protocol Diagram ==
== Protocol Diagram ==


This four-page PDF diagram contains complete details on the protocol,
The following PDF diagrams contains complete details on several variants of
including how the salts are generated, and the client/server messages that
the protocol, including how the salts are generated, and the client/server
are exchanged. The first page describes the basic KDF operation and the
messages that are exchanged.
"init" message which establishes the user's SRP verifier. The second page
 
shows how SRP is used to perform protected requests. The third page shows the
* [[media:Keywrapping-details-anon-noSRP.pdf|anonymous data, no SRP]]
"get" and "set" protected requests. The final page shows the "change
* [[media:Keywrapping-details-anon-SRP.pdf|anonymous data, yes SRP]]
password" request (which combines a "get" request, to fetch the UK, then uses
* [[media:Keywrapping-details-email-SRP.pdf|email-indexed data, yes SRP]]
a new "change" message to replace the stored data with a re-encrypted row).
 
In each document, the first page describes the basic KDF operation and the
"init" message which establishes or retrieves the user's SRP verifier or (for
the non-SRP case) their "S2" selection index and "S3" shared secret. The
second page shows how SRP or S2+S3 are used to perform protected requests.
The third page shows the "get" and "set" protected requests. The final page
shows the "change password" request (which combines a "get" request, to fetch
the UK, then uses a new "change" message to replace the stored data with a
re-encrypted row). There is also a "delete data" request that is similar for
all variants.
 
The anonymous-data forms never reveal the user's email address to the server,
which somewhat increases an attacker's costs, but prevents account-based
rate-limiting of online attacks. In all cases the "TOFU" window exists only
during the init phase: once that setup has been performed, all subsequent
messages could safely be sent in the clear.
 
== Implementation-Driven Changes ==
NSS, the cryptographic library used in Firefox, is enthusiastic about keeping key material behind a notional barrier (retaining the ability to implement primitives on a smart card or HSM). So its KDF functions don't want to reveal the derived keys, since they're "keys" and not "data". To overcome this (and make it easier to take advantage of NSS, instead of reimplementing many primitives), the protocol needs to sneakily derive values from the "keys".
 
I'm still working through what changes are necessary to support this, but at first glance, we'll need to define an EXTRACT function that encrypts a block of zeros (equal in length to the key size), using the derived key as the encryption key, and use the resulting "ciphertext" as the output of the KDF step. For example, instead of A=PBKDF(args), we'd use A0=PBKDF(args) and A=AES(key=A0).encrypt(plaintext=000). That lets us get "A" outside the NSS barrier and either deliver it to the scrypt-server or into the next KDF step (probably passing it back *into* the barrier).


[[media:Key-wrapping-details.pdf|Protocol Diagram]]
We'll need this EXTRACT step after the first round of PBKDF (on "A"), after the scrypt step on "B", and after the second round of PBKDF (on "C", giving us the option of storing "C" in the password manager). If HKDF is provided by NSS too, we'll have to do it again on each of the outputs of the HKDF step (PWK, MAC, and S1/SRPpw).


== Updates / Discussion ==
== Updates / Discussion ==
Confirmed users
1,042

edits