Changes

Jump to: navigation, search

Phishing Protection: Design Documentation

257 bytes added, 00:59, 11 January 2007
m
URL Canonicalization
We solve the encoding problem, but not the canonicalization problem. We repeatedly URL-unescape a URL until it has no more hex-encodings, then we escape it once. Yes, this can map several distinct URLs onto the same string, but these cases are rare, and happen primarily in query params. But taking this approach solves a multitude of other potential problems.
 
Additionally, we canonicalize the hostname as [[Phishing_Protection:_Server_Spec#Canonical_Hostname_Creation|mentioned in the server spec]]. Enchash lookups involve truncating the hostname at 5 dots. Url and domain table lookups do not do any truncation.
=== Relationship to Existing Products ===

Navigation menu