Phishing Protection: Server Spec: Difference between revisions

Line 225: Line 225:
Extract the hostname from the URL (if it's an international domain, we use the ascii punycode representation) and then follow these steps:
Extract the hostname from the URL (if it's an international domain, we use the ascii punycode representation) and then follow these steps:
* Remove all characters that match the following regular expressions:
* Remove all characters that match the following regular expressions:
** "[\x01-\x1f\x7f-\xff]+"
** "[\x00-\x1f\x7f-\xff]+"
** "^\\.+|\\.+$"
** "^\\.+|\\.+$"
* Replace consecutive dots with a single dot.
* Replace consecutive dots with a single dot.
* Try to parse the resulting hostname as IP address.  If it can be parsed, replace the current canonical hostname with the corresponding IP address.
* If the hostname can be parsed as an IP address, it should be normalized to 4 dot-separated decimal values.  The client should handle any legal IP address encoding, including octal, hex, and fewer than 4 components.
* If it's an IP address, normalize to be four dot separated decimal values.
* Escape all characters that are not alphanumeric or '.' or '-'.
* Escape all characters that are not alphanumeric or '.' or '-'.
* Lowercase the whole string.
* Lowercase the whole string.


Then to get the hostname for the encrypted hash lookup, we also apply this rule:
Then to get the hostname for the encrypted hash lookup, we also apply this rule:
* Strip all leading components so that the resulting hostname has at most 5 components separated by dots.
* Strip all leading components so that the resulting hostname has at most 5 dots.
 
To canonicalize the remainder of the URL:
* The sequences "/../" and "/./" in the path should be resolved, by replacing "/./" with "/", and removing "/../" along with the preceding path component.
* The fragment identifier ("#") and everything after it should be removed


== Report Requests ==
== Report Requests ==
53

edits