53
edits
Line 225: | Line 225: | ||
Extract the hostname from the URL (if it's an international domain, we use the ascii punycode representation) and then follow these steps: | Extract the hostname from the URL (if it's an international domain, we use the ascii punycode representation) and then follow these steps: | ||
* Remove all characters that match the following regular expressions: | * Remove all characters that match the following regular expressions: | ||
** "[\ | ** "[\x00-\x1f\x7f-\xff]+" | ||
** "^\\.+|\\.+$" | ** "^\\.+|\\.+$" | ||
* Replace consecutive dots with a single dot. | * Replace consecutive dots with a single dot. | ||
* | * If the hostname can be parsed as an IP address, it should be normalized to 4 dot-separated decimal values. The client should handle any legal IP address encoding, including octal, hex, and fewer than 4 components. | ||
* Escape all characters that are not alphanumeric or '.' or '-'. | * Escape all characters that are not alphanumeric or '.' or '-'. | ||
* Lowercase the whole string. | * Lowercase the whole string. | ||
Then to get the hostname for the encrypted hash lookup, we also apply this rule: | Then to get the hostname for the encrypted hash lookup, we also apply this rule: | ||
* Strip all leading components so that the resulting hostname has at most 5 | * Strip all leading components so that the resulting hostname has at most 5 dots. | ||
To canonicalize the remainder of the URL: | |||
* The sequences "/../" and "/./" in the path should be resolved, by replacing "/./" with "/", and removing "/../" along with the preceding path component. | |||
* The fragment identifier ("#") and everything after it should be removed | |||
== Report Requests == | == Report Requests == |
edits