Changes

Jump to: navigation, search

IDN Display Algorithm

900 bytes removed, 13:46, 17 April 2017
Update to show this is now what we do
This page explains the plan for changing the mechanism by which how Firefox decides whether to display a given IDN label (a domain name is made up of one or more labels, separated by dots) in its Unicode or Punycode form.
Implementing this plan is was covered by {{bug|722299}}.
==Background==
If we just display any possible IDN label, we open ourselves up to [http://en.wikipedia.org/wiki/IDN_homograph_attack IDN homograph attacks], where one identical-looking domain can spoof another. So we have to have some mechanism to decide which ones to display and which ones to not display, which does not involve comparing the domain in question against every other single domain which exists (which is impossible).
===Current Previous Algorithm===
Our current previous algorithm is was to display as Unicode all IDN labels within TLDs on our [http://www.mozilla.org/projects/security/tld-idn-policy-list.html whitelist], and display as Punycode otherwise. We check checked the anti-spoofing policies of a registry before adding their TLD to the whitelist. The TLD operator must had to apply directly (they cannot be nominated by another person), and on several occasions we have required policy updates or implementation as a condition of getting in.
We also have had a character blacklist - characters we will never display under any circumstances. This includes those which could be used to spoof the separators "/" and ".", and invisible characters. This still exists.
===Need For ChangeWhy We Changed===
This The old strategy provides provided pretty good user protection, and it provides provided consistency - every Firefox everywhere works the same. However, it does meanmeantthat IDNs do did not work at all in many TLDs, because the registry (for whatever reason)has had not applied for inclusion, or because we do not didn't think they have had sufficientlystrong protections in place. In addition, ICANN is was about to opena [http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en large number of new TLDs]. So either maintaining a whitelist is was going to becomeburdensome, or the list will was going to become wildly out of date and we will would not
be serving our users.
===Other Browsers=The New Idea==
The Chromium IDN page has a [http://www.chromium.org/developers/design-documents/idn-in-google-chrome good summary]of the policies of Chrome/Chromium and the other browsers. UnfortunatelyInstead, no consensus has emerged on how to dothis. Those other mechanisms were considered, but many of them depend on the configuration of the user's computer (e.g. installed languages), and this does not give site owners any confidence that their IDN will be correctly displayed for all their visitors (and no way of telling if it's not). ==Proposal== The plan is to we now augment our whitelist with something based on ascertaining whether all the characters in a label
all come from the same script, or are from one of a limited and defined number of allowable combinations. The
hope is that any intra-script near-homographs will be recognisable to people who understand that script.
We will retain the whitelist as well, because a) removing it might break
some domains which worked previously, and b) if a registry submits a
good policy, we have the ability to give them more freedom than the default restrictions do.
So an IDN would be is shown as Unicode if the TLD was on the whitelist or, if not, if it
met the criteria above.
==Algorithm==
If a TLD is in the whitelist, we will unconditionally display Unicode. If it is not, the followingalgorithm will applyapplies.
[http://www.unicode.org/reports/tr39/#Restriction_Level_Detection Unicode Technical Report 39]
===Possible Issues and Open Questions===
The following issues are still open, but should were not considered important enough to block initial implementation.
Further suggestions from TR #39:
===Downsides===
This system would permit permits whole-script confusables
(All-Latin "scope.tld" vs all-Cyrillic "ѕсоре.tld"). However, so do the
solutions of the other browsers, and it has not proved to be a
significant problem so far. If there is a problem, every browser is equally affected.
If problems arose in the future (e.g. whole-script, or homographs between a particularsingle script and Latin), our Our response would be to this issue is that in the end, it is up to registries
to make sure that their customers
cannot rip each other off. Browsers can put some technical restrictions in place,
only people in a position to implement the proper checking here. For our part,
we want to make sure we don't treat non-Latin scripts as second-class citizens.
 
==Transition==
 
In between adopting this plan and shipping a Firefox with
the restrictions implemented, we will admit into the whitelist any
TLD whose anti-spoofing policies <i>at registration time</i> were at least as strong as
those outlined above.
Accountapprovers, antispam, confirm, emeritus
4,925
edits

Navigation menu