Changes

IDN Display Algorithm

900 bytes removed, 13:46, 17 April 2017

Update to show this is now what we do

This page explains ~~the plan for changing the mechanism by which~~ how Firefox decides whether to display a given IDN label (a domain name is made up of one or more labels, separated by dots) in its Unicode or Punycode form.

Implementing this plan is was covered by {{bug|722299}}.

==Background==

If we just display any possible IDN label, we open ourselves up to [http://en.wikipedia.org/wiki/IDN_homograph_attack IDN homograph attacks], where one identical-looking domain can spoof another. So we have to have some mechanism to decide which ones to display and which ones to not display, which does not involve comparing the domain in question against every other single domain which exists (which is impossible).

===~~Current~~ Previous Algorithm===

Our ~~current~~ previous algorithm is was to display as Unicode all IDN labels within TLDs on our [http://www.mozilla.org/projects/security/tld-idn-policy-list.html whitelist], and display as Punycode otherwise. We ~~check~~ checked the anti-spoofing policies of a registry before adding their TLD to the whitelist. The TLD operator ~~must~~ had to apply directly (they cannot be nominated by another person), and on several occasions we ~~have~~ required policy updates or implementation as a condition of getting in.

We also ~~have~~ had a character blacklist - characters we will never display under any circumstances. This includes those which could be used to spoof the separators "/" and ".", and invisible characters. This still exists.

===~~Need For Change~~Why We Changed===

~~This~~ The old strategy ~~provides~~ provided pretty good user protection, and it ~~provides~~ provided consistency - every Firefox everywhere works the same. However, it ~~does mean~~meantthat IDNs do did not work at all in many TLDs, because the registry (for whatever reason)~~has~~ had not applied for inclusion, or because we ~~do not~~ didn't think they ~~have~~ had sufficientlystrong protections in place. In addition, ICANN is was about to opena [http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en large number of new TLDs]. So either maintaining a whitelist is was going to becomeburdensome, or the list ~~will~~ was going to become wildly out of date and we ~~will~~ would not

be serving our users.

==~~=Other Browsers=~~The New Idea==

~~The Chromium IDN page has a [http://www.chromium.org/developers/design-documents/idn-in-google-chrome good summary]of the policies of Chrome/Chromium and the other browsers. Unfortunately~~Instead, ~~no consensus has emerged on how to dothis. Those other mechanisms were considered, but many of them depend on the configuration of the user's~~ ~~computer (e.g. installed languages), and this does not give site owners any confidence that their IDN will be correctly displayed for all their visitors (and no way of telling if it's not).~~ ~~==Proposal==~~ ~~The plan is to~~ we now augment our whitelist with something based on ascertaining whether all the characters in a label

all come from the same script, or are from one of a limited and defined number of allowable combinations. The

hope is that any intra-script near-homographs will be recognisable to people who understand that script.

We ~~will~~ retain the whitelist as well, because a) removing it might break

some domains which worked previously, and b) if a registry submits a

good policy, we have the ability to give them more freedom than the default restrictions do.

So an IDN ~~would be~~ is shown as Unicode if the TLD was on the whitelist or, if not, if it

met the criteria above.

==Algorithm==

If a TLD is in the whitelist, we ~~will~~ unconditionally display Unicode. If it is not, the followingalgorithm ~~will apply~~applies.

[http://www.unicode.org/reports/tr39/#Restriction_Level_Detection Unicode Technical Report 39]

===Possible Issues and Open Questions===

The following issues are still open, but ~~should~~ were not considered important enough to block initial implementation.

Further suggestions from TR #39:

===Downsides===

This system ~~would permit~~ permits whole-script confusables

(All-Latin "scope.tld" vs all-Cyrillic "ѕсоре.tld"). However, so do the

solutions of the other browsers, and it has not proved to be a

significant problem so far. If there is a problem, every browser is equally affected.

~~If problems arose in the future (e.g. whole-script, or homographs between a particularsingle script and Latin), our~~ Our response ~~would be~~ to this issue is that in the end, it is up to registries

to make sure that their customers

cannot rip each other off. Browsers can put some technical restrictions in place,

only people in a position to implement the proper checking here. For our part,

we want to make sure we don't treat non-Latin scripts as second-class citizens.

~~==Transition==~~

~~In between adopting this plan and shipping a Firefox with~~

~~the restrictions implemented, we will admit into the whitelist any~~

~~TLD whose anti-spoofing policies <i>at registration time</i> were at least as strong as~~

~~those outlined above.~~

Gerv

Accountapprovers, antispam, confirm, emeritus

4,925

edits

Changes

IDN Display Algorithm

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools