Public Suffix List/platform.sh Problem

From MozillaWiki
Jump to: navigation, search

This page explains the "platform.sh problem", named after the entry whose proposed inclusion triggered the discussion.

The rules which were requested (in bug 1124625) were:

   // Already-existing
   sh
   ...
   // Newly-added
   *.platform.sh

Recall that the Public Suffix is the registry-controlled suffix, and the "registerable domain" is the Public Suffix + 1 additional label. (This wiki page does _not_ use the confusing notation from test_psl.txt, which has a function called checkPublicSuffix() which returns the registerable domain rather than the Public Suffix.) If we follow the defined PSL algorithm, the above rules should result in the following determinations:

   get_public_suffix(foo.bar.platform.sh) == "bar.platform.sh"
   get_public_suffix(bar.platform.sh) == "bar.platform.sh"
   get_public_suffix(platform.sh) == "sh"
   get_public_suffix(sh) == "sh"

The problem is that Chrome (and possibly other implementations too) currently implements the algorithm differently, returning the following for case 3:

   get_public_suffix(platform.sh) == "platform.sh"

and so it would (if it used a PSL with the above entries) refuse to let the website at http://platform.sh/ set any cookies. (Here we use cookie setting as a canonical example of PSL use.) This is because Chrome incorrectly assumes that a rule "*.platform.sh" means that "platform.sh" is also a public suffix, and it's not permitted to set cookies on a public suffix. This divergence in behaviour only occurs for precisely one name/site per instance of this "split rule" pair - platform.sh in this case.


Question: Does the definition advocated here for Public Suffix actually align with its historical reason TLD_list, or with the policies of including the entire IANA Root Zone Database or of adding new gTLDs before identifying the domain policies?

The reason Chrome made this assumption, perhaps, is that there are other instances of this pattern in the PSL and in those instances, the assumption is arguably correct. For example:

   jp
   *.kobe.jp

In this case, kobe.jp is arguably a public suffix, in the sense that it's registry-controlled. However, it's arguably not a public suffix in the sense that the public can't register names directly under it, as you might expect of a public suffix. There is no website there, and no need for it to set cookies, so that particular issue does not apply.


Question: why does Ryan think that the current Chrome behaviour for kobe.jp is correct? Is there perhaps a non-cookie example which shows the usefulness of the current Chrome behaviour more clearly? Is the problem that instead of checking for 'boundary-passing' when setting cookies, Chrome just checks for equality to any public suffix so, if it did not have the current behaviour, foo.bar.kobe.jp could erroneously set cookies for kobe.jp?
Answer: The current behaviour derived from when *.ccTLD was a common occurrence in the PSL, but that 'ccTLD' was a valid (IANA delegated) TLD, such as the purpose originally and partially reflected in TLD_List. In the same way that http://com/ should be non-navigable, or that [http://co.uk/] is non-navigable, this is the natural result of combining "*.kobe.jp says something about kobe.jp the same way that *.il says something about .il" (that is, that they're both IANA-recognized domain boundaries) with "Thou shant navigate to an effective TLD".
Further, the definition being used is "effective TLD" (which is what Mozilla calls it in code, c.f. nsEffectiveTLDService.cpp), in which "il" is an effective (and actual) TLD, and "co.uk" is an effective (but not actual) TLD.
From a security perspective, allowing kobe.jp to set cookies, particularly cookies that would affect sub.foo.kobe.jp and sub.bar.kobe.jp would be bad for security, much in the same way that sub.foo.kobe.jp setting cookies for kobe.jp (thus affecting sub.bar.kobe.jp) would also be bad. If these weren't issues, we wouldn't have the notion of public suffices to begin with, because they sought to prevent this sort of cookie pinning. Further, note that in this discussion, different domain boundaries are presumed to exist, such that foo and bar are organizationally disjoint. This assumption is reflected in the sheer number of hosting providers adding themselves to the public suffix list precisely to express this fact, and has historically been at the core of the notion of registerable domains.

There is a Chrome bug on this behaviour. Chrome's current behaviour gets all existing implementations right, but would get platform.sh wrong. Changing Chrome's assumption to be the opposite would lead it to get platform.sh right, but kobe.jp and friends wrong. So one proposed idea is to fix the bug, and then explicitly add "kobe.jp" and friends to the PSL. That is proposed solution 1, below.

However, Firefox (and possibly other implementations too) has a different bug - bug 1163015. Firefox's internal data model uses a hash table which indexes public suffixes without taking account of the "*" or "!" characters. Therefore, it cannot distinguish the following two rules:

   *.kobe.jp
   kobe.jp

So explicitly adding "kobe.jp" and its other equivalents to the Public Suffix List would make Firefox unable to properly read the PSL.

Further Information

It may be useful to note that the PSL algorithm has a "default rule" of "*", which is used when a more specific rule is not matched. So the Public Suffix of "foo.example" is "example", as "example" itself is not in the PSL.

However, while this rule exists, many implementations have also come to rely on the PSL as an expression of a TLD_list that is a wholly-containing superset of the IANA Root Zone database. That is, that the PSL contains all of the IANA first-level TLDs, as well as expressions of all of the policies set by the registries (eg: .uk setting rules on .co.uk, and then eventually relaxing them) as well as policies set by the domain holder (eg: amazonaws.com setting policies for the domains they give to their customers). This use has become prevalent throughout clients and libraries, such as for the purpose of spam mitigation or email validation.

Because of this, the "*" rule is insufficient to express whether or not a domain is "IANA delegated" or "registerable", and so applications decide whether or not to apply the "*" rule depending on the usage. For cookies, for example, the "*" rule always applies. For determining whether a domain is valid (and thus navigable), applications may inhibit the "*" rule such that only explicitly listed entries are considered. This can help distinguish, for example, whether "good.cookies" is an attempt by a user to search for "good.cookies" (presumably, typo-corrected to "good cookies") or an attempt to navigate to the address http://good.cookies (in which the user omitted the scheme, as users are want to do)

These sorts of distinctions are tracked more broadly at Public_Suffix_List/Uses as a small survey of the use space.

Proposed Solutions

1. Add the intermediate domains for the existing examples

  • Do not change the PSL algorithm
  • Fix implementations to distinguish between "*.foo.bar" and "foo.bar" everywhere
  • Add "*.platform.sh" to the PSL
  • Add "kobe.jp" and all its friends to the PSL

2. Change algorithm, and add a ! Rule for platform.sh

  • Update the PSL algorithm description to describe what some implementations have so far assumed - i.e. that "*.foo.bar" means that "foo.bar" is also a Public Suffix.
  • This gets the kobe.jp case right
  • Fix implementations to distinguish between "*.foo.bar" and "foo.bar" everywhere
  • Add "*.platform.sh" and "!platform.sh" to the PSL, to opt platform.sh itself out

3. Change algorithm, and tell platform.sh to change their practice

  • Update the PSL algorithm description to describe what implementations have so far assumed - i.e. that "*.foo.bar" means that "foo.bar" is also a Public Suffix.
  • Tell platform.sh that it's deeply unwise of them to give subdomains of their main website domain to their customers, and suggest that instead they offer subdomains of another domain (as Amazon does with amazonaws.com, and other cloud providers do), and that once they do that, we can add them.