Public Suffix Lookups Without Parsing the PSL
Tackling a New Challenge with the DNS
During my work at SSE (I was working on the security-first DNS provider deSEC), I was facing the need for quickly looking up the so-called Public Suffix for a given domain name. If the domain name is, say, amazon.co.uk
, then the Public Suffix would be co.uk
. Such lookups are usually done by loading and parsing the Public Suffix List (PSL) and then matching the last part(s) of the domain name against the list, eventually settling on the longest match.
As it turns out, this approach requires application awareness of the PSL and comes with significant maintenance overhead — more on that later. Let’s first understand both benefits and quirks of the PSL, and then see why it’s more complicated than it looks at first sight. Employing the DNS as a key-value store will lead an elegant way out of the mess.
The Idea behind the PSL
The PSL has a wide range of applications, as there are several (usually security-related) scenarios in which applications or service providers need to make a policy decision based on the public suffix of the domain name at hand. A special case is when the domain name is itself a public suffix. For example, Certificate Authorities most likely would not want to issue a wildcard TLS certificate for the name *.co.uk
, although a wildcard certificate for *.t.co
(Twitter’s URL shortener service) may be perfectly fine.
Here are some common applications in which knowledge of a domain name’s public suffix is required:
- In order to decide how exactly cookie scoping should be restricted across domains, browsers determine public suffixes. Some browsers also highlight a domain’s public suffix in the address bar to aid visual inspection and hamper phishing.
- Certificate Authorities (CAs) checking for wildcard misissuance should not allow a wildcard certificate for a public suffix name such as
*.co.uk
. - Some CAs such as Let’s Encrypt limit the number of certificates that can be requested for a given domain, including subdomains (within a given interval, such as per week). CAs need to be aware of each domain’s public suffix in order to decide on which part of the domain the limit should be applied. For example, a certificate request for
console.cloud.google.com
should be counted towards thegoogle.com
name (2nd-to-last label) while a certificate request forwww.google.co.uk
would be counted towards google.co.uk (3rd-to-last label!). - The DMARC email authentication protocol, intended for fighting spam by validating the message origin, is configured via DNS records on the registrable domain (organizational domain in DMARC speak), i.e. the domain name whose direct parent is the domain name’s public suffix. For example, DMARC for the address
admin@services.staff.example.org
is configured as a DNS record underexample.org
(to be precise: as aTXT
record at_dmarc.example.org
). DMARC validators need to know where to look for that record: Should they use_dmarc.staff.example.org
or_dmarc.example.org
? The PSL answers this question. - As a consequence of a little known subtlety in the specification of the DNS, DNS providers too need to be aware of what’s a public suffix.
I first encountered the need for ad-hoc PSL lookups while working on the deSEC DNS hosting platform, and so I would like to devote a separate section to this last use case. If you are not interested in the intricacies of setting up a DNS platform, you can skip the next section.
Deep Dive: The Relevance of Public Suffixes for a DNS Provider
Here’s why: In DNS, it is possible to store conflicting information on the same name server. For example, one customer could register the zone example.com
and create a DNS record for the subdomain www
with IP address 1.2.3.4, while another can register the zone www.example.com
and create a record there, with IP address 6.6.6.6. Each customer "owns" a part of the global DNS tree, but the two overlap. If this situation occurs, RFC 1034 Sec. 4.3.2 prescribes that the most specific zone (subtree) wins.
In other words, DNS queries for www.example.com
are answered with 6.6.6.6, which might not be what the owner of example.com
was expecting. (Such subzone takeovers may include dangerous names such as _acme-challenge.www.example.com
, allowing the attacker to obtain a TLS certificate for the parent name. You can find some real world cases in a paper I wrote in 2018.)
To avoid this problem, we introduced a check at deSEC to reject zone registrations for domain names if any parent domain name is owned by another user. — Great, we’re safe then, aren’t we?
Unfortunately, no. With this check in place, a malicious (or imprudent) user may register co.uk
and, as a consequence, cause all other users to be blocked from registering their legitimate <something>.co.uk
names: the security check would interpret such registrations as a hijacking attempt. We find ourselves in a catch-22: it seems that we have no choice but to either allow a malicious user to hijack another customer’s subdomains, or — with the security check in place — to allow them to occupy a large chunk of the DNS tree, such as co.uk
!
This is where the PSL comes in: unless you’re a registry and managing a public suffix (such as a top level domain), there is no legitimate use case for any user to register a domain that is a public suffix itself. After all, domain owners purchase a domain name under a public suffix, and then need DNS services for that registrable domain, not for the public suffix. Thus, barring a few very special cases, it is safe to reject registrations of domain names which are public suffixes themselves.
By combining the two security checks (rejecting both subzone registration and public suffix registration), safe operation of the DNS service is ensured. But it comes at a price: At deSEC, we need to perform a public suffix lookup for each domain name for which registration is attempted.
Practical Complications
All of the above types of applications, and possibly even more, need to perform lookups in the PSL. The list, ever changing, is now over 200 KB large (there are thousands of public suffixes), and only loosely structured. The usual approach is to distribute the official PSL file in text format along with the application itself, and update it once in a while.
This poses several problems:
- Parsing the PSL is not trivial. For example, it supports wildcards as well as exceptions: all direct children of
kawasaki.jp
are declared as public suffixes with the PSL entry*.kawasaki.jp
, butcity.kawasaki.jp
is exempt (!city.kawasaki.jp
). - When a given domain name is matched by several entries, only the longest one is the public suffix: the public suffix of
some-bucket.s3.amazonaws.com
iss3.amazonaws.com
(and notcom
— but both are on the list). - In light of these parsing issues, applications need to transform the PSL text file into a suitable representation in which lookups are efficient (typically a tree structure; a good choice requires some expertise).
- Also, a storage solution is required. It’s easiest to simply store the PSL file in the file system, but with the above considerations in mind, it may not be the best solution.
- Developers have to come up with some sort of update mechanism. Do random updates at deployment time suffice?
PSL users have to worry about all of this. While the above problems have been solved several times in various implementations for different browsers, programming languages, etc., there is no one-stop solution so far.
That leads to the obvious question: Is there a better, more generic approach?
Solution: Mapping the PSL onto the DNS
The DNS, being a domain-based key-value store, is well-fit to serve as a directory for domain-related lookups. Both keys and values are rather arbitrary: the constraints are only that the key must be a valid (sub-)domain name, and the value must fit one of the established DNS record types (e.g. A
for IP addresses, TXT
for strings, or PTR
for mapping a key onto another domain name). In recent years, a few interesting DNS-based applications have evolved, such as storage of TLS public keys using TLSA records (DANE).
The suffixes listed in the PSL qualify as proper DNS names (keys), with the exception of wildcard exception rules (let’s worry about that later). It is thus possible to map the PSL onto the DNS structure itself, forming a tree structure, and then "mount the PSL" as a subtree somewhere in the DNS. We chose to use the domain publicsuffix.zone
as our home, and use query.publicsuffix.zone
as the PSL mount point.
To represent PSL information, we use PTR
records and encode all public suffixes as the values of such records. Keys are set up such that one can simply take any domain name and query a PTR
record for that name, with .query.publicsuffix.zone
appended. The zone contains crafty CNAME
redirects and nifty wildcard configurations, such that the DNS lookup eventually arrives at a PTR
record which does indeed point to the public suffix of the domain of interest. To ensure authenticity, we use DNSSEC (as is the case for all deSEC-managed domains).
As an example, let’s figure out the public suffix of www.google.co.uk
. Here’s what we have configured under query.publicsuffix.zone
:
- There is a
PTR
record at**co.uk**.query.publicsuffix.zone
with a value ofco.uk
. This is the record that the query reply will be expected to contain. - We also have configured a
CNAME
redirect record for all domains under this suffix, pointing one level up in the DNS hierarchy:***.co.uk**.query.publicsuffix.zone CNAME **co.uk**.query.publicsuffix
.zone.
The lookup then proceeds as follows:
- We ask a
PTR
query for**www.google.co.uk**.query.publicsuffix.zone
. - The DNS resolution process will encounter the
CNAME
record at***.co.uk**.query.publicsuffix.zone
and redirect the question to**co.uk**.query.publicsuffix.zone
. - The
PTR
record at**co.uk**.query.publicsuffix.zone
will be returned, yielding the answerco.uk.
Voilà! 🎉
PSL wildcard rules are accommodated effortlessly with this approach: we simply prepend the wildcard label *.
to the value of the PTR
record. Wildcard exceptions are taken care of by adding explicit records at the domain name representing the exception rule, cutting the corresponding DNS subtree out of the wildcard’s sphere of influence. Consider the following example:
*.kawasaki.jp
is a wildcard public suffix, andcity.kawasaki.jp
is an exception.- As usual, we define a
PTR
record at*.kawasaki.jp.query.publicsuffix.zone
for the wildcard public suffix, with value*.kawasaki.jp
. - To take care of the exception, we set an explicit
PTR
record atcity.kawasaki.jp.query.publicsuffix.zone
, overriding the wildcard rule with an explicitPTR
value ofjp
. - Finally, we configure a
CNAME
record at*.city.kawasaki.jp.query.publicsuffix.zone
, pointing one level up.
Things Worth Considering
Generic and platform-independent. The DNS-based PSL lookup solution solves all of the problems mentioned above. Applications only need the ability to perform DNS queries (which should almost always be the case). Parsing of the PSL is not required on the application layer; no decisions on storage or internal representation need to be made. Also, PSL information in the DNS is always up to date: we propagate changes from the official list on a daily basis.
Library support. For convenience, there is a Python library (psl-dns) that comes with a handy interface to "simply answer your question", such as psl.is_public_suffix('city.kawasaki.jp')
or psl.get_public_suffix('some-bucket.s3.amazonaws.com')
. The library provides some extra convenience: For internationalized domain names, it will preserve the Unicode vs. Punycode format choice between question and answer. For those who can’t get enough, it also allows listing all PSL rules pertinent to a domain name using psl.get_rules()
(there may be several override rules in certain wildcard configurations).
For information on library support for other languages, check out the documentation at publicsuffix.zone. In any case, the use of a library is by no means necessary: if you like to stick to the KISS principle or if your language is not supported, simple DNS lookups will get you there as well.
PTR record support. Some DNS resolvers (especially those run by consumer Internet access providers) do not support PTR
queries. The problem can be worked around by using another resolver (such as 8.8.8.8), or by querying the ANY
record type instead (which will return all records for the given name, including PTR
).
When asking for ANY
records, you may find that the response will sometimes contain additional TXT
records. Those are purely informational and represent PSL rules that cover the domain name, but were overridden by some other rule (again, this can happen in case of wildcard exceptions). Unless you are interested in these no-op rules, you can ignore them completely.
Edge cases. Domain names are limited to 63 labels (they can contain at most 62 dots). Appending .query.publicsuffix.zone
to the name of interest costs 3 labels, so only domains names with up to 60 labels are supported. This will not usually be an issue.
[Update 02/2022: This remainder of the paragraph no longer applies, as the PSL specification has been updated to allow wildcards only at the first label.] Furthermore, the PSL specification allows rules with inline wildcards, such as inline.*.wildcard.test
. Such constructs cannot be mapped onto the DNS, as DNS requires wildcards to be in the leftmost position. However, the PSL does not currently contain any actual rules of this kind, and PSL maintainers are planning to drop inline wildcard support entirely. Consequently, this edge case is currently not a practical issue, and it most likely never will be.
Privacy. It is clear that when querying a DNS service, the involved DNS service provider(s) will learn about the names queried. It would thus be a bad idea if a browser vendor decided to query our PSL service for cookie policing, as this would expose important aspects of their users’ browsing activities to SSE and deSEC (the operators of publicsuffix.zone) as well as other parties (when using a resolver). In other contexts, such as during certificate issuance checks, the concern is lessened and a DNS query may be acceptable: the subject names of newly issued TLS certificates are publicly logged anyways, and a DNS query leaks the same information. Privacy concerns thus depend on the specific use case and must be evaluated on a case-by-case basis. By all means, we do not operate the service to collect any data (and in fact we do not keep query logs). We are open to providing full copies of the PSL zone to parties who would like to run a private on-site deployment of the service — just get in touch!
Conclusion
Knowing the public suffix of a given domain name is an important piece of information in several widespread applications. However, keeping an up-to-date copy of the list and parsing it correctly is a challenging task from which many issues arise.
Looking up public suffixes by utilizing a DNS-based representation of the PSL solves these issues, at the same time demonstrating that there are still novel applications to the DNS. While it may be surprising to some, I would like to make the case that this is a great illustration of how DNS is exactly the right technology to tackle the issue of storing information that is closely related to domain names: decentralized, ubiquitous, and, with DNSSEC, authentic.
This service has been made possible through the continued contribution of SSE to the security-first, free, and open-source DNS hosting service deSEC. It was while working on deSEC that I first became aware of the need for ad-hoc PSL lookups. We are glad to have found a low-maintenance solution for our own needs that at the same time we can provide as a free, public service.