With DNSSEC’s Red Letter Day, July 1, approaching, it’d be easy to neglect another DNS milestone, passed on May 5. For the first time, the root zone contains delegation to non-ASCII domain names. Gone are the days of just A to Z and 0 to 9, with dash added for spice. Today, if you look closely at a copy of the root zone, you’ll find delegation to XN–WGBH1C, XN–MGBERP4A5D4AR and XN–MGBAAM7A8H.
Wait a minute–those are ASCII domain names, too – albeit cryptic ones, aren’t they?
Yes, but they’re specially encoded domain names. Using a technique called IDNA, for Internationalized Domain Names in Applications, software can now encode characters from the whole world’s scripts into ASCII. The characters are taken from Unicode, a standard that encodes characters from 90 of the world’s scripts, totaling more than 107,000 characters, from Arabic to Yi. (Yi? Yeah, I’d never heard of it either.) The resulting domain names look weird (for example, they all start with “XN–,” as you can see above), but IDN software has no trouble decoding them into appropriate-looking characters.
This enables native speakers and writers of languages such as Hebrew and Chinese to type domain names in their far-from-ASCII scripts into IDN-capable web browsers and other software. The web browsers encode the domain names in ASCII and pass the encoded domain name to a stub resolver, which queries a recursive name server for the odd-looking domain names just like they were any other. Now that the root zone contains delegation to these three IDN top-level domains, a few domain names can even consist entirely of non-ASCII characters. Ignorance of ASCII (or the Roman alphabet) is no longer an excuse for not using the web!
Perhaps the best news of all is that we don’t need to upgrade any DNS infrastructure to make this work. Since the encoded domain names are still ASCII, standard resolvers and name servers can handle them. It’s up to applications to encode and decode the funny domain names. And most modern web browsers already support IDNA (though a much smaller number of other applications do).
But what do the first three IDNs represent? Turns out all three are Arabic, and represent, respectively, what we call Egypt, Saudi Arabia and the United Arab Emirates. (The traditional ASCII TLDs for the countries are .eg, .sa and .ae.)
We’re likely to see more of these soon: ICANN says they’ve received 21 applications for new IDNA country-code top-level domains in 11 languages, with ten more awaiting only the country’s request to delegate the new ccTLD.
Boy, between DNSSEC and IDNA, this is the most excitement this neighborhood has seen in years!