At one time, when my job actually entailed managing nameservers and namespace, I was pretty good at troubleshooting DNS problems. Often just a few queries with dig would tell me what was wrong, especially if the issue was a common one. Lame delegation? Easy. Forgotten trailing dot? Piece of cake.
Last week, though, one of our excellent support engineers here at Infoblox asked for help with a DNSSEC validation failure. A customer was trying to look up the A records for atmos.pds.nasa.gov and was getting a SERVFAIL error in reply. I could reproduce the error, which is sometimes half the battle, but couldn’t immediately determine its cause. Here’s what appeared in my nameservers syslog output:
Feb 11 21:33:42bigmo named[67012]: lame-servers: info: error (no valid RRSIG) resolving'atmos.pds.nasa.gov/DS/IN': 198.116.4.181#53
Feb 11 21:33:42bigmo named[67012]: lame-servers: info: error (no valid RRSIG) resolving'atmos.pds.nasa.gov/DS/IN': 198.116.4.189#53
Feb 11 21:33:42bigmo named[67012]: lame-servers: info: error (no valid RRSIG) resolving'atmos.pds.nasa.gov/DS/IN': 198.116.4.185#53
Feb 11 21:33:42bigmo named[67012]: lame-servers: info: error (no valid DS) resolving'atmos.pds.nasa.gov/A/IN': 198.116.4.181#53
Feb 11 21:33:42bigmo named[67012]: dnssec: info: validating @0x8373000: atmos.pds.nasa.gov A: bad cache hit (atmos.pds.nasa.gov/DS)
Feb 11 21:33:42 bigmo named[67012]:lame-servers: info: error (broken trust chain) res
Two excellent online tools for checking DNSSEC setup, DNSViz and VeriSign Labs DNSSEC Debugger, didnt uncover any obvious problems withthe chain of trust. But mydiscrete queries showed some interesting results:
- The authoritative name servers for nasa.gov and pds.nasa.gov were the same.
- nasa.gov was signed.
- A query for ANY records for pds.nasa.gov returned an SOA record, NS records, and an A record, but no DNSSEC records. So pds.nasa.gov existed, but didnt appear to be signed. Thats okay, of course. In this case, the parent zone indicates that the child zone isn’t signed by omitting a DS record for the child zone.
- An explicit query for a DNSKEY record for pds.nasa.gov returned NOERROR, indicating that pds.nasa.gov existed (of course), but that it owned no DNSKEY records. This confirmed that pds.nasa.gov wasn’t signed.
- An explicit query for DS records for pds.nasa.gov returned NXDOMAIN, meaning pds.nasa.gov didn’t exist! That really threw me. Of course pds.nasa.gov existed — we just saw that it did in the results fromthe previous queries!
Luckily, the smart folks on the DNS Operations mailing list (dns-operations@lists.dns-oarc.net) figured out what was going on. The parent zone, nasa.gov, was missing delegation for pds.nasa.gov. When I queried the name server for ANYrecords, the name server responded in its role as an authoritative pds.nasa.gov name server, giving me the SOA, NS and A records it found. Ditto when I asked for DNSKEYrecords. But when I asked the same name server for pds.nasa.govs DS records — which could only be stored in the nasa.gov zone — the name server responded in its role as an authoritative name server for nasa.gov, saying that there was no such domain name as pds.nasa.gov (because there was no delegation in that zone). If I’d thought to set the DNSSEC OK bit in the query, I also would have seen the NSEC records that proved that the domain name didn’t exist. In that context, anyway.
The administrator of nasa.gov fixed this just as soon as it was reported to her, and now everybody’s happy.
Except me. I’m left feeling uneasy. First, I’m facing the stark realization that my troubleshooting skills have atrophied alarmingly, and that I need to review my DNSSEC theory, but I’m also beginning to see how tricky troubleshooting DNSSEC validation issues can be. Of course, this particular problem likely won’t occur often — its a corner case — but there will be plenty of others. How often will folks who encounter these problems throw up their hands in frustration and tear out their DNSSEC configurations? Who will they turn to when they need help?
For my part, I’m going to start following the DNS Operations mailing list more closely and put more time in at the command line. And I’d suggest that those of you starting out with DNSSEC subscribe to the list, too.