Good hygiene for Securing DNS (Part 2)

Picking up where we left off from part 1, we are going to cover some of the more advanced DNS Security topics. If you haven’t read the part 1 one yet, please do.

As with part 1, the info is pretty generic, but the screenshots are from the Infoblox Grid Manager GUI.

Let’s take a look at the authoritative side first.

A lot of organizations have what is called split view (or split horizon) DNS. This just means that the same Domain Name exists inside the company as outside the company but has a different dataset. If you do use split view DNS, then you will want to make sure that only those IPs that should get a specific view of data actually get the data. One reason this is important is that if you use a secondary service, you don’t want your external answers being used internally and causing breakage (think internal only apps or sites). Another is internal answers leaking out to the internet and giving a potential attacker a leg up on reconnaissance. I can’t tell you how many times I’ve seen misconfigurations of the type above cause havoc and inconsistent results. Use match destination in addition to match client to help mitigate the detrimental effects.

DNSSEC Signing your zones

If you decide to be a good netizen and sign your zones, then you have some decisions to make. The first of which is NSEC or NSEC3. NSEC was the first of the two specs but had the unfortunate side effect of being able to enumerate all the data in the zone. This effectively amounts to a zone transfer of the data, NSEC3 was created to resolve this issue. Some organizations dislike the idea of their zone data being enumerated but then some consider this security through obscurity. Personally, I think it is security through obscurity, but I’d rather make it as difficult on an attacker as I can — maybe that is just me. There are performance implications between NSEC and NSEC3 but if you stick with the Infoblox defaults the difference in burden between the two should be negligible. In general, you want to make the bar high enough that it isn’t easily solved and reversed (or more accurately a collision found). But also, the bar shouldn’t be so high that it takes too long to verify. The Infoblox defaults for salt length are between 1 and 15 octets and 10 hash iterations.

Now let’s talk about the Key Signing Key (KSK) and Zone Signing Key (ZSK). Current NIST guidance is 1-2 years on the KSK and 1-3 months on the ZSK for the lifetime of the keys before they should be rolled. Additional guidance from NIST on the use of RSA keys, in general, is that it should be 2048 or more and that SHA-1 should not be used for new hashing. This leaves us with RSA/SHA-256 with a size of 2048 or better for both KSK and ZSK. The rest of the defaults are fine, except if you upgraded from a release prior to 6.11 NIOS. Prior to 6.11 pre-publish was not an option. In that case, make sure to switch your ZSK rollover method to “Pre-publish”. This will save you some reply size space, and some additional headache if you use an HSM to sign your zone.

On Infoblox, when you enabled DNSSEC Signing of the zone data it will also increase your object count because of the additional records added. So, talk to your SE to make sure you have the right sized solution.

Next, we will cover Response Rate Limiting (RRL).

This feature is intended to be used on authoritative servers. We will cover the recursive counterparts a little later on in the article. RRL does exactly what the names says, it limits the responses sent, but only over UDP, and it does so by putting clients into buckets based on network masks. Using this feature will keep you from being part of a DDoS without dropping real queries in a significant way. Over IPv4 it uses a /24 bucket, and for IPv6 it uses a /56 bucket for the credit/debit response per second. The exact number you want to set your limit will be highly dependent on your configuration and what your normal traffic patterns are, but the Infoblox default is 100. Once a given bucket reaches the limit, the “SLIP” comes into play. The slip is to indicates which of the over the limit queries are to be answered with a truncation message, which would tell a real client to retry the query over TCP. The default value on Infoblox is 2, meaning that every other packet gets get the TC message telling the client to retry over TCP.

There is some disagreement over what the value should be, and it really depends on your environment and goals. Some default configurations go with 1 while others use 2. The reason for going with 1 rather than 2 is to slow down the Kaminsky attack. Using a value of 2 brings the time execute a successful Kaminsky style attack down significantly. Fortunately, we have a different solution for that called DNSSEC – see the above and we will ignore the adoption rates for the purposes of this post. Using a slip value of 1 means that there will be more TCP conversations for DNS which is much more expensive in terms of resources used on the authoritative name server. So, if your customers are doing validation (like Google’s public DNS and Comcast), then your better option for the slip value is 2. If, on the other hand, your customers are not likely to have DNSSEC validation enabled, then you may want to set your slip value to 1. Unfortunately, there isn’t an Infoblox GUI widget that you can use for this. Instead, you will need to break out the CLI for this (show dns_rrl and set dns_rrl).

If you are self-hosting zones you may benefit from the PT Platform which has a smart NIC that can further protect the host appliance from external attack using a ruleset that Infoblox produces and publishes.

Now we will take a look at the recursive side.

The first thing we want is to make sure we don’t make another DNS Admin’s life miserable. There are a few ways we can handle this. The first way we covered in part 1 which is to use a third-party resolver service for recursion. Most definitely if you aren’t going to use a third-party resolver service you will need to consider the recursive counterparts to RRL. Even if you are going to use a third-party recursion service, you will want to consider the measures.

Recursive hold-down and limits

The first item we need to consider is going to be the hold-down for non-responsive servers. This will actually protect your recursive server from bad upstream servers – bad here could be intentional, under attack, or unintentional configuration errors. After that, we want to set some limits on recursion per server and then per zone. This is so we don’t put unnecessary stress on the upstream name servers. If you are using a third-party recursion service then you will want to tweak these numbers or uncheck the box, specifically you will probably want to bump the hold down timeouts up some and lower the duration to 30. You will also want to uncheck the “Limit recursive queries per server” box if you are using a third-party for recursion. If you are not using a third-party recursive service, then the Infoblox defaults as shown below should be pretty good for all but the very largest ISP installments.

Now that we’ve taken care of not making other DNS admin’s lives harder than necessary, let’s talk about Response Policy Zones (RPZ). RPZ was created to solve the problem of malicious domains. It is a zone just like any other zone if you were to look at the contents. We only know it is an RPZ because we tell the NS that it is to be used for policy decisions. Those policy decisions are encoded using the existing DNS language. Before RPZ, in order to block a C&C or other malicious domain, you would have to create the zone on your infrastructure which could be problematic and potentially expensive in terms of resources. With RPZ there is no need to do that and you can just return substitution (or any of the policy actions like NXDOMAIN) for a single host in the domain in question. There are a few RPZ feed services out there – including several Infoblox feed options – that will help keep you safe from current threats. Typically, you want your RPZs a close to the clients as possible. This is so you can identify the specific client that has the issue right away without chasing down logs from other systems. The RPZ “hits” are logged – if configured to do so – in the standard CEF format that your security people will be able to ingest using their tools. There is one more thing about RPZ but we will talk about that in the DNSSEC Validation section.

DNSSEC Validation

Validation takes advantage of the zones that have been signed and “chained” to their parent zones which at some point are chained to the root zone. Thankfully it is easier to configure than to explain how it works. There are these things called trust anchors (sometimes called TAs). As it turns out this is just the RRData from DNSKEY Resource Records (RRs) where they have the “Secure Entry Point” (SEP) bit set (called the Key Signing Key or KSK). Currently, the root zone has 2 KSKs keys published. These are the 2010 Key which is what originally signed the root zone and the 2017 key which was scheduled to go into production back in October of 2017 but was put off due to concerns from ICANN and the community. Once you have installed these keys into your recursive resolver configuration, your server will validate answers for zones that are signed and chained to their parent all the way up to the root zone. If there is any part of the chain missing your server will not validate that zone even though it is signed. That is unless you add a specific TA that would allow validation of that zone. That is not a common configuration; more on that in a bit.

And now to explain how the chain of trust works*.

*TLDR: Complicated, but it works kind of like the certificate authority model, but there is only one root. And the parent doesn’t need to give you’re the signed version because it is in DNS. The chain is established by using a special record called a Delegation Signer (DS). This record has the label of the delegation just like an NS record. But it is different because it only exists in the parent zone and not the delegated (or child) zone. The part that makes all this work is that the RRData of the DS RR is the fingerprint of the KSK of the delegated zone. As with most any other RRSet, it gets an RRSIG (Resource Record Signature) from the ZSK for the zone. The DNSKEY RRSet can be a special case, by which I mean that there are several methods that can be used in the signing operation. One method only signs the DNSKEY RRSet with the zone KSK, the other method (which Infoblox uses) is to sign the DNSKEY RRSet with the KSK and ZSK. Further to that, each of the above could use pre-publish or double signature. Using the DNSKEY RRSet and the configured TAs, we should be able to follow the KSK signature to the ZSK DNSKEY(s) which would sign a DS RRSet that leads us down to another KSK which signs as described above.

If you have split view DNS, then you will also probably want to install a negative trust anchor for those zones that are signed externally. Or sign them and install a TA before enabling validation – this is not common in my experience. If you don’t do one or the other, then your internal zones will show as bogus and your domain will disappear internally which would be not good.

Remember how I said there was another consideration for RPZ? RPZ is a way of intentionally lying in DNS. This is diametrically opposed to why DNSSEC exists, so if we want to be able to tell some lies to clients downstream for a “good reason” then we need to tell our recursive resolver to apply the RPZ anyway.

Below is a screenshot of where you configured the trust anchors, negative trust anchors, and policy overrides for DNSSEC in Infoblox at the grid level.

Thanks for your time, and I hope you find it useful.

Donald Rudder

Manager of TAM Accounts at Infoblox