DNS Response Rate Limiting with Bind


My monitoring setup at my day job recently noticed one of the external DNS servers responding slower than usual and I happened to have a few spare cycles so I took a look and found someone sending us a jillion or so DNS queries for "sgi.co.kr" (ok, that's one of our domains) and "." (eh? what?). These servers are correctly setup to NOT answer recursive queries so the query for "." was rightly rejected but we were sending answers for the other queries.

But was this a lame denial of service attempt on us or an almost as lame attempt to use us as an amplifier in a DDOS attack? I sent an email to the owner of the IPs I was seeing the most queries for (every IP in 4 /24 subnets) and confirmed that they were seeing tons of inbound DNS responses but no outbound DNS requests. So, yup, it's a DDOS. (sigh)

So, at first glance there didn't seem to be much I could do about it short of blocking traffic from those subnets, breaking legit DNS lookups (if any). But then I thought, it's been ages since I read all the bind docs cover to cover (the last time was when bind 9 was so shiny nobody was upgrading to it yet - grin). So I searched the web for bind and "rate limit" and whaddya know, there's a new feature in bind 9.x for doing just that!

Here's the first place I looked. This link was agreeably short and to the point too. :-)

So... One quick compile later on a dev system, a one-line tweak to the options section of my named.conf and bingo-bango, bind is doing rate limits on responses! Nice. The one thing I'd note in addition to the redbarn page above is that setting the rate limits to 10 may be too high. The DDOS'ing bozo I was setting this up in response to was only hitting me at around 4 or 5 duplicate queries per second per client IP. So I pulled my rate limits clear down to 3, at least for now. I may bump that up a bit in time.

I also added a line to our snmpd.conf file:
logmatch rate-limits /data/log/named.log 300 : rate limit

This way, I can graph (in cacti) the number of times bind mentions that it's rate limiting some IP/network. And now when I glance through the DNS related graphs, if I see a big spike in that graph I know there's something "interesting" I might want to investigate.


UPDATE:
I also recommend that you exempt some of your own IP space from the rate limits. I had some folks using our guest net whose winblows systems were doing a jillion repeated queries for the usual MS'ish _some_microsoft_junk.guest.my.domain type of hostnames. You know how MS clients are. Always trying to do their own dynamic DNS updates and querying for a pile of stuff starting with underscores. (rolls eyes)

acl my-ip-space {
1.2.3.4; 1.2.3.5; 1.2.3.6;
};

options {
[...]
rate-limits {
responses-per-second 5;
exempt-clients { my-ip-space; };
};


You might not want to include your entire IP space but only the IPs in your networks that might be sending queries to your non-recursive servers. For instance, though I threw all of our IP space in the acl as a temporary band-aid, I will likely be exempting only our recursive servers' IPs and the IP of our monitoring systems. I gotta do an audit of several days worth of query logs just to be sure I know what inside IPs are querying our authoritative servers.

Also, one of the ISC handlers reminded me of this link too. We really should be running DNSSEC. That'll have to be next on my DNS to-do list.