DNS Response Rate Limiting with Bind
My monitoring setup at my day job recently
noticed one of the external DNS servers responding slower than usual and
I happened to have a few spare cycles so I took a look and found someone
sending us a jillion or so DNS queries for "sgi.co.kr" (ok, that's one
of our domains) and "." (eh? what?). These servers are correctly setup
to NOT answer recursive queries so the query for "." was rightly rejected
but we were sending answers for the other queries.
But was this a lame denial of service attempt on us or an almost as
lame attempt to use us as an amplifier in a DDOS attack? I sent an
email to the owner of the IPs I was seeing the most queries for (every
IP in 4 /24 subnets) and confirmed that they were seeing tons of inbound
DNS responses but no outbound DNS requests. So, yup, it's a DDOS. (sigh)
So, at first glance there didn't seem to be much I could do about it short
of blocking traffic from those subnets, breaking legit DNS lookups (if any).
But then I thought, it's been ages since I read all the bind docs cover
to cover (the last time was when bind 9 was so shiny nobody was upgrading
to it yet - grin). So I searched the web for bind and "rate limit" and
whaddya know, there's a new feature in bind 9.x for doing just that!
Here's the first place I looked.
This link was agreeably
short and to the point too. :-)
So... One quick compile later on a dev system, a one-line tweak to the
options section of my named.conf and bingo-bango, bind is doing rate limits
on responses! Nice. The one thing I'd note in addition to the redbarn
page above is that setting the rate limits to 10 may be too high. The
DDOS'ing bozo I was setting this up in response to was only hitting me at
around 4 or 5 duplicate queries per second per client IP. So I pulled my
rate limits clear down to 3, at least for now. I may bump that up a bit
in time.
I also added a line to our snmpd.conf file:
logmatch rate-limits /data/log/named.log 300 : rate limit
This way, I can graph (in cacti) the number of times bind mentions that it's
rate limiting some IP/network. And now when I glance through the DNS
related graphs, if I see a big spike in that graph I know there's something
"interesting" I might want to investigate.
UPDATE:
I also recommend that you exempt some of your own IP space from the rate
limits. I had some folks using our guest net whose winblows systems were
doing a jillion repeated queries for the usual MS'ish
_some_microsoft_junk.guest.my.domain type of hostnames. You know how MS
clients are. Always trying to do their own dynamic DNS updates and
querying for a pile of stuff starting with underscores. (rolls eyes)
acl my-ip-space {
1.2.3.4; 1.2.3.5; 1.2.3.6;
};
options {
[...]
rate-limits {
responses-per-second 5;
exempt-clients { my-ip-space; };
};
You might not want to include your entire IP space but only the IPs in
your networks that might be sending queries to your non-recursive servers.
For instance, though I threw all of our IP space in the acl as a temporary
band-aid, I will likely be exempting only our recursive servers' IPs and
the IP of our monitoring systems. I gotta do an audit of several days
worth of query logs just to be sure I know what inside IPs are querying
our authoritative servers.
Also, one of the ISC handlers reminded me
of
this link too. We really should be running DNSSEC. That'll have to be next on my
DNS to-do list.