DNS Filtering with Bind Response Policy Zones

UPDATED 08/30/2013

I stumbled across a bug in RPZ in bind 9.9.4rc1 and earlier. Apparently, if your RPZ (response policy zone) is a slave zone, when that zone changes the rpz-ip records in it will cease to function until you kill and restart named. Running kill -HUP won't fix it - gotta kill and restart the daemon completely. Doh.

The good news is there's already a patch for it and I just tested the patch and it seems to work fine again. So expect the patch to fix this in release 9.9.4 (the official release).
I recently setup DNS filtering at work with BIND using Response Policy Zones (RPZ). It's a really cool new feature of BIND. Basically, I can tell BIND "if you see a query for a particular hostname or domain, rewrite the answer to be something else" or "if a query resolves to a particular IP address or subnet, rewrite the answer to be something else." I think you can also tell it that if the authoritative DNS server providing the answer for the query is a certain IP/host to rewrite the answer to something else.

Basically, what this means is that you can rewrite the answer to be "." (so the query fails to resolve) or rewrite it to something else, say, a web server that responds to all URLs with "Sorry, you can't get there from here" or some other policy message. (grin) Pretty cool.

Why is this cool? Well, given a list of domains known to host hostile content (viruses, malware, spyware, whatever), you can create a zone of filtering rules to prevent your users from being able to resolve these domains to IPs. Or if you want to block something for policy reasons, like, say, thepiratebay.com you can. Or given a list of IP addresses known to be hostile or compromised and likely to be hosting malware, spyware, porn, or whatever you can prevent DNS from resolving any hostname that points to them. Slick.

Also, it's another spam-filtering tool. Say you see a spammer who's registered a few hundred domains but only has a few DNS servers providing DNS service for them all, or there's only a limited number of IPs their MX records resolve to. You can put in a rule to prevent anything that resolves to those IPs from resolving properly, and now your mail server won't be able to validate MX records for any of those domains, no matter how often the spammer switches domains or registers new ones. I've sometimes found spammers using a DNS provider that provides service for (as near as I can tell) nobody but spammers, malware-authors and other no-goodniks. Well, setup RPZ to refuse to resolve anything hosted by this DNS provider's nameservers. :-) Slick.

At work, I've got one or two DNS secondary servers/proxies in bunches of different geographies. So at the time of this writing, I've got eight secondary servers. Before RPZ, if I wanted to prevent a domain from resolving, I had to go touch named.conf on all 8 servers (for reasons I won't bore you with, named.conf is just different enough on several of the servers I can't easily manage one named.conf centrally). But now that I've got an RPZ zone, I just make one change to the zone on the master DNS server, run rndc reload rpz.sgi.com and all my DNS secondaries see the change and start enforcing any change in policy immediately.

So enough about how cool RPZ is. Here's how I set it up. First off, you can find some documentation on BIND RPZ stuff here (search for Response Policy Zone). In my case, I wanted to build an RPZ zone based on a few sources of info. First, I wanted to be able to provide my own RPZ rules. But I also wanted to automatically include rules for known hostile domains (fetched from malwaredomains.com, and some rules for known compromised IPs from www.emergingthreats.net. Lastly, I knew there might be some IPs that were in the "known compromised" list that I might want to exempt, say, a partner's website that might have been compromised but that the company must be able to access.

So, I made a script to download newer (if any) sources of domains and IPs, a few scripts to convert them to RPZ zone rules, and a makefile to glue everything together and nudge BIND into re-reading the zone. Right now, I'm editing a file containing my own rules and the header of the zone file with the SOA record and running the make file manually. Someday in the not too distant future, I'll have a cron job that will automagically update the serial number in the SOA record and fetch updated domain/IP data and re-generate the zone.

I'll include a tarball of the makefile and scripts I use to generate a zone.rpz.sgi.com file. There's also a README file with some basic description of how it all works, though it's pretty darn simple. Edit static.rpz.sgi.com, increment/update the serial number in the SOA record for the zone, make any other custom changes you want, then do make update-all to download any new malware domains and bad IPs, then do make to generate the fresh zone and run rndc reload. Note that the providers of the IPs and domains would rather you don't fetch updates more than once a day at the time of this writing, so you probably shouldn't run make update-all very often. If you're just adding your own rule to static.rpz.sgi.com, just edit the file, increment the serial number, and run make. Easy-peasy.

In your named.conf file, in the options section, you should have these lines
response-policy {
zone "rpz.sgi.com";

And somewhere in the named.conf file you'll want to specify the rpz zone as being a type=master or type=secondary zone. The master server, obviously, should be setup to use the generated zone.rpz.sgi.com file, and the secondary servers setup to pull updates from the master for this zone. Each DNS proxy configured to use the RPZ zone must be either the master or a secondary server for the zone or it won't work.

At the time of this writing, I'm doing some minimal monitoring of RPZ rewrites. All of my DNS servers have a logging section that looks like this:
logging {
channel "queries" {
file "query.log" versions 10 size 120m;
print-time yes;
channel default_syslog {
syslog local4;
severity debug;
category lame-servers { null; };
category default { default_syslog; };
category general { default_syslog; };
category security { default_syslog; };
category config { default_syslog; };
category queries { "queries"; };

Then I have syslogd configured to have local4.* written to /var/log/named. Then if I want to see any RPZ rewrites that happened because a hostname resolved to a bad IP or subnet, I do:
grep "rpz-ip" /var/log/named |more

If I want to see all the hostnames that got rewritten also, I do:
grep ": rpz Q" /var/log/named |more

Lastly, I have this in my snmpd.conf file:
logmatch rpz-matches /var/log/named 300 : rpz QNAME

And then I have cacti setup to query for these logmatch rules and to graph each one I select. So, in theory I should be able to graph every single query that gets rewritten.

All in all, this is working quite well so far...