Last weekend one of our authoritative name servers
authdns1.csx.cam.ac.uk) suffered a series of DoS attacks which made
it rather unhappy. Over the last week I have developed a patch for
BIND to make it handle these attacks better.
The attack traffic
authdns1 we provide off-site secondary name service to a number
of other universities and academic institutions; the attack targeted
For years we have had a number of defence mechanisms on our DNS
servers. The main one is response rate
limiting, which is designed to
reduce the damage done by DNS reflection / amplification attacks.
However, our recent attacks were different. Like most reflection /
amplification attacks, we were getting a lot of QTYPE=ANY queries, but
unlike reflection / amplification attacks these were not spoofed, but
rather were coming to us from a lot of recursive DNS servers. (A large
part of the volume came from Google Public DNS; I suspect that is just
because of their size and popularity.)
My guess is that it was a reflection / amplification attack, but we
were not being used as the amplifier; instead, a lot of open
resolvers were being used to amplify,
and they in turn were making queries upstream to us. (Consumer routers
are often open resolvers, but usually forward to their ISP's resolvers
or to public resolvers such as Google's, and those query us in turn.)
What made it worse
Because from our point of view the queries were coming from real
resolvers, RRL was completely ineffective. But some other
configuration settings made the attacks cause more damage than they
might otherwise have done.
I have configured our authoritative servers to avoid sending large UDP
packets which get fragmented at the IP layer. IP fragments often get
dropped and this can cause problems with DNS
So I have set
The first setting limits the size of outgoing UDP responses to an MTU
which is very likely to work. (The ethernet MTU minus some slop for
tunnels.) The second setting reduces the amount of information that
the server tries to put in the packet, so that it is less likely to be
truncated because of the small UDP size limit, so that clients do not
have to retry over TCP.
This works OK for normal queries; for instance a
cam.ac.uk IN MX
query gets a svelte 216 byte response from our authoritative servers
but a chubby 2047 byte response from our recursive servers which do
not have these settings.
But ANY queries blow straight past the UDP size limit: the attack
imperial.ac.uk IN ANY got obese 3930 byte responses.
The effect was that the recursive clients retried their queries over
TCP, and consumed the server's entire TCP connection quota. (Sadly
BIND's TCP handling is not up to the standard of good web servers, so
it's quite easy to nadger it in this way.)
We might have coped a lot better if we could have served all the
attack traffic over UDP. Fortunately there was some pertinent
discussion in the IETF DNSOP working
group in March last
which resulted in
"providing minimal-sized responses to DNS queries with QTYPE=ANY".
This document was instigated by
Cloudflare, who have a DNS server
architecture which makes it unusually difficult to produce traditional
comprehensive responses to ANY queries. Their approach is instead to
send just one synthetic record in response, like
cloudflare.net. HINFO ( "Please stop asking for ANY"
"See draft-jabley-dnsop-refuse-any" )
In the discussion, Evan Hunt (one of the BIND developers) suggested an
alternative approach suitable for traditional name servers. They can
reply to an ANY query by picking one arbitrary RRset to put in the
answer, instead of all of the RRsets they have to hand.
The draft says you can use either of these approaches. They both allow
an authoritative server to make the recursive server go away happy
that it got an answer, and without breaking odd applications like
qmail that foolishly rely on ANY queries.
I did a few small experiments at the time to demonstrate that it
really would work OK in the real world (unlike some of the earlier
proposals) and they are both pretty neat solutions (unlike some of the
So draft-ietf-dnsop-refuse-any is an excellent way to reduce the
damage caused by the attacks, since it allows us to return small UDP
responses which reduce the downstream amplification and avoid pushing
the intermediate recursive servers on to TCP. But BIND did not have
I did a very quick hack on Tuesday to strip down ANY responses, and I
deployed it to our authoritative DNS servers on Wednesday morning for
swift mitigation. But it was immediately clear that I had put my patch
in completely the wrong part of BIND, so it would need substantial
re-working before it could be more widely useful.
I managed to get back to the patch on Thursday. The right place to put
the logic was in the fearsome
which is the top-level query handling function and nearly 2400 lines
long! I finished the first draft of the revised patch that afternoon
(using none of the code I wrote on Tuesday), and I spent Friday
afternoon debugging and improving it.
The result is this patch which adds a
I'm currently running it on my toy nameserver, and I plan to deploy it
to our production servers next week to replace the rough hack.
I have submitted the patch to the ISC; hopefully something like it will
be included in a future version of BIND. And it prompted a couple of questions about draft-ietf-dnsop-refuse-any that I posted to the DNSOP working group mailing list.