DNS behind L4
July 23, 2006 10:33 PM   Subscribe

Authoritive DNS behind L4?

It has been quite a while since I looked at DNS, and I don't recall any particularly difficult parts, but my Manager asked some questions I need to answer with authority.

Assume we host a bunch of authoritive-only DNS servers, say 4. Customer queries, and cache/recursive lookups are done on a different system, and different servers.

It was suggested previously that we could start with a system with an L4 on top, with the external IP. Then have 4 DNS servers on the inside. The content will be identical on all servers in the cluster, running bind. Probably bind with DLZ.

Now there should be no zone transfers required in this setup.

However, if queries are over 512 bytes, the queryers (should) switch to TCP. Would it matter if the UDP and subsequent TCP queries go to different servers in the cluster. Ie, are there any "session keys" or similar content that require the same server both times. (IIRC there is not, and it should just work if the L4 has both UDP and TCP settings).

Will SPF system influence this as well, as it was mentioned.
posted by lundman to Computers & Internet (9 answers total)
 
DNS is a very lightweight protocol. I'm not sure why you're wanting to load-balance it this way... it's sort of self load-balancing. And load-balancing on DNS at all is very, very unusual. You would have to have a site the size of Amazon's to want to do something like that.

If you really think you need four DNS servers, make four DNS servers. Publish them in WHOIS. Put a master server behind them and configure them all as slaves. If you make a change, the real master server notifies all the slaves, which then transfer the new zone file in. Everything just works.

Again, you would need a bloody ENORMOUS amount of DNS traffic to need four servers, much less a load balancer. A pair of P3/1Ghz machines will serve most busy domains more than adequately. DNS packets are very small and cached remotely for some time. The entire server runs very comfortably out of RAM and never hits the disk at all.

If you're worried about keeping the service up 24x7, I'd suggest spreading your servers around geographically, rather than load-balancing them in a single spot.

That said, I don't see any reason why you COULDN'T do what you're talking about. You'd run the risk of not updating all your servers in perfect sync, which could cause a little trouble, but it would otherwise work. It's an _extremely_ strange design, butchering the DNS protocol's built-in failover, and spending money you almost certainly don't need to spend, but it would work.

SPF is just a TXT record. It works like any other DNS query.
posted by Malor at 6:02 AM on July 24, 2006


What is an L4? Why do you need DLZ?

If the L4 thingy means TCP-aware load balancer, then I would say you should have no problems. You can easily prove this to yourself with dig and ethereal.
posted by popechunk at 6:12 AM on July 24, 2006


On lack of preview, Malor is 100% correct.
posted by popechunk at 6:14 AM on July 24, 2006


Again, you would need a bloody ENORMOUS amount of DNS traffic to need four servers

Or, a business need to not want to put all your publicly-facing services at the mercy of one wanker with a backhoe. Geodiversity is a good thing when designing highly-available systems.

Of course, that could be accomplished with 3 servers, or even 2 - you don't really need 4 authoritative machines.
posted by deadmessenger at 8:57 AM on July 24, 2006


What they said. You really need to put your DNS service in geodiverse locations.

Or farm it out to someone to does it for a living.
posted by baylink at 2:10 PM on July 24, 2006


Response by poster: DNS is a very lightweight protocol. I'm not sure why you're wanting to load-balance it this way... it's sort of self load-balancing. And load-balancing on DNS at all is very, very unusual. You would have to have a site the size of Amazon's to want to do something like that.

Don't know Amazon's size, but 2nd largest ISP in Japan. So we do have a metric craplode of zones, and multi-lingual zones.

If you really think you need four DNS servers, make four DNS servers. Publish them in WHOIS. Put a master server behind them and configure them all as slaves. If you make a change, the real master server notifies all the slaves, which then transfer the new zone file in. Everything just works.

That is the current system, and it is no longer scalable / fast-enough.

To add a (new) zone, into named.conf, then reload master takes anything up to 15 minutes (dual 3.8Ghz) servers. We could keep throwing hardware at that, this is true, but it also means there is no adding of zones _while_ it is reloading.

So, exploring, and engineering a new system, which is real time. This is why we looked at DLZ, which appears to work rather well. You do make a valid point in that it could just be 4 servers. I think that stemmed from a Legacy requirement when they were also resolvers, and having all customers experience a timeout (yes, just a few seconds) is not acceptable according to "upstairs".

And it was a chance to try something new.

..also thought I'd check how technical questions would go down on Ask.Mefi, and I must say it is nice to get intelligent replies.
posted by lundman at 8:12 PM on July 24, 2006


Wow, that's on a scale I've never worked with. 15 MINUTES for a restart? Are you memory starved?

If you're saying that your workhorse servers, the ones all your customers use, are the same ones you're authoritative with.. yeah, I'd definitely split those two functions apart, just on general principles.

You're dealing at a scale I've never worked with, and for me to give you a really intelligent recommendation would take a few days, likely, of research and thought. Offhand, I'd say you probably want a database backend instead of flat text files, and a server that will load and publish new domains on the fly, without requiring a restart. I'm not familiar with DLZ, but if it will do that, it might be a good solution. The true root nameservers are able to handle this kind of size without a problem, so it obviously be DOABLE. Expensive, I'm sure, but doable.

15 minute restarts is crazy stuff, man. Ouch. You must be hosting tens of thousands of domains.
posted by Malor at 12:32 PM on July 26, 2006


sigh. "obviously must be".
posted by Malor at 12:33 PM on July 26, 2006


Response by poster: Wow, that's on a scale I've never worked with. 15 MINUTES for a restart? Are you memory starved?

Memory does fine once it is in there, actually tend to run out of stack first, but that is easy to fix. Just some of the Jr's tend to start named manually instead of the init.d scripts, so then it doesn't have the stack size directives and cores soon enough.

I think the named is only using about 200Megs resident, it's the loading, parsing, and zone-transfers at startup that seems to take longer.

If you're saying that your workhorse servers, the ones all your customers use, are the same ones you're authoritative with.. yeah, I'd definitely split those two functions apart, just on general principles.

Yeah, Legacy. Before my time. But in their defense, when they started it was small. But I agree, I want it separate.

The root servers can do it as they don't care about interactive/real-time changes. I just changed my Zones over and it took 3 days. Customers want changes in 3 seconds :)

Anyway, I think L4 is over-engineering now. Would have been fun to try, but maybe not great long term. Since it is authoritive only, it is ok if one of the servers is down for a bit, customers will not really notice.
posted by lundman at 7:49 PM on July 27, 2006


« Older It's my own damn file...   |   How can I attach a golf club to a motorcycle? Newer »
This thread is closed to new comments.