Screwy DHCP server problem
October 13, 2010 5:23 PM   Subscribe

Help me diagnose and fix my Ubuntu DHCP server problem. I've tried asking a few Linux gurus and either gotten snarled at or ignored, and my google-fu is not getting it (perhaps there is no such problem other than myself). Hope me Metafilter! (detailed more inside)

Background: I have a Ubuntu 10.04 server that is set up as my DHCP, DNS, and Samba server for our Windows XP and 7 computers. It has been running fine for the past three-four months, but starting last week, none of the Windows 7 computers are able to get or re-get a DHCP lease. The Windows XP computers (and the wifi Squeezebox radio and the Blackberry's wifi) are able to get and use their leases just fine.

The wifi router is NOT set up to provide any sort of dhcp or other services. I have unplugged to make sure.

The firewall router "brick" is also NOT set up to provide dhcp services; the only thing it does is provide a NAT route to the internet connection. I have also disconnected this and confirmed it is not the problem. Basically just the server, a dumb 5-port router, and a test Windows 7 system isolated from everything else.

The Windows 7 computers all get the usual "I can't find a dhcp lease, so here's a squirrel" ip address of 169.254.x.x as if they are unplugged from the network.

I do know that I had applied whatever updates there were available to Ubuntu 10.04 Wednesday night; they were not dhcp-related as far as I remember, but there were some ssl and crypto stuff. I do not know how to check for historical updates to the system, nor how to back them out.

Current workarounds:
1) Hard-assigning the ip address on the Windows 7 systems and plugging in the proper values for the gateway and dns servers works fine. But this sucks because two of the systems are laptops and they go out of the office.
2) Turning OFF the Ubuntu dhcp server and turning ON the firewall brick's dhcp server works. Happily, I can override the dns address on it, so I'm able to point it back to our DNS system, but I'm loosing the other dhcp stuff. This is my current keep-it-working-until-fixed solution.

/var/log/syslog shows stuff like this:
Oct 7 16:54:01 LINUXSERVER dhcpd: DHCPDISCOVER from 90:fb:a6:26:ca:64 (WIN7-A1) via eth0
Oct 7 16:54:01 LINUXSERVER dhcpd: DHCPOFFER on 192.168.0.74 to 90:fb:a6:26:ca:64 (WIN7-A1) via eth0
Oct 7 16:54:17 LINUXSERVER dhcpd: DHCPDISCOVER from 90:fb:a6:26:ca:64 (WIN7-A1) via eth0
Oct 7 16:54:17 LINUXSERVER dhcpd: DHCPOFFER on 192.168.0.74 to 90:fb:a6:26:ca:64 (WIN7-A1) via eth0
Oct 7 16:57:37 LINUXSERVER dhcpd: DHCPDISCOVER from 00:23:7a:9e:08:a7 (BLACKBERRY-0D1B) via eth0
Oct 7 16:57:38 LINUXSERVER dhcpd: DHCPOFFER on 192.168.0.75 to 00:23:7a:9e:08:a7 (BLACKBERRY-0D1B) via eth0
Oct 7 16:57:38 LINUXSERVER dhcpd: Wrote 0 deleted host decls to leases file.
Oct 7 16:57:38 LINUXSERVER dhcpd: Wrote 0 new dynamic host decls to leases file.
Oct 7 16:57:38 LINUXSERVER dhcpd: Wrote 8 leases to leases file.
Oct 7 16:57:38 LINUXSERVER dhcpd: DHCPREQUEST for 192.168.0.75 (127.0.0.1) from 00:23:7a:9e:08:a7 (BLACKBERRY-0D1B) via eth0
Oct 7 16:57:38 LINUXSERVER dhcpd: DHCPACK on 192.168.0.75 to 00:23:7a:9e:08:a7 (BLACKBERRY-0D1B) via eth0
Oct 7 16:58:37 LINUXSERVER dhcpd: DHCPREQUEST for 192.168.0.71 from 00:1c:7e:25:88:ac (WIN7-B2) via eth0
Oct 7 16:58:37 LINUXSERVER dhcpd: DHCPACK on 192.168.0.71 to 00:1c:7e:25:88:ac (WIN7-B2) via eth0
Oct 7 16:58:42 LINUXSERVER dhcpd: DHCPREQUEST for 192.168.0.71 from 00:1c:7e:25:88:ac (WIN7-B2) via eth0
Oct 7 16:58:42 LINUXSERVER dhcpd: DHCPACK on 192.168.0.71 to 00:1c:7e:25:88:ac (WIN7-B2) via eth0
Oct 7 16:58:43 LINUXSERVER dhcpd: DHCPREQUEST for 192.168.0.73 from 00:21:5c:30:d3:17 (WIN7-B2) via eth0
Oct 7 16:58:43 LINUXSERVER dhcpd: DHCPACK on 192.168.0.73 to 00:21:5c:30:d3:17 (WIN7-B2) via eth0

over and over. WIN7-A1 and WIN7-B2 (-b2 is a laptop with wired and wireless) are two of the Windows 7 machines. BLACKBERRY-0D1B is my cell phone.

Here is my /etc/dhcp3/dpcp3.conf file:

max-lease-time 86400;
default-lease-time 86400;
option routers 192.168.0.1;
option domain-name-servers 192.168.0.5;
option dhcp-server-identifier LINUXSERVER;
option netbios-name-servers 192.168.0.5;
server-name "LINUXSERVER";
ddns-update-style interim;
# have also tried ad-hoc here
authoritative;
log-facility local7;
subnet 192.168.0.0 netmask 255.255.255.0 {
# ping-check off;
# deny client-updates;
# ddns-updates on;
# tried these to see if there was some strange conflict
authoritative;
option dhcp-server-identifier LINUXSERVER;
option domain-name "local";
server-name "LINUXSERVER";
range 192.168.0.40 192.168.0.249;
option broadcast-address 192.168.0.255;
}
host Confroom {
hardware ethernet 00:15:58:1e:20:ec;
fixed-address 192.168.0.30;
}
host Helpdesk {
hardware ethernet 08:00:27:21:70:f3;
fixed-address 192.168.0.147;
}
host WIN7-B2 {
hardware ethernet 00:21:5c:30:d3:17;
fixed-address 192.168.0.175;
}

Tried wiping the leases file, no effect.
posted by Old'n'Busted to Computers & Internet (14 answers total) 2 users marked this as a favorite
 
Have you tried livebooting *Nix on the Win7 systems? A different OS won't change your MAC address so your config should still work, and dhclient should provide some useful information about what's going on. If they can get addresses under Linux, then it's likely something to do with Windows 7. Did Win 7 update recently?
posted by thesmophoron at 5:35 PM on October 13, 2010


"I can't find a dhcp lease, so here's a squirrel"

Perfect explanation of this effect!

The only thing I see that jumps out to me is the .0.xxx network. It should work, but maybe it doesn't.

Check in that /etc/dhcpd3 directory and see if there are any .conf.old style files that you can use to confirm that your conf is correct. I've had Fedora screw up some conf files before.

WIN7-A1 seems to not be acking, but win7-b2 is asking for and acking two different IP addresses. That doesn't seem right.
posted by gjc at 5:35 PM on October 13, 2010


gjc: WIN7-B2 is asking for and acking two different IP addresses from different MAC addresses -- as the OP said, WIN7-B2 has both wired and wireless interfaces.

Old'n'Busted: Would you post the full logs at pastebin?
posted by thesmophoron at 5:50 PM on October 13, 2010


One further troubleshooting step would be to get a sniffer (like wireshark so you can see if the Win 7 box does see the dhcp offer and what it replies.
posted by splice at 5:59 PM on October 13, 2010


FWIW 192.168.0.x subnet will work, this is what I use at home (although I have a Smoothwall DHCP server and not an Ubuntu).

A bit of Googling gave me this link. Although the solution listed inside didn't seem to work, it is pretty descriptive of your problem (ie not being able to get an IP address from a "non-Microsoft" DHCP server), and worth a try in your situation.

On the surface of it I would think that the problem was with Windows 7 and not Ubuntu, since all the other systems work fine. If I were you I would concentrate my troubleshooting efforts on the Windows 7 boxen first.

Best of luck!
posted by humpy at 6:49 PM on October 13, 2010


Response by poster: humpy: yes, I tried that first off on a test box, but it didn't help, so I left it alone on the other systems.

splice: I'll try wireshark tomorrowish; I do have it installed on one of the systems.

thesopohoron: which full logs are you referring to? syslog? Also, the win 7 systems were not updated in that time frame. I do have one in storage that has been offline for a month and I will plug that in and see what it does. I will also try a try Unix cd rom, but I'm dead certain that it will work and get a lease with no problems.
posted by Old'n'Busted at 7:10 PM on October 13, 2010


I don't know anything about Windows or the answer to your larger question, but you should be able to look in /var/log/dpkg.log to see exactly which packages were upgraded and whether any of them seem like they might be related to DHCP.
posted by enn at 7:20 PM on October 13, 2010


See: The HOWTO for DHCP Server setup. Not sure if it's still the case, but one more thing to check:
Next step is to add route for 255.255.255.255. Quoted from DHCPd README:

"In order for dhcpd to work correctly with picky DHCP clients (e.g., Windows 95), it must be able to send packets with an IP destination address of 255.255.255.255. Unfortunately, Linux insists on changing 255.255.255.255 into the local subnet broadcast address (here, that's 192.5.5.223). This results in a DHCP protocol violation, and while many DHCP clients don't notice the problem, some (e.g., all Microsoft DHCP clients) do. Clients that have this problem will appear not to see DHCPOFFER messages from the server."
Windows is the most fragile IP stack around.
posted by zengargoyle at 8:06 PM on October 13, 2010


Is the clock on the DHCP server correct?
posted by sharding at 11:13 PM on October 13, 2010


If that were my problem, I'd be running Wireshark simultaneously on the Windows 7 box and the Ubuntu server (or using tcpdump on the server if it doesn't have X) so that I could verify that all packets emitted from the DHCP server actually make it to the Windows client and didn't get eaten somewhere by a rogue firewall. Assuming that to be the case, I'd then be switching to a DHCP server that Windows does work with, and looking at Wireshark traces on Windows to find out what's in the packets from the working DHCP server that's different from what the Ubuntu box is handing out. And I'd be swearing a lot and drinking coffee the whole time. But Wireshark is definitely the right tool for this job.
posted by flabdablet at 4:39 AM on October 14, 2010


Response by poster: sharding: the clock is correct, yes. Why do you think if there was a difference it would cause this?

zengargoyle: tried that and still does it.

I have installed a fresh copy of Win7 (sans key) on a test box and the problem still occurs, so I'm certain that it's not an update of Windows that caused the problem.

Looking at the dpkg.log doesn't show any dhcp-releated updates, unless the there is something kernel-wise I'm not clued in all.

I will not be able to run Wireshark on the Ubuntu box since it's a headless server (console only). I have not gotten the time to use it on the Win7 box yet. When I do so, what should I be looking for (I've only ever used it to diagnose soap and php api stuff)?
posted by Old'n'Busted at 7:24 AM on October 14, 2010


tshark is wireshark for console, available in Lucid.
posted by Zed at 9:36 AM on October 14, 2010


Response by poster: Wireshark didn't tell me squat. The dhcp packets were all valid; Windows was just failing. Microsoft's utterly useless forums were no help, and continually referring to (a) the broadcast flag reg hack and (b) how f*cking dare you use Linux.

After everyone bailed for the day, I was able to start over with a blank dhcpd.conf file and built it up one line at a time, cycling the server and test box until I hit the item that broke Windows 7:

option dhcp-server-identifier LINUXSERVER;

Taking this out makes the Win7 boxes work all over again. Note that I had it in there twice, so the second instance was not needed.

These lines have been in the config file from day one, and only until last week did things start to fall over. Experimenting with this showed that the problem is that Windows 7 does not properly using the domain-name "local" value. If I fully qualify the name - "LINUXSERVER.local" - or put the ip address in there, then all is good and fine. But the line itself is rather useless, because by default DHCPD will use the machine's address. The simple solution - ignoring what O'Rielly has to say about it - is to remove the offending line and whistle on past it.

My current theory is that somehow the ip address of LINUXSERVER had gotten cached on the Win7 boxes and they all resolved them locally (thus making it appear that all was good). Then for some reason on about the same day, they all cached out (ha!) and things went to pot.

Interestingly enough, non- Win7 systems do not have this problem, meaning they either properly concat the domain-name value, or else just ignore it and use the ip address of the dhcp server.

I went and cleaned up the redundant lines and fully-qualified the names so that hopefully this problem doesn't reoccur, but it goes into my Panic Binder for future reference.

Thanks to everyone that tried to help.
posted by Old'n'Busted at 5:24 PM on October 14, 2010 [1 favorite]


And thank you for coming back with the solution once found.

If you have time, it might be interesting to find out the consequences of using something other than .local for your TLD. Reason I say that is because I've noticed that .local is the domain that all those pesky automated make-it-work-and-we-don't-care-how squirrels play in.

I've been using .lan on my LANs since noticing that, and I'd be interested to find out whether that particular piece of superstition is in any way justified.
posted by flabdablet at 6:45 PM on October 14, 2010


« Older I fear nothing-- except spiders.   |   Gift ideas for a truck driver Newer »
This thread is closed to new comments.