Mysql... why won't you accept more connections?
November 16, 2006 4:20 PM   Subscribe

Mysql5... why won't you accept more connections from a set of IP addresses?

Here's kind of a weird one -- we've got a mysql 5 server in a cluster that does a fair amount of i/o. After awhile, it stops accepting connections (or slows way down) from the other machines in the cluster, but if I try to connect from a totally different box, it's lightning fast.

The server itself isn't running a load, nor is it coming close to the number of max_connections/max_user_connection limit. (It is, however, using a single account from all those connections.)

The code connecting to the db is PHP and we do close() the connection.

The mysql itself recognizes that there are "Failed attempts," but it's not clear to us at all what's causing them.

The box is ubuntu, if that helps. The hardware is pretty substantial....
posted by ph00dz to Computers & Internet (12 answers total)
What does mysql> show processlist; look like? What about your \s ... what's the stats look like for slow queries?
posted by SpecialK at 4:47 PM on November 16, 2006

sounds like the issue explained here in the documentation.
posted by jimw at 5:01 PM on November 16, 2006

So... it's not the obvious stuff:

Our max_connection_errors is set to 10,000,000

I also trapped all the relevant code and am having it email me if it errors out. It isn't throwing a normal sql error, at any rate...

Along those same lines, flush hosts seems to have no effect.

In the process list, almost nothing is listed. 2 entries... something like that.

Also, the queries aren't particularly complicated. Simple reads/inserts... nothing too fancy.

Weird, eh? I'm kind of at a loss as to how to debug this...
posted by ph00dz at 5:57 PM on November 16, 2006

I'm kind of at a loss as to how to debug this...

I've always found that a packet trace from a network analysis tool is indispensable in tracking down these kinds of problems. If you aren't already familiar with these tools, two free utilities are Wireshark (formerly Ethereal) and tcpflow. My favorite commercial package is EtherPeek.
posted by RichardP at 6:18 PM on November 16, 2006

Just as a followup -- it appears to constantly failing to connect to the machine. Connectivity isn't a problem -- it pings alright.... something else is going on.

The strange part is that this worked just fine all morning under heavy loads...
posted by ph00dz at 7:39 PM on November 16, 2006

Ok, phoodz, you've got to try to isolate the issue. you can't just flail around and try random stuff.

A few suggestions for starting points:
RichardP had a good point. Run wireshark or tcpdump on the mysql box, and see if you can figure out where packets are bouncing. You have ping, but that's not the end-all-be-all of connectivity; could there be an issue with a firewall rule or something else that's discarding a certain percentage of TCP packets but not ICMP packets? At this point you don't know. Note that you might have to take most of the cluster down to do it.

One other thing that came to mind is network hardware. Do you have something turned on in the switch that would be limiting the amount of traffic on specific ports? What kind of distance is between the machines in the cluster?

Sure, max_connection_errors is set high, and so is max_connections in all likelihood. Have you checked your max_queries_per_hour on your user account? (Hint: it's a grant priviledge.) ARE you testing (at lightning speed, from a different host) with the same user account that is used by the cluster? If you try to connect to the mysql server from a server that IS slow using a separate user account, does the speed issue happen? Have you tried assigning different user names or other contexts to the slow cluster machines?

What about trying to switch to persistent connections? They really are a lot faster. You might test and see if you can get better performance out of the persistent connection, because there could be some as-yet undiscovered bug that you're just now running across.
posted by SpecialK at 9:06 PM on November 16, 2006

Cool, specialk... that gives me some stuff to eliminate.

To answer your questions -- we were testing the connection to mysql from a different host and ran queries with no trouble. We also connected locally and ran queries. It was all good.

Also, the only other thing running on that machine, a pretty simple php script that dumps data into the db, continued running fine.

It's the ability of the other machines in the cluster to interact with that box that's so puzzling.

We went through all the php code and started seeing connect errors all over the place. It was random -- but as soon as we turned the full blast of data loose on that box, connections started getting more and more delayed from the machines involved.

This is a logging box... so right now, we're using a db outside of the cluster all the way out at another isp. So, it's not like the amount of data moving through the entire network is diminished at all. The network switch is nothing fancy -- all the machines involved are connected up through a simple switch.

I see how to set max_queries_per_hour, but I don't see how to check it. I'll have to poke around a bit.
posted by ph00dz at 9:36 PM on November 16, 2006

max_queries_per_hour should be somewhere in the User database.

My guess is that it's something either with a network part in between the two ISPs limiting the connection rate (Quality of Service degredation for high rate of connection, or something similar... ISPs play funky games like that all the time because some PHB thinks it's a good idea). Best way to test this would be to log into one of the cluster machines, and see if you can't hit the mysql box at a faster rate with a different user account. If you get a faster rate with a different user account, good, otherwise it's an ISP traffic shaping issue (in my book) and this means that you should move the MySQL db to be local to the cluster, even if you just install it on one of the cluster machines.

It's all about ruling things out.
posted by SpecialK at 9:47 PM on November 16, 2006

Let me ask you this -- at what point does mysql start to get crazy, table-wise? The tables involved here are large... 3+ million rows.
posted by ph00dz at 9:56 PM on November 16, 2006

ph00dz writes "Let me ask you this -- at what point does mysql start to get crazy, table-wise? The tables involved here are large... 3+ million rows."

Depends what you're doing with the table, the type of table, and the indices on it. 3M isn't that big, and it shouldn't be refusing connections.
posted by orthogonality at 7:32 AM on November 17, 2006

So... it would appear that specialK was right!!!

We switched over to persistent connections and we're rokken' like dokken...

Thank you so much, guys. Maybe now my corporate masters will let me out of the programming dungeon for more mountain dew...
posted by ph00dz at 8:09 AM on November 17, 2006

By the way, just to follow up for anyone else experiencing this problem, this was invaluable to figuring out what was going on:

mysqladmin -m -v -i10 processlist status;

It's kinda like top, but for mysql. Basically, our stuff was opening a billion connections to the server, more than it could handle opening in a small period of time.

So... to get around it, we used sqlrelay to absorb a lot of this load -- it's like a proxy server for mysql (and other things), so you end up opening a billion connections on that thing instead of your main server, which could die. It's a little goofy and weird to configure, but it really seems to be working great.
posted by ph00dz at 8:31 PM on November 21, 2006

« Older Please help me find a new direction   |   Good free/cheap XML-and-XSL application? Newer »
This thread is closed to new comments.