What other strategies are there for seeing CPU bottlenecks in a large PHP application?
December 5, 2011 1:56 PM   Subscribe

How can I track down the cause of periodic spikes in CPU usage caused by a plugin-bloated WordPress install on a LAMP server?

Just a few notes about the setup: about 18k pieces of content, 80k users, 30ish plugins, with the apache and mysql servers running on a Large (7gb ram) AWS instance. I have WordPress-level caching implemented which uses XCache PHP opcode cache. I've set fairly conservative limits on the Apache worker process settings (each process takes up a bunch of memory). I've also run some tuning scripts on the MySQL server and made sure that a 'long query' wasn't the culprit. I've even run a Xdebug profile of a typical page load and couldn't see any obvious bottlenecks. Everything is pretty stable for the most part while I work on reimplementing features that will let me get rid of some of the plugin bloat.

However, once or twice a day, the CPU usage goes nuts for five minutes and causes load spikes of 40-50. Top doesn't show me much except that apache/mysql are getting hammered. Traffic patterns don't seem to effect when it will happen, either, and it happens too irregularly for me to think it's a scheduled task. Apache's error logs don't turn up anything obvious either. I can't recreate the spikes on an identical development server (with zero traffic).

How can I find what URL requests or scripts are directly related to the CPU spike? Is there some way to correlate processor resource consumption with access logs? What other strategies are there for seeing CPU bottlenecks in a large PHP application?
posted by cowbellemoo to Computers & Internet (5 answers total)
Best answer: It sounds like either something triggered by wp-cron.php, or a rude search engine spider, or a combination of both. If you can enable mod_status on Apache, you'll have a better sense of the requests being served during the spikes, and there's a plugin to show the wp-cron tasks on your Dashboard.
posted by holgate at 2:22 PM on December 5, 2011

Best answer: If you know the times that the spikes have happened the raw access logs will include detailed information about every request that was happening at the time.
Also, if you're watching the server while its happening you can run show processlist on mysql to see any queries that are currently being run - that may give you some clues about what is going on.

Of course apache/mysql getting hammered could be a symptom not the direct cause - something else could be going on server-wise that causes apache and mysql to slow down so it looks like its getting hammered.

Do you have logs set to rotate when they reach a certain size? That would account for the irregular timing.
posted by missmagenta at 2:22 PM on December 5, 2011

Response by poster: Thanks!

wp-cron checks out as normal (only the built-in WP update checks). I did figure out how to get a full status listing (including resources per request) from mod_status, so that might help the next time there's a spike. Pouring through the access logs does turn up lots of bots and scrapers, but I don't have a log-crunching tool that will show me abusive patterns. Any recommendations there?

It could very well be an automatic log rotation as well, but it hasn't shown up in top as a red flag. I'm not really familiar with that aspect of server administration and have just left ubuntu's default settings take care of it. I'll see if I can set up logrotate to do its stuff late at night.

There is a plugin which generates a big XML sitemap "when content changes." I'll also try to make that a cron job instead.
posted by cowbellemoo at 7:46 AM on December 6, 2011

cowbellemoo: "but I don't have a log-crunching tool that will show me abusive patterns. Any recommendations there?"

I use awstats and webalizer. Especially if load isn't correlated with Google Analytics, its probably a spider. There was one I used in a grad level web course, but I cannot recall what it was.

Ubuntu's log rotation happens regularly, but shouldn't cause dramatic CPU load; I'm guessing you've got a plugin that doesn't like concurrent requests, or some other interaction at the database level. You can actually take apache logs and feed them to Jmeter to load test against your test server. If that triggers, you can probably just tell the scraper bot to slow down.

I haven't used AWS, but I suppose it's also possible your server is being attacked on the SSH port. Or being used as part of a zombie botnet.
posted by pwnguin at 1:59 PM on December 6, 2011

Response by poster: Just a follow-up:

After totally replacing the old developers' code (theme+plugins) and cutting down the number of total plugins (to around 20 which are mostly-tiny), the load troubles have stopped. The normal idle, which was usually around 0.5 is now consistently 0.01 with no spikes in the logs. Load has totally ceased to be a problem.
posted by cowbellemoo at 8:58 AM on May 19, 2012

« Older When can you sock it to me?   |   What should I watch? Newer »
This thread is closed to new comments.