LAMP scaling and NFS. Good Idea or Not?
March 30, 2007 2:41 PM Subscribe
So, I've got myself a web application. LAMP. I need to scale it up. Can I run the php application across multiple machines loaded off a mounted NFS share on another machine?
I'm running gig-e with jumbo frames on a private LAN. Our application is fairly high load, hence the need to scale it up. We can't afford a real fiber SAN, so, can we replicate the benefits of having all of our application scripts live in one place, and each of the web-servers mount the nfs share and serve the application off of the nfs share?
We've thought about other possibilities, such as rsync, etc. But we're concerned about race conditions. Particularly two servers communicating with clients and with the DB using different codebases.
other datapoints:
We use a bytecode cache, called eaccelerator. those caches would be on the local webserver, not on the NFS share.
We use memcache for stuff like sessions across machines.
Would there be significant network throughput setting up our application this way? What are the downsides? Would it outweigh the benefits of having one codebase for our application?
I'm running gig-e with jumbo frames on a private LAN. Our application is fairly high load, hence the need to scale it up. We can't afford a real fiber SAN, so, can we replicate the benefits of having all of our application scripts live in one place, and each of the web-servers mount the nfs share and serve the application off of the nfs share?
We've thought about other possibilities, such as rsync, etc. But we're concerned about race conditions. Particularly two servers communicating with clients and with the DB using different codebases.
other datapoints:
We use a bytecode cache, called eaccelerator. those caches would be on the local webserver, not on the NFS share.
We use memcache for stuff like sessions across machines.
Would there be significant network throughput setting up our application this way? What are the downsides? Would it outweigh the benefits of having one codebase for our application?
What kinds of "race conditions" are you concerned about? Just make sure all of your servers contain the same version of the app and you should be good to go. Why would you need a SAN for loading scripts? There must be something I'm missing.
posted by aeighty at 3:15 PM on March 30, 2007
posted by aeighty at 3:15 PM on March 30, 2007
Oh, and FWIW, I view a single codebase as a disadvantage, not an advantage.
If you setup multiple codebases carefully (generally just a few deployment scripts), then you can do interesting things.
An obvious example: You're an e-tailer, and have a new design. You can set your load balancer to have 10% of your visitors get the server with the new design on it, and then see how it affects sales. If it's adverse, you didn't expose your whole clientèle to it, and can easily back it out.
Or if your site generates advertising money, you can easily play with different advertisers, to see who really gives you the best returns.
posted by Tacos Are Pretty Great at 3:20 PM on March 30, 2007 [1 favorite]
If you setup multiple codebases carefully (generally just a few deployment scripts), then you can do interesting things.
An obvious example: You're an e-tailer, and have a new design. You can set your load balancer to have 10% of your visitors get the server with the new design on it, and then see how it affects sales. If it's adverse, you didn't expose your whole clientèle to it, and can easily back it out.
Or if your site generates advertising money, you can easily play with different advertisers, to see who really gives you the best returns.
posted by Tacos Are Pretty Great at 3:20 PM on March 30, 2007 [1 favorite]
Response by poster: aeighty: We do alot of logging, with heavy dependencies on time, format and uniqueness. I'm concerned that if we make a change to how we store logs in the application, and we have to different systems running simultaneously, there will be conflicts.
posted by Freen at 3:51 PM on March 30, 2007
posted by Freen at 3:51 PM on March 30, 2007
There's nothing wrong with NFS. I run an online service that serves and processes 10s of gigs of audio/video files per day around a vanilla NFS scheme that gives me no trouble at all. Another datapoint: Yahoo Mail runs over NFS.
posted by rhizome at 3:52 PM on March 30, 2007
posted by rhizome at 3:52 PM on March 30, 2007
Response by poster: Tacos: We run a pretty monolithic B2B system. There are no great benefits to running different code on different machines, with the exception of perhaps benchmarking, but that shouldn't be done in production.
posted by Freen at 3:53 PM on March 30, 2007
posted by Freen at 3:53 PM on March 30, 2007
Response by poster: Tacos: we are using a hardware load balancer in the front.
I'm interested in iSCSI. Any pointers?
posted by Freen at 4:13 PM on March 30, 2007
I'm interested in iSCSI. Any pointers?
posted by Freen at 4:13 PM on March 30, 2007
Response by poster: I'm really wondering about whether there are problems running a php application off of an nfs share. Is it going to lock for reads if one server opens a file for reads? etc.
posted by Freen at 4:15 PM on March 30, 2007
posted by Freen at 4:15 PM on March 30, 2007
Best answer: Actually, it's unnecessary to have that big share. It adds more latency and headaches than it's worth in my opinion... your latency will still be huge because then all of the processors are in IOWAIT while they confer with the storage layer.
You can run a Layer 7 routing device, like a Coyote Point Systems load balancer, users will always get sent back to the same server that their session started out on, which solves your race conditions & session management all at once. See below for a typical deployment cycle.
To keep from having different codebases across your servers, maintain your code in a repository, and then push out to the server via update or export.
So a deployment cycle would look like this:
* Announce a day ahead of time that you'll be doing a deployment over x low-load, and things might still be slow while you're working on it.
* X hours ahead of time, drop your session times on the srver down to one hour each. X is the number of hours that your sessions currently last.
* At the beginning of the period, quiesce a server. When it's no longer receiving traffic and all sessions have expired, push an update to it. Then un-quiesce it.
* Repeat the previous point for each server in your cluster.
So -- no issues with sessions, since you're routing at layer 7 which keeps your users hitting the same server their sessions were established on, no issues with race conditions, and no issues with multiple codebases. Does that solve everything? What's even easier is if you can arrange to have everyone be logged out at say midnight and take the system down temporarily while you update everything. We just check quickly to make sure no one's using it, turn it off for a minute, check out the new code and update the database server, and flip it back on.
posted by SpecialK at 4:19 PM on March 30, 2007 [2 favorites]
You can run a Layer 7 routing device, like a Coyote Point Systems load balancer, users will always get sent back to the same server that their session started out on, which solves your race conditions & session management all at once. See below for a typical deployment cycle.
To keep from having different codebases across your servers, maintain your code in a repository, and then push out to the server via update or export.
So a deployment cycle would look like this:
* Announce a day ahead of time that you'll be doing a deployment over x low-load, and things might still be slow while you're working on it.
* X hours ahead of time, drop your session times on the srver down to one hour each. X is the number of hours that your sessions currently last.
* At the beginning of the period, quiesce a server. When it's no longer receiving traffic and all sessions have expired, push an update to it. Then un-quiesce it.
* Repeat the previous point for each server in your cluster.
So -- no issues with sessions, since you're routing at layer 7 which keeps your users hitting the same server their sessions were established on, no issues with race conditions, and no issues with multiple codebases. Does that solve everything? What's even easier is if you can arrange to have everyone be logged out at say midnight and take the system down temporarily while you update everything. We just check quickly to make sure no one's using it, turn it off for a minute, check out the new code and update the database server, and flip it back on.
posted by SpecialK at 4:19 PM on March 30, 2007 [2 favorites]
Why are you using a hardware load balancer if you've only got one machine behind it ... ?
posted by SpecialK at 4:20 PM on March 30, 2007
posted by SpecialK at 4:20 PM on March 30, 2007
Response by poster: SpecialK: No, we got a load balancer for the new, multi-machine setup.
And, I believe it can do what you are talking about.
posted by Freen at 4:24 PM on March 30, 2007
And, I believe it can do what you are talking about.
posted by Freen at 4:24 PM on March 30, 2007
OK, awesome. There's a couple of articles in my blog, which is linked in my profile, that talk about load balancing and/or Coyote Point Systems, which is teh frickin' awesome as far as I'm concerned for load balancing -- their load balancers are cheap, easy to use and configure, blindingly fast, and have pretty much every feature you could want but caching at half the price of the cheaper of f5's BigIP load balancers.
posted by SpecialK at 4:51 PM on March 30, 2007
posted by SpecialK at 4:51 PM on March 30, 2007
Probably already been said. With an L4 you'd be ok. Since you can turn sticky on to handle php's sessionkeys, or, have them live NFSed too, but the latter would mean you should make sure to have locking done right.
We run large clusters here, some with php. But our NFS is NetApp to Solaris. Linux's NFS implementation is not great, but should work ok.
posted by lundman at 4:59 PM on March 30, 2007
We run large clusters here, some with php. But our NFS is NetApp to Solaris. Linux's NFS implementation is not great, but should work ok.
posted by lundman at 4:59 PM on March 30, 2007
Can the application not be taked down even to push new versions of the software?
If you can, I strongly suggest packaging any production software up in the native package format for the linux you are using (rpm,deb, etc). Then just push the latest version of the software packages to servers when they are down for the outage.
What other data is shared across the apps? If sessions are already shared via memcache, you can probably get away with a dumb load balanceing setup (RRDNS in the simplist case...)
If different code versions talking to the db are a real concern, it might be wise to implement a version/capability check in the code/db itself to prevent it.
That said, NFS would probably work. You shouldn't really be doing that much i/o on the app code itself, even less so with a bytecode cache or accelerator.
posted by alikins at 5:19 PM on March 30, 2007
If you can, I strongly suggest packaging any production software up in the native package format for the linux you are using (rpm,deb, etc). Then just push the latest version of the software packages to servers when they are down for the outage.
What other data is shared across the apps? If sessions are already shared via memcache, you can probably get away with a dumb load balanceing setup (RRDNS in the simplist case...)
If different code versions talking to the db are a real concern, it might be wise to implement a version/capability check in the code/db itself to prevent it.
That said, NFS would probably work. You shouldn't really be doing that much i/o on the app code itself, even less so with a bytecode cache or accelerator.
posted by alikins at 5:19 PM on March 30, 2007
Tacos: We run a pretty monolithic B2B system. There are no great benefits to running different code on different machines, with the exception of perhaps benchmarking, but that shouldn't be done in production.
Even for things like this, it can be useful to roll out the code to a very small percentage of users at first, just as an extra verification that it isn't failing horribly.
I know that we have a business-critical site that we treat that way, even for things like minor version updates of the OS or associated applications. After all, it's happened more than once that moving from x.y.z to x.y.z+1 has exploded something in a way that wasn't caught by our existing unit tests.
I'm not sure where to read more about iSCSI, but if you look at SANs, you'll find that more and more of them support it. Basically it's the same as an FC-based SAN, except it uses ethernet as a transport, so you get some cost savings from the fact that it's built on commodity parts.
As such, we setup iSCSI devices 2xGigabit channelled connections, then run 1xGB from each server to the storage area GB switch. Works fantastic, and it's performance competitive with our older FC shit.
posted by Tacos Are Pretty Great at 11:58 AM on March 31, 2007
This thread is closed to new comments.
If you're using the L part of LAMP, you could consider deploying GFS, also. It's somewhat terrifying, but I know people who use it in production.
NFS is such an awful shitty protocol, and the Linux implementation has been so uneven over the years that my gut reaction is to run away from it, even though it will probably work reasonably okay.
That said, I think you're missing the big problem, which is what to use as a load balancer in the first place. If you just do RRDNS, you're going to hate life, as any single failure will end up bringing down the whole thing, and if one machine starts getting a little too loaded at random, it'll snowball.
If you use a decent load balancer, then you can map users to particular servers, and you can easily knock a server off-line, do maintainance/upgrade whatever, then bring it back, without having any silly DNS trickery, or concerns about stupid providers that ignore your short TTL and rewrite it to be 3600 seconds.
posted by Tacos Are Pretty Great at 3:12 PM on March 30, 2007