Improve performance of file-based-database web calendar app
May 23, 2006 12:01 PM Subscribe
I am in charge of a custom installation of a web calendar application (ORS) that is feeling the strain of too many users and the consequences of some unusual policies. I need some advice on how to get better performance out of this application.
I installed ORS about 3 years ago. Some quick facts about ORS: it is written in PHP, and uses CSV files to store all its data - no mysql, no berkeley dbs, just straight CSV files. All the CSV reading/writing routines are handwritten - the author makes no use of php's libraries. Same for the file locking routines - the author uses his own file locking routines, creating, reading, and deleting his own lockfiles.
I customized ORS heavily in a number of ways for our location, the most important being an institution of a weekly time release. In a normal ORS install, blocks of time become available on a rolling basis - at every moment, a block of time some x number of weeks in the future is becoming available. Administrators here felt this was unfair to people who didn't have the freedom (or assistants) to hover over a keyboard at the precise time x weeks before the appointment time they needed. So, instead, every Tuesday at 9:30 AM, an entire week's worth of time 3 weeks in the future all of a sudden becomes available.
Result: every Tuesday at 9:30 AM, a hundred people simultaneously start hammering the server, which immediately stops responding to anyone at all.
What can I do?
The program is currently on a shared machine with the rest of the department's pages. I've vaguely heard mention of things like lighttpd as a possible solution, but I'd love some specific details.
I installed ORS about 3 years ago. Some quick facts about ORS: it is written in PHP, and uses CSV files to store all its data - no mysql, no berkeley dbs, just straight CSV files. All the CSV reading/writing routines are handwritten - the author makes no use of php's libraries. Same for the file locking routines - the author uses his own file locking routines, creating, reading, and deleting his own lockfiles.
I customized ORS heavily in a number of ways for our location, the most important being an institution of a weekly time release. In a normal ORS install, blocks of time become available on a rolling basis - at every moment, a block of time some x number of weeks in the future is becoming available. Administrators here felt this was unfair to people who didn't have the freedom (or assistants) to hover over a keyboard at the precise time x weeks before the appointment time they needed. So, instead, every Tuesday at 9:30 AM, an entire week's worth of time 3 weeks in the future all of a sudden becomes available.
Result: every Tuesday at 9:30 AM, a hundred people simultaneously start hammering the server, which immediately stops responding to anyone at all.
What can I do?
The program is currently on a shared machine with the rest of the department's pages. I've vaguely heard mention of things like lighttpd as a possible solution, but I'd love some specific details.
Well you can do it on a rolling basis. Anyone with the last name A-D at 9:30, E-K at 10:00, etc. and so forth.
Or just schedule it for 1am.
posted by empath at 12:24 PM on May 23, 2006
Or just schedule it for 1am.
posted by empath at 12:24 PM on May 23, 2006
Response by poster: As for (4) - there aren't many. I spent two solid weeks searching for an application like this, and I found exactly one.
posted by dmd at 12:56 PM on May 23, 2006
posted by dmd at 12:56 PM on May 23, 2006
Response by poster: (See screenshots of ORS - we use every one of the features shown in those screenshots, and many many more. ORS is much more than a web calendar - it deals heavily with implementing business rules about who can sign up when, who can cancel when, what the consequences of canceling are (and those consequences can depend on when you cancel, and that can interact with who you are...) ... resources can be double-booked, and second-chance bookings can be promoted to first position, and those promotions don't get treated in the same ways as first-bookings...
So, moving to a totally new piece of software is out.
A complete rewrite is also out, as is trying to change it to not use CSVs at all - the CSV-ness is unfortunately heavily blended into the (very, very unmaintainable) code.
I'm really looking for solutions that involve replacing the back end with something that might give me a bit of a boost - the code itself isn't going anywhere.
posted by dmd at 1:05 PM on May 23, 2006
So, moving to a totally new piece of software is out.
A complete rewrite is also out, as is trying to change it to not use CSVs at all - the CSV-ness is unfortunately heavily blended into the (very, very unmaintainable) code.
I'm really looking for solutions that involve replacing the back end with something that might give me a bit of a boost - the code itself isn't going anywhere.
posted by dmd at 1:05 PM on May 23, 2006
If you're unwilling to change the code, then you're going to be driving up a very steep road.
You could put the app on its own machine and scale the performance of the machine up to match the user load, but that could get expensive.
I'm not sure what sort of answer you expect if new software is out and changing code is out.
posted by yellowbkpk at 1:45 PM on May 23, 2006
You could put the app on its own machine and scale the performance of the machine up to match the user load, but that could get expensive.
I'm not sure what sort of answer you expect if new software is out and changing code is out.
posted by yellowbkpk at 1:45 PM on May 23, 2006
Have you identified the bottleneck at the system level? CPU? Memory? DiskIO? NetworkIO? Since the application itself is untouchable, it sounds like system configuration and hardware are you only options. Hard to say where to start on either of them until you have the best idea of what it is that's holding you back.
A few things you might be able to do without upgrading the hardware that might help:
1) Set up the webserver to limit the number of concurrent requests being processed and place subsequent requests in a queue until a free slot is in place. This will allow more RAM, CPU and DiskIO to go to actually getting work done, rather than being lost to overhead. In addition, by limiting the number of concurrent requests it could reduce the strain on the applications locking mechanism.
2) Experiment with turning HTTP keep-alives on and off. Not sure, but this could make a difference in how server-side resources are allocated and released.
3) Use one of the available PHP optimizers to decreace the overhead of handling each request.
4)Move the data to a less used disk, tune the filesystem parameters, and/or experiment with different filesystems to see if you can improve the performance of the file based locking system.
posted by Good Brain at 2:40 PM on May 23, 2006
A few things you might be able to do without upgrading the hardware that might help:
1) Set up the webserver to limit the number of concurrent requests being processed and place subsequent requests in a queue until a free slot is in place. This will allow more RAM, CPU and DiskIO to go to actually getting work done, rather than being lost to overhead. In addition, by limiting the number of concurrent requests it could reduce the strain on the applications locking mechanism.
2) Experiment with turning HTTP keep-alives on and off. Not sure, but this could make a difference in how server-side resources are allocated and released.
3) Use one of the available PHP optimizers to decreace the overhead of handling each request.
4)Move the data to a less used disk, tune the filesystem parameters, and/or experiment with different filesystems to see if you can improve the performance of the file based locking system.
posted by Good Brain at 2:40 PM on May 23, 2006
Good Brain has covered much of what I was going to say. Your first hope is to change the policy which forces everyone to log on at once. If that won't work, you need to find out what is halting your app. This will be harder on a shared machine so move over to a dedicated server if possible.
Unless you're getting thousands of simultaneous sessions, I doubt that apache is the bottleneck.
posted by blag at 3:27 PM on May 23, 2006
Unless you're getting thousands of simultaneous sessions, I doubt that apache is the bottleneck.
posted by blag at 3:27 PM on May 23, 2006
re: optimisers. Zend has worked well for me in the past - are you running that or similar?
posted by blag at 3:28 PM on May 23, 2006
posted by blag at 3:28 PM on May 23, 2006
Response by poster: Changing the web server and filesystem are the sorts of solutions I'm looking for, but I'd really like some details - I don't really know what I'm doing there. Would something like lighttpd help? Would something like Zend optimizer help?
posted by dmd at 7:21 PM on May 23, 2006
posted by dmd at 7:21 PM on May 23, 2006
We don't know why the app gets bogged down, other than the fact that you get huge spikes of traffic, so we can't tell you what will help.
There is all sorts of information on tuning webservers out there. I suggest you start looking for apache and php tuning guides. If those don't get you where you need to go then you should at least have a better idea of where your bottlenecks lie (if only because you can rule out RAM or CPU). From there, the next step is probably the filesystem and disk subsystem, which is a bit more esoteric these days, but I know the information is out there.
Lighttpd is a great little webserver, but even if you get it set up perfectly for your situation, I doubt it's going to buy you more than 100MB of extra available ram and 10-12% extra available CPU over a properly tuned apache, so just focus on getting apache and php dialed in as best you can.
posted by Good Brain at 12:29 AM on May 24, 2006
There is all sorts of information on tuning webservers out there. I suggest you start looking for apache and php tuning guides. If those don't get you where you need to go then you should at least have a better idea of where your bottlenecks lie (if only because you can rule out RAM or CPU). From there, the next step is probably the filesystem and disk subsystem, which is a bit more esoteric these days, but I know the information is out there.
Lighttpd is a great little webserver, but even if you get it set up perfectly for your situation, I doubt it's going to buy you more than 100MB of extra available ram and 10-12% extra available CPU over a properly tuned apache, so just focus on getting apache and php dialed in as best you can.
posted by Good Brain at 12:29 AM on May 24, 2006
This thread is like the poster child of why half ass PHP programmers that do really moronic stuff (like this "screw conventional wisdom, I'm using these stupid CSV files and I'm reinventing it all") really ruin the whole field for competant programmers.
posted by Rhomboid at 5:29 AM on May 24, 2006
posted by Rhomboid at 5:29 AM on May 24, 2006
Response by poster: Oh, agreed Rhomboid. If I had taken a closer look at ORS, I'd probably have bitten the bullet and written the damn software myself.
The thing was, I had been looking around for some software to do the job, and there's literally nothing else out there that does anything even remotely like this... and then I came across ORS, which checked off every single one of the about 25 must-have features we needed. (The other things I looked at checked off less than 10.)
So, yeah - if you ever want to spend an hour in utter misery and despair, go download ORS and browse through, say, functions.php and mainfunctions.php ...
posted by dmd at 6:54 AM on May 24, 2006
The thing was, I had been looking around for some software to do the job, and there's literally nothing else out there that does anything even remotely like this... and then I came across ORS, which checked off every single one of the about 25 must-have features we needed. (The other things I looked at checked off less than 10.)
So, yeah - if you ever want to spend an hour in utter misery and despair, go download ORS and browse through, say, functions.php and mainfunctions.php ...
posted by dmd at 6:54 AM on May 24, 2006
This thread is closed to new comments.
1) Make the application fast enough to handle the people. In this case, it means using a real database, and rewriting every part of the program to forget about csv files and instead use MySQL or some other database.
2) Make the people slow enough for the application to handle. This is a little difficult, because ideally you'd like one person to be able to sign on, access a number of pages, then sign off, sequentially, without letting too many others compete. You could, for example, only have Apache spawn a max of 2 or 3 child processes. So even if five hundred people try to use it at the same time, most of them will get connection timeout errors rather than actual pages, and the application itself won't be overloaded. It will still be hard for users to complete their scheduling. You could run a reverse proxy to rate-limit the requests... it'll still be a little hard. If your app can handle two users/minute and you have a hundred users competing to use it, nothing is going to make that wait magically go away.
3) Change the department policies. No doubt the developer of ORS put in the rolling access times for exactly this reason - he knew his app couldn't handle vast numbers of simultaneous accesses, it wasn't designed to. So you've undone something the developer did for exactly this reason, and now you're encountering exactly the problem that he solved. At the *least* you could open a day's worth of scheduling every day, rather than a week's worth once a week.
4) Pick a new calendar app. There are many. Ones designed to use a real database should handle many more simultaneous accesses.
posted by jellicle at 12:22 PM on May 23, 2006