A little bigger than a thumbnail
February 13, 2008 1:36 PM   Subscribe

Resizing & storing many images on a webserver: limits, options, etc?

I'm working on a web app that currently follows the following method for image uploads:

1. upload via PHP
2. generate new unique name for the file & its thumbnails
3. generate thumbnails via GD2
4. save image files all in one directory.

My concerns are basically that this might need to be scalable and flat directories, GD2, and 128MB of VPS memory just don't sound convincing in that respect. I'm wondering if there's a smarter option that I'm missing. I was earlier considering storage in S3 and doing thumbnails in EC2 - anyone have experience doing this? I'm really wondering about how difficult it is to use EC2, since it appears to be optimized for big, big scenarios.

Will A3 essentially serve these static files? Also, does anyone have knowledge about GD2 vs. Imagemagick vs. any other library that might optimize for memory?

Thanks for any help!
posted by tmcw to Computers & Internet (8 answers total)
 
You might get better responses if you expand those acronyms. I've been in computers for ages and photography for a few years and I have no idea what you're talking about, except for ImageMagick.
posted by chairface at 1:55 PM on February 13, 2008


One thing to consider is how often the resizing will happen.

If you resize and store that on your server for later delivery, you just need to consider how many will be uploaded at any time. If at peak load, you're only really going to be uploading and resizing a couple of images at a time, I'd just go with gd2 and call it a day.

I wouldn't recommend using it to dynamically resize things every time someone wanted to view a page. That wouldn't be a good thing on a shared server of any type.
posted by advicepig at 2:06 PM on February 13, 2008


chairface: GD is an image-processing library; Simple Storage Service (S3) and Elastic Compute Cloud (EC2) are Amazon's Web services for storage and processing respectively. A3 is a typo (I think).

At Foneshow we serve some series icons from S3, that part's easy, just alias a hostname like static.yoursite.com to s3.amazon.com and create a publicly readable basket with the same name. And EC2 works just fine with just one small instance running for $2.40 per day. Applications running EC2 can now poll SQS for free, so the simple scalable architecture there is to have your hosted Web app upload the files to S3 and queue a message on SQS with the key; then your EC2 instance can poll for the message, get the file from S3, make the thumbnail, and stick it back in S3. If it gets busy, you just start more instances from that AMI and let them compete for SQS messages.
posted by nicwolff at 2:24 PM on February 13, 2008


if possible, do yourself a favor and switch to ImageMagick as soon as possible. GD2 runs within the php child, so resizing any image over 640x480 will pretty much take down the process you're using. On our server, we'd have children allocated 64megs of memory die trying to resize a 1280x1024 image.

While I'm not an expert on server tuning or linux, my understanding is that in your typical usage scenario, you'd run IM as a niced shell process, meaning that 1) the php child won't crash because it just launches the exec and continues, and 2) the process shouldn't significantly slow down your server, because it'll only use 'free' resources. This CAN mean that your user may experience a delay between uploading a file and being able to see a thumbnail, but in my experience with image-resizing this is a non-issue (transcoding videos, on the other hand ....)

also, if you haven't already and it fits your purposes, consider using a flash library for uploading (with a fallback to standard form uploading). The benefits include being able to select multiple files at once, which is a HUGE win for most users.

I don't have any experience with EC2, unfortunately. Will probably take a closer look at it later this year.
posted by fishfucker at 2:24 PM on February 13, 2008


I should probably say that AFAIK, GD2 is compiled into or a module of php and thus runs within the memory limit of the child. I'm more than happy to be corrected on this point if my understanding of the situation is wrong.
posted by fishfucker at 2:27 PM on February 13, 2008


You likely wouldn't be disappointed if you check out S3 + EC2. They talk wicked fast to each other. I helped with an app where I setup Apache on the EC2 to proxy requests for images to an S3 bucket -- very slick and responsive, nice URLs, and no DNS jiggery-pokery.

For your webapp, the main advantage of running it on EC2, of course (other than the strong connection with S3), is that you can scale it rapidly, if it's designed that with that in mind.
posted by so at 7:57 PM on February 13, 2008


Response by poster: @advancepig: I'm resizing images once - on upload. I've seen the other style (as well as the terrible 'html method') and their consequences.

I'm going to check out EC2 and S3... S3 seems like a much simpler thing that EC2, and when SQS gets involved.... I'm guessing that my server load wouln't be all that bad.

Does anyone know if there are dire consequences to storing possibly thousands of images in one Linux dir?
posted by tmcw at 8:32 PM on February 13, 2008


Linux (with the ext3 filesystem) can handle thousands of files in one dir easily, but it's good practice to subdivide the directory by the first couple of letters of the filename.
posted by nicwolff at 9:10 PM on February 13, 2008


« Older Which is the best speaker for my bass amp?   |   Joomla - can anyone do it? Newer »
This thread is closed to new comments.