Scanning - how much?
May 16, 2011 11:02 AM   Subscribe

I have been asked to provide a quote for digitizing a bunch of documents. What is the going rate for this type of work? More details inside.

About 1500 pages, organized as newsletters in 5 page groups. Basically, I'll have to manually scan each one, as the auto scanner I have access to will only accept individual 8.5 x 11 pages. They want pdf's.

Straight forward request, right? I simply do a test run, time it, and figure out how long it will take to do the entire job. But how do I calculate what to charge? Hourly? How much per hour? Or base fee?

They are also interested in searchability. Is it worth buying OCR software? In my experience, it is not very reliable, although my exposure has been limited. How do we tackle searchability, & how much would I expect to pay someone to create a database that could handle search requests across the entire lot?

Is anyone doing this type of work and can provide some insight?

This is not the usual type of archiving with which I am involved :)

I do see quite few "how much do I charge?" askme's so sorry in advance if I missed a similar question. I did already read this one.

Let me know if I can provide any additional detail and thanks!
posted by archivist to Technology (6 answers total) 3 users marked this as a favorite
You can get yourself Fujitsu ScanSnap, it will save a lot of your time. As for the OCR, I think ScanSnap comes bundled with with an OCR software. Also, what I find convenient for OCR'ing is Adobe Acrobat. It will straighten the image automatically and create an indexable/searchable PDFs. Charging hourly is best :-)
posted by bbxx at 11:16 AM on May 16, 2011 [1 favorite]

How much is your time worth? $10/hr, $15/hr, $20/hr??

Figure out how much you can scan and ocr in an hour then do the math. A bulk feed scanner should be able to handle paper copies quickly unless they have to be handled specially. Also add in the time for your scanner or specialized software such as Abobe Acrobat Pro. It sounds like the organization of the documents will be the most time-consuming.

Depending on the quality of the documents, OCR can do a pretty good job. I use ABBYY and have good success but still have to go through and proofread for accuracy and make minor corrections.
posted by JJ86 at 12:10 PM on May 16, 2011

We do this type of work, but on high-volume machinery and software. Our main scanner does 60 sheets per minute, both sides scanned in one pass, far faster than any retail auto-feed scanner.

Going rate for straight scanning-to-image for only 1,500 pages would probably be around $0.20 per delivered image; OCR would probably add a few cents per page -- but if we have to verify or correct for OCR's failings it's much more expensive (like more than double). We bill by the image mostly because it's easy for the customer to verify that they're being billed properly, since they can count the images, too. Billing by the page doesn't work so well if there's a wide variance of time involved for things like bad originals or difficult-to-read images that need tweaking or corrections.
posted by AzraelBrown at 12:24 PM on May 16, 2011

Send your stuff to AzraelBrown, pay him or her the $300, charge the client $750, and go home.
posted by DarlingBri at 1:00 PM on May 16, 2011 [5 favorites]

Best answer: Definitely agree that you should outsource the actual doc capture step, if possible, to someone with the right equipment and experience, who does this all day. Get a quote from a few sources and build your quote to the client up from there, using a fair hourly rate for yourself plus a decent profit margin.

Are these 8.5"x11" sheets stapled together (or bound in some other way) into 5-page groups, or are they bigger sheets, like 11x17s, that have been folded?

If it's the latter, and you're not allowed to cut the sheets down, things might get a lot more interesting. While I'm sure that 11x17 high-speed sheet fed scanners exist, I've never seen one. Most of the true production scanners (1 sheet/sec or faster), which are really what you want, are letter or A4 machines. If you can't cut the sheets, I think your cost to scan 11x17 is going to be much higher than letter. Hopefully that is not a requirement. Also you'd then have to deal with cutting and reassembling the pages into the correct order digitally. To be honest if you can't cut the sheets then I might think about treating them like books and doing them on a copystand type device, but I think there you're talking about something like $1/page because it's much more time-consuming than using an ADF scanner.

Anyway, even if they are just letter sheets you're going to have to "prep" them before scanning. Maybe the person you contract with to do the actual scanning will do this, or maybe it'll fall on you; that's something to negotiate. All staples will need to be removed, you'll need to somehow break up the stack into documents (e.g. with barcoded index pages) so that the scanner will know how to divide up the output, make sure that pages aren't stuck together or misoriented, etc. The time to do this is frequently longer than the time to do the actual scanning, and it's not a lot of fun. (At least it doesn't look like a lot of fun to me; YMMV.)

It's definitely a tractable project though; 1500 pages is not that much once you move up from consumer equipment and software. You just don't want to buy it for one project.
posted by Kadin2048 at 11:51 PM on May 16, 2011

Best answer: Thanks for the vote of confidence, DarlingBri, but the question was how much people charge, not who's available to do it :)

I had a thought after the fact, which I almost just sent to archivist directly, but felt I should share here for everybody's benefit:

If you're going to go the outsourced route, looking at what's being asked for, it's essentially converting a hard-copy to a searchable PDF - pretty much doing what Project Gutenberg and Google Books do. This falls into the category of e-book conversion, and there's a lot of companies in that line of business these days. Most advertise that they'll work off a writer's manuscript, so while their regular business model might not be exactly the same, digitizing the originals and returning an accurate textfile of the contents would seem to be right up their alley.
posted by AzraelBrown at 4:32 AM on May 17, 2011

« Older Scientist seeks scientific shrink   |   Help me find a good background checking company. Newer »
This thread is closed to new comments.