Join 3,439 readers in helping fund MetaFilter (Hide)


Looking to go paperless but afraid to shred everything.
April 10, 2009 1:24 AM   Subscribe

I'm looking to digitise all of my documents or as many as possible, but am slightly weary about shredding everything what sorts of documents should i keep original paper copies of and which should I digitise? I ask because I don't want to end up in a sticky legal situation where I would require the original paper copy.

I believe that it should be fully legal to use a digital copy of a once physical document but I could be wrong I need some advice on this because I really want to buy a scansnap scanner from fujitsu but just aren't sure, yet.

Also on a side note these scanners convert the scans directly into pdf's is there a good search tool to help in finding text in documents for linux, or am i going to have to settle for my winblows box?
posted by Chamunks to Technology (6 answers total) 6 users marked this as a favorite
 
Documents: What to Keep - What to Store - What to Shred
posted by sharkfu at 1:36 AM on April 10, 2009 [8 favorites]


This is a project I've been trying to do myself. One thing I noticed is that scan-only devices with automatic document feeders are more expensive than multi-function devices with automatic document feeders. My experience with technology says that neither will be more reliable than another.

Legalities- you are probably right in most cases. The physical document is usually just a reflection of an agreement made. If both parties continue to agree, they you have no issue with using a copy. But if that's the case, you really wouldn't ever need the paper. When you need an original document is when you need to prove that some agreement had been made when the other party is giving you grief.

Obviously, the Big Important Documents like those listed in sharhku's link you need to keep real copies of. And silly stuff like old phone bills, you don't. But for the marginal stuff, like the receipt for the garage door opener with the 10 year warranty, it's a crapshoot. Those marginal cases I think you need to look at on a case-by-case basis and envision what the worst case scenario would be. Is the potential loss worth the effort?

What I plan to do is scan the marginal stuff, but also keep copies. But I plan to keep those copies in the most convenient way- in a box, in the basement. Or out in the garage. Or at my parent's house. I could get to it if I absolutely had to, but otherwise, I don't have to deal with it at all.

I look at it like this: If you were in court, having to prove when you bought the garage door opener, which would be more credible: "I scanned the receipt and threw away the original" or "I lost the original in a basement flood, but I have a copy."
posted by gjc at 5:31 AM on April 10, 2009


If you're scanning documents, and they're going directly to PDF, you may not be able to search for text inside them. Scanners record pixels, not words, and the contents of the PDF are probably just JPEG images of your pages. OCR is required to actually interpret text and save it in a way that can be parsed or searched. OCR is good, but not great, so it takes work to verify it was correct. As for scanners: as price goes up, speed is the biggest improvement. Consumer-grade, off-the-shelf scanners maybe do one or two pages a minute; high-volume production scanners can do a couple a second. If you've got thousands of pages, take that into consideration: you'll be spending a lot of time watching the scanner feed. As far as keeping documents: the link above is a good one, but you definitely need to keep hard copies of certain documents. The company I work for digitizes documents, and we do destroy stuff at the customer's behest after it is scanned sometimes, but most of the time the customer (bank/county/hospital) gets the documents back with the intention of putting the originals in cold storage. Digital documents can be altered or deleted in their entirety, so you'll always need a hard copy of certain things for undeniable proof (one exception are faxes: for some strange reason, a fax stands up as closer to an original than other scanned documents. Not sure why.)
posted by AzraelBrown at 7:37 AM on April 10, 2009


If it is important,don't shred it. It is important to remember that paper documents have an archival quality that electronic documents don't. Although electronic documents can sometimes be easier to store, catalog, and access, for posterity's sake paper is the only medium. Scanned documents are the equivalent of copies and are not equivalent to originals.
posted by JJ86 at 8:07 AM on April 10, 2009


A lot of the scanning software will do OCR and store the OCRed text along with the bitmap scan in the PDF. Even imperfect OCR can help a lot with search. I don't know how to search that in Linux, but I wouldn't be surprised if something like SOLR could handle it (though there is probably something lighter weight).
posted by Good Brain at 10:03 AM on April 10, 2009


I'm in the process of doing this myself, too, using the high-end MFP at work. Make sure the scanner you use can do automatic OCR; that'll save you a step. I have an automated Acrobat script that OCRs the docs in a batch, but it requires the full version of Acrobat (not Reader). I'm sure there are tools for Linux that can do the same. Beyond that, the biggest time investment is going to be naming and sorting your digital files; it's at least as much of a hassle as dealing with the paper originals. You can't just grab a stack of paper and sort on the fly; you have to view the files, check that they scanned correctly, name them, edit any metadata for searching later, and put them in the correct folder. It's really kind of a major project.

As far as originals not to discard, the list linked by sharkfu is a good guide. Anything that could conceivably be entered as evidence in a legal proceeding (of real financial or personal consequence) should be an original document with a digital copy. (actually, anything with a foil seal or watermark should be kept; unless they have anti-copy marks, scanned copies can be pretty difficult to distinguish from originals).

You might look at document organizing software such as DEVONThink or similar. I believe the pro version of DT has integrated PDF OCR and sorting features.
posted by Chris4d at 12:03 PM on April 10, 2009


« Older Is there any chance I can stil...   |  LadyFilter: Can menstrual cups... Newer »
This thread is closed to new comments.