I work for a small non-profit organization that is drowning in old paperwork, and we’d like to scan it into a searchable electronic archive. Our budget is limited.
I’m looking for an inexpensive, no-frills way to create such an electronic archive. We have a fairly fast, modern multi-function device that can scan the documents into PDF files and put them right on the file server. The challenge, though, is to implement some efficient system to retrieve any given document from the archive.
I was excited to find an open-source (free) program called Mayan EDMS
. However, I had two different people test it independently, and both found it to be buggy and difficult to work with. The support forum wasn’t much help. In any case, Mayan is probably overkill for our needs. I’m not worried about creating different users, with different levels of security permissions. I’m also not overly concerned about seeing little thumbnails of the documents.
To provide some additional background, our most important document trove consists of roughly 20,000 packets of about 10-20 pages each, some of which have hand-written notes on them. Each packet has these critical pieces of information that would have to be searchable: Two ID numbers, and a person’s name & contact information. It wouldn’t be too difficult for me to create an automated way to print a cover sheet containing that information in a fixed location on the page. The cover sheet could be added as the first page of the packet. What the EDMS would have to perform is some kind of reasonably reliable OCR (optical character recognition) to capture that data and associate it with the PDF file. Then, there would have to be a way to search the archive for those pieces of information.
I solicited bids from two IT consulting companies, but their prices were outrageously high. I have access to a few very tech-savvy volunteers who could devote some time to this project, so I’m not necessarily looking for a pre-packaged solution that would work right out of the box. I do have a little bit of money to spend on the project (around $1,000), and I have an unused, modern server that’s already available to me. I envision the data as being hosted on-site rather than in the cloud.
Any thoughts on this?