Help me Tag and Index a Pile of Office Documents
May 31, 2007 5:01 PM
Networked repository for Word, Excel and Powerpoint documents, full-text indexed and tagged. Does it exist?
I'm interning at a small consulting company to earn some cash for grad school. My boss has asked me to research ways to store, full-text-search and retrieve about a thousand MS Office (Word, Excel and Powerpoint) documents that're currently stored in a well-organized (but non-searchable) heirarchy on a Windows server's shared folder. The initial plan was to use a Lotus Notes database (see my last AskMe), but for a variety of reasons that isn't moving forward as fast as he'd like.
The documents each relate to a project the company has worked on. Ideally, I'd like to be able to index the projects such that I can find all documents attached to the "foo" project, or all documents attached to projects for client "bar," as well as full-text search for documents containing "baz." If that's not achievable, I'm looking for customizable metadata fields, but just a general "tags" field or even straight full-text indexing would work if that's all that's out there.
The boss has suggested Google Desktop Search and either copying the archive to each machine or indexing the shared folder on each machine, but I'm convinced there has to be a more elegant solution. Unfortunately, whatever the solution is needs to work in a Windows-only software ecosystem and be relatively inexpensive. I'd love to roll my own php/sql or ruby-based solution, but he wants rapid turnaround and I don't have the skills to do this fast enough, sadly.
Any ideas what's out there, MeFites?
I'm interning at a small consulting company to earn some cash for grad school. My boss has asked me to research ways to store, full-text-search and retrieve about a thousand MS Office (Word, Excel and Powerpoint) documents that're currently stored in a well-organized (but non-searchable) heirarchy on a Windows server's shared folder. The initial plan was to use a Lotus Notes database (see my last AskMe), but for a variety of reasons that isn't moving forward as fast as he'd like.
The documents each relate to a project the company has worked on. Ideally, I'd like to be able to index the projects such that I can find all documents attached to the "foo" project, or all documents attached to projects for client "bar," as well as full-text search for documents containing "baz." If that's not achievable, I'm looking for customizable metadata fields, but just a general "tags" field or even straight full-text indexing would work if that's all that's out there.
The boss has suggested Google Desktop Search and either copying the archive to each machine or indexing the shared folder on each machine, but I'm convinced there has to be a more elegant solution. Unfortunately, whatever the solution is needs to work in a Windows-only software ecosystem and be relatively inexpensive. I'd love to roll my own php/sql or ruby-based solution, but he wants rapid turnaround and I don't have the skills to do this fast enough, sadly.
Any ideas what's out there, MeFites?
I concur. dtSearch is a $200 package, very powerful but probably too pricey for a small lot of 1,000 documents.
Archivarius is another very capable indexed search program, and if I recall correctly the cost is about thirty bucks. It does not handle the wide variety of files that dtSearch does, but it should handle the MS files you mention.
posted by yclipse at 7:32 PM on May 31, 2007
Archivarius is another very capable indexed search program, and if I recall correctly the cost is about thirty bucks. It does not handle the wide variety of files that dtSearch does, but it should handle the MS files you mention.
posted by yclipse at 7:32 PM on May 31, 2007
In order of price AFAIK (dtSearch aside as I don't know much about it):
Google Mini - 5K and up
SharePoint - licensing costs vary, requires server2k3 I believe
Swish-E - free as in speech
posted by datacenter refugee at 9:33 PM on May 31, 2007
Google Mini - 5K and up
SharePoint - licensing costs vary, requires server2k3 I believe
Swish-E - free as in speech
posted by datacenter refugee at 9:33 PM on May 31, 2007
Seconding (strongly seconding) SharePoint. WSS (Windows SharePoint Services v3.0, the core platform/product) is free, and yes, it requires Windows 2003 Server. The "enterprise" product, Microsoft Office SharePoint Server 2007, is overkill for your needs.
Then, you can use custom fields in your document libraries to "tag" or categorize content, set permissions, even trigger workflow actions when a file is uploaded or modified, such as "send to John for approval, then if it's approved, copy the file to this other site."
posted by Merdryn at 8:35 AM on June 1, 2007
Then, you can use custom fields in your document libraries to "tag" or categorize content, set permissions, even trigger workflow actions when a file is uploaded or modified, such as "send to John for approval, then if it's approved, copy the file to this other site."
posted by Merdryn at 8:35 AM on June 1, 2007
This thread is closed to new comments.
posted by trip and a half at 6:19 PM on May 31, 2007