Online folder of searchable pdfs?
December 7, 2021 1:28 PM   Subscribe

I would like people to be able to search the contents of pdfs in an online folder. It is proving to be surprisingly difficult to find a way to do this.

There are 40 PDFs, they have been OCRed and are on average 10-15MB each. I don't want people to have to sign into a service to access the pdfs or search them. Google/Dropbox/ownCloud have publicly available folders, but there's no search function. There are enterprise-level options like dtSearch but they are crazy expensive. Expertrec has $9/month service that does what I'm looking for but surely there's a free way to do this?

Google's Programmable Search Engine seems like the perfect tool but I can't get it to search pdfs. I've tried restricting the search to direct links to the pdfs (including direct links to the files on Dropbox, Google Drive and Google Cloud), I've tried restricting the search to a Google cloud directory that contains the pdfs, I've tried including "filetype:pdf" as default query addition. But nothing seems to work.

I host my own websites, so I guess I could do a roll-your-own approach like Algolia/Firebase/PDFTron or god forbid something that uses a database, but it just seems like there should be a simple tool that does this?
posted by not_the_water to Technology (5 answers total) 3 users marked this as a favorite
 
I think you can do this in Zotero
posted by kbuxton at 3:02 PM on December 7, 2021


Well, my PDFs were on Google Drive and I've now been migrating away from the evil Google to Dropbox. In both situations for years I have been able to search native or OCRed PDFs using Agent Ransack. It provides for Regex for the file name and/or the file contents, shows previews of hits, is very fast, and I still use the free version (I guess I'm cheap).

To be clear I am searching the landing zone of the Cloud service on my Windows machine (I didn't notice you mention an operating system).
posted by forthright at 3:32 PM on December 7, 2021


Response by poster: My question is about an online search tool that multiple people could use. Zotero and Agent Ransack are good ideas but they're local options used by a single user on their own machine.
posted by not_the_water at 4:43 PM on December 7, 2021 [1 favorite]


Programmable Search Engine should work if you convert them to another file format - text or html, for example.
posted by soelo at 5:22 PM on December 7, 2021


Not sure if you already have access or what the cost is if you do not, but Microsoft Teams can do this. I just verified on our system, and its search function was entirely accurate and fast.
posted by wile e at 6:18 AM on December 8, 2021


« Older My Mac has started importing duplicate photos from...   |   Suddenly separated for the holidays - how to not... Newer »
This thread is closed to new comments.