index.html: the internet's best directory browsing resources and how to find them.
February 7, 2012 12:45 PM   Subscribe

Open Directories: Occasionally I'll come across an index.html treasure-trove. What are the best and most interesting directory indexes of public-facing (by design or by oversight) public-domain archival materials? How does one find more of them?

My chief interest with this question is finding open directories of large, archival-quality historical maps (such as this one from U of Wisconsin-Milwaukee); but I'm always excited when I come across fantastic directories of other materials (images/pdfs/shapefiles) as well.

Reddit's /r/OpenDirectories is a good resource, but it's a little too warez/porn heavy, and my only interest is in public domain archival materials.

How does one find these? I've had luck with just finding where the image lives, and then looking for an "index.html". Otherwise, I've had mixed results by kludging together some tailored google searches, such as: ...but I'm sure that these can be improved upon. I arrived at them by trial-and-error and it's frustrating to try to discern the efficacy of minor changes to the formula.

Common file types I am looking for are MrSID [.sid]; JPEG 2000 [.jp2, .j2k, .jpf, .jpx, .jpm, .mj2]; TIFF [.tif, .tiff]; and to a much lesser extent Deja Vu [.djvu, .djv]. Though if you know of any other types associate with large images, I'd welcome the input.

Feel free to MeMail me if you don't want to post publically. I will respect your wishes regarding non-/disclosure of whichever resource.
posted by jjjjjjjijjjjjjj to Computers & Internet (8 answers total) 48 users marked this as a favorite
 
Obvious suggestion: don't limit yourself to English. Try the 2nd search but instead of "map OR maps" try "karte OR karten" (German) or "mapa OR mapas" (Spanish).
posted by fings at 1:06 PM on February 7, 2012


Some of the stuff on Fravia's site may still be relevant, such as this.
posted by wilko at 1:43 PM on February 7, 2012




Do you know about the filetype: operator in Google?

for example, entering site:lib.umich.edu filetype:sid gets you 1400+ MrSID images, mostly plants, from their archives. Since google sometimes doesn't index every single page, you can sometimes find more, similar pages by following links.
posted by rockindata at 3:56 PM on February 7, 2012


There's the intitle: operator too.
posted by holloway at 4:25 PM on February 7, 2012


Let's try that again, with encoding
posted by holloway at 4:27 PM on February 7, 2012


The way I did this a few years ago very successfully: I used the localized files on an apache web server to extract the text at the top of a directory index display page in many languages. "index of", "directory index" I forget what it was exactly. Then I did a bunch of google searches. I was looking for images so I used the "linked images" bookmarklet a lot.
posted by Infernarl at 5:47 PM on February 7, 2012


If you combine a site search with the intitle, and add in tif or sid, etc you can get some excellent results.

site:berkeley.edu intitle:"index of" tif

is excellent. 100 mb original scans of archival photos anyone?

Oooh, fun fact: if you start being too clever with the operators, Google will challenge you as to your humanness! I just had to fill out captchas to see my search results.
posted by rockindata at 6:35 PM on February 7, 2012 [1 favorite]


« Older Getting rid of 1980s trading cards   |   Should I try to fix a jerk? Newer »
This thread is closed to new comments.