index.html: the internet's best directory browsing resources and how to find them.
February 7, 2012 12:45 PM Subscribe
Open Directories: Occasionally I'll come across an index.html treasure-trove. What are the best and most interesting directory indexes of public-facing (by design or by oversight) public-domain archival materials? How does one find more of them?
My chief interest with this question is finding open directories of large, archival-quality historical maps (such as this one from U of Wisconsin-Milwaukee); but I'm always excited when I come across fantastic directories of other materials (images/pdfs/shapefiles) as well.
Reddit's /r/OpenDirectories is a good resource, but it's a little too warez/porn heavy, and my only interest is in public domain archival materials.
How does one find these? I've had luck with just finding where the image lives, and then looking for an "index.html". Otherwise, I've had mixed results by kludging together some tailored google searches, such as:
Common file types I am looking for are MrSID [.sid]; JPEG 2000 [.jp2, .j2k, .jpf, .jpx, .jpm, .mj2]; TIFF [.tif, .tiff]; and to a much lesser extent Deja Vu [.djvu, .djv]. Though if you know of any other types associate with large images, I'd welcome the input.
Feel free to MeMail me if you don't want to post publically. I will respect your wishes regarding non-/disclosure of whichever resource.
My chief interest with this question is finding open directories of large, archival-quality historical maps (such as this one from U of Wisconsin-Milwaukee); but I'm always excited when I come across fantastic directories of other materials (images/pdfs/shapefiles) as well.
Reddit's /r/OpenDirectories is a good resource, but it's a little too warez/porn heavy, and my only interest is in public domain archival materials.
How does one find these? I've had luck with just finding where the image lives, and then looking for an "index.html". Otherwise, I've had mixed results by kludging together some tailored google searches, such as:
- “index of /map OR maps” jp2 OR j2k OR tiff OR tif -html -htm -download -links
- “index of /” sid OR jp2 OR j2k OR tiff OR tif map OR maps -html -htm -download -links
Common file types I am looking for are MrSID [.sid]; JPEG 2000 [.jp2, .j2k, .jpf, .jpx, .jpm, .mj2]; TIFF [.tif, .tiff]; and to a much lesser extent Deja Vu [.djvu, .djv]. Though if you know of any other types associate with large images, I'd welcome the input.
Feel free to MeMail me if you don't want to post publically. I will respect your wishes regarding non-/disclosure of whichever resource.
Some of the stuff on Fravia's site may still be relevant, such as this.
posted by wilko at 1:43 PM on February 7, 2012
posted by wilko at 1:43 PM on February 7, 2012
Would directories of digital libraries point you in the right direction?
Something like http://babel.hathitrust.org/cgi/mb?a=listcs;colltype=pub#all
or http://en.wikipedia.org/wiki/List_of_digital_library_projects
or http://archiveshub.ac.uk/index.html
or http://www.oclc.org/contentdm/collections/default.htm
posted by woodman at 3:05 PM on February 7, 2012
Something like http://babel.hathitrust.org/cgi/mb?a=listcs;colltype=pub#all
or http://en.wikipedia.org/wiki/List_of_digital_library_projects
or http://archiveshub.ac.uk/index.html
or http://www.oclc.org/contentdm/collections/default.htm
posted by woodman at 3:05 PM on February 7, 2012
Do you know about the filetype: operator in Google?
for example, entering site:lib.umich.edu filetype:sid gets you 1400+ MrSID images, mostly plants, from their archives. Since google sometimes doesn't index every single page, you can sometimes find more, similar pages by following links.
posted by rockindata at 3:56 PM on February 7, 2012
for example, entering site:lib.umich.edu filetype:sid gets you 1400+ MrSID images, mostly plants, from their archives. Since google sometimes doesn't index every single page, you can sometimes find more, similar pages by following links.
posted by rockindata at 3:56 PM on February 7, 2012
The way I did this a few years ago very successfully: I used the localized files on an apache web server to extract the text at the top of a directory index display page in many languages. "index of", "directory index" I forget what it was exactly. Then I did a bunch of google searches. I was looking for images so I used the "linked images" bookmarklet a lot.
posted by Infernarl at 5:47 PM on February 7, 2012
posted by Infernarl at 5:47 PM on February 7, 2012
If you combine a site search with the intitle, and add in tif or sid, etc you can get some excellent results.
site:berkeley.edu intitle:"index of" tif
is excellent. 100 mb original scans of archival photos anyone?
Oooh, fun fact: if you start being too clever with the operators, Google will challenge you as to your humanness! I just had to fill out captchas to see my search results.
posted by rockindata at 6:35 PM on February 7, 2012 [1 favorite]
site:berkeley.edu intitle:"index of" tif
is excellent. 100 mb original scans of archival photos anyone?
Oooh, fun fact: if you start being too clever with the operators, Google will challenge you as to your humanness! I just had to fill out captchas to see my search results.
posted by rockindata at 6:35 PM on February 7, 2012 [1 favorite]
This thread is closed to new comments.
posted by fings at 1:06 PM on February 7, 2012