What are some ways I can improve a digital library?
June 30, 2008 2:42 PM
Subscribe
I'm in the process of revamping a digital library website and I have a few questions about how to link my resources into both Google Scholar and the academic/archival community at large.
- I know that it's possible to allow Google Scholar to do a full search of the PDF documents while still making them restricted to normal visitors. I want to do this so that Google Scholar can do a better search of our documents, while still allowing for a subscription model. I suppose I could offer either IP or user-agent based subscriber access to the website, but I know that google often doesn't look kindly upon websites that serve them up different content. Is there a sanctioned way to do this?*
- Assume I want documents on this site to get "linked in" to the rest of the academic world. What are things I can do to make this easier and better? I've already implemented OpenURL, kind of (is it really just as simple as making a page like /resolver?issn=blah&volume=blah&issue=blah&spage=blah ?). What other standards would be good to support/implement?
If you're a frequent digital library user I'd also be interested in hearing about features that would make you revisit a digital library on a regular basis, and similarly if anyone out there has developed a digital library in the past, are there any tools or programs (preferably Java-based) that you might recommend that speed up the document handling process?
More details: this is for a non-profit educational organization that has around 10k (and growing) scholarly (peer reviewed and published) papers. They're imported in standard PDF format so thankfully issues of OCR or conversion are not an issue although I would be really interested in ways to pull out metainfo or even things like references and citations.
* Yes, I realize there is a contact page for this. When I submitted a request, I received a response something along the lines of "Currently due to a huge number of requests you won't hear from us, like, ever"
posted by Deathalicious to education (3 comments total)
1 user marked this as a favorite
Barring that, my experience with Google Scholar was that only large academic publishers tend to get their attention. If drop me a MeMail I'll see what I can dig up from talking to colleagues who've waded through before. (I know in at least one case we just allowed the Googlebot user-agent in with no repercussions, although know that all similar methods are easily spoofed).
As far as metadata goes, it's a hard problem to extract anything structured from PDFs. I could probably help, but I'd have to see your content.
Good luck!
posted by nev at 8:43 PM on June 30, 2008