What are some ways I can improve a digital library?
June 30, 2008 2:42 PM   Subscribe

I'm in the process of revamping a digital library website and I have a few questions about how to link my resources into both Google Scholar and the academic/archival community at large.

  1. I know that it's possible to allow Google Scholar to do a full search of the PDF documents while still making them restricted to normal visitors. I want to do this so that Google Scholar can do a better search of our documents, while still allowing for a subscription model. I suppose I could offer either IP or user-agent based subscriber access to the website, but I know that google often doesn't look kindly upon websites that serve them up different content. Is there a sanctioned way to do this?*
  2. Assume I want documents on this site to get "linked in" to the rest of the academic world. What are things I can do to make this easier and better? I've already implemented OpenURL, kind of (is it really just as simple as making a page like /resolver?issn=blah&volume=blah&issue=blah&spage=blah ?). What other standards would be good to support/implement?
If you're a frequent digital library user I'd also be interested in hearing about features that would make you revisit a digital library on a regular basis, and similarly if anyone out there has developed a digital library in the past, are there any tools or programs (preferably Java-based) that you might recommend that speed up the document handling process?

More details: this is for a non-profit educational organization that has around 10k (and growing) scholarly (peer reviewed and published) papers. They're imported in standard PDF format so thankfully issues of OCR or conversion are not an issue although I would be really interested in ways to pull out metainfo or even things like references and citations.

* Yes, I realize there is a contact page for this. When I submitted a request, I received a response something along the lines of "Currently due to a huge number of requests you won't hear from us, like, ever"
posted by Deathalicious to Education (3 answers total) 1 user marked this as a favorite
I build sites like this for a living, so one possibility is to hire me as a consultant!

Barring that, my experience with Google Scholar was that only large academic publishers tend to get their attention. If drop me a MeMail I'll see what I can dig up from talking to colleagues who've waded through before. (I know in at least one case we just allowed the Googlebot user-agent in with no repercussions, although know that all similar methods are easily spoofed).

As far as metadata goes, it's a hard problem to extract anything structured from PDFs. I could probably help, but I'd have to see your content.

Good luck!
posted by nev at 8:43 PM on June 30, 2008

I know there are a lot of people in academic publishing on friendfeed, so you could try asking there. Maybe Bill or this guy could steer you in the right direction.
posted by Mr. Gunn at 1:20 AM on July 1, 2008

If you're a frequent digital library user I'd also be interested in hearing about features that would make you revisit a digital library on a regular basis,

1. Having quality papers relevant to what I'm researching.
2. Me being able to find them.
3. Me being able to access them.
4. Copies of documents being complete.

By (1) I mean the obvious; if I'm researching physics and your library is about psychology, we're unlikely to interact. If I have read papers and found them relevant, informative, clear, readable, and factually accurate, I am more likely to look at the other papers in that journal/issue/library.

By (2) I mean the papers showing up in my searches (I like Google, Web of knowledge, and science direct - but that's just me) and having clear title and abstract.

(3) is because putting in a document supply request can take two weeks. If my institution has access to your library, that's great. If I'm going to have to wait two weeks to pay for that paper, that's two weeks to find another paper in a journal I *can* access instantly. Also, the whole point of giving references is that people can follow them; from this perspective a reference that leads to an $80 pay wall isn't a very good reference.

By (4) I mean if you're offering a PDF copy of a book, and that book comes with a CD of example programs or suchlike, you should offer access to the CD alongside the PDF of the pages of the book.

In summary you're going in the right direction getting indexed by Google. You also need good papers with clear titles and abstracts; and ideally you need major institutions to subscribe to your library so users can access it easily.

Of course, now I summarise it, I probably haven't told you anything you don't already know!
posted by Mike1024 at 2:26 AM on July 1, 2008

« Older What is the best scooter for me?   |   The best things in life are free... Newer »
This thread is closed to new comments.