Getting started on small-scale bibliometrics and Endnote wrangling
March 28, 2013 12:02 PM   Subscribe

As part of my dissertation, I want to look at the research output of a small research center. I'd like to do some bibliometric analysis but am not sure where to start. Where can I look for tools and techniques? In particular, I'm interested in tools that help me wrangle their Endnote database into something easy to analyse.

Most resources I've found by googling are for doing large-scale studies over the web: trawling the big online citation indexes and so forth. I'm more interested in making the most of my small dataset--their "bibliography" of items published by their researchers over the years.

We're talking maybe 2000 items over forty years here, so I can probably afford to dig down a little deeper: into the keywords, abstracts, etc that some of their bibliography entries contain.

To complicate matters, I don't have my hands on the actual dataset I need in a good form: they have an endnote file and they don't have much technical expertise (even less than me!) in databases. They've been willing to send me their bibliography and have managed to send me a tab-delimited version. Unfortunately, it's a mess: lots and lots of rows of keywords and abstracts in between records--I guess what happens when you try to export a database to a spreadsheet without taking a good deal of care.

(I sent them these instructions on exporting the data: as tab-delimited or as XML. I'm not sure I can or should expect them to do anything much more complicated. They're already being very accommodating as it is.)

Ideally, I'm looking for:
1) tools or a forum I can go to that will help me clean up the tab-delimited or XML files they send me;
2) tools or a forum I can go to help me figure out how to get them to send me data I can analyse more easily;
3) Fun, meaningful ways to analyze that data;
posted by col_pogo to Science & Nature (2 answers total) 3 users marked this as a favorite
 
Can you get them to send you a copy of the EndNote database itself? You can update the references that are incomplete, and you can probably get some information out of just looking at the keywords term list.
posted by zoetrope at 1:05 PM on March 28, 2013


Well, one suggestion, once you have their endnote data in a workable form, is to cross-reference it with a larger citation index, like PubMed (http://www.healthdata.gov/data/dataset/pubmed). So you can pull in additional metadata, as well as information about papers that cite their papers.

As for wrangling the data into submission, it looks like there are open source tools for working with endnote data in Open/Libre Office, which might be a starting point. Also, there are tools like Google Refine that can speed up the process of massaging data into better normalized and structured form.
posted by Good Brain at 1:13 PM on March 28, 2013


« Older I can say with 99% confidence that this class is...   |   Tax Q: Gettin' divorced, can't claim deductions. Newer »
This thread is closed to new comments.