Join 3,523 readers in helping fund MetaFilter (Hide)


APA --> EndNote (ugh!)
November 21, 2006 12:25 PM   Subscribe

I have a bunch of APA-style refs in a huge (580+ page) MS Word document. How do I get them all into EndNote?

This recent post made me decide to go ahead and ask this question, even though I have little hope.

Over the past ten years or so my boss has built up a hefty collection of journal / book references, and they have all been typed into MS Word by folks like me, over the years, and occasionally by work-study undergrads. I want to import them into EndNote, but before I do things the Really Hard Way, I want to know if there's an easier way.
Smith, J., Johnson, D., L., & Bikkle, R. (1998). What I did on my summer vacation. Cognitive Science, 9, 231-292.

Event if all of my entries were perfectly formatted (and they are not) I'm still not sure how to convert them.

My best idea so far is to write a PHP script that parses them, somehow, and determines if they are books or journals or unpublished manuscripts. Then it would spit out a tab-delimited list of refs, along with a list of references that were poorly formatted. Then someone would have to go through the first list and make sure that nothing got mangled (which it will) and go through the second list and fix each missing parenthesis by hand. Did I mention that I have around five thousand of these?

It's also possible that this PHP program will just screw things up from the get-go, and I should just have someone enter all of them by hand. I'm hoping, however, that someone out there has a Better Way. Maybe?
posted by Squid Voltaire to Computers & Internet (3 answers total) 3 users marked this as a favorite
 
"go through the second list and fix each missing parenthesis by hand"

Ugh. If parens are mising, they're missing around the date, so put them back in automatically.

Here's what you'll find, based on my doing something similar recently, with precinct names and polling place locations: the failures will cluster in classes, such that one transformation will fix all in that class

OK, let's build a BNF

APACITE ::= NAMELIST DATE ARTICLEORBOOK
NAMELIST ::= NAME | NAME AMPERSAND NAME | NAME COMMA NAMELIST
NAME ::= LASTNAME COMMA INITIALS
LASTNAME ::= LETTER | LETTER LASTNAME //forget init Caps so we match van den Bergs and MacIntoshes
LETTER ::= [A-Za-z] // Jennifer 8 Lee we don't match, sorry
COMMA ::= ","
AMPERSAND ::= "&"
INITIALS ::= LETTER PERIOD
PERIOD :== "."
DATE :== [OPENPAREN] DIGITLIST[CLOSEPAREN] [PERIOD]
DIGITLIST ::= DIGIT | DIGITLIST
DIGIT ::= [0-9]

And so forth. If you export from Word such that italics are preserved all the easier to seperate article titles from journal names.

Throw the BNF into Yacc or GOLD or JavaCC, or just write a regex in Perl.

Or, use an expisting CPAN Perl module: Biblio::Citation::Parser

Or, use this one, that the author claims is superior: http://sunir.org/monkey/AcademicCitation/
posted by orthogonality at 1:57 PM on November 21, 2006


Orthogonality has it, at least as far as the parsing goes -- use someone else's mod to do the dirty work as they've already done most of the hassle work. Code around the exceptions (and ach -- avoid PHP if you can. What a hassle that language can be)
posted by Ogre Lawless at 2:33 PM on November 21, 2006


Perl, not php is my recommendation. It's built to solve these kinds of problems. You'll probably find you need to do a combination of the approach below and some manual tweaking and final edits for a satisfactory solution.

You probably want to convert the reference to RIS format then import that into EndNote, maybe via placing a delimiter between authors/year/title/journal etc.

Here's a perl regex/script that works with your sample reference (tested!)

one reference per line, issue the command like this:

perl myscript.pl refs_file.txt

here's the script:

#!/usr/bin/perl -w
use warnings;
use strict;
while (<>) {
my %ref;
($ref{authors},
$ref{year},
$ref{title},
$ref{journal},
$ref{vol},
$ref{pps}) =
$_ =~ /(.*?)\s+ # authors
\((\d+)\).*? # year
(\w.*?)\. # title
(.*?),.*? # journal
(\d+),.*? # volume
(\w+)/x; # pages
# note the use of //x so that we can have whitespace in the
# regex for clarity
# output the proof of concept result: ( you do the work
# converting to ris here ;)
print map {$_, " => ", $ref{$_}, "\n"} keys %ref;
}


As I said, tested with your sample reference. will need tweaking or you will need to modify the results to get by hand. You will also need to make the ris formated file yourself :)
posted by singingfish at 5:31 PM on November 21, 2006


« Older I want to read something by Th...   |  What should I get for a 1 year... Newer »
This thread is closed to new comments.