Solution for matching citations to bibliography
July 31, 2016 6:21 PM Subscribe
I need to be able to match in text citations to reference list(s) and back again in very large documents (1000+ pages) WITHOUT the use of referencing software like Endnote*. I'd like to automate this as much as possible.
I format books to publisher standard for academics using Microsoft Word. Part of my task is to make sure that the references in text match those in the list. (*Referencing software is not an option with up to 100 different authors who struggle to use a simple Word template or stick to one referencing system. Yes, I know, but no. No referencing software. Doing it myself from scratch is not efficient either. )
My current method is to highlight brackets and numbers in the text using a Find & Replace macro, and go page by page matching against the list, and highlighting each reference in the bibliography to catch the books not cited in the text.
It's not enough to know that the in-text citation is in the list, I need to know where the first cite appears for any reference of 3-5 authors, because subsequent cites should be written First Author et al. I also need to know where authors with multiple works in the same year are placed in the text so I can use Author (2016a) if necessary.
APA if it counts. Publishers vary: Palgrave, Allen & Unwin, Springer, Sense.
I format books to publisher standard for academics using Microsoft Word. Part of my task is to make sure that the references in text match those in the list. (*Referencing software is not an option with up to 100 different authors who struggle to use a simple Word template or stick to one referencing system. Yes, I know, but no. No referencing software. Doing it myself from scratch is not efficient either. )
My current method is to highlight brackets and numbers in the text using a Find & Replace macro, and go page by page matching against the list, and highlighting each reference in the bibliography to catch the books not cited in the text.
It's not enough to know that the in-text citation is in the list, I need to know where the first cite appears for any reference of 3-5 authors, because subsequent cites should be written First Author et al. I also need to know where authors with multiple works in the same year are placed in the text so I can use Author (2016a) if necessary.
APA if it counts. Publishers vary: Palgrave, Allen & Unwin, Springer, Sense.
Response by poster: I thought I made it clear. I'll try again. I can't use Endnote. I receive the document in text format without any embedded links from up to 100 academics who I do not have contact with (just the editors of the book) who do not provide their EndNote libraries, who often don't use Endnote, who struggle with using Word in more than a basic way. I can not use referencing software unless I manually add every single reference (more than 1000) into Endnote, and then find every citation within the book and insert the Endnote cite. I can't use EndNote or other referencing software. Did I mention? I can't use EndNote in any practical sense unless I start from scratch and that's going to take longer than what I'm already doing. By the way, I can't use Endnote because referencing software is not an option with up to 100 different authors who struggle to use a simple Word template or stick to one referencing system. Yes, I know, but no. No referencing software. Doing it myself from scratch is not efficient either.
posted by b33j at 7:15 PM on July 31, 2016 [4 favorites]
posted by b33j at 7:15 PM on July 31, 2016 [4 favorites]
Best answer: I have not used this personally, but it sounds like you want a solution like ReCite. Not sure if it will handle the 3-5 authors question, so that may still be a manual process.
(Also, I may be misremembering APA style but I think that authors with multiple works in the same year have letters assigned according to the order they appear in the reference list rather than the order they appear in-text.)
posted by Paragon at 7:27 PM on July 31, 2016 [3 favorites]
(Also, I may be misremembering APA style but I think that authors with multiple works in the same year have letters assigned according to the order they appear in the reference list rather than the order they appear in-text.)
posted by Paragon at 7:27 PM on July 31, 2016 [3 favorites]
Response by poster: You're probably right about the letters for APA multiple works. I check when I work on these tasks. Thanks for the tip.
posted by b33j at 8:15 PM on July 31, 2016
posted by b33j at 8:15 PM on July 31, 2016
Yeah letters go by reference list order.
I wrote a macro to do this in wordperfect based on another I'd seen.
The basic algorithm is
0. Place cursor at beginning of document
1. Find open bracket and place cursor before (wp let's you choose what action to take on find).
2. Find close bracket and select from current position until after the bracket (another option for what to do on find).
3. Append to clipboard (like copy but adds to clipboard instead of replacing)
4. Repeat 1-3 until step 1 yields a not found.
5. Create a new document. Paste clipboard into new document.
6. Find close bracket. Replace with close bracket hard return.
7. Select all.
8. Alphabetized paragraphs. (Which obviously doesn't order citations where there are multiple works cited in one set of brackets but close enough.
My macro doesn't do this, but since you need page numbers I would alter as follows.
1. Start at the end not beginning and do all searches backwards (find previous).
2. Search for CLOSE bracket. Place cursor after.
3. Insert page number and a symbol, say $
4. Search previous for open bracket and select everything between current cursor position (ie after page number) and there. It doesn't matter if bracket gets selected.
5. Append.
6. Repeat, new document, paste as before.
7. Now find your symbol, say $, and replace with hard return. Then alphabetize paragraphs.
Obviously you print out the list of citations to check against the reference list.
posted by If only I had a penguin... at 9:13 PM on July 31, 2016
I wrote a macro to do this in wordperfect based on another I'd seen.
The basic algorithm is
0. Place cursor at beginning of document
1. Find open bracket and place cursor before (wp let's you choose what action to take on find).
2. Find close bracket and select from current position until after the bracket (another option for what to do on find).
3. Append to clipboard (like copy but adds to clipboard instead of replacing)
4. Repeat 1-3 until step 1 yields a not found.
5. Create a new document. Paste clipboard into new document.
6. Find close bracket. Replace with close bracket hard return.
7. Select all.
8. Alphabetized paragraphs. (Which obviously doesn't order citations where there are multiple works cited in one set of brackets but close enough.
My macro doesn't do this, but since you need page numbers I would alter as follows.
1. Start at the end not beginning and do all searches backwards (find previous).
2. Search for CLOSE bracket. Place cursor after.
3. Insert page number and a symbol, say $
4. Search previous for open bracket and select everything between current cursor position (ie after page number) and there. It doesn't matter if bracket gets selected.
5. Append.
6. Repeat, new document, paste as before.
7. Now find your symbol, say $, and replace with hard return. Then alphabetize paragraphs.
Obviously you print out the list of citations to check against the reference list.
posted by If only I had a penguin... at 9:13 PM on July 31, 2016
Oh, and I struggle to use word when I have to. What a shit piece of software that is. When it reformats my paragraphs for no reason or refuses to remove a hard page return that I didn't even create and has no block protect function....argh... I just remind myself that I'm helping to cure malaria by using that crap software.
It's probably above your pay grade to make the call and lots of people would still use word because they don't know any better, but giving people the option to use other software might get you files that are easier to work with. When I gave to use word I have literally sent things to editors with comments saying I can't format a paragraph and can they please fix it.
posted by If only I had a penguin... at 9:25 PM on July 31, 2016 [1 favorite]
It's probably above your pay grade to make the call and lots of people would still use word because they don't know any better, but giving people the option to use other software might get you files that are easier to work with. When I gave to use word I have literally sent things to editors with comments saying I can't format a paragraph and can they please fix it.
posted by If only I had a penguin... at 9:25 PM on July 31, 2016 [1 favorite]
3. Insert page number and a symbol
I'm sure it's perfectly clear to the asker but what page number? Do you mean part of the citation? That you have to insert manually?
Thanks for the question BTW.
posted by M. at 9:36 PM on July 31, 2016
I'm sure it's perfectly clear to the asker but what page number? Do you mean part of the citation? That you have to insert manually?
Thanks for the question BTW.
posted by M. at 9:36 PM on July 31, 2016
Current page number. It basically puts in a code that is read as the page number.
But now that you point this out and make me think about it a little harder, I realize this won't work. Once you paste all this into the new document the code will read and show the page number in the new document (ie everything pasted on the first page will say 1, second page will say 2 etc.) Even pasting it into notepad won't work. It pastes a code. I can't remember why I was doing this recently, but I couldn't get it to work. Maybe a word handles this differently and it will work for you somehow.
I don't have a solution for your needing page numbers but if you don't bother alphebetizing the references, you'll know which came first even if you don't know where.
Alternatively if you do want to alphabetize, and order is an acceptable substitute for page number: After you put each reference on a new line, select all and create a table. It will be on each column with a row for each line. Now add a column and quickfill that column with consecutive numbers (1, 2, 3 etc.) Now sort on the references not numbers. References in alphabetical order but you can still see what order they came in.
posted by If only I had a penguin... at 10:00 PM on July 31, 2016
But now that you point this out and make me think about it a little harder, I realize this won't work. Once you paste all this into the new document the code will read and show the page number in the new document (ie everything pasted on the first page will say 1, second page will say 2 etc.) Even pasting it into notepad won't work. It pastes a code. I can't remember why I was doing this recently, but I couldn't get it to work. Maybe a word handles this differently and it will work for you somehow.
I don't have a solution for your needing page numbers but if you don't bother alphebetizing the references, you'll know which came first even if you don't know where.
Alternatively if you do want to alphabetize, and order is an acceptable substitute for page number: After you put each reference on a new line, select all and create a table. It will be on each column with a row for each line. Now add a column and quickfill that column with consecutive numbers (1, 2, 3 etc.) Now sort on the references not numbers. References in alphabetical order but you can still see what order they came in.
posted by If only I had a penguin... at 10:00 PM on July 31, 2016
Oh and if you're not inserting page numbers there's no need to search backwards. The point of searching backwards was to avoid having those inserted page numbers move subsequent references onto the next page.
posted by If only I had a penguin... at 10:07 PM on July 31, 2016
posted by If only I had a penguin... at 10:07 PM on July 31, 2016
Response by poster: Great answers, thanks! There is no alternative to Word, but luckily I'm an ace Word wrangler.
Regarding the algorithm steps 1 & 2, I'm not sure I can convince Word to select text between brackets. If I could, that would be pretty good. Any ideas, anyone?
Also, a reference to a work by Smith, Jones & Brown can appear variously like this:
* These theories (Other-Author, 2014; Smith, Jones, & Brown, 2016; Smith & White, 2016)
* As described by Smith, Jones, & Brown (2016), blah blah blah
* Other studies (Brown, 2016; Smith et al., 2016; White, 2016)
So your algorithm wouldn't collect the second example there, just the year.
I could convert the entire text of the book to a single sentence per paragraph and then dump into Excel. And I think there's a VLOOKUP that I could use to match stuff from the reference list (that I split into author names and initials by using text to columns). But I still haven't figured out how to isolate the references automatically from within the text.
The solution to your problem of the impossible formatting is to take the text of your entire document, and strip it of formatting by putting into something like notepad, and then to paste into a clean document with only your styles. Do not allow the master to be changed by anyone else, that is, compare the version they return to you with the one you sent using the compare documents function, and then input the changes yourself. Otherwise things WILL go wrong.
posted by b33j at 10:30 PM on July 31, 2016
Regarding the algorithm steps 1 & 2, I'm not sure I can convince Word to select text between brackets. If I could, that would be pretty good. Any ideas, anyone?
Also, a reference to a work by Smith, Jones & Brown can appear variously like this:
* These theories (Other-Author, 2014; Smith, Jones, & Brown, 2016; Smith & White, 2016)
* As described by Smith, Jones, & Brown (2016), blah blah blah
* Other studies (Brown, 2016; Smith et al., 2016; White, 2016)
So your algorithm wouldn't collect the second example there, just the year.
I could convert the entire text of the book to a single sentence per paragraph and then dump into Excel. And I think there's a VLOOKUP that I could use to match stuff from the reference list (that I split into author names and initials by using text to columns). But I still haven't figured out how to isolate the references automatically from within the text.
The solution to your problem of the impossible formatting is to take the text of your entire document, and strip it of formatting by putting into something like notepad, and then to paste into a clean document with only your styles. Do not allow the master to be changed by anyone else, that is, compare the version they return to you with the one you sent using the compare documents function, and then input the changes yourself. Otherwise things WILL go wrong.
posted by b33j at 10:30 PM on July 31, 2016
Response by poster: By the way, I just tested Recite with the text of a 500 page book and it's quite amazing. False positives, but still! Amazing.
posted by b33j at 10:47 PM on July 31, 2016
posted by b33j at 10:47 PM on July 31, 2016
It sounds like ReCite fits your needs, but on selecting: In WordPerfect, I think the options when you're doing a find and replace while actually using the software are only put cursor before, put cursor after, or select (the actual thing you searched for). I think that "select everything from current cursor to there" is only an option within the macro programming language. So even if word doesn't do this when you're actually using Word (which I believe, because it is a crap piece of software that doesn't even have a block protect function), it's possible that you can still do it when you're programming the macro.
posted by If only I had a penguin... at 6:58 AM on August 1, 2016
posted by If only I had a penguin... at 6:58 AM on August 1, 2016
Nevermind...even in the software itself WP has "position before" "position after" "select" or "extend selection" as the options. Though there ARE things available in the programming language that aren't really doable when you're just using the software, so who knows if Word would have something like that in it's macro programming.
posted by If only I had a penguin... at 7:14 AM on August 1, 2016
posted by If only I had a penguin... at 7:14 AM on August 1, 2016
(ioihap, Word does too have block protect: Keep Lines Together and Keep With Next . WP's approach sounds like continuous impending doom for properly flowed text, and a potential throwback to typewriter days.)
posted by scruss at 2:41 AM on December 25, 2016
posted by scruss at 2:41 AM on December 25, 2016
WP has those, too, but also let's you choose any block you'd like to keep together. E.g. I've seen people using word start a new page to get a table all on one page, but then if you edit your paper and this no longer requires a page break, you either have this mess6 random page break or you have to go in and manually find and fix all the places that happened.
posted by If only I had a penguin... at 6:39 AM on December 25, 2016
posted by If only I had a penguin... at 6:39 AM on December 25, 2016
This thread is closed to new comments.
posted by Kalmya at 6:56 PM on July 31, 2016