Remove pieces of a bibtex file?
June 1, 2011 8:50 AM   Subscribe

How can I remove some fields and the label from all the entries in a bibtex file?

I would like to automate the stripping of pieces of a bibtex file. There are two parts to this: first, there are some fields I would like to get rid of altogether (abstract, doi, url), from every entry in my bibtex file. Second, I also have some bibtex entries from a colleague who uses a different labeling convention than I do, so I would like to strip all the labels from the file. In case I'm not using the right terms: the entry starts with something like


and I want to remove the 'smith10'.

I'm using XEmacs as an editor, if there were an easy way to do this within the editor that would be amazing. I'm open to using other programs to do the processing if necessary.
posted by medusa to Computers & Internet (7 answers total)
How many entries are we talking about, and how regular/repeated is the pattern ?

In emacs, you could do a find/replace, or run a reg-ex to strip out the parts you don't want.. Just depends on specifics of the formatting and what you want to do (strike an entry, strike the line/field, etc).
posted by k5.user at 8:57 AM on June 1, 2011

Response by poster: We're talking about hundreds of entries.

For the label, the pattern is always like in my example above: there's a line that reads


and I want to remove the label (the text between the { and the comma, which is different for each entry.

For the other fields, the structure is always the same, but the text within the field is different. For example, each entry has an abstract field that has the form

abstract = {bunch of text that is different for every entry},

and I want to remove the entire line. Does that make sense?
posted by medusa at 9:43 AM on June 1, 2011

I think ..

I can think of a way to do it with sed, which I think means you can reg-ex it .. (I kicked the emacs habit ;)..

For the label, I think you can do a global replace of the label -- the only quirks might be if it's the ONLY entry in the "@article{" clause or not -- ie do search/replace "@article{label," with "@article{" (ie the comma is key, if you don't want to go back by hand to fix any errors).

For the other fields, you want to remove the entire abstract entry ? Is it all on one line or does it span multiple lines ? If it's one line, you can replace using reg-ex "^abstract" with "" (blank). ("^" means 'line starts with').

Does that make sense ?
posted by k5.user at 10:03 AM on June 1, 2011

The replace-regexp command (M-x replace-regexp) in emacs, and presumably in xemacs, will do what you need. Its also so incredibly useful in general and worth learning.

Start by making a backup copy of the file.
posted by eotvos at 10:08 AM on June 1, 2011

Best answer: Superfast regex tutorial: . matches any char; * means any number (inc 0); .* is thus any number of any characters; search & replace syntax is "s/regex_to_match/regex_to_replace/". So in sed/vi the regex replacement for the article label would most simply be:
More explicitly in vi (can't help you w. emacs, unfortunately), opening the file & typing this:
(where the % means "on all lines" and the g means "as many times on a line as match") should get you a copy of the bib file w/o the labels saved to FILENAME_FOR_UNLABELED_BIB

Ripping out the abstracts is trickier, esp if they have linebreaks. If they don't, a simple
 s/abstract *=.*// 
will replace the abstract line with a whole lotta nothing (& will match any spaces around the =).
posted by Westringia F. at 11:12 AM on June 1, 2011 [1 favorite]

Best answer: Bibtexformat should be able to do everything you need and more.
posted by joeyo at 11:05 PM on June 1, 2011

Response by poster: The combination of regexp and bibtexformat worked great. Thanks!
posted by medusa at 10:48 AM on June 29, 2011

« Older What do I need to go from Digital Audio Out port...   |   to properly propagate a pothos Newer »
This thread is closed to new comments.