Remove pieces of a bibtex file?
June 1, 2011 8:50 AM Subscribe
How can I remove some fields and the label from all the entries in a bibtex file?
I would like to automate the stripping of pieces of a bibtex file. There are two parts to this: first, there are some fields I would like to get rid of altogether (abstract, doi, url), from every entry in my bibtex file. Second, I also have some bibtex entries from a colleague who uses a different labeling convention than I do, so I would like to strip all the labels from the file. In case I'm not using the right terms: the entry starts with something like
@article{smith10,
and I want to remove the 'smith10'.
I'm using XEmacs as an editor, if there were an easy way to do this within the editor that would be amazing. I'm open to using other programs to do the processing if necessary.
I would like to automate the stripping of pieces of a bibtex file. There are two parts to this: first, there are some fields I would like to get rid of altogether (abstract, doi, url), from every entry in my bibtex file. Second, I also have some bibtex entries from a colleague who uses a different labeling convention than I do, so I would like to strip all the labels from the file. In case I'm not using the right terms: the entry starts with something like
@article{smith10,
and I want to remove the 'smith10'.
I'm using XEmacs as an editor, if there were an easy way to do this within the editor that would be amazing. I'm open to using other programs to do the processing if necessary.
Response by poster: We're talking about hundreds of entries.
For the label, the pattern is always like in my example above: there's a line that reads
@article{label,
and I want to remove the label (the text between the { and the comma, which is different for each entry.
For the other fields, the structure is always the same, but the text within the field is different. For example, each entry has an abstract field that has the form
abstract = {bunch of text that is different for every entry},
and I want to remove the entire line. Does that make sense?
posted by medusa at 9:43 AM on June 1, 2011
For the label, the pattern is always like in my example above: there's a line that reads
@article{label,
and I want to remove the label (the text between the { and the comma, which is different for each entry.
For the other fields, the structure is always the same, but the text within the field is different. For example, each entry has an abstract field that has the form
abstract = {bunch of text that is different for every entry},
and I want to remove the entire line. Does that make sense?
posted by medusa at 9:43 AM on June 1, 2011
I think ..
I can think of a way to do it with sed, which I think means you can reg-ex it .. (I kicked the emacs habit ;)..
For the label, I think you can do a global replace of the label -- the only quirks might be if it's the ONLY entry in the "@article{" clause or not -- ie do search/replace "@article{label," with "@article{" (ie the comma is key, if you don't want to go back by hand to fix any errors).
For the other fields, you want to remove the entire abstract entry ? Is it all on one line or does it span multiple lines ? If it's one line, you can replace using reg-ex "^abstract" with "" (blank). ("^" means 'line starts with').
Does that make sense ?
posted by k5.user at 10:03 AM on June 1, 2011
I can think of a way to do it with sed, which I think means you can reg-ex it .. (I kicked the emacs habit ;)..
For the label, I think you can do a global replace of the label -- the only quirks might be if it's the ONLY entry in the "@article{" clause or not -- ie do search/replace "@article{label," with "@article{" (ie the comma is key, if you don't want to go back by hand to fix any errors).
For the other fields, you want to remove the entire abstract entry ? Is it all on one line or does it span multiple lines ? If it's one line, you can replace using reg-ex "^abstract" with "" (blank). ("^" means 'line starts with').
Does that make sense ?
posted by k5.user at 10:03 AM on June 1, 2011
The replace-regexp command (M-x replace-regexp) in emacs, and presumably in xemacs, will do what you need. Its also so incredibly useful in general and worth learning.
Start by making a backup copy of the file.
posted by eotvos at 10:08 AM on June 1, 2011
Start by making a backup copy of the file.
posted by eotvos at 10:08 AM on June 1, 2011
Best answer: Superfast regex tutorial: . matches any char; * means any number (inc 0); .* is thus any number of any characters; search & replace syntax is "s/regex_to_match/regex_to_replace/". So in sed/vi the regex replacement for the article label would most simply be:
Ripping out the abstracts is trickier, esp if they have linebreaks. If they don't, a simple
posted by Westringia F. at 11:12 AM on June 1, 2011 [1 favorite]
s/@article{.*,/@article{,/More explicitly in vi (can't help you w. emacs, unfortunately), opening the file & typing this:
:%s/@article{.*,/@article{,/g :w FILENAME_FOR_UNLABELED_BIB :q!(where the % means "on all lines" and the g means "as many times on a line as match") should get you a copy of the bib file w/o the labels saved to FILENAME_FOR_UNLABELED_BIB
Ripping out the abstracts is trickier, esp if they have linebreaks. If they don't, a simple
s/abstract *=.*//will replace the abstract line with a whole lotta nothing (& will match any spaces around the =).
posted by Westringia F. at 11:12 AM on June 1, 2011 [1 favorite]
Best answer: Bibtexformat should be able to do everything you need and more.
posted by joeyo at 11:05 PM on June 1, 2011
posted by joeyo at 11:05 PM on June 1, 2011
Response by poster: The combination of regexp and bibtexformat worked great. Thanks!
posted by medusa at 10:48 AM on June 29, 2011
posted by medusa at 10:48 AM on June 29, 2011
« Older What do I need to go from Digital Audio Out port... | to properly propagate a pothos Newer »
This thread is closed to new comments.
In emacs, you could do a find/replace, or run a reg-ex to strip out the parts you don't want.. Just depends on specifics of the formatting and what you want to do (strike an entry, strike the line/field, etc).
posted by k5.user at 8:57 AM on June 1, 2011