What text-manipulation language to teach myself some of, and what book to do it with?
So I've got legislator voting data for different committees, but it's inconveniently formatted. It look like
this, or, simpler, like this:
BLAH blah blah
AYES
****
Alpha Echo
Charlie
NOES
****
Bravo
ABSENT, ABSTAINING, OR NOT VOTING
******************************
Delta
What I want to do is read in a whole bunch of these files (or the whole bunch catted together if that's easier) and output a matrix of votes:
Alpha***61119
Bravo***16191
Charlie*69169
Delta***11661
Echo****11116
And so on.
So my questions are:
(1) What language should I use to do this, knowing that apart from little bits of coding for Sas or R, I haven't really programmed anything since BASIC in 1986? My sense from googling is that the prime candidates are perl or python, and that this is not going to be a difficult task to program.
(2) What easily-obtainable book is good for teaching oneself the basics of the language? Just enough for me to figure out how to do this, not do it efficiently or elegantly. I don't mind if the machine chews on something for a minute instead of a millisecond, as the realistic alternative is entering them in by hand, and like hell am I doing that again if they're already in html.
posted by jacquilynne at 8:01 PM on June 21, 2004