Tags:



echo "I AM SMART" | sed 's/SMART/DUMB/'
April 1, 2008 10:05 PM   RSS feed for this thread Subscribe

Unix folk: I'd like to reformat a long text file in a very specific manner using command line tools.

What I currently have is a file that alternates between lines of "Introductory Text" and lines that are sequentially numbered:

Introductory Text
001 text text text text text text text text text text text text text text
Introductory Text
002 text text text text text text text text text text text text
Introductory Text
003 text text text
(Those example lines are pretty short, but in the file, some lines are hundreds of characters long.) What I would like to do is reformat it so that it looks like this:

Introductory Text
001 text text text text text text text text text
    text text text text text
Introductory Text
002 text text text text text text text text text
    text text text
Introductory Text
003 text text text
Specifically, I want to leave the "Introductory Text" lines untouched in all cases, while changing the numbered lines so that they are (at most) 50 characters wide, with no words being split between lines and each new line created having 4 spaces prefixed to it so that it lines up.

All responses are appreciated deeply.
posted by aiko to computers & internet (13 comments total) 1 user marked this as a favorite
assuming that no header line will begin with a digit, run your text through

perl -pe 'next unless /^\d/; s/(.{0,50}) /\1\n /g'
posted by russm at 10:22 PM on April 1, 2008


gah... f'n HTML collapsing multiple spaces... there are 4 space characters between the \n and the /g...
posted by russm at 10:26 PM on April 1, 2008


Here's a few line ruby script that will do it. (This could be condensed into a one liner, but I broke it out for readability)
#!/usr/bin/ruby

infile = File.new(ARGV[0])
infile.each{ |line|
  if ((line=~/Introductory/) || (line=~/^\d+/))
    print line
  else
    print "    "
    print line
  end
}
save it as scriptName.rb, then call it from the command line with
ruby scriptName.rb inputFile.txt >outputFile.txt

posted by chrisamiller at 10:32 PM on April 1, 2008


grr... stupid spacing...
posted by chrisamiller at 10:33 PM on April 1, 2008


hang about... that gives the second and subsequent lines from a wrapped line a top length of 54 characters... grrr...
posted by russm at 10:38 PM on April 1, 2008


chrisamiller - ummm... does that wrap lines at 50 characters?

I ought to be shot for this, but still... and with any luck the spacing will be right this time...
perl -pe 'next unless /^\d/; s/(\d.{0,49}) /\1\n/; while (s/\n([^ ].{0,46}) /\n    \1\n/) {}; s/\n([^ ])/ \1/'
explanation: loop over the file, printing each line once we're done with it. unless it starts with a digit, leave it as is. change (a digit followed by 0 to 49 characters) and a space into that stuff and a newline. then, until there's nothing left to change, change a newline then (a non-space followed by 0 to 46 characters) and a space into a newline, 4 spaces, that stuff, and a newline. this leaves the final word of the original line on a line of it's own, so turn a newline and (a non-space) into a space and that stuff. ugghhh...
posted by russm at 10:56 PM on April 1, 2008


perl -MText::Wrap -pe 's/^(\d{3}.*)/wrap("","    ",$1)/e'
posted by nicwolff at 11:04 PM on April 1, 2008


Oh, Text::Wrap defaults to 76 columns. So, not as short, but:

perl -MText::Wrap -pe '$Text::Wrap::columns=50; s/^(\d+.*)/wrap("","    ",$1)/e;'
posted by nicwolff at 11:12 PM on April 1, 2008


Thanks russm, I read too quickly, misunderstood the problem and missed the point. My bad.

Here's a ruby script that does what the poster asks for (I believe).

Yeah, it can be done in a one liner, as russm did, but just looking at that regex makes me squirm. I find that when problems reach a certain level of complexity, it's usually quicker to write out a little script that I can understand, rather than taking the time to debug ugly regexes on the command line.
#!/usr/bin/rubyinfile = File.new(ARGV[0])infile.each{ |line|  if !(line=~/^\d+/)    print line  else    arr = line.chomp.split(/ /)    sum = 0    arr.each{ |word|      if (sum + word.length > 50)        print "\n    "        sum = 4      end      print "#{word} "      sum += (word.length + 1)   }    print "\n"  end}

posted by chrisamiller at 11:35 PM on April 1, 2008


I know you asked for a command line answer (I like nicwolff's answer the best), but I'll mention in passing that (of course) emacs will mechanize this as well. Set your fill column to 50 [C-5 C-0 M-x set-fill-column], then record a macro [C-x (] that:
* sets mark at the start of a line beginning with digits,
* offsets the /second/ word to the next line,
* goes to end of line,
* 'fills' (reformats) the region
C-x ( 
  M-x search-forward-regexp ^[0-9] RET 
  C-a C-SPC 
  M-F M-F RET space space space space 
  C-e 
  M-x fill-region
C-x )

posted by mrflip at 11:35 PM on April 1, 2008


nicwolff's Text::Wrap answer is the right one...

chris - tell me about it... I feel dirty for having barfed that out onto the keyboard... :)
posted by russm at 11:52 PM on April 1, 2008


Wow, thank you so much to all four of you, I really appreciate it. The three solutions I could get to work properly are marked as best answers.
posted by aiko at 11:53 PM on April 1, 2008


fmt -t -w 50 <>
should be pretty close as well.
posted by alikins at 8:51 AM on April 2, 2008 [1 favorite]


« Older http://amorphia-apparel.com/ ...   |   For Yahoo! Messenger users: ho... Newer »
This thread is closed to new comments.