Powerfull search and replace software?
April 8, 2008 7:57 PM   Subscribe

[Search/replace filter] Is there a free (or really cheap) software that would let me do a batch search and replace where the name of the file is used as pasted text?

I have about 300 html files.

I want to add a string of text in the file that would include the first 3 characters of the name of the file.

For example :
in the file 100_xxx.xxx I want to add a string that would go :
a href="zzzzzzzzzz?100" blabla /a

I found a lot of software that do complex search and replace, but none that lets you use the name of the file.

I could start by adding the link with my current search/replace software, and use some variable in place of the numbers, so I'd need something like : find variable, replace with (first 3 char of file name)

If the (first 3 characters) part is too hard, I could also settle for the full name, and I'd search/delete the last part of the name, since there is only about 10 variations of the ending part.

Additional difficulty : these files are in UNIX and they need to stay that way (else i have to manually open each one and convert them), so I'd need a program that wont encode them in DOS format. And I only have access to a Windows machine.

Does anyone know about such a software?

I'm guessing that there might be a way to do it with regular expressions, but I only know the basic concept of it, and I have no idea 'where' you do the regex. I'd have to install PHP? Perl? I have minimal knowledge of PHP, none of Perl.

I'm starting to wonder which would be faster : learning a new language and regex or just manually editing these 300 files. Ok, to be honest, I think I already spent more time trying to find that software than what it would have taken me to edit them manually...

Thanks for your help!

And you might have guessed, English isn't my main language, so please forgive mistakes
posted by domi_p to Computers & Internet (14 answers total) 1 user marked this as a favorite
You're looking for cygwin. It'll give you a working unix shell where you bash script this together in a jiffy.
posted by mullingitover at 8:14 PM on April 8, 2008

This command, executed from a unix shell (or cygwin with find, bash, basename, cut, and sed installed) might do the trick:

find . -name "*.html" | while read i; do cp "$i" "$i.bak"; prefix=`basename "$i" | cut -c 1-3`; sed -e "s/VARIABLE/$prefix/g" "$i" > tmpfile; mv tmpfile "$i"; done

That should find all .html files in the current directory and below, make backup copies of them ending in .bak (just in case), and then replace all occurrences of "VARIABLE" with the first three letters of the file's name. There are a bunch of ways to do the same thing using other tools, including perl; this is just the first that came to mind.
posted by hades at 8:35 PM on April 8, 2008

Are all these files in the same folder?
posted by AmbroseChapel at 9:11 PM on April 8, 2008

I tried typing the command from cygwin, but I don't think it worked, either because i didn't install cygwin correctly or I'm not using the right characters
in prefix=`basename is the ` an accent?

After the command is entered, I only get another line starting with $ and nothing seems to happen.

I'll try to understand cygwin a bit more and i'll try hades' command again

AmbroseChapel : Yes they're all in the same folder
posted by domi_p at 9:19 PM on April 8, 2008

do you know if there is a similar product for vista pc's? that would be really handy.
posted by wildpetals at 9:25 PM on April 8, 2008

the ` is a backtick - it's on the same key as the ~ tilde, at least on my keyboard. It means "treat the output of the command inside the backticks as input to the command outside.

so here's an example:

$ echo ls
$ echo `ls`
16pf.txt 5pf.xls IPIP_instrument.xls allscales.txt complete_instrument.csv ipip-items.txt neo.txt personality.doc wholers.csv wholers.xls

Get used to this stuff and you start to realise how appallingly crippled your average windows pc is - it's like they sell you an operating system for a computer and take away an awful lot of the most useful bits of the machine.

A perl script for what you want would be simpler but what you've got is good enough.
posted by singingfish at 9:52 PM on April 8, 2008

If you get another $ prompt and nothing else, then it might have worked. Do you have a bunch of .bak files now? If so, check some of the .html files to see if the desired substitution took place. That command would only produce output if it failed. Of course, it could also fail without producing any output (if, for example, you used a ' [single quote] instead of a ` [backtick]).
posted by hades at 10:13 PM on April 8, 2008

I'm conscious that we haven't actually heard where in the files we want this to happen.

In the example, it would replace the word "VARIABLE" with the first three letters of the file name. But of course that's not actually what you want. Where in your HTML files do you want it to appear?
posted by AmbroseChapel at 10:17 PM on April 8, 2008

Hah, yes, good point. If there's no VARIABLE to search for and replace, then my command wouldn't do a thing (besides create a bunch of duplicates as .bak files). So a better definition of the problem is probably in order. Is there something you can search for that you want to replace? Or do you want to just add a line to the end of every file, or what?
posted by hades at 10:27 PM on April 8, 2008

I finally bought Search Replace (you can dl a fully functional trial)

It was only 50 bucks and has served me well.
posted by mattoxic at 10:30 PM on April 8, 2008

On the off chance that you just want to add a line to the end of every .html file in the current directory, here's a quick way to do it using only the shell (if the shell is bash or bash-ish):

for i in *.html; do echo "<a href=\"zzzzz?${i:0:3}\">blabla</a>" >> "$i"

That would add the line:

<a href="zzzzz?XYZ">blabla</a>

to the end of each file, where "XYZ" is the first three letters of the filename.
posted by hades at 10:38 PM on April 8, 2008

Here's a talkative perl script which does the kind of thing you want:
use strict;
use File::Copy "cp";
while (<*.html>) {
    my $filename = $_;
    my $first_three_letters = substr( $filename, 0, 3 );
    cp( "$filename", "$filename.bak" );
    print "copied $filename to $filename.bak for backup\n";
    open( INPUT, '<', $filename ) or die "couldn't open $filename: $!\n";
    my $content = join( '', <INPUT> );
    if ( $content =~
        s|</body>|<a href="x?$first_three_letters">link</a></body>|i )
        print "added link to x?$first_three_letters before the close body tag";
        open( OUTPUT, '>', $filename )
          or die "Couldn't open $filename for writing: $!";
        print OUTPUT $content;
    else {
        print "couldn't do the replacement for some reason!\n";
        print "$filename unchanged\n";
You'd have to put it into the directory then run it from the command line.
posted by AmbroseChapel at 10:43 PM on April 8, 2008 [1 favorite]

Please also insert into every page "posting code to MetaFilter is still broken after all these years".
posted by AmbroseChapel at 10:45 PM on April 8, 2008

hades' command worked. The problem on my first try was on my part, as I expected : I wasn't in the right folder...

I created 2 test files with 'VARIABLE' in it to try out the command. For the real thing, I will pre-populate my files with my link, actually using the word variable where i want my numbers to be and it should work.

Thanks a lot!
posted by domi_p at 9:52 AM on April 9, 2008

« Older Is there a website that categorizes songs by mood...   |   Encircled Letter "F" Newer »
This thread is closed to new comments.