Shell Scripting and Regex Voodoo.
April 1, 2005 9:22 AM   Subscribe

I've got a problem. There is this ASP website that has gone down for repairs. I've got a wget of the whole site and i need to put the mirror back up on the internet. I've got Os X, and Developer tools installed, etc. I need a shell script to help me get this done.

The files are saved in a directory format.

\site.com\page.asp
\site.com\page.asp?random_info
\site.com\foo\otherpage.asp?random_bar

Etc.

I need to recursivle go through the folder and all subfolders, finding any filename that contains *.asp* and append ".html" to the end of the file.

so \site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html

Thats part one.

Part two involves searching through the files themselves, and looking for links that contain *.asp*

i.e. [a href="\otherpage.asp?foobar"]link to foobar[/a]

and change it to:
[a href="otherpage.asp?foobar.html"] link to foobar[/a]

Thanks for any help you can provide.
posted by Freen to Computers & Internet (4 answers total)
 
I think you need to replace the question marks too or you'll have trouble serving them.

There are plenty of old threads about batch renaming files. For the text editing, I'd reccomend you download TextWrangler and use batch grep.

Replace "(\S+\.asp[^"]*)" with "\1.html" (include the quote marks).
posted by cillit bang at 9:33 AM on April 1, 2005


Thanks, I'll check it out.

Also, the "[" and "]"

Are actually open and closed brackets: '>'
posted by Freen at 9:51 AM on April 1, 2005


If you want to mass rename files, here'd be a way to do it from your os x terminal in one string of commands (it's all on one line btw).

\site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html


for var in `ls -1 /site.com/foo/ | grep ".asp"`; do mv /site.com/foo/$var /site.com/foo/$var.html; done

but I would recommend you run the following first just to make sure it will display all the filenames in question (better safe than sorry)

for var in `ls -1 /site.com/foo/ | grep ".asp"`; do echo $var ; done

Hope this helps.
posted by grahamux at 2:37 AM on April 2, 2005


"\site.com" ? why backslashes? What kind of crazy shell does OSX have, anyway? I just know bash.

so \site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html


Are you sure that's going to work? I dunno what webserver you're using, but most will probably not look for a static file with a ? in it there.... maybe you want to change it to "foo/otherpage.asp!bar.html" or something instead?

To do what you said: for i in `find site.com -name \*.asp` ; do mv $i $i.html; done
To eliminate the ? instead: for i in `find site.com -name \*.asp\?`; do mv $i `echo $i | sed 's/.asp?\(.*\)/.asp!\1.html'`; done

The text replace would be something like:

for i in `find site.com -name \*.asp\*` ; do cat $i | sed 's/.asp?\([a-zA-Z]*\)/.asp?\1.html/' > tmp ; mv tmp $i; done

Or replace that last ? with a ! if you're trying to do what I think you're trying to do. Which will not work too well if there's any actual dynamic content on the site...
posted by sfenders at 7:09 AM on April 2, 2005


« Older Perfect Fried Egg   |   What's your favorite urban umbrella? Newer »
This thread is closed to new comments.