Shell Scripting and Regex Voodoo.
April 1, 2005 9:22 AM Subscribe
I've got a problem. There is this ASP website that has gone down for repairs. I've got a wget of the whole site and i need to put the mirror back up on the internet. I've got Os X, and Developer tools installed, etc. I need a shell script to help me get this done.
The files are saved in a directory format.
\site.com\page.asp
\site.com\page.asp?random_info
\site.com\foo\otherpage.asp?random_bar
Etc.
I need to recursivle go through the folder and all subfolders, finding any filename that contains *.asp* and append ".html" to the end of the file.
so \site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html
Thats part one.
Part two involves searching through the files themselves, and looking for links that contain *.asp*
i.e. [a href="\otherpage.asp?foobar"]link to foobar[/a]
and change it to:
[a href="otherpage.asp?foobar.html"] link to foobar[/a]
Thanks for any help you can provide.
The files are saved in a directory format.
\site.com\page.asp
\site.com\page.asp?random_info
\site.com\foo\otherpage.asp?random_bar
Etc.
I need to recursivle go through the folder and all subfolders, finding any filename that contains *.asp* and append ".html" to the end of the file.
so \site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html
Thats part one.
Part two involves searching through the files themselves, and looking for links that contain *.asp*
i.e. [a href="\otherpage.asp?foobar"]link to foobar[/a]
and change it to:
[a href="otherpage.asp?foobar.html"] link to foobar[/a]
Thanks for any help you can provide.
Response by poster: Thanks, I'll check it out.
Also, the "[" and "]"
Are actually open and closed brackets: '>'
posted by Freen at 9:51 AM on April 1, 2005
Also, the "[" and "]"
Are actually open and closed brackets: '>'
posted by Freen at 9:51 AM on April 1, 2005
If you want to mass rename files, here'd be a way to do it from your os x terminal in one string of commands (it's all on one line btw).
\site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html
for var in `ls -1 /site.com/foo/ | grep ".asp"`; do mv /site.com/foo/$var /site.com/foo/$var.html; done
but I would recommend you run the following first just to make sure it will display all the filenames in question (better safe than sorry)
for var in `ls -1 /site.com/foo/ | grep ".asp"`; do echo $var ; done
Hope this helps.
posted by grahamux at 2:37 AM on April 2, 2005
\site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html
for var in `ls -1 /site.com/foo/ | grep ".asp"`; do mv /site.com/foo/$var /site.com/foo/$var.html; done
but I would recommend you run the following first just to make sure it will display all the filenames in question (better safe than sorry)
for var in `ls -1 /site.com/foo/ | grep ".asp"`; do echo $var ; done
Hope this helps.
posted by grahamux at 2:37 AM on April 2, 2005
"\site.com" ? why backslashes? What kind of crazy shell does OSX have, anyway? I just know bash.
so \site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html
Are you sure that's going to work? I dunno what webserver you're using, but most will probably not look for a static file with a ? in it there.... maybe you want to change it to "foo/otherpage.asp!bar.html" or something instead?
To do what you said: for i in `find site.com -name \*.asp` ; do mv $i $i.html; done
To eliminate the ? instead: for i in `find site.com -name \*.asp\?`; do mv $i `echo $i | sed 's/.asp?\(.*\)/.asp!\1.html'`; done
The text replace would be something like:
for i in `find site.com -name \*.asp\*` ; do cat $i | sed 's/.asp?\([a-zA-Z]*\)/.asp?\1.html/' > tmp ; mv tmp $i; done
Or replace that last ? with a ! if you're trying to do what I think you're trying to do. Which will not work too well if there's any actual dynamic content on the site...
posted by sfenders at 7:09 AM on April 2, 2005
so \site.com\foo\otherpage.asp?bar
becomes
\site.com\foo\otherpage.asp?bar.html
Are you sure that's going to work? I dunno what webserver you're using, but most will probably not look for a static file with a ? in it there.... maybe you want to change it to "foo/otherpage.asp!bar.html" or something instead?
To do what you said: for i in `find site.com -name \*.asp` ; do mv $i $i.html; done
To eliminate the ? instead: for i in `find site.com -name \*.asp\?`; do mv $i `echo $i | sed 's/.asp?\(.*\)/.asp!\1.html'`; done
The text replace would be something like:
for i in `find site.com -name \*.asp\*` ; do cat $i | sed 's/.asp?\([a-zA-Z]*\)/.asp?\1.html/' > tmp ; mv tmp $i; done
Or replace that last ? with a ! if you're trying to do what I think you're trying to do. Which will not work too well if there's any actual dynamic content on the site...
posted by sfenders at 7:09 AM on April 2, 2005
This thread is closed to new comments.
There are plenty of old threads about batch renaming files. For the text editing, I'd reccomend you download TextWrangler and use batch grep.
Replace "(\S+\.asp[^"]*)" with "\1.html" (include the quote marks).
posted by cillit bang at 9:33 AM on April 1, 2005