Search and Replace across entire website
January 25, 2007 10:20 AM Subscribe
I need to do a massive replace of Flash object HTML code throughout an entire site. I really don't want to download each page and fix it by hand.
I have the task of fixing the "Eolas bug" in a small-to-medium sized web site. The fix involves replacing every single instance of 6 lines of embed/object code with the SWFObject code. I do other site changes in Dreamweaver 8, but I don't think it's up to doing a search and replace in 50 to 100 pages (at least not without downloading and reuploading every page).
I have SSH access to the server, so I believe I can do some sort of regex replace magic - but I only know enough to be dangerous. Any instructions for replacing HTML code, only in .html files, with other specific code via regular expressions? If there's a better way to do this, Id love to hear that too.
I really, really don't need this to turn into a "oh, regular expressions really do need the brackets escaped" experience.
I have the task of fixing the "Eolas bug" in a small-to-medium sized web site. The fix involves replacing every single instance of 6 lines of embed/object code with the SWFObject code. I do other site changes in Dreamweaver 8, but I don't think it's up to doing a search and replace in 50 to 100 pages (at least not without downloading and reuploading every page).
I have SSH access to the server, so I believe I can do some sort of regex replace magic - but I only know enough to be dangerous. Any instructions for replacing HTML code, only in .html files, with other specific code via regular expressions? If there's a better way to do this, Id love to hear that too.
I really, really don't need this to turn into a "oh, regular expressions really do need the brackets escaped" experience.
Best answer: find /path/to/files -iname \*.html | xargs perl -i.bak -pe 's,your_regexp_here,replace,ig'
That will change all *.html files in-place, making a backup copy with the extension .bak.
posted by Rhomboid at 10:59 AM on January 25, 2007
That will change all *.html files in-place, making a backup copy with the extension .bak.
posted by Rhomboid at 10:59 AM on January 25, 2007
If it's well-formed XHTML you might analyse the exact transformation that has to be done for every page and write a little bit of XSL that expresses that transformation.
Then use a bit of programming code to scan all the files and apply the XSL.
posted by jouke at 11:01 AM on January 25, 2007
Then use a bit of programming code to scan all the files and apply the XSL.
posted by jouke at 11:01 AM on January 25, 2007
Oh, and if any of your filenames will have spaces in them, then you need:
find /path/to/files -iname \*.html -print0 | xargs -0 perl -i.bak -pe 's,your_regexp_here,replace,ig'
And, if you want to do multiline search/replace, then you can do something like this:
find /path/to/files -iname \*.html -print0 | xargs -0 perl -i.bak -e 'local $/; while(<>) { s,search line 1\nsearch line2\n,replace line1\nreplace line2\n,sig; print; }'
posted by Rhomboid at 11:05 AM on January 25, 2007
find /path/to/files -iname \*.html -print0 | xargs -0 perl -i.bak -pe 's,your_regexp_here,replace,ig'
And, if you want to do multiline search/replace, then you can do something like this:
find /path/to/files -iname \*.html -print0 | xargs -0 perl -i.bak -e 'local $/; while(<>) { s,search line 1\nsearch line2\n,replace line1\nreplace line2\n,sig; print; }'
posted by Rhomboid at 11:05 AM on January 25, 2007
Response by poster: condour: No dice on the automatic changes. It'll also take me longer than a half-hour - our network speed here isn't winning any races.
It is the same code on each page (I have that much going for me)
Rhomboid: That's about what I'm looking for, and the backup thing is nice. Now that I have actual command names, I should be able to pull this off.
jouke: That scares me slightly more than regular expressions.
----
on preview:
Yeah, that's the one problem: as this is HTML, this will be multi-line stuff. Is that bad?
I might just need to take 2 hours and do this in Dreamweaver before I make a datacenter explode with my almost-but-not-quite knowledge...
posted by niles at 11:13 AM on January 25, 2007
It is the same code on each page (I have that much going for me)
Rhomboid: That's about what I'm looking for, and the backup thing is nice. Now that I have actual command names, I should be able to pull this off.
jouke: That scares me slightly more than regular expressions.
----
on preview:
Yeah, that's the one problem: as this is HTML, this will be multi-line stuff. Is that bad?
I might just need to take 2 hours and do this in Dreamweaver before I make a datacenter explode with my almost-but-not-quite knowledge...
posted by niles at 11:13 AM on January 25, 2007
If you are worried, make a quick tar archive (cd /path/to/top/directory && tar cjvf ~/backup.tar.bz2 *) of the whole tree as backup, so that worst case you can just untar everything.
Regarding multiline search/replace, it's certainly doable with the "local $/" idiom. This tells perl to snarf the whole file, so your s//g applies to the entire file. By using the -s modifier you can match multiple lines easily. A useful tip is to use "\s+" anywhere you would have whitespace: this will match spaces, tabs, newlines, etc. So if you use this everywhere there is whitespace in your RE it will match with some flexibility.
For example, "foo\s+bar\s+baz\s+" matches
foo bar baz
as well as
foo
bar
baz
as well as
foo
bar
baz
..and so on. The linebreaks/indentations are all eaten equally by \s+.
posted by Rhomboid at 11:25 AM on January 25, 2007
Regarding multiline search/replace, it's certainly doable with the "local $/" idiom. This tells perl to snarf the whole file, so your s//g applies to the entire file. By using the -s modifier you can match multiple lines easily. A useful tip is to use "\s+" anywhere you would have whitespace: this will match spaces, tabs, newlines, etc. So if you use this everywhere there is whitespace in your RE it will match with some flexibility.
For example, "foo\s+bar\s+baz\s+" matches
foo bar baz
as well as
foo
bar
baz
as well as
foo
bar
baz
..and so on. The linebreaks/indentations are all eaten equally by \s+.
posted by Rhomboid at 11:25 AM on January 25, 2007
You have Dreamweaver? Check out its great Find/Replace (on the Edit menu). You can batch replace text (and code) in multiple files.
posted by grumblebee at 11:51 AM on January 25, 2007
posted by grumblebee at 11:51 AM on January 25, 2007
Best answer: Don't apply changes to the live site, dude. You're just asking for trouble. Just use DW's find/replace, which is pretty powerful (and yet also quite intuitive) on the local "site" and then reupload (but not using DW's painful FTP dealie), preferably with a smart FTP app that only syncs changed files.
Or make a few local copies of the site and try your hand at the regex.
posted by misterbrandt at 7:04 PM on January 25, 2007
Or make a few local copies of the site and try your hand at the regex.
posted by misterbrandt at 7:04 PM on January 25, 2007
Best answer: seconding misterbrandt.
In most cases I'd recommend "working smarter, not harder", but when you're looking at potentially corrupting up to 100 (presumedly static) pages, I'd go the long and careful route. Download the site to your local disk, use Dreamweaver's Search And Replace (the batch search and replace only works locally anyways, if I recall correctly), and then upload during off hours (if possible).
While you're at it I would *seriously* consider using some sort of include, if that option is available to you -- that way next time these sorts of changes will be *MUCH* less painful.
There is a plugin for DW8 that supposedly "fixes" the Eolas problem, but I cannot confirm if it does so automatically. I've always used SWFObject to fix the issue.
posted by fishfucker at 7:32 PM on January 25, 2007
In most cases I'd recommend "working smarter, not harder", but when you're looking at potentially corrupting up to 100 (presumedly static) pages, I'd go the long and careful route. Download the site to your local disk, use Dreamweaver's Search And Replace (the batch search and replace only works locally anyways, if I recall correctly), and then upload during off hours (if possible).
While you're at it I would *seriously* consider using some sort of include, if that option is available to you -- that way next time these sorts of changes will be *MUCH* less painful.
There is a plugin for DW8 that supposedly "fixes" the Eolas problem, but I cannot confirm if it does so automatically. I've always used SWFObject to fix the issue.
posted by fishfucker at 7:32 PM on January 25, 2007
Response by poster: So, here's what I'm doing then: I downloaded the entire site last night, and I'm going to be running Dreamweaver's S&R across it today, and then upload later today, which I guess I'll use FileZilla for.
The main reason I thought of fixing this over SSH is because I read an article on the powers of perl in situations like this. Of course, the article (never to be found again) was concerned with changing simple things like a company name throughout a site, not nasty HTML. It turns out just because I can use wget in Ubuntu and regular expressions in Visual Studio doesn't mean I can do this!
fishfucker: Yes, I know includes would be amazing, but I have inherited this site, and changes like that would be more trouble that it's worth in the long run. This site is a fun mix of good design, with a couple issues that I (expert I am not) have known to avoid for ages.
So, thanks for all the answers everyone, and convincing me I'm not quite a *n*x guru yet :)
Long and careful it is.
posted by niles at 9:14 AM on January 26, 2007
The main reason I thought of fixing this over SSH is because I read an article on the powers of perl in situations like this. Of course, the article (never to be found again) was concerned with changing simple things like a company name throughout a site, not nasty HTML. It turns out just because I can use wget in Ubuntu and regular expressions in Visual Studio doesn't mean I can do this!
fishfucker: Yes, I know includes would be amazing, but I have inherited this site, and changes like that would be more trouble that it's worth in the long run. This site is a fun mix of good design, with a couple issues that I (expert I am not) have known to avoid for ages.
So, thanks for all the answers everyone, and convincing me I'm not quite a *n*x guru yet :)
Long and careful it is.
posted by niles at 9:14 AM on January 26, 2007
Response by poster: Followup:
Ironically enough, my place of work is changing its name, and I have to do this all over again. This time, at least, it's only domain names, which regex seems to be happy with.
posted by niles at 11:17 AM on April 13, 2007
Ironically enough, my place of work is changing its name, and I have to do this all over again. This time, at least, it's only domain names, which regex seems to be happy with.
posted by niles at 11:17 AM on April 13, 2007
This thread is closed to new comments.
Is it always the same flash piece, or will the code differ on a page by page basis?
posted by condour75 at 10:39 AM on January 25, 2007