Substitute broken site using redirect to WayBack Machine
March 30, 2018 5:17 PM   Subscribe

What .htaccess lines will redirect a request for an arbitrary URL on a former WordPress site to the index page of WayBack Machine snapshots for that page?

The good people of MeFi have helped me once before with a redirect issue, when I moved from Blogger to WordPress.

Now, an error in a WordPress update (possibly connected with quirks in my antiquated theme) has manifested some kind of malignant mutation in my MySQL database, making me unable to fix the site despite several weeks of effort.

Could someone suggest .htaccess lines that would redirect anyone seeking any page on sindark.com to the index page for what the WayBack Machine currently has archived for that URL?

P.S. If anyone knows about copying a MySQL database with foreign characters and "smart" punctation intact, you might be able to help me fix the WP installation.
posted by sindark to Computers & Internet (5 answers total)
 
This would be tricky because you'd need to know the date in the way back machine... But assuming you knew the date string, maybe something g like this: (note 302 for temporary redirect)


RewriteEngine On
RewriteRule ^(.*)$ https://web.archive.org/web/20180316022001/http://www.sindark.com%{REQUEST_URI} [R=302]

posted by czytm at 10:43 PM on March 30, 2018 [1 favorite]


Tested that on https://htaccess.mwl.be
With the URL of https://www.sindark.com/2009/07/03/pumped-and-multi-lagoon-tidal-systems/#comment-1472236

Debugging info
1 RewriteRule ^(.*)$ https://web.archive.org/web/20180316022001/http://www.sindark.com%{REQUEST_URI} [R=302]

The new url is https://web.archive.org/web/20180316022001/http://www.sindark.com/2009/07/03/pumped-and-multi-lagoon-tidal-systems/#comment-1472236
Test are stopped, a redirect will be made with status code 302
posted by czytm at 10:48 PM on March 30, 2018 [1 favorite]


I'm going to suggest that this is a bad idea, because anyone using archive.org to find the most recent version of a page will be redirected back to archive.org in an infinite loop.
You could set a no-archive tag for ia_archiver in robots.txt to disable future archiving, but that would also blank out all the existing pages on archive.org.

I think the best way to handle this is to setup a simple home page with an explanation and link to the archive.org version of your site, then let all the other pages return a 404.
posted by Lanark at 2:29 AM on March 31, 2018 [1 favorite]


Response by poster: Thank you for all of these suggestions.

I'm hoping to repair the site, but a half dozen different DreamHost customer service reps have failed to make a working copy of the MySQL database.
posted by sindark at 12:47 PM on March 31, 2018


You need to export the file as UTF-8 and at every point along the line, for every system that touches that file, it needs to be treated as UTF-8. Otherwise some characters will get misunderstood.

Related: https://www.whitesmith.co/blog/latin1-to-utf8/
https://makandracards.com/makandra/595-dumping-and-importing-from-to-mysql-in-an-utf-8-safe-way

I handle old links kind of one at a time and point to web archive as needed. I also think automating this is a bad idea. I believe there are plugins for browsers to do similar things though, you might try them out and recommend them as needed to your readers.

I'm a big fan of using Search Regex plugin to do replacements to go to web.archive versions of a site.
posted by artlung at 8:14 PM on April 1, 2018 [1 favorite]


« Older NYC: Seeking advice from stay-at-home pre-K...   |   Show me to the good bandanas, please. Newer »
This thread is closed to new comments.