I need to redirect about 10,000 URLs
June 6, 2012 4:41 PM
I've got to 301 redirect URLs for roughly 10,000 pages on a pair of Drupal sites. That seems like a lot to stuff into .htaccess. Are there better alternatives?
I'm merging the content for a couple of small sites into their larger sister site. All told, we have about 10,000 pages (user-contributed content) to redirect. Because of the nature of the migration and the peculiarities of our URL scheme, there's no way to do this with an elegant regexp: We need ~10,000 301 redirects.
I read this question from a couple of years ago, but it was framed in terms of a few dozen redirects, not thousands and thousands.
Two ideas I've had so far are:
1. biting the bullet and dropping 10,000 301s into the .htaccess for that virtual host
2. Stuffing the redirect mappings into a database and doing the redirects with a bit of PHP.
My gut tells me option 2 is the better option over time: Search comprises 60 percent of the traffic — that will decrease quickly as the 301s are indexed — and our own campaigns are generating another 20 percent. The remaining 20 percent is split between direct traffic (which will slack off because our communities know they're being moved) and referred traffic (which we can't do anything about).
I'm merging the content for a couple of small sites into their larger sister site. All told, we have about 10,000 pages (user-contributed content) to redirect. Because of the nature of the migration and the peculiarities of our URL scheme, there's no way to do this with an elegant regexp: We need ~10,000 301 redirects.
I read this question from a couple of years ago, but it was framed in terms of a few dozen redirects, not thousands and thousands.
Two ideas I've had so far are:
1. biting the bullet and dropping 10,000 301s into the .htaccess for that virtual host
2. Stuffing the redirect mappings into a database and doing the redirects with a bit of PHP.
My gut tells me option 2 is the better option over time: Search comprises 60 percent of the traffic — that will decrease quickly as the 301s are indexed — and our own campaigns are generating another 20 percent. The remaining 20 percent is split between direct traffic (which will slack off because our communities know they're being moved) and referred traffic (which we can't do anything about).
pretty easy to turn your 10,000 or so redirects into something you could upload into that table
Oops, even simpler: Path redirect import module.
posted by flug at 5:04 PM on June 6, 2012
Oops, even simpler: Path redirect import module.
posted by flug at 5:04 PM on June 6, 2012
You'll be very unhappy with an .htaccess of that size. Apache will have to process it for every request. If you stuffed it in the virtual host config, it would be in memory and go faster, but each apache instance would get pretty huge. It would still run for every request.
Flug's answers sound much more promising.
posted by advicepig at 7:35 PM on June 6, 2012
Flug's answers sound much more promising.
posted by advicepig at 7:35 PM on June 6, 2012
I've done your option 2 before. We rewrite traffic to a small PHP file that looks up the URL in a database and passes it along as needed. It's quick and painless, and far more efficient than putting it all in .htaccess. As advicepig said, the entirety of .htaccess is parsed on every request.
In that table we also keep track of the last time a redirect was actually "hit", so we have some data around which old URLs remain in use and which ones are falling away.
posted by unionsquarepark at 9:39 PM on June 6, 2012
In that table we also keep track of the last time a redirect was actually "hit", so we have some data around which old URLs remain in use and which ones are falling away.
posted by unionsquarepark at 9:39 PM on June 6, 2012
Putting a gazillion entries into a .htaccess will likely be pretty slow. You could instead use mod_rewrite's RewriteMap feature, keeping the list of redirects in a simple dbm database.
posted by vasi at 2:29 AM on June 7, 2012
posted by vasi at 2:29 AM on June 7, 2012
Thanks for the answers.
Having done a little research on vasi's suggestion, I think RewriteMap sits the best with me of all my choices: I have to work with an outside team on some of our hosting, so sticking to established features in the core stack is appealing.
If RewriteMap had not existed, I would have taken some comfort from unionsquarepark noting that a small PHP script could do the trick, too.
posted by mph at 1:54 PM on June 7, 2012
Having done a little research on vasi's suggestion, I think RewriteMap sits the best with me of all my choices: I have to work with an outside team on some of our hosting, so sticking to established features in the core stack is appealing.
If RewriteMap had not existed, I would have taken some comfort from unionsquarepark noting that a small PHP script could do the trick, too.
posted by mph at 1:54 PM on June 7, 2012
It turns out that the team from our hosting provider preferred we keep the redirects *out* of Apache and wanted us to do it all inside Drupal. So we ended up doing it using flug's suggestions, after all.
Thanks again for all the answers. I've got good tools for a bunch of situations now.
posted by mph at 8:59 PM on June 20, 2012
Thanks again for all the answers. I've got good tools for a bunch of situations now.
posted by mph at 8:59 PM on June 20, 2012
This thread is closed to new comments.
I don't know if that approach or the .htaccess is more efficient, faster, or whatever.
posted by flug at 5:02 PM on June 6, 2012