Can I use htaccess to deny certain non-existent directories to avoid going through my Drupal site?
January 2, 2009 12:12 PM   Subscribe

Can I use htaccess to deny certain non-existent directories in order to avoid going through my Drupal site (which requires connecting to my database)?

My Drupal site was hacked, though my content was not touched (which is why it went unnoticed for a while). I eventually noticed and cleaned up several extra directories that had had thousands of subdirectories with spammer linking content. The whole site is fresh and fixed but now I am getting a huge amount of 404 errors from all over the world, with people trying to access these old spam directories. In Drupal, each time this happens, the 404 page is generated and the 404 error is logged, which means accessing the database, which means my database is straining just to issue all of these 404 denials.

I just want a simple apache 404 page instead (but only for these spammer urls!).

The former pages and subdirectories were all contained within three base directories (I'll call them spam1, spam2, spam3), and so I would like to use htaccess to simply deny any request for (e.g.):

mysite.com/spam1/
mysite.com/spam1/item34/spam.php
mysite.com/spam2/item23/item5/anotherspam.php
...

And any other permutation.

I don't know how to do this when the directories don't actually exist. That is, I can recreate an empty folder called "spam1" and deny mysite.com/spam1/ requests with Apache, but this wouldn't deny any of the thousands of subdirectories -- as I said, Drupal steps in and takes over the 404 duties when the directory does not exist.

Is there some way to do the kind of denial I want, to pre-empt Drupal and the database connections? I do not control the server so htaccess may be my most powerful option.

(Otherwise, maybe I have to reconfigure Drupal in some way?)
posted by kosmonaut to Computers & Internet (7 answers total) 1 user marked this as a favorite
 
Create your empty top-level directories and add something like this to the .htaccess:

ErrorDocument 404 /spam1/404.html

Or whatever you want. My gentoo has its system default at /error/HTTP_NOT_FOUND.html.var
posted by sbutler at 12:33 PM on January 2, 2009


That file should be spam1/.htaccess, spam2/.htaccess, etc in case that wasn't clear.
posted by sbutler at 12:36 PM on January 2, 2009


Response by poster: Thanks for the good suggestion; I tried it and discovered that it didn't work, but its failure made something probably very important occur to me. In the root .htaccess file (the default one for Drupal), there is also this:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

I never really processed these lines whenever I looked at them before. But, this is sending all requests for non-existent items to Drupal's index.php page, is it not? So, I would need to write some form of RewriteCond and RewriteRule that will tell all requests starting with spam1/spam2/spam3 to go to a different 404 page, right?
posted by kosmonaut at 12:55 PM on January 2, 2009


Best answer: Ugg... I hate PHP for relying on mod_rewrite so much.

I haven't tested this, but try putting RewriteEngine off in your spam .htaccess files.
posted by sbutler at 12:59 PM on January 2, 2009


I would create a hidden linkfarm page encrusted with ads and set up 301 redirects to it for the spam urls. Get the spammers to pay you!
posted by rhizome at 1:16 PM on January 2, 2009


Best answer:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]


What that's doing is enabling your clean URLs. But you're right; if you add a separate rewrite rule before the last, you can redirect to a custom 404 page and prevent the database hits through index.php.
posted by bricoleur at 1:27 PM on January 2, 2009


Response by poster: sbutler:
You were absolutely correct about the RewriteEngine line. Thanks!

rhizome:
That would be really hilarious, but my main goal is not to waste any more of my time on this thing.

bricoleur:
That's right, clean urls! Anyway, since sbutler's approach of ignoring those rules worked, I don't need to mess with rewritecond and rewriterule.

Thanks all. I've managed to stop all the 404 requests with your help!

(It seems my database is still swamped though, so I guess writing to the logs wasn't the main culprit. *sigh*)
posted by kosmonaut at 2:26 PM on January 2, 2009


« Older How to get started with Pokemon TCG?   |   Help me load up an anti-malware tookit Newer »
This thread is closed to new comments.