Separating vhosts in combined apache logfile?
August 17, 2005 5:17 PM   Subscribe

I am in a shared hosting environment, but I host multiple domains using mod_rewrite. Of course, this means all the domains get lumped together in one logfile. I need to be able to separate the individual domains for log analysis, without using a CustomLog statement.

Let's take care of the obvious: I can't afford individual accounts for each domain, or upgrading to a dedicated server. I don't have access to httpd.conf, so I can't use with accompanying CustomLog statements.

It would seem that one would be able to infer which request is for which domain by looking at requests in aggregate (e.g. first request is for "/" with the referring page as the referrer, and the second request is for the page's style sheet, graphics, etc, with the requested domain's docroot as the referrer. )

I suppose some perl is the answer, but that's a frightening prospect for me. There's got to be a way to either tag or otherwise indicate in the log entry which domain is being requested, somewhere in .htaccess (where the mod_rewrite rules live, and which is the only apache config to which I have access).

Possible ideas: Can RewriteLog be hacked to do this? Or I could do an AddHandler to a shell script that created the appropriate logs, but that would create serious performance issues.

Or perhaps there's a way to do this in Sawmill, my logfile analyser.

I can't be the only one in this situation, but Google has yielded zilch.
posted by joshwa to Computers & Internet (6 answers total)
The second code block on this gentoo specific vhost guide might be able to help you.
posted by cmm at 5:38 PM on August 17, 2005

Response by poster: The above-referenced script depends on a CustomLog in the previous code block, which includes a %V indicating the virtualhost. The customlog (and the rewrite-based vhosts) all live in httpd.conf/apache.conf. Which I can't get at. :(
posted by joshwa at 6:40 PM on August 17, 2005

If you can't edit the conf files, then you can't use RewriteLog either, can you? Seems to me that if you can't change the log format, all you can do is push the HTTP_HOST variable into something that does get logged, like the username or URI.

Oooh, this is a vicious hack and will double your incoming traffic but how about:

RewriteCond $1 ! ^HOST
RewriteRule ^/(.*) /HOST:%{HTTP_HOST}/$1 [R]

RewriteRule ^/HOST:%{HTTP_HOST}/(.*) /$1

Get it? Let's say a request comes in for It gets logged as "/bar/baz.html". Then the first rule does an external redirect, which means the browser will request - which will be logged as "/" so you can grep it later for analysis. Now the first rule is skipped because the RewriteCond sees the "HOST", and the second rule cleans up the URI so the real file is delivered.

I didn't test this, but it should work...
posted by nicwolff at 7:04 PM on August 17, 2005

One thing I can think of would be to rewrite all requests through a script that does the requisite logging.

Something like
RewriteCond %{HTTP_HOST} .*myvhost.tld$
RewriteRule ^/(.*)$1 [P,L]
RewriteCond %{HTTP_HOST} .*myothervhost.tld$
RewriteRule ^/(.*)$1 [P,L]

Note that your script (CGI, PHP, whatevs) needs to accept the arguments in the path, otherwise you could run into some problems with query stringarguments.
posted by revgeorge at 7:13 PM on August 17, 2005

Response by poster: Oh I'm sure my webhost will love that... clever, though, and in a pinch I just may end up using it.

At least it won't *double* my traffic-- just the number of requests and the bandwidth used trading headers.
posted by joshwa at 7:26 PM on August 17, 2005

You mention .htaccess--I take it that you are not being premitted to override CustomLog there? Otherwise that would be the obvious solution.

I sincerely doubt RewriteLog would suit your purposes, and I echo the question above as to whether you'd be allowed to override that either (although I guess I could see a host permitting Rewrite directives in a .htaccess while preventing other things).

Otherwise I have no real suggestions, although either revgeorge or nicwolff's suggestions should probably work, even thought they aren't super clean.

Oh, and a final thought, which isn't exactly helping--why do you want to do log analysis in the first place? My experience has turned me off from it; the amount of referral spam and other noise always drowns out whatever signal I can get out of such analysis, rendering the process completely useless :(
posted by cyrusdogstar at 8:30 PM on August 17, 2005

« Older Help us not be bored.   |   Where to brunch in NYC? Newer »
This thread is closed to new comments.