How do I correct a discrepancy between MRTG and Apache?
April 5, 2007 1:43 AM   Subscribe

How can I go about rectifying the discrepancy between AWStats/Analog/Apache log tools and MRTG graphs?

I've searched endlessly on Google, but everyone has the opposite problem to me. My MRTG graphs from my provider (LayeredTech, for what it's worth) are consistently lower than what is reported by Apache logs and therefore tools like AWStats, Analog, Webalizer and the like. I figure this is due to people starting downloads (and thus HTTP 200 status codes), but cancelling their downloads.

The discrepancy is as much as 30GB a month (45GB in MRTG vs. 75GB elsewhere), so a solution to this would be greatly appreciated. Any pointers, AskMefi?
posted by PuGZ to Computers & Internet (7 answers total) 2 users marked this as a favorite
 
Your provider might be disregarding 206 codes. You will get lots of these with large file downloads, especially pdfs. There is (or was, I don't know if it has been fixed) also a bug with some version combination of IE and acrobat Reader that causes each data request (200 and 206) to be sent twice to the server. Maybe your Provider recognizes that and filters it out. Also your provider may also not count graphics loaded from pages and stylesheets.

You may have to configure your other software to disregard these hits.

Web statistic analysis is such an inexact science. Different analysis tools report different things in different ways. It's hard to help you without knowing more and seeing your setup.
posted by chillmost at 5:35 AM on April 5, 2007


Response by poster: Hm. The provider only gives me MRTG graphs (it's an unmanaged dedicated box in a datacentre), so I trust that their data is accurate. I would imagine that these statistics programmes would take into account range requests for 206 code files? Perhaps not. I'll have to research that.

I know that those three tools (AWStats, Analog and Webalizer) all give me the same results, which leads me to believe there's not much hope. Then again, I know it must be possible because others can do it!

I hope that helps somehow. I can do any research into my setup that you think might help. :-)
posted by PuGZ at 6:00 AM on April 5, 2007


Just remember, the provider's MRTG graph is generated from the byte counter on their switch port. (Or perhaps using flow data, if you don't have a dedicated port)

Anything your analysis software does will be an estimate, based on whatever Apache puts in the log files for the size of the data returned in response to a request, which doesn't count headers, as far as I can tell. As long as Apache reports the total file size when it doesn't send the whole file, there's not much the log analyzers can do since the data just isn't there to begin with
posted by wierdo at 6:19 AM on April 5, 2007


Best answer: Are you compressing with mod_gzip or the like, so that the actual sent data is compressed, so MRTG reporting on actual i/o usage of your data port will only see the gzipped content but your apache log will report on uncompressed?
posted by cmm at 6:26 AM on April 5, 2007


I was going to say what cmm just said... that's your most likely issue...

MRTG is far more trustworthy than AWStats.
posted by twiggy at 6:55 AM on April 5, 2007


Response by poster: Yeah, I know that, that's why I want to fix the stats to match MRTG's output - which I know to be correct.

cmm's solution might explain it, actually! I send out a *lot* of text files.
posted by PuGZ at 6:32 PM on April 5, 2007


Response by poster: Further research shows that it's not mod_gzip funny business. Hm.
posted by PuGZ at 7:48 PM on April 5, 2007


« Older My tax dollar at furlough   |   Why am I getting dumber? Newer »
This thread is closed to new comments.