Help with stats
June 6, 2007 7:49 AM Subscribe
Why is there such a difference between Google's Analytics, awstats and webalizer? Google analytics says my site received just under 5000 visitors for the month of may, awstats says I received just under 28,000 and webalizer says I received 65,001 visitors. Which one is the most accurate? Which one do I point advertisers to?
there will always be a big gap between JavaScript-based page tagging analytics tools (Google Analytics) and a log-based tracking tool (awstats). I'd recommend reading Web Analytics Demystified to get a more detailed understanding of how these different systems work and the implications they have for tracking site activity.
A number like 65K from webalizer seems more like total page views versus visitors, it's far too out of whack. Check to make sure that each system is actually measuring the same thing first of all - visitors, unique visitors, visits, unique visits, page views - all mean something different.
posted by GuyZero at 7:53 AM on June 6, 2007
A number like 65K from webalizer seems more like total page views versus visitors, it's far too out of whack. Check to make sure that each system is actually measuring the same thing first of all - visitors, unique visitors, visits, unique visits, page views - all mean something different.
posted by GuyZero at 7:53 AM on June 6, 2007
Yes, and as chunking express points out, page tag based solutions will generally ignore all robot/spider activity (although I did once see a web spider that executed javascript on pages - kind of odd). So it could be as simple as that.
posted by GuyZero at 7:54 AM on June 6, 2007
posted by GuyZero at 7:54 AM on June 6, 2007
To elaborate on GuyZero, Google Analytics uses JavaScript, so only "real browsers" will register on it, whereas things like spambots and search engine crawlers (and people with ancient browsers) don't.
awstats and webalizer work by looking at your logfiles, and will thus include all hits, human or not. I have no idea why they're coming up differently, though.
posted by fogster at 9:10 AM on June 6, 2007
awstats and webalizer work by looking at your logfiles, and will thus include all hits, human or not. I have no idea why they're coming up differently, though.
posted by fogster at 9:10 AM on June 6, 2007
...people with ancient browsers, people with javascript disabled, people using browsers like lynx, people who use plugins or extensions that only allow javascript from certain trusted sites...
JavaScript is an even worse metric than the log files themselves. At least the log files are measuring a defined something that is measurable. The number of requests to your site or for a given file that made it to your server.
posted by wierdo at 9:12 AM on June 6, 2007
JavaScript is an even worse metric than the log files themselves. At least the log files are measuring a defined something that is measurable. The number of requests to your site or for a given file that made it to your server.
posted by wierdo at 9:12 AM on June 6, 2007
As others have noted, Google Analytics, by it's nature, tends not to count bots and spiders (unless they have a working javascript engine, and even then, Google probably has enough data to filter many of them out). In addition, Analytics uses a cookie to better identify unique visitors.
AWStats doesn't use a cookie or a javascript tracker. It just works from the log file. It identifies visits (a visit is typically all the page requests from a single IP where the gap between requests doesn't exceed a certain period of time, often 15-30 minutes). It also identifies visitors, probably by counting the number of IP addresses, though it could go further and look at the combo of IP address + user agent string ( typically browser type, version + OS info).
Webalizer doesn't claim to report visitors at all, it reports visits, probably using a technique similar to the one I described above, though it may use different time windows, which could cause different results.
posted by Good Brain at 9:14 AM on June 6, 2007
AWStats doesn't use a cookie or a javascript tracker. It just works from the log file. It identifies visits (a visit is typically all the page requests from a single IP where the gap between requests doesn't exceed a certain period of time, often 15-30 minutes). It also identifies visitors, probably by counting the number of IP addresses, though it could go further and look at the combo of IP address + user agent string ( typically browser type, version + OS info).
Webalizer doesn't claim to report visitors at all, it reports visits, probably using a technique similar to the one I described above, though it may use different time windows, which could cause different results.
posted by Good Brain at 9:14 AM on June 6, 2007
Google analytics also misses counting those of us that use the NoScript firefox extension to disable javascript...
posted by Arthur Dent at 9:28 AM on June 6, 2007
posted by Arthur Dent at 9:28 AM on June 6, 2007
I'd bet that most sites only have a tiny number of users that use lynx or disable/filter javascript.
More likely that their ISP routes page requests through a caching proxy, so they'll never hit your server when they revisit the same page during the same visit, or if the page is already in the cache due to a previous visit, or a recent visit by another user.
Any web metric is going to have significant sources of error, which is why there are so many different approaches.
posted by Good Brain at 9:28 AM on June 6, 2007
More likely that their ISP routes page requests through a caching proxy, so they'll never hit your server when they revisit the same page during the same visit, or if the page is already in the cache due to a previous visit, or a recent visit by another user.
Any web metric is going to have significant sources of error, which is why there are so many different approaches.
posted by Good Brain at 9:28 AM on June 6, 2007
...people with ancient browsers, people with javascript disabled, people using browsers like lynx, people who use plugins or extensions that only allow javascript from certain trusted sites...
In otherwords, almost no one. You are probably loosing a very negligible number of your visitors when tracking via javascript, unless your site caters to one of the niches listed above. (i.e. your site is popular amongst the blind, the paranoid, etc...)
I still think the Analytics numbers are probably the most accurate of the lot.
posted by chunking express at 10:01 AM on June 6, 2007
In otherwords, almost no one. You are probably loosing a very negligible number of your visitors when tracking via javascript, unless your site caters to one of the niches listed above. (i.e. your site is popular amongst the blind, the paranoid, etc...)
I still think the Analytics numbers are probably the most accurate of the lot.
posted by chunking express at 10:01 AM on June 6, 2007
The last stats I saw where someone tried to measure javaScript enablement by serving unique visitor cookies and trying to send them back via JavaScript (and seeing which ones never came back), something like 95%+ of web visitors have JavaScript enabled.
For mass-market sites, page tagging is generally a better solution, unless you have a very fancy back-end which serves unique cookies and tracks visitors itself.
posted by GuyZero at 10:36 AM on June 6, 2007
For mass-market sites, page tagging is generally a better solution, unless you have a very fancy back-end which serves unique cookies and tracks visitors itself.
posted by GuyZero at 10:36 AM on June 6, 2007
I was curious about Google Analytics, so I looked around their site. I found the signup page which reads:
"(5M pageview cap per month for non AdWords advertisers.)"
Maybe this is why Google is only reporting 5k?
posted by boreddusty at 11:36 AM on June 6, 2007
"(5M pageview cap per month for non AdWords advertisers.)"
Maybe this is why Google is only reporting 5k?
posted by boreddusty at 11:36 AM on June 6, 2007
Awstats breaks out how many hits you got from bots in a special section. Try subtracting that out and see if your numbers are more similar to google analytics.
For reasons mentioned above, there's a lot that google analytics won't capture. I wish google would release a logfile analyzer that presented data in as nice a way as their site does... it so kills awstats, but I feel more comfortable with the numbers from awstats because it's from my logfiles so it's more complete.
posted by twiggy at 12:33 PM on June 6, 2007
For reasons mentioned above, there's a lot that google analytics won't capture. I wish google would release a logfile analyzer that presented data in as nice a way as their site does... it so kills awstats, but I feel more comfortable with the numbers from awstats because it's from my logfiles so it's more complete.
posted by twiggy at 12:33 PM on June 6, 2007
it's from my logfiles so it's more complete.
Not necessarily. Logfiles record partially delivered pages, that users may not actually see. Also, logfiles do not record when there's a caching proxy or when users revisit a page and it comes straight from the browser cache.
Neither technique is perfect.
posted by GuyZero at 12:40 PM on June 6, 2007
Not necessarily. Logfiles record partially delivered pages, that users may not actually see. Also, logfiles do not record when there's a caching proxy or when users revisit a page and it comes straight from the browser cache.
Neither technique is perfect.
posted by GuyZero at 12:40 PM on June 6, 2007
Er, do be careful with your terminology here; a "Visitor" probably implies a unique user (what webalizer calls a "Site", though that's really a count of unique IP's, which isn't the whole story thanks to proxies and dynamic IP's), while each "Visit" is a single session from one of those users.
Even if you are comparing the right numbers, ultimately they're all poorly defined ad-hoc metrics, and there's a lot of educated (and not) guessing going on working out what hits are from the same user, and during the same session. I'd expect advertisers are well aware of this and have sufficient supplies of salt to cope.
Are you sure they wouldn't prefer a raw count of hits to content pages/day and unique IP's/day, or some similar count of something which isn't going to differ by a factor of 10 depending on how you squint?
posted by Freaky at 3:59 PM on June 6, 2007
Even if you are comparing the right numbers, ultimately they're all poorly defined ad-hoc metrics, and there's a lot of educated (and not) guessing going on working out what hits are from the same user, and during the same session. I'd expect advertisers are well aware of this and have sufficient supplies of salt to cope.
Are you sure they wouldn't prefer a raw count of hits to content pages/day and unique IP's/day, or some similar count of something which isn't going to differ by a factor of 10 depending on how you squint?
posted by Freaky at 3:59 PM on June 6, 2007
This thread is closed to new comments.
posted by chunking express at 7:52 AM on June 6, 2007