I've got the low-down weblog blues
October 20, 2006 3:56 PM   Subscribe

Hullo. I'm compiling some weblog traffic reports for a site running on two servers (for load balancing). As they are wont to do, each server has collected differnt data...

I'm ok as far as the hard numbers go (visitors, views, etc.) but I'm having trouble with figuring out how to present data on things like top ten pages visited, top referrers, etc., since the data for each server is different. Unfortunately these older weblogs are crunched in Webalizer, not a particularly robust compiling program. Any tips or advice from anyone else who does these types of reports about how to lay this out clearly for a persnickety client?
posted by missmobtown to Computers & Internet (5 answers total)
 
Could you use a simple table format, and just do two columns when data from the two servers is different? Label them "Server 1" and "Server 2" at the top, then do rows for whatever variable you're looking at (Top Page #1, Top Page #2, etc.)?

If I am understanding your question wrong, let me know.
posted by lisaj32 at 4:11 PM on October 20, 2006


Response by poster: I should also add that I'm doing quarterly reports for 2005, so everything is in 3-month chunks.
posted by missmobtown at 4:20 PM on October 20, 2006


Do you have the original log files? When I used to perform reporting on load balanced web servers, I would pull the logs to my reporting server, cat them all together there, and then sic Urchin Stats on them. Urchin would pay attention to the log line timestamps, not caring that they were out of order in a given file, and create stats based on that as though they were for one server.
posted by autojack at 5:06 PM on October 20, 2006


What autojack said- the key question is whether you can get access to the original log files, and if they are in parseable format (should be- all webservers typically log to some kind of CSV).

If so, simply archive the logs to a central location (not a bad idea anyway- the webservers shouldn't be carrying more than a short period of recent log information on them), and sic something like Logparser on the logs. It will handily do multi-log input and produce any SQL-style query you could think of, including top N style reports, time-quantized reports (i.e., total of each http status code by minute/hour/day, top files requested, top 404 requests, etc) and any other variations you can think of from multiple log files. It also has nice capabilities of producing graphed output vs. csv/tsv/xml textual output, or even exporting directly to a SQL table.

The logparser site also has some examples of people doing things like autocharting and reporting using logparser to build those views automatically, and in near-real time. It has a checkpointing feature so it can re-analyze a growing file faster by jumping ahead to the last point it read (which is great for a web log)
posted by hincandenza at 5:33 PM on October 20, 2006


Hrm. Again, you might need access to the log files themselves, but AWStats has the capability to deal with multiple log files from load balanced servers, and it provides more interesting detail than Webalizer. Might be worth looking into.

What data is the client asking for? Are they asking for some crazy metric stuff that is commonly found in Webtrends reports? (Webtrends = expensive, and a huge pain in the ass. We deal with it far too often.)
posted by drstein at 8:39 PM on October 20, 2006


« Older Ttttrrrrrrrrrrrr   |   What is native? Newer »
This thread is closed to new comments.