Advertise here: Contact FM.


How to begin on this multi user PHP stats project?
December 5, 2006 12:24 PM   RSS feed for this thread Subscribe

How would you write an application in PHP that tracks multiple users, and the users' multiple sites?

I need to write something that will track the stats of users' sites. Information such as: referrer, date/time, pages visited, length of session, search engine and search terms needs to be kept. A user should be able to enter a date range to see specific data over time.

There are plenty of stats packages out there, and they're fairly easy to write. However, I haven't found any that were made with the multi-user, multi-site problem in mind.

How best to store the data?

Throw it in an Apache like log, and only parse it when the user makes requests?

Put it straight into a mySQL DB? But then what would be a good way to keep the date/time data intact without having an enormous amount of rows to parse?

I'm an intermediate PHP developer, and this is the first project where I really couldn't think of the best place to start.

PS: When I say "multi user, multi site" I'm talking about a Google Analytics type setup. They have a very large amount of users, and an even larger amount of sites belonging to each user. However, my project is MUCH smaller in scale.
posted by gradient to computers & internet (7 comments total)
So do you have access to the server logs? Or is your PHP code going to be sitting on all of the pages, or are you collecting information with Javascript or something else? The question sounds kinda wide open. You'll probably get more answers if you give more details.
posted by miniape at 1:02 PM on December 5, 2006


Well, you've got a place to start. Your storage media is as good of a place as any.

The db method has a problem in that your indexes and tables are going to get *huge*. The file method is slow to parse and return data when you give it arbitrary date ranges. If you're only going to have a few sites on this thing, db is not a bad way to go. If you're going to have a large number of sites, I would use flat data files and then parse your most frequently used reports into other flat files at a set time each day (Generally in the wee hours when your incoming data load is the lowest) ...and then parse the other, dynamic reports on the fly.

With the file method, you need to be aware of file locking problems. You might need to use something like syslogd or another pipe that keeps your data files open, and simply accepts data from your many web server processes that are collecting the reports. With the DB method, you run into latency with your database server(s) doing inserts in high load situations and exceeding the maxiumum resources (drive space, memory for open tables) ... although it'd make the on-demand reporting a snap, because it'd just be a query with counts and different group by clauses.

But in short, the way that you split the data is that the site 'knows' who it is when it's reporting the data to the storage engine, and the storage engine stores it appropriately -- appropriately in this case being in whichever method you choose.
posted by SpecialK at 1:11 PM on December 5, 2006


miniape, I think he's talking about a system like google analytics where there's a bit of code, or a generated image, or something else that's sitting on each page of many websites on many servers in many locations around the world. i.e. the old 'invisible gif' stats collection method.
posted by SpecialK at 1:15 PM on December 5, 2006


I have access to everything. This is going to be a module for a larger site with user generated content. I can really do absolutely anything I want.

So far the file method is starting to sound like a good method. I believe Urchin 5 uses this method by parsing the Apache logs at the intervals you give it. I'll probably look into how it functions and stores its data.

I suppose I could store each day's data in a new file (since sorting by anything smaller shouldn't be necessary) which would allow me to select date ranges easily without actually parsing through long files. Then it's just a matter of parsing each selected days data and generating reports.

However, this method would cause a lot of disk access if users are switching date ranges, or refreshing the page a lot. I supposed I could generate a static report like Google Analytics does for time ranges, but that can be very annoying to the end user. They'd want their data now, not when the server has time. Any suggestions for caching results or otherwise making viewing the statistics less resource intensive?

Thanks a lot for the responses so far, this is helping me think it through!
posted by gradient at 3:13 PM on December 5, 2006


Yeah, don't provide detailed statistics for large date ranges. ;) Google Analytics only provides summary data, and that data was easy to parse into summary files.
posted by SpecialK at 3:36 PM on December 5, 2006


Any suggestions for caching results or otherwise making viewing the statistics less resource intensive?

Smarty allows fine-tuning for caching parts or all of a page.
posted by and hosted from Uranus at 5:53 AM on December 6, 2006


Cache Lite is a great little caching system, easy to implement. I've done something similar to what you're doing, remember that 3rd party cookies make browsers unhappy. I used mySQL to store the info but had a script move the data out periodically so the tables didn't get huge.
posted by bertrandom at 5:44 PM on December 8, 2006


« Older I'm looking for a book retelli...   |   What are good places for break... Newer »
This thread is closed to new comments.