How to programmatically access referrer data for an html page visit
August 2, 2007 6:29 PM   Subscribe

How can I programmatically access the "referring page" environment data for an html or shtml page visit? I'm trying to save to a database the referring page ("HTTP_REFERER") for all accesses to a specific .shtml page. I'd like it to be triggered by the page visit, or at least update frequently. Tried to call cgi and php scripts using server-side includes but it replaces the referer with the calling script. Is there some simple way to pass this data or get it into a form that can be programmatically manipulated?
posted by Manjusri to Technology (6 answers total)
 
If you've got logging enabled, it's pretty much a simple matter of parsing your web server's log for the page you're interested in, and writing out whatever record fields you want to a database of your choice. You certainly do this in PERL from a cgi script, run at whatever frequency you like. If you're looking for a place to start, grab a copy of Webalizer.
posted by paulsc at 6:37 PM on August 2, 2007


You can grab the referrer via javascript and then hit myloggingscript.php?page=foo&referrer=bar, so the php script would store them in a database.
posted by Firas at 6:58 PM on August 2, 2007


Best answer: I guess you can't chance the configuration of your Web server, or read its log files? You could put

<img style="display: none" src="/log_referers.cgi?<!--#echo var="HTTP_REFERER" -->">

in your .shtml page, and when a browser requests it, it will then request /log_referers.cgi with the referer in the query string.
posted by nicwolff at 7:33 PM on August 2, 2007 [1 favorite]


Response by poster: nicwolff's technique works and is exactly what I was looking for.

I do have access to the server logs (its shared hosting). That would seem a better alternative, except that I'd like to update as often as every minute and the log files are rather large. I'm thinking that grepping the whole log file every minute or so to get the new records would be a bad idea. Unless there is a better way to extract the new records from the log?
posted by Manjusri at 10:27 PM on August 2, 2007


"I'm thinking that grepping the whole log file every minute or so to get the new records would be a bad idea."

An old Pentium MMX machine with a 2 GB SCSI drive I have does a 10 meg log file in about 15 seconds using Webalizer, while serving 20 to 30 connections a minute on a database backed Web site. That's old hardware. A decent shared hosting set up shouldn't be too taxed.

Even on a shared host, piping -tail to restrict your query to the last x lines of the log is simple. Why would you grep the whole file for each update?
posted by paulsc at 11:52 PM on August 2, 2007


Best answer: Uh, how would he know how many lines to ask "tail" for? You could pipe "tail -f" to a long-running process, but that will keep tailing the old file when the log is rotated, so you'd have to restart it from the log-rotating script.

Instead, use Perl and the File::Tail module. I do this on a very busy maillog to keep track of IPs that have recently authenticated for POP, which I then allow to connect for SMTP.
posted by nicwolff at 1:07 AM on August 3, 2007 [1 favorite]


« Older Advise on camping burners?   |   Those Damn Anxiety Attacks Newer »
This thread is closed to new comments.