Apache/GrepFilter: Number Four Son, He Never Calls
November 4, 2007 8:17 AM   Subscribe

GrepFilter: How to pull the last 10 authusers from an Apache log file, without all the noise?

This is why I flunked out of compsci.

I made a little extranet web thingie for my world-scattered family to use to stay in touch and so on. It works great, technically, but of course it's a pain always nagging everyone to use it. So I'd like to use the power of passive-aggressive love, by just showing the last time each user has logged in on the home page. Then my family can all guilt each other into being more regular.

Everyone's an htaccess user, so the info is right there in the apache logs. But I'm out of my depth here. I'm looking for a quickie grep/shellscript/perl way to pull the last ten unique authusers (htusers) from a standard apache logfile, with dates.

My regular old apache log looks like this:


in.log
====
74.53.68.133 - no1son [01/Nov/2007:01:13:15 -0400] "GET /page.htm HTTP/1.1" 200 - "referrer stuff"
74.53.68.133 - no1son [01/Nov/2007:01:13:31 -0400] "GET /page1.htm HTTP/1.1" 200 - "referrer stuff"
64.13.232.151 - no2son [02/Nov/2007:12:13:01 -0400] "GET /page.htm HTTP/1.1" 200 - "referrer stuff"
68.178.232.99 - no2son [02/Nov/2007:12:13:11 -0400] "GET /page3.htm HTTP/1.1" 200 - "referrer stuff"
68.178.232.99 - no2son [02/Nov/2007:12:13:22 -0400] "GET /page.htm HTTP/1.1" 200 - "referrer stuff"
74.53.68.130 - no1son [03/Nov/2007:14:13:36 -0400] "GET /page.htm HTTP/1.1" 200 - "referrer stuff"
74.53.68.130 - no1son [03/Nov/2007:14:13:37 -0400] "GET /page4.htm HTTP/1.1" 200 - "referrer stuff"
68.178.232.99 - no3son [04/Nov/2007:03:13:32 -0400] "GET /page9.htm HTTP/1.1" 200 - "referrer stuff"
68.178.232.99 - no3son [04/Nov/2007:03:13:35 -0400] "GET /page3.htm HTTP/1.1" 200 - "referrer stuff"
64.13.232.151 - no2son [04/Nov/2007:09:13:36 -0400] "GET /page.htm HTTP/1.1" 200 - "referrer stuff"



And I want to turn that into this (just a text file) on demand:


out.txt
=====
Most Recent Logins By:
-----
no1son 03/Nov/2007 14:13:37
no2son 04/Nov/2007 09:13:36
no3son 04/Nov/2007 03:13:35
no4son never


So I'd like a shell script or something (perl?) that I can invoke when I feel like it, or from a cron job, or something. Notice that number four son, he never logs in at all. I need to cover that, maybe by calling the script with a list of uids I'm looking for?

But my perl is poor, and my grep skills worse. I can get the last lines from each user out, I think, but then I hit the wall of ripping apart the "fields" usefully and the wall of trying to grep (?) out stuff with square brackets and dashes... and those walls HURT my poor little head.

Can someone show me a simple way to do this? If the big parts work, I can muddle my way through refining it. But a couple hours of Google has just left me feeling inadequate, like a bad son. :(
posted by rokusan to Computers & Internet (10 answers total)
 
something like this would probably do what you want:

cat foo.log | awk '{print $3}' |grep -v - |tail -n 10
posted by roue at 8:30 AM on November 4, 2007


Best answer: Using a proper log parser would be better, but in this case enough information can be pulled out with a simple regular expression.


% python rokusan.py <> no1son 03/Nov/2007:14:13:37 -0400
no2son 04/Nov/2007:09:13:36 -0400
no3son 04/Nov/2007:03:13:35 -0400
no4son never


#!/usr/bin/python

all_names = ['no1son', 'no2son', 'no3son', 'no4son']

import sys, re
pattern = re.compile(r"\S+ \S+ (\S+) \[(.*?)\]")

last = {}
for name in all_names:
    last[name] = "never"

for line in sys.stdin:
    m = pattern.match(line)
    if not m: continue
    last[m.group(1)] = m.group(2)

for who, when in sorted(last.items()):
    print who, when
posted by jepler at 8:34 AM on November 4, 2007


argh, I was bould to botch something in there. The line to run the program was:
% python rokusan.py < rokusan.log
posted by jepler at 8:35 AM on November 4, 2007


Response by poster: Jesus, Jepler, that's beautiful. It's also completely opaque to me, though, so... that outputs the whole time/date as one glob.

no1son      03/Nov/2007:14:13:37 -0400

I'd rather have the three bits of info discretely, and lose the time zone...

no1son     03/Nov/2007    14:13:37

So rather than "who, when"... I'd like "who, date, time. "

Sorry, I know this is nitpicky and even more kindergarden than the original question, but damn, I wasn't expecting Python, wow. I'm definitely going to have to learn me some of that. It's pretty.
posted by rokusan at 9:17 AM on November 4, 2007


The requirement that you only post one line per username makes this a bit tricky. And there's no simple way to do the "never" part without hardcoding a list of usernames. Here's a hack for you that does the rest:

$ awk '{ print NR, $4, $3 }' < /tmp/log | sort -k 3 | uniq -f 2 | sort -n | awk '{ print $3, $2 }' | tr -d \\[

no1son 01/Nov/2007:01:13:15
no3son 04/Nov/2007:03:13:32
no2son 04/Nov/2007:09:13:36

Sadly, 3/4 of my programming these days seems to be awful shell hacks like this.
posted by Nelson at 9:19 AM on November 4, 2007


perl -e 'while(<>) { $l->{$1} = $2 if m,^\S+ \S+ (\S+) \[([^\]]+)\],; } for my $n in (qw/no1son no2son no3son no4sun/) { $l->{$n} ||= "never"; print "$n $l->{$n}\n"; }' <in.log
posted by Rhomboid at 9:29 AM on November 4, 2007


Best answer: Oh, I didn't see that you wanted the date and time separate:

perl -e 'while(<>) { $l->{$1} = "$2\t$3" if m,^\S+ \S+ (\S+) \[([^:]+):([^\]-]+),; } for my $n (qw/no1son no2son no3son no4sun/) { $l->{$n} ||= "never"; print "$n\t$l->{$n}\n"; }' <in.log
posted by Rhomboid at 9:33 AM on November 4, 2007


Response by poster: I gotta split the best answer award on that one, if only because now I'm inspired to learn python. Rhomboid's was the kind of arcane answer I expected, Jepler's is just plain cool, though. Nelson loses for getting the FIRST occurence of each user, rather than the most recent, though I've no doubt he could do it either way. Thanks guys!

AskMeFi needs tip jars.
posted by rokusan at 9:42 AM on November 4, 2007


Since the date is printed in a fixed-width style, you can simply slice it up to get the day and time-of-day separately. Replace the final loop with this:

for who, when in sorted(last.items()):
    if when == "never":
        day = "never"
        time = ""
    else:
        day = when[:11]
        time = when[12:-6]
    print who, day, time

No tips are necessary.
posted by jepler at 9:39 AM on November 4, 2007


here's an awk/sed one-liner that'll do it... others have provided some really nice solutions, but i thought i'd put this out there.

cat apache-log | awk -F" " '/[a-z0-9A-Z]/ {print $3, $4}' | sed s/\\[//g | awk -F":" '{print $1 , $2":"$3":"$4}'
posted by The_Auditor at 3:15 PM on November 4, 2007


« Older Is it wise to setup a gray water system when you...   |   Nintendo Wii Games - same everywhere? Newer »
This thread is closed to new comments.