Perl is making my head explode
May 21, 2010 1:55 PM   Subscribe

Script-filter: I've been tasked with finding out who at our company is synchronizing to their home folders on our fileservers, and how often. Although initially it was recommended I do this in perl, my head is blowing up trying to get it to work. Lots of blundering attempts to ask for help inside.

I hate scripting in perl. Hate it. I don't have an alternative scripting language I prefer. They all seem to make me react the same way. My head simply locks up and refuses to learn it, which I recognize as a very real problem. I can configure routers and switches all day long, and I've created shell scripts and batch files with very little trouble. But perl seems to be a different animal entirely for me.

Let's say for this we have 6 locations, all meshed via VPN and all accessible on the domain. Each location has a Win2k3 file server. On each file server, there is a \home\department directory for each department. In each \home\department directory, there is a "Users" folder. And in each "Users" folder, there is a folder for each user in that department.

My task as it was given to me: Find out who is using their home folder, and how often. To do this it was suggested I create a script that will go through these User directories (and subdirs) and pull up the last accessed (or modified) file date. Provide easy to understand results. So if I last synchronized my home directory on 5/19/2010, I would want an email or a text file resulting from the script to show:
Routergirl - 05/19/2010.

It would be nice, even, if it came back with something like:
Fileserver - Engineering - Routergirl - 05/19/2010.

My current script has had me baffled for several days, and when I go to people in the company who know more about perl than I do, they seem to either want to run with it and create a new one on their own that still doesn't work, or they're too busy to help me. The person who has been the most help is my brother, who is currently 3k miles away and busy with his own job. Although perl was initially recommended by my boss, he is perfectly willing to accept other methods that provide the hoped for results.

Is there a better way to do this? Could it be that perl is just the wrong solution? Maybe there is some automated free program we could be using for this that would save a load of headaches. I'm not convinced that checking last modified or last accessed date on the files will even be a good check for this. People could circumvent it by browsing to their home folder on the fileserver and just modifying a file, therefore not actually synchronizing, but bringing up current results. Maybe there is something that can actually check when a user last accessed the directory itself?

Entirely open to any and all suggestions. I realize this question is fairly scattered. If it comes down to all of you saying, "Yes, perl is the best way to do this," then I know to sign myself up for some force feeding...errrr....classes. If anyone can provide resources on existing scripts that do this exact thing already (my google fu has failed me so far - they're all just enough different that they won't work), that would be much appreciated. If anyone knows of a better way - please tell me, cos maybe I'm just missing the obvious, and I've been known to overthink.
posted by routergirl to Computers & Internet (15 answers total)
perl isn't your immediate issue. Perl is just duct tape.

What you first need to figure out is *exactly* how to get the answer you need - is checking last-modified-date the right way to do it? What if the user is syncing the other way, and the activity on the share is read-only? Do you have a record of last-accessed date at all? Log files that show when the share was last accessed? anything like that?

Don't even bother figuring out what to write it in until you have the answer to that question.

After that, I'd use whatever the standard scripting language is at your employer for this kind of task - that varies from company to company.

FWIW, perl is *fantastic* for this kind of stuff, but that's not your immediate issue.
posted by swngnmonk at 2:05 PM on May 21, 2010 [2 favorites]

I have no great love for Perl, but this task isn't impossible using it.

That aside, define "synchronization." If you mean that "every file in every subdirectory here is identical to every matching file in subdirectories over there with no differences in files," that's far different from just noticing when there's a recent change.

The former involves recursing through directories in lockstep and running a binary comparison between each file, as well as looking at the difference of sets between files in directories. Dates aren't perfect for this — someone intent on skirting the system could touch a file to make the date on it current.
posted by adipocere at 2:07 PM on May 21, 2010

This is definitely a task that could easily be accomplished with a shell script. I know for sure I could do it in a few lines of bash. I'm not a Windows guy, but I understand Powershell is similarly capable. A cursory Google search suggests something like: get-childitem –recurse | where-object {$_.lastwritetime etc, etc}
posted by signalnine at 2:14 PM on May 21, 2010

Well, this comes from users not synchronizing their home directory, and then complaining when something happens to their laptop and they lose data. Our company policy is that anything in your home directory is backed up, so if your laptop blows smoke we can get you back up and running as soon as possible. Over the last few months we ran into some users who had forgotten their home folders even existed. We're going through now and trying to be proactive and determine who is not even using their home directory so we can fix whatever is stopping them and/or educate them as needed.

swngnmonk: I agree about the perl being duct tape. You're spot on, and I think it's something that was dawning on me as I wrote up the question. It was handed to me as "This is what we want, and here is a recommendation." If I had just taken the "This is what we want," I might have skipped perl altogether. We don't have an official company scripting language. We have people from all over the place, and my supervisors have said - "Use whatever scripting language you feel comfortable with." We just happen to already have a few perl scripts we use. I wish it didn't make my head lock up. Maybe digging into my reticence to learn it will be my next step.

adipocere: We're not being that specific about it. In fact, my boss hasn't even considered people logging in and changing last-modified date on files. He just wants me to get the first part done, which is "Are they using their home directories?"
posted by routergirl at 2:19 PM on May 21, 2010

A basic stab at this in Perl could be:
my $root = "\home\Department\Users";
my $rootdh;
foreach my $dir (readdir($rootdh)) { 
  my ($atime,$mtime) = (stat("$root\$dir"))[8,9]; 
  print join " ", "$root\$dir", (scalar localtime $atime), (scalar localtime $mtime), "\n";
This is getting the last access time and modified time on the directory itself, not looking for the most recently modified file in the directory. I'm not a windows person, so you should carefully double-check what stat returns on Windows, and what it means (for directories and files, which could be different.)
posted by Zed at 2:22 PM on May 21, 2010

I'll throw in some other stuff. If you do stick with Perl, and you're running this on Windows, please pick up the Dave Roth Win32 Perl Programming book. It is very handy for these sorts of tasks.

I'll throw in a hand-waving description of something I wrote in Python, recently, as sort of a curiosity satisfier — given a directory tree, how do I find unique files?

First, I recursed through the whole directory tree and built a nice little data structure to house what I had. Then I placed every file in one big group in the data structure. For my first pass, I looked at filesizes — if someone had a different filesize, they went into a different group. At the end of it, I had a bunch of groups based on filesizes. Then on my next pass, I went over each group and took the md5 hash of the first 1% of the files in each, separating groups out that way even more.

Then on my second pass, I did it with one of the sha variants. Third pass, 10% into md5. Fourth pass, 10% into sha. Fifth pass, 100% into md5. Sixth pass, 100% into sha. Only on the last pass did I do a binary comparison within the tiny groups left.

It was ridiculous overkill, for fun, but I probably saved myself a lot of big expensive binary comparisons. Similarly, I think, I think a multipass (no, not LeLu) approach might help. Compare first on filesizes. If they're different, you don't even need to do a binary comparison. If they're the same, yeah, it might warrant further inspection.

Oh, and I think there's a Windows function, buried deep, that allows you to wait for changes in a directory. Maybe that is what you want, if you just want to know they touched their syncho-directories.

If you get stumped with some Perl network sharing stuff, let me know — I could dust off some old code.
posted by adipocere at 2:25 PM on May 21, 2010 [1 favorite]

I'm finding that in Windows if I look at Last accessed, it's pointless, because as soon as I open the directory to check the last accessed...well. Yeah.

Initially I was just looking at the user's directory last modified date. But what I found was that the last modified on the directory isn't always showing up correctly. You would think that if I put a new text file in my directory on 5/20, the last modified for the routergirl directory would be 5/20. But it isn't, which totally threw me off for a bit. It frustrated me, too, because it added the necessity to dig into deeper directories and check all files. Which is where my script is currently getting jammed. If it hits a subdir with space in the name or even a file with a space in the name, it appears to loop indefinitely. If anyone's interested in looking at what I have, I'd be happy to send it out. If you promise not to laugh. Or at least promise that if you do, you'll laugh to my face. I want this to be easy, but it *has* to be accurate.

/taking notes on all suggestions. Thanks guys. Keep it coming.
posted by routergirl at 2:37 PM on May 21, 2010

For recursive directory walking, File::Find is your friend. This, for instance, gives you the most recently modified file under some given dir:
use File::Find;
my $max = 0; 
sub maxtime { 
  my $mtime = (stat($File::Find::name))[9]; 
  $max = $mtime if $mtime > $max; 
find(\&maxtime, "/your/dir/here"); 
print scalar localtime $max, "\n";

posted by Zed at 2:50 PM on May 21, 2010 [1 favorite]

I suspect the answer might actually be to enable auditing on the server for the User folders - and then focusing the scripting (possibly in powershell or vbscript rather than perl.... I know lots about perl but not a lot about perl on windows - as long as you can get at the event logs into something plaintext perl can work with , then perl will do fine).

That should give you a pile of auditing data bout all types of access to files and folders to mine the data you want out of.......
posted by TravellingDen at 4:49 PM on May 21, 2010

TravellingDen's suggestion has a Perl library, Win32::EventLog, to back it up. I'm pretty sure that there's an entry in the Dave Roth book for it.

So the good news is that you have at least three ways to approach the problem:

1) Auditing plus processing the Event Log
2) Recursing through files and picking out data
3) Directory change notification
posted by adipocere at 5:08 PM on May 21, 2010

Some hacking................

This powershell bit might be something you an hack away at to get you the results you want....

Get-ChildItem -Recurse |sort-object -Property LastWriteTime -Descending | select-object -First 1 | Get-ItemProperty -Name LastWriteTime,FullName | format-table -Property LastWriteTime,FullName -Autosize

Wrap that into a full powershell script, include the list of servers to run through (it can connect remotely) - the usernames should be obvious from the folders.

You could throw in a few more operations - like how much is in each folder and whatnot too.... powershell is crazy powerfull, though I haven't used it much.
posted by TravellingDen at 5:28 PM on May 21, 2010

An irritant with Perl is that since it has no REPL (interactive command line), it can be irritating/hard to detect / correct minor mistakes. Python and Ruby (among others) don't suffer this limitation. I'd probably do this in Python, given my skillset (using os.walk, and the os, sys modules).

All of that is of course tangential to the "what are you actually trying to check" business though, as others have pointed out!
posted by gregglind at 9:37 AM on May 27, 2010

Perl has a REPL.
posted by Zed at 9:44 AM on May 27, 2010

Zed, news to me! I stand corrected :)
posted by gregglind at 3:04 PM on May 27, 2010

Thank you, everyone. I got a bit sidetracked with another project, but will have to come back to this one soon, and this will be a load of help. Oh yeah, I'll also be picking up Dave Roth's Win32 Perl Programming book, along with taking a basic perl class.
posted by routergirl at 7:04 AM on June 1, 2010

« Older Acquiring industrial design skill   |   You were in Boston and you didn't see _____?! Newer »
This thread is closed to new comments.