Hey, watch this!
January 3, 2007 6:57 PM   Subscribe

Is there a way to monitor internal (i.e. behind the firewall) web pages for changes? I’m looking for a piece of software that works like WatchThatPage, but running from my PC (which is on the corporate LAN via VPN) instead of as an external service.

This would be a nifty utility if it exists. I use information aggregation services (like MileageManager to track my frequent flyer miles) but they require me to put in my username and password for each site, and I’m not comfortable doing that with, for example, financial info (or risking getting fired for putting my corporate login info out on third party provider). I have cookies enabled so I can login to lots of sites automatically and such a service, running from my own PC, ought to enable me to access such sites (and check for changes to specific web pages) as well, something WatchThatPage can't do.
posted by JParker to Computers & Internet (7 answers total) 1 user marked this as a favorite
 
So, you need to clarify what you're looking for, and what it is you're monitoring and what that means. The example you gave is an external site, but you mention in the main post monitoring internal pages, which I assume means intranet/hosted on the local network. Can you clarify that?

At some level, you're either going to script something (which isn't hard, a quickie vbscript using winhttp will work fantastically well), or use macro-based software to drive an actual browser session (if you're going to rely on saved cookies, etc). There may be tools to auto-grab webpages, but a script is easy, and once you have it you can make it do just about anything you want.
posted by hincandenza at 7:31 PM on January 3, 2007


Response by poster: I would be happy just to be notified by email on a daily basis when the content of a particular web page has changed. My company has thousands of web pages, and over a dozen separate sites focused on what I do, so checking them manually is just impossible.

Content aggregation would be a bonus, but that's clearly more fully developed app functionality and not a requirement.

Your suggestions both seem feasible, but I don't know vbscript and don't know of any pre-built browser add-ins that do this. Basically you're saying this is something I would have to write myself? If so, well ... continuing my self-education is on my new year's resolutions list.
posted by JParker at 7:59 PM on January 3, 2007


So, if the sites internally don't require login (or even if they do, actually), the winhttp object in vbscript is incredibly easy to use- I keep recommending it for just these kinds of questions.

Basically, I would approach this as a two-pronged problem. One, something that will go out, grab the page, and save it to file, for a list of many files. Two, a separate batch or whatever that will do a compare of yesterday's pages to today's, and notify you if there are any differences.

If you have thousands of pages to check (although at that point, you'd really want to be doing webserver log parsing to look for failed pages- is the point simply to find out whenever any page changes?), and want to know when they change on a daily basis, here's how I would do it:
1) Set up a schedule task from your PC once a day to run a vbscript that will consume a list of URLs from a file, and do a simple vbscript that will output the page response to file. The list of sites would have to be something like this as list.txt. Each line would be some kind of friendlyname (no spaces or special characters), a comma, and then the actual link:
GoogleHome,http://www.google.com
YahooHome,http://www.yahoo.com


I've whipped up a simple vbscript that does just this, which I'll post shortly- you can go the Windows Scripting Center to get loads more info including a great Windows Script Host 5.2 downloadable help file that is an incredible resource for quickly learning vbscript. Search on winhttp if you want to learn more sophisticated tricks of winhttp.

2) Have a totally separate scheduled job that simply does a diff of the files from one day's scan to the next (if the file is a different size is probably good enough). That job is responsible for finding differences, and then emailing you, or otherwise leaving a "what's different" list of files that you can easily check. It could be as simply as a batch file that does "for every file in one folder, find the same file in this other folder and see if they're different", or simply using the windiff or fc utilities to do the same thing. I'll leave that as an exercise for you to solve- it shouldn't be too hard.
posted by hincandenza at 9:45 PM on January 3, 2007


Best answer: Here is the vbscript. Save this text as a file called crawler.vbs somewhere on your system, along with a file called list.txt in that same folder where you put the list of URLs as I define them above. You might even try the simple two-URL list I have of google and yahoo, to try it out.

Then create a folder for storing the pages, such as d:\mypages. You should see you can adjust this location easily by editing the script in the obvious place at the beginning.

Now just run it, preferably from a command line- you should see it create a subfolder based on the currenttime in the parent folder you created above. Every page in the list should be downloaded. Now, this script has virtually no error handling, but that's something you can do- I think you'll find that vbscript is straightforward and easy to use, and you'll pick it up real quick with this as a basis to get started.

' Define root folder- this is where each day's "scan" will be stored
RootFolder = "D:\myPages\" ' this is the storage folder
myList = "list.txt" ' this is the list file

' Create necessary COM objects
set FSO = CreateObject("Scripting.FileSystemObject")
Set myHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
' on windows XP; if this errors out, try removing the .1

' Get current date, reformat it as the folder name mypages\YYYYMM_HHMM\, and create that folder
myFolderName = RootFolder & mytimeStamp(Now)
Set myFolder = FSO.CreateFolder(myFolderName)

'open the listfile
Set myURLs = FSO.OpenTextFile(myList, 1)

' Loop through the list until the file ends, i.e "atendofstream"
On Error Resume Next
Do while Not myURLs.AtEndOfStream

' split the line into a friendly name and the URL
myEntry = myURLs.ReadLine
arrItems = Split(myEntry, ",", -1, 1)
myFriendlyName = arrItems(0)
myLink = arrItems(1)

myHttp.Open "GET", myLink, False
myHttp.Send

' get the http status code and response text
myResponseCode = myHttp.Status
myResponseText = myHttp.ResponseText

'write the output to the file \friendlyname.txt
myOutputName = myFolderName & "\" & myFriendlyName & ".txt"
Set myOutputFile = FSO.CreateTextFile(myOutputName, True)
myOutputFile.WriteLine myResponseCode & vbCRLF & myResponseText
myOutputfile.Close
Set myOutputfile = nothing

Loop

wscript.quit

'-------------

' simply formats a date-time folder name
Function MytimeStamp(curTime)
myTime = curTime

myYear = DatePart("yyyy", myTime)
myMonth = DatePart("m", myTime)
if Len(myMonth) = 1 then myMonth = "0" & myMonth
myDate = Datepart("d", myTime)
if Len(myDate) = 1 then myDate = "0" & myDate
myHour = DatePart("h", myTime)
if Len(myHour) = 1 then myHour = "0" & myHour
myMin = DatePart("n", myTime)
if Len(myMin) = 1 then myMin = "0" & myMin


myTimeStamp = myYear & myMonth & myDate & "_" & myHour & myMin

End Function

posted by hincandenza at 9:50 PM on January 3, 2007 [2 favorites]


Response by poster: hincandenza,
Nice, I think that's exactly what I need. A little more complex than I was hoping for, but I get to learn something in the process. I appreciate your extensive answer. Thank you.
posted by JParker at 9:52 PM on January 3, 2007


No problem- if you have other questions, go ahead and contact me and I can give you tidbits of advice. The script center has plenty of sample scripts to work from, as well.

There might well be an app that does this, but a little scripting work not only gives you everything you need, but you learn a ton in the process.

also, clicking that "mark as a favorite" link would be kind of nice. :)
posted by hincandenza at 10:00 PM on January 3, 2007


Response by poster: Done! I also sent you an email at the address in your profile.
posted by JParker at 10:24 PM on January 3, 2007


« Older le grand bleu   |   Is Japan the Life Changing Experience I need? Newer »
This thread is closed to new comments.