Web scraping onling banking sites?
September 7, 2006 4:28 PM   Subscribe

Is there an tool/tutorial to web scrape secure sites like Online Banking or other sites that make sure that it's you that is logging in?

I've been self learning web scraping and found it extremely useful, building proxies for my Nokia tablet, automation, through Perl & LWP.

I want to write something that will grab my bank balance, but those online banking sites seem to go through lots of hoops to make sure you are the one sitting at your computer trying to login.

I've tried deciphering all the tactics they use, javascripts, cookies, but it seems like theres more tricks they are using that I don't know about.

Is there some util that lets you analyze the actual behind the scenes events, headers, cookies, etc.. so I can just repeat them?

A low level tool would be great, but I'd be happy with a web recorder/automation program that would log me in (based on keystrokes, etc) and save the html so I could get my balance.
posted by mphuie to Computers & Internet (7 answers total) 1 user marked this as a favorite
Capture the HTTP traffic between the browser and the server as you log in. This will give you a much better feel for what redirects and cookies and the like are being used.
posted by kindall at 4:39 PM on September 7, 2006

Not a full solution (and maybe not a solution at all) but the AutoLoginJ greasemonkey script automatically logs you in to websites where Firefox has remembered the password. It works on my bank's website.
posted by joshuaconner at 4:39 PM on September 7, 2006

You could easily do something like this with Watir, which would actually drive your browser and let you cherry-pick from the DOM. If you want to see what kind of headers the site is sending, check out livehttpheaders for Firefox.
posted by subclub at 5:09 PM on September 7, 2006

This idea did cross my mind at one stage, but the idea of the banks calling the authorities over unusual traffic turned me off the idea. Just look what happened to the guy trying to donate money to the Tsunami relief effort in London.
posted by a. at 5:13 PM on September 7, 2006

I wonder if i can fit another 'idea' into that post.
posted by a. at 5:15 PM on September 7, 2006

In Perl, WWW::Mechanize was written to do exactly this.
posted by mendel at 6:22 PM on September 7, 2006

Mendel, I've fiddled around with WWW:Mechanize (its more of a playback) but I need something to record the underlying actions.

I've also looked at WWW::Proxy, which is supposed to create Mechanize scripts but isn't working for me.

Thanks for the other ideas, i'll look into those tools/extensions.
posted by mphuie at 11:13 AM on September 8, 2006

« Older 57 channels and nothing (on my G4)   |   Help identify this utensil/implement Newer »
This thread is closed to new comments.