Help creating a bot
December 24, 2007 2:57 PM   Subscribe

I need help creating a web bot (for non-malicious purposes). Are there any tools out there to facilitate this?

I need to retrieve some datasets from a web server. To get the datasets, it involves making a form request, getting assigned a random job number, waiting for the job number to finish (usually 5-15 minutes), and then downloading the datasets from the links presented on the job webpage (i.e.[jobnumber]).

The access to this server is unlimited (a public government server) and not against its usage policy and my planned usage is very light, so let's skip that issue. The problem is that the interface is clumsy and it will be very tedious to sit there and submit each job for the data I need. I want to automate this and let the computer worry about it. Is there a good scripting language that can help me with this? I guess Perl is the obvious solution but I am not very well versed in it and I've seen very little in the way of short, elegant Perl code.

I'm very good with DOS scripts and Delphi (which I will probably resort to if there's no simple method).
posted by chips ahoy to Computers & Internet (11 answers total) 3 users marked this as a favorite
posted by zengargoyle at 3:00 PM on December 24, 2007 [1 favorite]

Seconding WWW::Mechanize. Or, if you don't feel like dealing with the overhead of learning a language you may not know, go post a job at one of the many freelance coding sites out there -- this is probably something you can get done for about $20.
posted by Doofus Magoo at 3:14 PM on December 24, 2007

I'm clueless about using scripting languages, but I've used dapper occasionally, with some success. I don't know if it will accommodate the time delay part, but it might be worth checking out.
posted by Slacker Manager at 4:31 PM on December 24, 2007

I like Python for this kind of work, and BeautifulSoup is one of my tools of choice.
posted by migurski at 4:46 PM on December 24, 2007

I barely know perl, but was able to write a screen scrapping app with it. The app eventually had on the order of ~100 users. There's a way to make it record your interaction with a website, then dump that as perl script.

Another option would be to use Firefox and Greasemonkey, and write the functionality in Javascript. This will let you use Xpath (though you can probably do Xpath in perl too). If you want to get Greasemonkey to save to disk, there's a simple way to hack Greasmonkey such that the security policy that prevents saving to disk is bypassed -- it's a couple of lines of modification, but I forget now eactly what I did to do that.
posted by orthogonality at 4:55 PM on December 24, 2007

Thirding WWW::Mechanize. It also takes care of various bot-specific functionality (like limiting the rate at which you make requests, if you want), which is nice.
posted by hattifattener at 6:36 PM on December 24, 2007

Damnit, my initial answer should have been "I barely know perl, but was able to write a screen scrapping app with WWW::Mechanize.
posted by orthogonality at 7:56 PM on December 24, 2007

Good to see all the recommendations for mech. Also WWW::Mechanize::Shell gives you the fast track via a command line interface, or HTTP::Recorder by way of setting up a local proxy server (doesn't always work).
posted by singingfish at 11:36 PM on December 24, 2007

There's a python version of mechanize as well that comes well recommended.
posted by ph00dz at 12:28 AM on December 25, 2007

I can vouch for the Ruby version of Mechanize. Ruby's syntax (or Python, for that matter) is easier to pick up than Perl.
posted by sixacross at 2:47 AM on December 25, 2007

AutoIt is extremely easy to learn and, while it was originally intended as a Windows-based automation package, it has been extended to include webpage elements and navigation, so you can automatically load a webpage, fill and submit forms, parse the html, etc. Windows only, though.
posted by MarkLark at 5:14 AM on December 25, 2007

« Older Week-long Bicycle Tours In or Near New York State?   |   Please save me from "Excel Hell" Newer »
This thread is closed to new comments.