I want to code a really basic screen-scraper to make a tedious work task slightly less tedious. I'll probably be doing it in Python. Some assistance required!
This is the NYS Corporation database search site. You type in a name, it gives you a few matches, you pick one and that leads to
this, for example.
Here's what I need a program to do:
- Query the site
- Allow me to pick the correct entry
- Scrape the info in a bunch of those fields
- Covert the formatting (title case instead of caps, "St." instead of "Street," etc.)
- Spit the data into pre-determined places in a boilerplate text document.
- Save each instance of this search as an individual file.
OR
- Email the text to a specific address
Right now I'm doing this by copying and pasting into a Word file and manually emailing each text block. I'm pretty fast, but I know my time is better spent figuring out a way to get the computer to do it. Plus, it'll impress the hell out of my boss, as he has to do this from time to time and finds it equally annoying.
My friend suggested I do this in Python using mechanize and BeautifulSoup. Fair enough. Anyone have any opinions or counter-opinions on that? Getting this done painlessly is top priority.
Speaking of which, if I
am to use Python, is there a guide out there that will hold my hand through getting Python actually running on a Windows Vista box? There's all sorts of different versions, different packages, different implementations and just a world of stuff I have no interest in picking apart. I just want an idiot-proof guide that lands me in front of a
reasonably smart IDE with keyword highlighting, bracket checking, indenting and so on.
Background: I have a few years of programming classes (a good ten years) behind me. I coded something similar to what I need now for my high school Visual Basic project, so I'm pretty sure I'm capable of it now. I also mess around with Arduino at home, so it would be pretty sweet to have practice for a hobby at work.
posted by griphus at 1:36 PM on May 14, 2012