WinXP Scripting
August 8, 2004 7:15 AM   Subscribe

What's a relatively easy way to script browser actions (events?) in Windows XP? (mi)

All I want to do is automate the act of going to a website, logging in, and clicking on certain links, resulting in the download of a file that must be saved with a name that I designate in advance. And then, do it again about ten more times, choosing different files to download each time, and of course giving them different names each time. The web interface will always be exactly the same, so will the choices.

If at all possible, the browser would be the latest IE or Netscape 7.2.

Note that I am not looking for some kind of slurping tool that will download the whole site, and everything linked to it.
posted by bingo to Computers & Internet (11 answers total)
 
Response by poster: For clarity: the "certain links" are always exactly the same. The layout of the website is always exactly the same. Occasionally, there may be one more choice to click on in a list of downloads, but not very often, and that's all the variety there is.

The downloaded files are .csv files which will, in a perfect world, be handed off to an excel macro that will bring them to their actual purpose.
posted by bingo at 7:29 AM on August 8, 2004


I'm assuming that the CSV files are generated by the website?

I don't think it's possible to script that much detail, even using the windows scripting host in XP.

Depending on how the website is set up (or, even if you have access to modify the site yourself), you might be able to fudge the URL to automatically pick the options you require for each file.

If you want, email me, or post some more details.
posted by cheaily at 8:36 AM on August 8, 2004


Other than the incredibly expensive WinRunner, I can only think of Mac and Unix solutions. If you install curl, it can do a lot of this logic, especially if the site is barely changing.
posted by jragon at 8:40 AM on August 8, 2004


I did it with Python, Bingo. I'm looking to move, so I wanted to scrape the MLS.CA website for new listings every day. Turns out to be dead easy! I'm sure the code is inefficient and ugly, but it took less than two hours for me to go from zero to full-speed. I love Python!
from BeautifulSoup import BeautifulSoup
import urllib2, ClientCookie, pickle, webbrowser

def MakeSoup(pageno):
    #fetch a page and make it into delicious soup
    urlstart = 'XXXobscuredXXX'
    urlend = 'XXXobscuredXXX'

    request = urllib2.Request(urlstart+str(pageno)+urlend)

    request.add_header('Accept-charset','utf-8,*')
    request.add_header('Cookie',"LegalDisclaimer=1")

    f = ClientCookie.urlopen(request)
    response = f.read()
    f.close()

    soup.feed(response)
    return soup


def ScrapeNumbers():
    #scrape the numbers out of the soup
    numlist = []
    idlist = soup.fetch('div', {'class': 'Label'})

    for i in idlist:
        s = str(i.contents[0])
        start = s.find('MLS')
        end = s.find('')

        if start > 0:
            #got an mlsno
            mlsno = s[start+10:end].strip()
            #get a property id, too
            start = s.find('PropertyID')
            end = s.find('">MLS')
            propid = s[start+11:end].strip()
            numlist.append((mlsno,propid))
    return numlist


#########################
# load our MLS number history
try:
    mlshistory = pickle.load(open('mlslist.pickle'))
except:
    mlshistory = []

# first pass gets the page count
print "Getting page count..."
pageno = 1
soup = BeautifulSoup()
soup = MakeSoup(pageno)
# identify page count
pagelist = soup.first('span', {'class': 'PageHeader'})
s = str(pagelist.contents[0])
start = s.find('of')
end = s.find('-')
pagecount = s[start+2:end].strip()
print "There are "+str(pagecount)+" pages"

# make more soup
while int(pageno) < int(pagecount):br>
    print "Processing page "+str(pageno)
    pageno += 1
    soup = MakeSoup(pageno)

# parse out new numbers
newnumbers = []
numlist = ScrapeNumbers()
for i in numlist:
    if i not in mlshistory:
        newnumbers.append(i)

print "New Numbers:"
print newnumbers

for i in newnumbers:
    mlshistory.append(i)
    webbrowser.open('XXXobscuredXXX'+str(i[1]),1)

#save the updated MLS number history
pickle.dump(mlshistory,open('mlslist.pickle','w'))


Dunno why it's double-spacing. Makes no real difference.
posted by five fresh fish at 9:29 AM on August 8, 2004


Response by poster: fff: Thanks, but I don't know enough about coding to read that (uncommented) well enough to use it in my situation.

cheaily: Playing with the URL is an interesting idea.

The csv files are generated by the website, but I don't need the script to actually do anything with them other than download them. If the handing off to the excel macro has to be done manually and that's the worst of my problems, then I'll be fine.

Here's what happens when I do it manually: I go to the site, which I have designated as a shortcut in IE. The username/password box comes up immediately, and IE fills it in for me. I click ok, and am presented with a page full of links that offer me choices for what kind of file I want to create. I click on blah, I click on blah, and wallah, the information is displayed before me. To show that I want to download it as a csv file, I click on blah. Windows asks me if I want to open the file, or save it. I say that I want to save it. Windows asks me what name to save it under. I tell it. The deed is done. I then go back to the website and choose some more options, and do it again.

Surely, to a browser, these links all have numbers, or some other labels, that can be remembered and used to find the same links every time? I imagine (in my non-programmer mind) code that does something like this (This is me talking to the browser):

a) go to the main page
b) enter username and password, click ok
c) You will see 20 links. Click on link #14.
d) You will see 5 links. Click on link #2.
e) You will see 7 links and three buttons. Click on button #1.
f) You will get a choice of whether to open the file, or save it. Choose save.
g) You will be asked what to name it and where to put it. Call it "bingo's file #8" and put it in the folder called "bingo's automatically downloaded csv documents."
h) Return to step c), but this time start with link #15 instead of #14.
i) Repeat until you've gone through the cycle starting with links 14, 15, 16, 17, and 18.

Then, in a perfect world, the excel macro will spring into action without a human having to be there to start it. But just steps a through i would make my life a lot easier.

Thanks.
posted by bingo at 10:06 AM on August 8, 2004


Perl, HTTP::Recorder, and WWW::Mechanize will do this quite easily and nicely.
posted by nicwolff at 2:40 PM on August 8, 2004


Or since you're on Windows I should have linked Perl.
posted by nicwolff at 2:42 PM on August 8, 2004


It's not using IE but Canoo takes an XML file of events (go here, click that, download) and plays that back.
posted by holloway at 3:37 PM on August 8, 2004


Response by poster: As far as Canoo, the website won't cooperate with anything that doesn't identify itself as a recent version of IE.

nicwolff: Thanks, I guess you've shown me that for someone at my level (programming novice; all I know is some VB and bash shell stuff), this is going to take a lot of reading on my part, even if the result would be simple to acheive for someone who knows enough.

But surely then, this is a gap in the market waiting to be filled? Surely there are others like me, wanting to automate their browsing experience at a fairly simple level, but not knowing python or perl (and, in general, not needing them for my job)?
posted by bingo at 4:24 PM on August 8, 2004


Say what? Canoo's website works for me in Firefox 0.92 / WinXP -- and as you can see from Google, canoo is very popular.
posted by holloway at 8:51 PM on August 8, 2004


Response by poster: holloway, I'm not talking about the website, I'm talking about the product. I have to use either IE or Netscape for the task.
posted by bingo at 5:23 PM on August 9, 2004


« Older The name of the White Castle sandwich- slyder or...   |   What are these black diamonds? Newer »
This thread is closed to new comments.