How can I automate this web browsing task?
August 29, 2008 12:49 PM   Subscribe

Please help me automate a repetitive task (log in to website, then after successful login, load some pages, then log out), in a way that can be scheduled so that it runs at two different hours each day.

Every day, I need to log on to a website. This part is important, as without logging in, I can't do the actions that I want to do.

After logging in, I need to load several specific pages. The pages that need to be loaded can be determined in one of two ways:

1. By programmatically loading one page, retrieving all links that say 'Add', and loading the pages referenced by those links
2. I can manually add the links to a list (or a Firefox bookmark, which is how I've been semi-automating this)

Once that's done, the script (or whatever) can log out.

I'm good with computers. I program for a living and am familiar with a bunch of scripting languages. The reason I'm asking for this is, where I currently work, I am expected to load a page twice a day (*exactly* when a catalogue refreshes, it's time-sensitive) and Add a list of products (it uses HTTP GET so I just need to follow links). It really only takes a couple of minutes, but I hate switching from one task to another and back.

I don't mind manually finding the links on my own time. I just don't want to be obligated to be online at EXACTLY the time that their catalogue refreshes.

Here's what I've come up with so far:

1. (ideal) manually adding the links to a firefox bookmark, then using some sort of firefox extension/applescript or something to use my saved login/password to log in, then load the bookmarks at a time I specify

I would like to stress that (1) would be ideal.

2. Use www:mechanize to retrieve the Add links, and then follow them using threaded requests.

Any other suggestions, or any tools built specifically for this task?

posted by mebibyte to Computers & Internet (15 answers total) 9 users marked this as a favorite
You might be able to do this with autoit if you can't find a way to do it all with browser extensions/addons.
posted by blind.wombat at 12:59 PM on August 29, 2008

Response by poster: Sorry! That reminds me. I work with Mac and Linux computers only. No VMs, no wine.
posted by mebibyte at 1:10 PM on August 29, 2008

Can't you do this with Applescript, Automator, or a cronjob?
posted by knowles at 1:12 PM on August 29, 2008

PushToTest is a Java-based (runs on Linux and Windows, I don't know about OSX) web application testing framework that can be used to script things like this. It's designed so that the scripts can be packaged up and run automatically by a Unix daemon, I believe.

(But actually, you ought to be able to write something to do this from scratch in any language you're familiar with, if you feel up to the challenge.)
posted by XMLicious at 1:13 PM on August 29, 2008

you can use any of the following ...




same examples of web client programming can be found here (the content is old but it should still be accurate)

hope this helps
posted by pdxpatzer at 1:24 PM on August 29, 2008

iMacros for Firefox
Automating Firefox
posted by adamrice at 1:25 PM on August 29, 2008

Whoa! Selenium looks sweet. Thanks for posting that, pdxpatzer.
posted by XMLicious at 1:27 PM on August 29, 2008

I'll 2nd Watir. It was created to do automated testing of web apps with Internet Explorer. However we've been using it at work to script automated tasks on the web
posted by austinetsu at 1:28 PM on August 29, 2008

You can do this with Applescript and Safari. There's a "do javascript" command in the Safari library, that will help you with a lot of this stuff --- filling in forms, submitting, grabbing links. I find macosxhints to be a useful resource for this kind of thing.
posted by hooray at 1:38 PM on August 29, 2008

Untested, but something like this should work...

cd $HOME/tmp/

wget --save-cookies cookies.txt --post-data 'username=foo&password=bar'

wget --load-cookies cookies.txt --recursive --max-depth 0 --delete-after

Note that the second wget will follow all links on the page, not just the ones you want. This may or may not be a problem. (Also the --domains switch might help here.)
posted by rjt at 1:47 PM on August 29, 2008

sorry, that should be --max-depth 1
posted by rjt at 1:57 PM on August 29, 2008

Seconding a curl or wget script on a cron job
posted by wongcorgi at 2:42 PM on August 29, 2008

Ruby's WWW:Mechanize is well documented and easy to use. I don't see why you couldn't write a short script to do what you want and run it from cron.
posted by PueExMachina at 5:08 PM on August 29, 2008

I've done something similar to this using a combination of Aurora and Automator. Basically, each morning, Aurora wakes the mac, plays a song from an iTunes playlist, then opens safari, loads the page for NPR's The writer's Almanac (I'm in the UK, can't get it on the radio), finds the link on the page that has 'play' in it and clicks it.

This works because as well as hooking into iTunes, Aurora has a field that you can drag and drop any file or application to in order to launch it at a certain time. And you can save Automator scripts as Application files.
posted by Happy Dave at 5:21 PM on August 29, 2008 [1 favorite]

I'd use wget, bash and cron.
posted by flabdablet at 5:50 AM on August 30, 2008

« Older ex Neti Pot Devotee   |   Brain oscillations? Why? Newer »
This thread is closed to new comments.