I want to download all files from a page, but there's a catch
February 1, 2008 7:42 AM   Subscribe

How can I download a bunch of .pdfs from a webpage all at once? The page I'm looking at is a list of .pdfs available (for a class I'm in - course materials), with a link to download each .pdf file. I've tried using the Firefox extension downthemall, but there's a catch - the links I click to download the .pdfs individually are javascript popups that look like this: javascriptf:pop('docs/12004/andes_vegetation.pdf','yes',",",'12004'); Is there a way to grab all of these at once, or am I doomed to clicking each and every link to open a new window, save as, etc.?
posted by entropic to Computers & Internet (9 answers total) 1 user marked this as a favorite
If you're using Firefox, install the WebDeveloper addon.
Then once it's up, go to your download page and on the Webdev bar above you'll see a menu that says INFORMATION.

Click that. When the menu drops down, choose VIEW LINK INFORMATION.

A new tab will pop up that will have every link listed in the order it's placed. Copy the links and voila!
posted by damiano99 at 7:54 AM on February 1, 2008

Try going to http://the.url/and/path/docs/12004 and see if you can see an index there. If so, just grab Downthemall or wget that (and various other 12004 substitutes).

If not it would be pretty trivial to hack up a script to munge the data as needed given the source document.
posted by Skorgu at 8:02 AM on February 1, 2008

Are the pdfs all located in the same directory? (i.e., docs/12004/andes_vegetation.pdf) If so, just hack the URL, navigate to that directory and use downthemall, completely bypassing the javascript crap.

For example: http://cms.csc.com/cwf/downloads/docs/pdfs/
posted by desjardins at 8:03 AM on February 1, 2008

If you have them, viewing the source and using grep, cut and wget will probably make this very simple.
posted by Cat Pie Hurts at 8:12 AM on February 1, 2008

You could use the RegEx Coach (for Windows--uses regex, but more gui than grep and easier to see what you're doing) to pull a list of files out of the source and format them in a neat list, and then use the Windows version of wget.
posted by anaelith at 8:32 AM on February 1, 2008

Response by poster: I tried finding an index of the files, but the link to get to the page that has all the javascript popups is this:


which is the exact same link to get to any of the pages of course materials for this course. I tried several ways of hacking the url with /docs/12004 but it just takes me back to the listing of .pdfs.

When I tried damiano99's suggestion, I did get a new tab with all the links, but they're still all javascript popups as described in my original question and downthemall can't see them/do anything with them.

Too bad my school uses such a shitty system of linking to documents.
posted by entropic at 8:41 AM on February 1, 2008

wget can do this. There's no pretty gui for it, but configuration isnt too hard to figure out. Try appending 'docs/12004/' to the webpage. Or contact your schools support and see if they can put these files up on FTP or something.
posted by damn dirty ape at 9:11 AM on February 1, 2008

I've had to this before - save the main html page to your desktop, then open it in a text editor and do a Find & Replace to change the javascript links to <a href=... links. Then resave the page, open it in Firefox, and use Down Them All to grab the newly created links.
posted by Gortuk at 9:23 AM on February 1, 2008

Can't you do this with the download function of Acrobat Standard?
posted by ZenMasterThis at 9:48 AM on February 1, 2008

« Older If Bloomberg can buy the Presidency, why not...   |   What is the strongest Wireless Accent Point that I... Newer »
This thread is closed to new comments.