I want to suck my files!
April 14, 2012 5:31 PM   Subscribe

Unix cURL syntax: multiple URLs to multiple files (download)

I've read the man pages, I've tried an amazing number of things to varying failure. Minimal success or I wouldn't be asking.

I want:

exaple.com/red
exaple.com/blue
exaple.com/orange
exaple.com/purple
[...]

to be downloaded as red.html blue.html orange.html purple.html

I can go through the list of things I've tried, but in the end I either get just the first file, or I get all the files without extensions.

It seems idiot simple, so it seems I am less than an idiot.

OS: Lion, but would like to eventually script and cron this. This is for pulling hard copies out of my own CMS (ExpressionEngine), so nothing nefarious.
posted by cjorgensen to Computers & Internet (8 answers total) 1 user marked this as a favorite
 
I would put my list of URLs to download in a file named URLs, and do something like
cat URLs | while read url; do
  curl -o `basename "$url"` "$url"
done
That uses shell to loop over the URLs. Using basename for the file name isn't awesome. Often I prefer wget to curl, simply because it has a decent default filename.
posted by Nelson at 5:50 PM on April 14, 2012


Can't you just rename the files after they're downloaded? Have a bash script that looks something like this:
#!/bin/bash

#curl command(s) here, assuming we write to a directory called 'example.com'

for file in ./example.com/*; do
  if grep -qi html "$file" 
  then
    mv "$file" "$file.html"
  fi
done
If you need recursion you can adapt find to the task at hand, but that's the basic idea.
posted by axiom at 5:55 PM on April 14, 2012


Response by poster: I'm not opposed to something other than curl, but I am going to do this from a dynamic site. So I would be generating that file on the fly. It won't be static. Is that a way to cat the content of a URL? I could pull that info way easy.

Basically, I want to download every entry on my blog as an individual static page (from a template specifically designed for this).
posted by cjorgensen at 5:56 PM on April 14, 2012


Response by poster: axion, I can rename after downloading. If that's the easier option I'll take it. The filenames would just need .html appended.
posted by cjorgensen at 5:58 PM on April 14, 2012


If you're trying to archive your blog, that's basically what wget recursion is for. There are a variety of wget options for MacOS; I installed mine via homebrew.
posted by Nelson at 6:09 PM on April 14, 2012


You can use bash's word expansion to make this simple:
while read -r url; do curl -so "${url##*/}.html" "$url"; done <file_with_urls.txt
Change the final ; to a & and you automatically turn it into a parallel job, where all the files are downloaded simultaneously. (Don't do this if you have a ton of URLs in the file.)
posted by Rhomboid at 4:41 AM on April 15, 2012


And if instead of file_with_urls.txt you want the contents of a URL, then:
curl -s http://example.com/list_of_urls | while read -r ...
posted by Rhomboid at 4:45 AM on April 15, 2012


Use `wget`. This standard Unix utility isn't included in OSX AFAIK because of Apple's dislike for GPL-licensed software, so you'll have to install it yourself (the instructions to which are readily googled). You can supply multiple URLs or use the `-i` option to read a list of URLs from a file.
posted by faustdick at 6:37 AM on April 15, 2012 [1 favorite]


« Older Now they got the sun, an' they got the palm trees   |   If Rafters was in LA, where would it be? Newer »
This thread is closed to new comments.