I want to suck my files!
April 14, 2012 5:31 PM Subscribe
Unix cURL syntax: multiple URLs to multiple files (download)
I've read the man pages, I've tried an amazing number of things to varying failure. Minimal success or I wouldn't be asking.
I want:
exaple.com/red
exaple.com/blue
exaple.com/orange
exaple.com/purple
[...]
to be downloaded as red.html blue.html orange.html purple.html
I can go through the list of things I've tried, but in the end I either get just the first file, or I get all the files without extensions.
It seems idiot simple, so it seems I am less than an idiot.
OS: Lion, but would like to eventually script and cron this. This is for pulling hard copies out of my own CMS (ExpressionEngine), so nothing nefarious.
I've read the man pages, I've tried an amazing number of things to varying failure. Minimal success or I wouldn't be asking.
I want:
exaple.com/red
exaple.com/blue
exaple.com/orange
exaple.com/purple
[...]
to be downloaded as red.html blue.html orange.html purple.html
I can go through the list of things I've tried, but in the end I either get just the first file, or I get all the files without extensions.
It seems idiot simple, so it seems I am less than an idiot.
OS: Lion, but would like to eventually script and cron this. This is for pulling hard copies out of my own CMS (ExpressionEngine), so nothing nefarious.
Can't you just rename the files after they're downloaded? Have a bash script that looks something like this:
posted by axiom at 5:55 PM on April 14, 2012
#!/bin/bash #curl command(s) here, assuming we write to a directory called 'example.com' for file in ./example.com/*; do if grep -qi html "$file" then mv "$file" "$file.html" fi doneIf you need recursion you can adapt find to the task at hand, but that's the basic idea.
posted by axiom at 5:55 PM on April 14, 2012
Response by poster: I'm not opposed to something other than curl, but I am going to do this from a dynamic site. So I would be generating that file on the fly. It won't be static. Is that a way to cat the content of a URL? I could pull that info way easy.
Basically, I want to download every entry on my blog as an individual static page (from a template specifically designed for this).
posted by cjorgensen at 5:56 PM on April 14, 2012
Basically, I want to download every entry on my blog as an individual static page (from a template specifically designed for this).
posted by cjorgensen at 5:56 PM on April 14, 2012
Response by poster: axion, I can rename after downloading. If that's the easier option I'll take it. The filenames would just need .html appended.
posted by cjorgensen at 5:58 PM on April 14, 2012
posted by cjorgensen at 5:58 PM on April 14, 2012
If you're trying to archive your blog, that's basically what wget recursion is for. There are a variety of wget options for MacOS; I installed mine via homebrew.
posted by Nelson at 6:09 PM on April 14, 2012
posted by Nelson at 6:09 PM on April 14, 2012
You can use bash's word expansion to make this simple:
posted by Rhomboid at 4:41 AM on April 15, 2012
Change the final ; to a & and you automatically turn it into a parallel job, where all the files are downloaded simultaneously. (Don't do this if you have a ton of URLs in the file.)while read -r url; do curl -so "${url##*/}.html" "$url"; done <file_with_urls.txt
posted by Rhomboid at 4:41 AM on April 15, 2012
And if instead of file_with_urls.txt you want the contents of a URL, then:
posted by Rhomboid at 4:45 AM on April 15, 2012curl -s http://example.com/list_of_urls | while read -r ...
Use `wget`. This standard Unix utility isn't included in OSX AFAIK because of Apple's dislike for GPL-licensed software, so you'll have to install it yourself (the instructions to which are readily googled). You can supply multiple URLs or use the `-i` option to read a list of URLs from a file.
posted by faustdick at 6:37 AM on April 15, 2012 [1 favorite]
posted by faustdick at 6:37 AM on April 15, 2012 [1 favorite]
« Older Now they got the sun, an' they got the palm trees | If Rafters was in LA, where would it be? Newer »
This thread is closed to new comments.
posted by Nelson at 5:50 PM on April 14, 2012