Automator to download files from web that have changed since last week?
May 22, 2015 3:05 PM Subscribe
I wish to automate downloading an mp3 from a specific web directory (I have permission, and it is not password protected) each week, but only the single mp3 file that has changed since the previous week.
I run a small LP/Internet radio station and we syndicate certain shows (with permission) and each one has its own unique way of getting files to their affiliates. This one show dumps their files into a web directory (http://www.ecoshock.net/affiliates/). I can easily write an Automator workflow that will Launch Safari, go to that directory and only download files that start with ES_ and end with _Affiliates.mp3, but I'd like to only download the most recently added mp3 that matches the criteria I mentioned. I assume it could be accomplished by adding a line or two of Applescript to the Automator workflow, but I do not know how to write Applescript.
Here's my current Automator workflow:
1) Launch Application - chooses Safari
2) Get Specified URLs - points to the directory I listed above
3) Get Link URLs from Webpages - Only return URLs in the same domain as the starting page
4) Filter URLs
ALL
Name contains ES_
Name contains _Affiliates.mp3
5) Download URLs
Right now this is to a folder on Desktop, but I eventually will have it add to iTunes and add some tags, create a Smart Playlist, and then close Safari.
I ass-u-me I can swap in a line of Applescript for step 4 above or add it right after, but I am open to other options.
PS: If it was via FTP access I could also synchronize the directory using Transmit or Fetch, but that isn't an option.Even checked with the Panic developers and they confirmed.
I run a small LP/Internet radio station and we syndicate certain shows (with permission) and each one has its own unique way of getting files to their affiliates. This one show dumps their files into a web directory (http://www.ecoshock.net/affiliates/). I can easily write an Automator workflow that will Launch Safari, go to that directory and only download files that start with ES_ and end with _Affiliates.mp3, but I'd like to only download the most recently added mp3 that matches the criteria I mentioned. I assume it could be accomplished by adding a line or two of Applescript to the Automator workflow, but I do not know how to write Applescript.
Here's my current Automator workflow:
1) Launch Application - chooses Safari
2) Get Specified URLs - points to the directory I listed above
3) Get Link URLs from Webpages - Only return URLs in the same domain as the starting page
4) Filter URLs
ALL
Name contains ES_
Name contains _Affiliates.mp3
5) Download URLs
Right now this is to a folder on Desktop, but I eventually will have it add to iTunes and add some tags, create a Smart Playlist, and then close Safari.
I ass-u-me I can swap in a line of Applescript for step 4 above or add it right after, but I am open to other options.
PS: If it was via FTP access I could also synchronize the directory using Transmit or Fetch, but that isn't an option.Even checked with the Panic developers and they confirmed.
I don't know how one would get the timestamp...
Could you just download the URLs that aren't already downloaded?
posted by Monochrome at 3:38 PM on May 22, 2015
Could you just download the URLs that aren't already downloaded?
posted by Monochrome at 3:38 PM on May 22, 2015
Best answer: It looks like the filename contains the date of the show. You could use something like this:
...and wrap that in the aforementioned cron job. Schedule it to run weekly after they've dropped the MP3, the morning after or something along those lines. You may need to install wget (or curl), which entails its own set of tasks, but isn't too hairy. On reasonable Linux hosts, both of those tools should be standard issue.
posted by jquinby at 4:10 PM on May 22, 2015 [1 favorite]
wget "http://www.ecoshock.net/affiliates/ES_`date +%y%m%d`_Affiliates.mp3"
...and wrap that in the aforementioned cron job. Schedule it to run weekly after they've dropped the MP3, the morning after or something along those lines. You may need to install wget (or curl), which entails its own set of tasks, but isn't too hairy. On reasonable Linux hosts, both of those tools should be standard issue.
posted by jquinby at 4:10 PM on May 22, 2015 [1 favorite]
Ah, also - you'd want to make sure the job runs on the same day as the day the dropped it, or the output of the 'date' command would create a filename that doesn't exist on the webserver.
posted by jquinby at 4:11 PM on May 22, 2015
posted by jquinby at 4:11 PM on May 22, 2015
You don't even need a Webserver, you should be able to do it all locally. And there are multiple ways to get timestamp info if you can't just use the filename or one of the other methods already mentioned.
posted by turkeyphant at 4:17 PM on May 22, 2015
posted by turkeyphant at 4:17 PM on May 22, 2015
Response by poster: Thanks for the ideas! Chiming in to say — it it wasn't clear — I am a serious noob when it comes to much of the suggested methods. My colleague is reading this thread, and he understands the wget, but not how to "wrap it in a ... cron job"
posted by terrapin at 4:29 PM on May 22, 2015
posted by terrapin at 4:29 PM on May 22, 2015
cron is a program that runs on your Mac. You'll need to configure it from the command line, but it can be used to run scripts and whatnot on a regular schedule - once daily, every 1st and 15th of the month, once a year on New Year's Day, every 15 minutes, and so on. This page has a good introduction and explanation.
posted by jquinby at 6:22 PM on May 22, 2015
posted by jquinby at 6:22 PM on May 22, 2015
Best answer: Thanks, all. I think I was able to combine all the tips here with the features I want from Automator (add import to iTunes, and add tags). I used "run shell script" at the start of my automator forkflow with curl
and the programmer has informed me that the show is updated in that directory on Saturday evenings. So I have the set it to run Saturday late in the evening.
Will test tonight.
Thanks again!
posted by terrapin at 7:58 AM on May 23, 2015 [1 favorite]
curl http://www.ecoshock.net/affiliates/ES_`date +%y%m%d`_Affiliates.mp3 > ~/Documents/EcoShock/ES_`date +%y%m%d`_Affiliates.mp3
and the programmer has informed me that the show is updated in that directory on Saturday evenings. So I have the set it to run Saturday late in the evening.
Will test tonight.
Thanks again!
posted by terrapin at 7:58 AM on May 23, 2015 [1 favorite]
Nice. curl is fabulously useful. One suggestion - if redirecting output like that goes wonky, try using the "-o" (output) switch like so:
posted by jquinby at 8:43 AM on May 23, 2015
curl http://www.ecoshock.net/affiliates/ES_`date +%y%m%d`_Affiliates.mp3 -o "~/Documents/EcoShock/ES_`date +%y%m%d`_Affiliates.mp3"
posted by jquinby at 8:43 AM on May 23, 2015
Don't put double-quotes around the ~ part of the path, or the shell will fail to expand ~ to the pathname of your home folder. In fact there's no need to use the double-quotes at all in this instance, since the pathname contains no spaces; if it did, you could use something like
posted by flabdablet at 11:12 PM on May 23, 2015
~/"Folder Name With Spaces/Eco Shock/ES_`date +%y%m%d`_Affiliates.mp3"
posted by flabdablet at 11:12 PM on May 23, 2015
This thread is closed to new comments.
posted by turkeyphant at 3:24 PM on May 22, 2015