Good way to dump hundreds of podcasts into one RSS file?
February 2, 2015 5:46 AM
I'm making a Swift iOS app that lets the user search years of daily podcasts from a certain Wordpress site. Right now it is working off an RSS feed that only goes back several days but I want to be able to dump everything into one giant file. Any ideas?
* Current working idea is to make a sever-side cron daemon that runs through every "page" of Wordpress RSS to compile one large, concatenated text file...
* ...since it seems as if it would cause major problems to just change Wordpress's RSS export number to 9,999, making everything hang due to the long process, and anyway we still want to keep the short version of the RSS feed for other purposes...
* Have also thought about somehow having the app itself do the work of paging through all the Wordpress RSS, compiling one big master graph of everything it has found, but wondering if this would be more work pain than just going the Python daemon route.
Any suggestions or tools appreciated.
* Current working idea is to make a sever-side cron daemon that runs through every "page" of Wordpress RSS to compile one large, concatenated text file...
* ...since it seems as if it would cause major problems to just change Wordpress's RSS export number to 9,999, making everything hang due to the long process, and anyway we still want to keep the short version of the RSS feed for other purposes...
* Have also thought about somehow having the app itself do the work of paging through all the Wordpress RSS, compiling one big master graph of everything it has found, but wondering if this would be more work pain than just going the Python daemon route.
Any suggestions or tools appreciated.
Yeah, forgot to add, page after page of RSS calls from the phone app seems like a waste of battery power.
posted by johngoren at 6:00 AM on February 2, 2015
posted by johngoren at 6:00 AM on February 2, 2015
Maybe I've missed something, but it seems like you'd want to parse the RSS files as you go and put the results in something like a SQLite database.
posted by wotsac at 7:38 AM on February 2, 2015
posted by wotsac at 7:38 AM on February 2, 2015
A giant text file would be fairly easy to make though - I forget what tag encloses the items in an RSS feed, but you find the end of the start tag in the new rss file, and put everything after that in the giant text file, starting at the location of the start of the closing tag in that file. But don't do that.
posted by wotsac at 7:42 AM on February 2, 2015
posted by wotsac at 7:42 AM on February 2, 2015
You can also try the JSON API available for WP, which is only one step from actually querying the database, which might be even better. I wouldn't use WP's own RSS functions for this kind of task if you want to do it repeatedly as they often have a lot of overload attached to them.
If you take the Crystal Lake route you will love SwiftyJSON.
posted by KMB at 8:06 AM on February 2, 2015
If you take the Crystal Lake route you will love SwiftyJSON.
posted by KMB at 8:06 AM on February 2, 2015
Off the cuff thoughts...
You can use the
Concatenating the pages of RSS into one mega file feels like a brittle way to do it, I'd be thinking more along the lines of native database mechanisms.
You're making essentially a kind of spider, in that case spidering lots of different RSS feeds with lots of different clients out there seems like a recipe for DOS some servers. I'd be considering a proxy - you run your own server that in turn spiders content as needed and can be a reliable connection.
posted by artlung at 7:21 PM on February 2, 2015
You can use the
paged
url parameter on an RSS feed to get page by page of posts in WordPress. See the answers on Pagination of RSS2 feed. Concatenating the pages of RSS into one mega file feels like a brittle way to do it, I'd be thinking more along the lines of native database mechanisms.
You're making essentially a kind of spider, in that case spidering lots of different RSS feeds with lots of different clients out there seems like a recipe for DOS some servers. I'd be considering a proxy - you run your own server that in turn spiders content as needed and can be a reliable connection.
posted by artlung at 7:21 PM on February 2, 2015
This thread is closed to new comments.
posted by pocams at 5:58 AM on February 2, 2015