Good way to dump hundreds of podcasts into one RSS file?
February 2, 2015 5:46 AM

I'm making a Swift iOS app that lets the user search years of daily podcasts from a certain Wordpress site. Right now it is working off an RSS feed that only goes back several days but I want to be able to dump everything into one giant file. Any ideas?

* Current working idea is to make a sever-side cron daemon that runs through every "page" of Wordpress RSS to compile one large, concatenated text file...

* ...since it seems as if it would cause major problems to just change Wordpress's RSS export number to 9,999, making everything hang due to the long process, and anyway we still want to keep the short version of the RSS feed for other purposes...

* Have also thought about somehow having the app itself do the work of paging through all the Wordpress RSS, compiling one big master graph of everything it has found, but wondering if this would be more work pain than just going the Python daemon route.

Any suggestions or tools appreciated.
posted by johngoren to Computers & Internet (6 answers total) 3 users marked this as a favorite
If you have access to the Wordpress site and backend, I would set up a server-side script to do the actual searching and return RSS for only the podcasts that match the search. You wouldn't want to pull down a whole bunch of data to your app just to do one search, and you definitely wouldn't want to pull down page after page of RSS. Doing the work on the server would be kindest to your users' battery life and your app's responsiveness.
posted by pocams at 5:58 AM on February 2, 2015


Yeah, forgot to add, page after page of RSS calls from the phone app seems like a waste of battery power.
posted by johngoren at 6:00 AM on February 2, 2015


Maybe I've missed something, but it seems like you'd want to parse the RSS files as you go and put the results in something like a SQLite database.
posted by wotsac at 7:38 AM on February 2, 2015


A giant text file would be fairly easy to make though - I forget what tag encloses the items in an RSS feed, but you find the end of the start tag in the new rss file, and put everything after that in the giant text file, starting at the location of the start of the closing tag in that file. But don't do that.
posted by wotsac at 7:42 AM on February 2, 2015


You can also try the JSON API available for WP, which is only one step from actually querying the database, which might be even better. I wouldn't use WP's own RSS functions for this kind of task if you want to do it repeatedly as they often have a lot of overload attached to them.

If you take the Crystal Lake route you will love SwiftyJSON.
posted by KMB at 8:06 AM on February 2, 2015


Off the cuff thoughts...

You can use the paged url parameter on an RSS feed to get page by page of posts in WordPress. See the answers on Pagination of RSS2 feed.

Concatenating the pages of RSS into one mega file feels like a brittle way to do it, I'd be thinking more along the lines of native database mechanisms.

You're making essentially a kind of spider, in that case spidering lots of different RSS feeds with lots of different clients out there seems like a recipe for DOS some servers. I'd be considering a proxy - you run your own server that in turn spiders content as needed and can be a reliable connection.
posted by artlung at 7:21 PM on February 2, 2015


« Older Help me diagnose my scooter problem   |   Not quite my tempo. Newer »
This thread is closed to new comments.