Data table sources with multiple pagination
October 21, 2010 9:47 PM   Subscribe

i need to gather data from paginated tables and return them as a feed or something that can be easily pulled as a query into excel.

Here's the first example of a page I'm having problems with. I only need to save the coupons, on all pages:

http://print.coupons.com/Couponweb/Index.aspx?pid=13306&amp%3bzid=iq37&amp%3bnid=10&amp%3bbid=alk1021211002f0520f9a58012

I could copy and paste and reformat, but that's the fun in that. I'm trying to write this query to update and refresh in Excel.

They do have an RSS feed, but it only brings 100 records. Today's list is showing 238 coupons. I need the text and other info from them into cells in Excel.

Sorry for the rambling, it's just hard to put it into words. I've had luck with other sources, but this one is giving me a hard time.
posted by malcommc to Computers & Internet (4 answers total)
 
Have you looked at the source? They're making heavy use of javascript to render everything. All the coupon data is in one javascript array. Writing a script to translate a javascript array to a csv file (or whatever format) should be pretty easy, since the data is structured neatly already.
posted by demiurge at 9:55 PM on October 21, 2010


Sorry, but why are you using Excel? Excel is not a database. Don't use Excel. Excel make programmer so angry! Programmer smash!

Ok, uh, sorry. Use um, wget. Or curl. Or better yet, something like Python's urllib2. This is a programming project, and the further down the road you go with some hacked up Excel spreadsheet and an RSS feed, the more kludgy it is going to get.
posted by sophist at 11:09 PM on October 21, 2010


Sorry, maybe that was too harsh. If you already have something to parse the RSS and put it into the spreadsheet, why not just monitor the RSS feed for items as they come in?
posted by sophist at 11:17 PM on October 21, 2010


Response by poster: Thanks guys for the info so far. Sophist, my apologies for not giving more info. I'm totally with you on the Excel is not a database comment!

What I left out is that this is for a benchmarking analysis. This isn't the only site that I have to tally up, and I'd rather not copy and paste all of this weekly, let alone go from page to page and repeat.

So this is just for a report, which is why it ends up in Excel so that I can manipulate some info. Also, Excel is all I have to work with on my company computer at the moment.
posted by malcommc at 5:41 AM on October 22, 2010


« Older Unproven Bed Bugs--to treat or not to treat?   |   Management 101 Newer »
This thread is closed to new comments.