Can I pull content from another website to display on ours?
April 4, 2013 10:44 AM   Subscribe

We have a Joomla site that serves as a portal for our community, and would like to display some updated content from another website's fishing report.

We have permission to do this, but so far not the means. The other site appears to be built and managed with either Dreamweaver or MS Expression web, and the content lies in an "editable region". Is there a way to grab this automatically and display on our site? Here's the fish report.

I'm not familiar with dynamic web templates, but I'm guessing they're less dynamic than the title implies. So, is there a way to do this without making significant changes of their end?
posted by klinefelter to Computers & Internet (4 answers total) 2 users marked this as a favorite
 
Short answer: No.
Long Answer: Yes.

The fish report appears to just exist as content within the document with nothing to semantically distinguish it. Unless there's something Dreamweaver or MS Expression specific that I'm not aware of, you're going to have to write some sort of script that scrapes the content of that page and makes a best-effort of parsing out the information that you actually want.

You could probably make this easier by asking them to embed 'id' attributes so that you can better distinguish the content you're looking for. The optimal solution would be for them to provide this report in something that could be more easily parsed like XML or RSS.
posted by RonButNotStupid at 11:02 AM on April 4, 2013 [1 favorite]


Best answer: Yeah. It looks like they're editing that entire page by hand (and have included no semantic markup in it at all). There's no awesome or easy way to do this for you, since the internal structure of their site could change dramatically if their web editor presses backspace one too many times when updating the daily report.

Basically, your best bet is to do something like
<iframe src="http://www.parchersresort.net/fishreport.htm" width=600 height=600></iframe>
and just embed the entire page into your site.

Not to be harsh, but I don't think that XML and RSS are going to be in the Parchers Resort's skillset.
posted by schmod at 11:06 AM on April 4, 2013


Looking at the code for the page, it would be pretty easy to scrape the content (assuming that they don't make any major changes to the format). You would just find the span for the report and rip out the text you want. I do this sort of thing in BeautifulSoup in python but it would be similarly easy to do in any other decent screen scraping library. I don't know what screen scraping libraries/plugins exist for PHP/Joomla though.
posted by burnmp3s at 11:06 AM on April 4, 2013 [1 favorite]


Response by poster: So, seems like it might be easier to install something like Wordpress on their server and hack some code from that into their DWT, and have them use WP to manage the fish report? Then grab the rss from the blog?
posted by klinefelter at 11:35 AM on April 4, 2013


« Older Tom Petty Tickets - Reasonable Prices?   |   Moving to SSD, moving iTunes, overthinking beans Newer »
This thread is closed to new comments.