Join 3,440 readers in helping fund MetaFilter (Hide)


How best to backup/move a strange old blog
July 2, 2014 3:01 PM   Subscribe

As part of hoovering up old blogs and content before more disappear, I'm stuck with the most difficult blog in terms of format; this one (Digital Sands). Is there an efficient way of saving both the posts and the comments to make publicly available elsewhere.

I am gradually moving many years of blog posts, blogs, diary entries and other materials into one www.blogger.com blog. My problem is this particular old blog, set up and hosted by the BBC as part of their web expansion project in the last decade. At some point, as these blogs have been archived for years, they'll be wiped one day.

Is there a neat way of moving all of the material over to either my blogger.com blog I'm building, or something else I can control? Cutting and pasting the posts, and moving the pictures, into the blogger.com blog could be done though this introduces inevitable manual errors. The number of comments is a magnitude larger, and I can't see how I can move those into blogger.com comment format. Or should I create a new blog (wordpress?) for importing?

An additional consideration is that, though the posts and the pictures within are all mine, the comments were made by other people (though no-one used their real name). I'm not sure if there is any legal complication over ownership here.

I've tried to contact the BBC but I didn't get a reply; it's likely, from other knowledge of their web projects, that the staff who worked on this left the BBC a long time ago.
posted by Wordshore to Computers & Internet (4 answers total) 4 users marked this as a favorite
 
Have you tried wget?
posted by pompomtom at 3:08 PM on July 2


I've successfully used this to scrape a beloved old website before it went dark, so hopefully this will work for you. I just made a local directory and it saved all of the HTML pages and images. Comments would be saved as part of the HTML page, iirc. Formatting might be funky depending on custom scripts, CSS, etc. that would have been used on the host domain.

It's been a few years since I last used it, so I'm not sure what has changed since then. I'm also not sure if it will work on secure sites or not, but it's worth a shot...?

GNU WGet


There is probably an easier method out there, though. Hopefully someone with more web-fu than me can weigh in on this.
posted by cardinality at 3:09 PM on July 2


HTTrack is a venerable but still-maintained open-source application for spidering and downloading static copies of a web site with a couple of GUI options as well as command-line.
posted by XMLicious at 3:51 PM on July 2


If you are trying to parse the content to remove the nav and such, I've heard good things about beautiful soup: http://www.crummy.com/software/BeautifulSoup/

I don't write python, so I have no practical experience with it tho'.
posted by chocolate_butch at 6:13 AM on July 4


« Older My car roof is rusting extensi...   |  I live in the SF Bay area and ... Newer »

You are not logged in, either login or create an account to post comments