2025: How to print an entire blog?
January 10, 2025 1:25 PM Subscribe
I blogged, approximately one million years ago, via Blogspot. There are about 500 total posts and most of it still seems to be there. Is there an easy or automatic way to print it all out? Or to compile it into a digital doc? Basically, I'd love to save it in some fashion but not so much that I want to spend a ton of time or energy on it. I am, however, willing to pay money.
This is probably not in *your* time-and-energy budget, but it's money-free: scraping a defined site is an excellent practice project for programming, e.g. using BeautifulSoup in Python. Maybe that will do another reader some good.
posted by clew at 1:55 PM on January 10
posted by clew at 1:55 PM on January 10
I also blogspot and have captured about 4000 posts in blocks of 100. Blogspot pages are 7 posts long. I think it takes me about 40 mins to cut & paste (100 / 7) blocks into Libre Office and do a rough edit to remove footers etc. maybe as much again to get it camera ready. It's a bit ploddy but not taxing. I'm glad I did it in easy chunks. Because I anticipate the plug being pulled at short notice any time soon. I'd say budget for 10 hours paid labor.
posted by BobTheScientist at 2:12 PM on January 10
posted by BobTheScientist at 2:12 PM on January 10
Best answer: I think I have done this with BlogSpot's "Export Blog" button, followed by manually copy-pasting and reviewing each page.
I see there are some services offering to do this nowadays:
BlogBooker.com
IntoRealPages.com
posted by Phssthpok at 2:34 PM on January 10
I see there are some services offering to do this nowadays:
BlogBooker.com
IntoRealPages.com
posted by Phssthpok at 2:34 PM on January 10
A LLM would be a lost cost easy way to do this.
posted by MisantropicPainforest at 2:51 PM on January 10
posted by MisantropicPainforest at 2:51 PM on January 10
This is the kind of thing I enjoy noodling on if you are comfortable sharing a link.
posted by phil at 4:02 PM on January 10
posted by phil at 4:02 PM on January 10
This will neither convert it to a printable version, or single digital doc, but in case you wanted to preserve it close to it's current form (as a website), but preserve it offline, I have used httrack to achieve this in the past, with good results.
It is a command line tool specifically for preserving websites offline. Despite the basic/outdated website for it, the tool itself is powerful and has a lot of options. It does take some tweaking with the parameters to get the right results, so might be an option for someone that you pay to use (as you mention not wanting to spend energy on it yourself).
The expected result would be a folder in your hard drive that has an html file for each of the crawled pages, and it also captures all the necessary assets (images, javascripts, fonts) so that you can then browse the offline files in your browser and use them as if it was a live website.
In my experience the results would not be 100% faithful to the original (depending on what technologies/scripts are used on the site), but with a bit of tweaking, you can get pretty close, with just a few missing images and maybe the appearance of some styles being off.
posted by Gomez_in_the_South at 7:14 PM on January 10 [2 favorites]
It is a command line tool specifically for preserving websites offline. Despite the basic/outdated website for it, the tool itself is powerful and has a lot of options. It does take some tweaking with the parameters to get the right results, so might be an option for someone that you pay to use (as you mention not wanting to spend energy on it yourself).
The expected result would be a folder in your hard drive that has an html file for each of the crawled pages, and it also captures all the necessary assets (images, javascripts, fonts) so that you can then browse the offline files in your browser and use them as if it was a live website.
In my experience the results would not be 100% faithful to the original (depending on what technologies/scripts are used on the site), but with a bit of tweaking, you can get pretty close, with just a few missing images and maybe the appearance of some styles being off.
posted by Gomez_in_the_South at 7:14 PM on January 10 [2 favorites]
I can't speak to the printing itself but you should be able to use Google Takeout to download the whole blog. You should not need to scrape it. If it doesn't work please memail me.
posted by potrzebie at 9:18 PM on January 10
posted by potrzebie at 9:18 PM on January 10
oh! then Phssthpok probably has the easiest answer. Can you find an "Export Blog" button?
posted by clew at 11:51 AM on January 11
posted by clew at 11:51 AM on January 11
Response by poster: Hmmm...I do not. Any clues, clew, about where to find that? I just ran through everything I can see in what seems to be the control center/panels for it.
posted by BlahLaLa at 12:26 PM on January 11
posted by BlahLaLa at 12:26 PM on January 11
Response by poster: Whoops, hang on - I think I found it and am currently downloading it. Let's see what that gets me.
posted by BlahLaLa at 12:28 PM on January 11
posted by BlahLaLa at 12:28 PM on January 11
Response by poster: Okay, that didn't work. I got a long .xml document which is full of mush.
posted by BlahLaLa at 12:29 PM on January 11
posted by BlahLaLa at 12:29 PM on January 11
Probably the services Phssthpok linked have reverse engineered the blogspot xml and can extract the content. It's possible that one of the crawlers, like httrack, will be easier for you.
(If you search for one of your unusual sentences in the xml, is it there? it's sort of reasonable for an export into one file to have a huge amount of code to represent the page-linky-clicky parts of a website, surrounding a ha'porth of text-for-humans. It's also a known annoying possibility that the exporter encodes the text-for-humans "for compression" but really to keep people from leaving.)
posted by clew at 1:56 PM on January 11 [1 favorite]
(If you search for one of your unusual sentences in the xml, is it there? it's sort of reasonable for an export into one file to have a huge amount of code to represent the page-linky-clicky parts of a website, surrounding a ha'porth of text-for-humans. It's also a known annoying possibility that the exporter encodes the text-for-humans "for compression" but really to keep people from leaving.)
posted by clew at 1:56 PM on January 11 [1 favorite]
Response by poster: Thank you, clew. At this point I will def. look at one of the paid services because I'm already bumping up against how much energy I want to put into this.
posted by BlahLaLa at 1:57 PM on January 11 [1 favorite]
posted by BlahLaLa at 1:57 PM on January 11 [1 favorite]
You are not logged in, either login or create an account to post comments
It worked pretty well in that it did capture all the text but images were hit or miss (but that's the case on the site I downloaded anyway and I mostly wanted the text). It did duplicate a lot of pages but that could just be because of how my site was structured (it was a former Wordpress site that was made static).
It doesn't compile everything into one file, though, but separate folders (although all those folders are in one folder) so that may be a dealbreaker for you. But once again, that could just be how my site was structured.
posted by edencosmic at 1:36 PM on January 10