can someone walk me through downloading an old blog
July 22, 2023 4:35 PM Subscribe
I'm grieving and bereft and can't figure this out on my own. I just found out that a dear friend of mine and the person who introduced me to metafilter recently passed away. I saved a link to one particular post he had written years ago but had not thought to download his blog. It's really important for me to spend time with the parts of him I still have access to but I think the blog is offline now. Can someone please walk me through how I can download his writings?
Best answer: You can also see if there is a Google cache of the blog.
Use this format: https://webcache.googleusercontent.com/search?q=cache:example.com and add the url instead of example.com.
posted by gemmy at 5:53 PM on July 22, 2023 [6 favorites]
Use this format: https://webcache.googleusercontent.com/search?q=cache:example.com and add the url instead of example.com.
posted by gemmy at 5:53 PM on July 22, 2023 [6 favorites]
Best answer: Diskeater: "You can try inputting the link over at the Wayback Machine to see if it's saved over there."
Specifically, if you search the Wayback site for the home URL followed by a slash and an asterisk, it will show all archived pages under that domain:
myblog.com = myblog.com/*
blogname.blogspot.com = blogname.blogspot.com/*
etc.
Also, if you happen to subscribe to the blog in an RSS reader, many will store archived copies of posts as long as you're subscribed to the feed.
Good luck -- I'm sorry for your loss.
posted by Rhaomi at 9:40 PM on July 22, 2023 [7 favorites]
Specifically, if you search the Wayback site for the home URL followed by a slash and an asterisk, it will show all archived pages under that domain:
myblog.com = myblog.com/*
blogname.blogspot.com = blogname.blogspot.com/*
etc.
Also, if you happen to subscribe to the blog in an RSS reader, many will store archived copies of posts as long as you're subscribed to the feed.
Good luck -- I'm sorry for your loss.
posted by Rhaomi at 9:40 PM on July 22, 2023 [7 favorites]
Best answer: One option you might consider is using a company like this one https://waybackdownloads.com/ . They pull everything associated with a given web site off archive.org, package it up and send it to you.
You can make use of the archive.org pages without charge, and you could do what this type of company does yourself, however the fee is quite small and it's a lot less hassle if you do want a copy for yourself.
There are some limits to how big the site can be.
I'm not associated with them except that I've used their service and been happy with it.
posted by southof40 at 9:41 PM on July 23, 2023 [2 favorites]
You can make use of the archive.org pages without charge, and you could do what this type of company does yourself, however the fee is quite small and it's a lot less hassle if you do want a copy for yourself.
There are some limits to how big the site can be.
I'm not associated with them except that I've used their service and been happy with it.
posted by southof40 at 9:41 PM on July 23, 2023 [2 favorites]
Best answer: No need to pay any money.
You can recover sites from Wayback Machine for free using this command line tool https://github.com/hartator/wayback-machine-downloader
posted by gingerbeardman at 10:28 AM on July 24, 2023
You can recover sites from Wayback Machine for free using this command line tool https://github.com/hartator/wayback-machine-downloader
posted by gingerbeardman at 10:28 AM on July 24, 2023
Best answer: First, my condolences for your loss.
You wrote "I think the blog is offline now", which (for me anyway) leaves open the possibility that it may still be online. Have you had a chance in the past couple of days to confirm whether it is or isn't? (I'm sorry to ask for clarification; I know writing while grieving is difficult, and getting asked annoying questions on top of that is, well, annoying. I'm trying to ask gently; my sincere apologies if I'm failing.)
For sites that are still online and accessible to you, there are a number of software tools and services that can crawl the pages of a website and save them in a format you can save. People here will no doubt be happy to offer some suggestions in that case.
If the site is definitely not online, then unfortunately things are much more difficult. Here are all the options I can think of:
1. If someone sent the URLs to the Internet Archive, or the IA crawled it itself, then as others posted upthread, you may be able to find the site (or at least a subset of the pages) in the Wayback Machine.
2. Were the blog pages by any chance hosted by an institution of some kind ? If so, the Internet Archive has a service called Archive-It used by a number of institutions (mainly academic) to archive their entire sites. If your friend's blog was hosted by a participating institution, maybe there's a copy there.
3. If if's not in IA, then I would next try to contact the owner or operator(s) of the website where the blog was located. It's possible that the owners/operators have a copy in their system, or have backups. Whether they'd be permitted to give you a copy, as well as be willing and have the time/resources to do that, is another question – there may be privacy policies or other limitations that would prevent them from doing that, and let's be honest, most people wouldn't have the time or resources. But you might get lucky if you ask and explain why.
4. Next, I would try to put the URL of the blog pages in Google itself, meaning, search for the URL. (I don't mean to mansplain if you already know how to do this, but I can't tell if you do, so to be more clear: don't type the URL in your web browser's URL field, but rather go to www.google.com, and in the middle of the page where the search input field is located, paste the URL that you have, and press return, to make Google search using the URL as the search term.) This may turn up copies of the page elsewhere, or other people's pages pointing to the blog or writing about the blog. It's a long shot, yes, but sometimes I find things this way. If the full URL to a blog page doesn't turn up anything, successively remove parts from the end of the URL and try again, repeating until you can't go any further. For example, if the page is
5. Somewhat similar to #4 but using a different tack: try searching for your friend's name, or the name of their blog (if it was different), or bits of their writing.
6. If the blog was an older site that existed before 2013, and it had an RSS feed, then there's a chance there's a copy of it (or some of it) in the Google Reader archive. Gwern Branwen has a detailed explanation of searching the Google Reader archives.
7. Can you let us know at least the domain where the blog was located? If it's a popular blogging site, it may be that other people have made their own archives of the site. People here might recognize the site and know of possible other archives or copies. (Long shot, but worth mentioning.)
8. Finally, there's the Google Cache, mentioned upthread.
Good luck. I hope something works out.
posted by StrawberryPie at 8:56 PM on July 24, 2023 [1 favorite]
You wrote "I think the blog is offline now", which (for me anyway) leaves open the possibility that it may still be online. Have you had a chance in the past couple of days to confirm whether it is or isn't? (I'm sorry to ask for clarification; I know writing while grieving is difficult, and getting asked annoying questions on top of that is, well, annoying. I'm trying to ask gently; my sincere apologies if I'm failing.)
For sites that are still online and accessible to you, there are a number of software tools and services that can crawl the pages of a website and save them in a format you can save. People here will no doubt be happy to offer some suggestions in that case.
If the site is definitely not online, then unfortunately things are much more difficult. Here are all the options I can think of:
1. If someone sent the URLs to the Internet Archive, or the IA crawled it itself, then as others posted upthread, you may be able to find the site (or at least a subset of the pages) in the Wayback Machine.
2. Were the blog pages by any chance hosted by an institution of some kind ? If so, the Internet Archive has a service called Archive-It used by a number of institutions (mainly academic) to archive their entire sites. If your friend's blog was hosted by a participating institution, maybe there's a copy there.
3. If if's not in IA, then I would next try to contact the owner or operator(s) of the website where the blog was located. It's possible that the owners/operators have a copy in their system, or have backups. Whether they'd be permitted to give you a copy, as well as be willing and have the time/resources to do that, is another question – there may be privacy policies or other limitations that would prevent them from doing that, and let's be honest, most people wouldn't have the time or resources. But you might get lucky if you ask and explain why.
4. Next, I would try to put the URL of the blog pages in Google itself, meaning, search for the URL. (I don't mean to mansplain if you already know how to do this, but I can't tell if you do, so to be more clear: don't type the URL in your web browser's URL field, but rather go to www.google.com, and in the middle of the page where the search input field is located, paste the URL that you have, and press return, to make Google search using the URL as the search term.) This may turn up copies of the page elsewhere, or other people's pages pointing to the blog or writing about the blog. It's a long shot, yes, but sometimes I find things this way. If the full URL to a blog page doesn't turn up anything, successively remove parts from the end of the URL and try again, repeating until you can't go any further. For example, if the page is
https://somesite.com/blog/2023/02/some-topic.html
, and searching for that URL in Google doesn't turn up anything, try https://somesite.com/blog/2023/02/some-topic
, then https://somesite.com/blog/2023/02
, then https://somesite.com/blog/2023/
, then https://somesite.com/blog
. If you get any results, it may mean a lot of clicking and investigating, but there's a chance that someone (e.g.) blogged their own thoughts in response to a posting by your friend and (e.g.) excerpted or took a screenshot of a page from your friend's blog. 5. Somewhat similar to #4 but using a different tack: try searching for your friend's name, or the name of their blog (if it was different), or bits of their writing.
6. If the blog was an older site that existed before 2013, and it had an RSS feed, then there's a chance there's a copy of it (or some of it) in the Google Reader archive. Gwern Branwen has a detailed explanation of searching the Google Reader archives.
7. Can you let us know at least the domain where the blog was located? If it's a popular blogging site, it may be that other people have made their own archives of the site. People here might recognize the site and know of possible other archives or copies. (Long shot, but worth mentioning.)
8. Finally, there's the Google Cache, mentioned upthread.
Good luck. I hope something works out.
posted by StrawberryPie at 8:56 PM on July 24, 2023 [1 favorite]
Response by poster: I'm sorry that I haven't replied to any of the messages in this thread yet. My friend passed away in June but I only found out about it last Friday night (the 21st?). The news of his passing, the manner in which I found out has just wrecked me. I had been worried about him for the past several months but he ended our relationship in May and I was trying to respect his decision. I'm one week out and even though I've spent several hours with a counsellor, the pain and everything associated with my friend's passing is 'still-beating-heart-ripped-out-of-chest' intense.
The blogs are definitely both offline. I was able to find 2 websites belonging to him, one located through something called posterous and the other a sympatico blog (Canadian phone and internet provider linked to Bell). I found parts of both blogs through the wayback machine and while the wayback machine downloader is something I should be able to understand, it's beyond my ability to figure out currently. I barely understand what day it is right now, so please don't worry about oversimplifying anything right now. When this initial stuff lessens, I will reread all of your helpful instructions and try to download the blogs.
I've been rereading our text messages, replaying conversations and looking through photos that he took over the years. I didn't know him very long - we met online in October and he struggled through most of our time together, but he was undeniably one of the most compassionate, kind, intelligent and empathetic people I've ever had the privilege to meet. I was grateful when he encouraged me late last fall to join Metafilter because this community was an important part of his life and I appreciated his willingness to let me in here. I think that because I have so few connections to him in real life, I'm holding on tightly to the ones that I do have, even as I know he isn't his blog or his metafilter post history.
posted by Ceridwen at 6:11 PM on July 27, 2023
The blogs are definitely both offline. I was able to find 2 websites belonging to him, one located through something called posterous and the other a sympatico blog (Canadian phone and internet provider linked to Bell). I found parts of both blogs through the wayback machine and while the wayback machine downloader is something I should be able to understand, it's beyond my ability to figure out currently. I barely understand what day it is right now, so please don't worry about oversimplifying anything right now. When this initial stuff lessens, I will reread all of your helpful instructions and try to download the blogs.
I've been rereading our text messages, replaying conversations and looking through photos that he took over the years. I didn't know him very long - we met online in October and he struggled through most of our time together, but he was undeniably one of the most compassionate, kind, intelligent and empathetic people I've ever had the privilege to meet. I was grateful when he encouraged me late last fall to join Metafilter because this community was an important part of his life and I appreciated his willingness to let me in here. I think that because I have so few connections to him in real life, I'm holding on tightly to the ones that I do have, even as I know he isn't his blog or his metafilter post history.
posted by Ceridwen at 6:11 PM on July 27, 2023
« Older What's the best theater in Manhattan to see the... | breaking the cycle of strange insomnia? Newer »
This thread is closed to new comments.
posted by Diskeater at 4:42 PM on July 22, 2023 [8 favorites]