Easiest and fastest way to save 150+ Wordpress articles?
May 8, 2019 10:01 AM Subscribe
What is my best option to save 155 Wordpress-published columns in the fastest way possible, preserving the formatting of the original page?
So for three years, I wrote a weekly books column for an online publication that uses Wordpress. Unfortunately, the company decided to seriously downsize its operations last week, and I lost my freelance gig (my editor, and most of the other people I know who worked for the site also lost their jobs, so I'm lucky in that it was a very part-time gig I mostly did for fun). I cannot assume that the website will stay open in its current form for long after the massive reorganization.
I have all my columns saved as Word documents, but would like some evidence of them in their online, formatted form. I started to save them one by one as webpages, but I'm not sure if this is what I really want - it seems to come along with a folder that is essentially the webpage in pieces (individual files for each of the photos used, etc.), and I already have these photos; I just want it to look like a snapshot of the webpage. I have tried saving the page as a PDF, but the sidebar on the website winds up covering half of my text, so the PDF is basically unreadable. Is there a better way to do this? Thanks!
So for three years, I wrote a weekly books column for an online publication that uses Wordpress. Unfortunately, the company decided to seriously downsize its operations last week, and I lost my freelance gig (my editor, and most of the other people I know who worked for the site also lost their jobs, so I'm lucky in that it was a very part-time gig I mostly did for fun). I cannot assume that the website will stay open in its current form for long after the massive reorganization.
I have all my columns saved as Word documents, but would like some evidence of them in their online, formatted form. I started to save them one by one as webpages, but I'm not sure if this is what I really want - it seems to come along with a folder that is essentially the webpage in pieces (individual files for each of the photos used, etc.), and I already have these photos; I just want it to look like a snapshot of the webpage. I have tried saving the page as a PDF, but the sidebar on the website winds up covering half of my text, so the PDF is basically unreadable. Is there a better way to do this? Thanks!
A screenshot tool like qSnap is what I'd recommend.
posted by Medieval Maven at 10:30 AM on May 8, 2019
posted by Medieval Maven at 10:30 AM on May 8, 2019
This tutorial may help.
Basically you are looking for "scrolling capture"
posted by srboisvert at 10:42 AM on May 8, 2019
Basically you are looking for "scrolling capture"
posted by srboisvert at 10:42 AM on May 8, 2019
Best answer: If you use Firefox, the built-in "Take a Screenshot" helper will save the entire page contents as an image.
Alternatively, if you can find all of the URLs and have some knowledge of command-line things, WeasyPrint makes a much better job of printing to a file than the built-in browser's print function. It can produce PDF or PNG.
(memail me if you want help batch-capturing these. Unless the articles are paywalled or ridiculously hard to find, this is likely a few minutes of scripting to grab everything.)
posted by scruss at 11:14 AM on May 8, 2019 [2 favorites]
Alternatively, if you can find all of the URLs and have some knowledge of command-line things, WeasyPrint makes a much better job of printing to a file than the built-in browser's print function. It can produce PDF or PNG.
(memail me if you want help batch-capturing these. Unless the articles are paywalled or ridiculously hard to find, this is likely a few minutes of scripting to grab everything.)
posted by scruss at 11:14 AM on May 8, 2019 [2 favorites]
the sidebar on the website winds up covering half of my text, so the PDF is basically unreadable.
If there is anything like a checkbox that says "Simplify" in your print dialogue, clicking it might remove the sidebar. It is not a guarantee, but worth a try.
posted by soelo at 11:25 AM on May 8, 2019
If there is anything like a checkbox that says "Simplify" in your print dialogue, clicking it might remove the sidebar. It is not a guarantee, but worth a try.
posted by soelo at 11:25 AM on May 8, 2019
Do you still have admin access to the WP backend? You can do a Post export.
posted by humboldt32 at 12:34 PM on May 8, 2019
posted by humboldt32 at 12:34 PM on May 8, 2019
Mirror the whole site with WinHTTrack.
You might also check to see if the articles you wrote are stored in the Internet Archive (change the URL to the online publication URL).
posted by gregr at 1:01 PM on May 8, 2019 [1 favorite]
You might also check to see if the articles you wrote are stored in the Internet Archive (change the URL to the online publication URL).
posted by gregr at 1:01 PM on May 8, 2019 [1 favorite]
Best answer: All of the OP's articles that are indexed by Google and Bing are now on the Internet Archive. As long as the publisher doesn't do something evil with a robots.txt file, they should be there forever. Using a browser extension such as Wayback Machine or prefixing the url with https://web.archive.org/save/ are ways to store pages in the archive.
As for saving the articles, the site had some really creatively broken CSS that caused some issues for WeasyPrint. Should anyone else have to get articles from a WP site that they don't have admin access to, this at least might produce something sensible:
posted by scruss at 12:20 PM on May 10, 2019
As for saving the articles, the site had some really creatively broken CSS that caused some issues for WeasyPrint. Should anyone else have to get articles from a WP site that they don't have admin access to, this at least might produce something sensible:
weasyprint -f pdf url out.pdf -s <(echo '.article-nav { visibility: hidden; !important } .nav { visibility: hidden; !important } @page {size: Letter; margin: 2.5cm; }')
posted by scruss at 12:20 PM on May 10, 2019
This thread is closed to new comments.
posted by Wild_Eep at 10:11 AM on May 8, 2019