Downloading/cloning a WordPress site
August 3, 2020 5:53 AM   Subscribe

The site is published by a large contractor, who is posting Gbs of docs related to the consultation process for a very large and potentially hazardous local infrastructure project. They are also posting notices of public meetings, feedback from public meetings, etc. From a community history/activism point of view, it would be nice to snapshot this. I'm using a Mac. Thank you!

The contractors are legally obliged to provide this information, although not necessarily in the form of a website (they can deposit the docs somewhere and we'd have to write in and request copies). We could trawl the site for pdfs etc (and use downthemall for example) but feel it would be better to preserve the publishing context, web pages, etc. I was wondering also the extent to which WordPress is a content management system, with some files only being made available 'on the fly' at the time they are requested.
posted by carter to Computers & Internet (4 answers total) 1 user marked this as a favorite
 
If you just want to make a local static snapshot of everything that's accessible by following links on some given web page, wget can do that regardless of whether the server it's pulling from makes files available 'on the fly' or not.

It won't grab stuff you have to fill in boxes and click buttons for, though.
posted by flabdablet at 6:10 AM on August 3, 2020 [6 favorites]


I have used httrack in the past to archive Wordpress sites and with a little experimentation to find the right settings it works very well.

httrack can create an internal cache which speeds up the process of updating your local copy of the website.

Be sure to read the docs about rate-limiting requests to sites you don't control.
posted by dweingart at 9:26 AM on August 3, 2020 [1 favorite]


Response by poster: Update: Thank you flabdablet! I managed to get wget installed by cut-and-pasting from tutorials. Now trying to get it to do what I want ...
posted by carter at 3:32 AM on August 4, 2020


WordPress is a pretty full-featured content management system. The place where files are typically stored is called the Media Library, which may help you if you're trying to understand its features. Important: in its natural state, all files stored in WordPress are publicly available on the web if WordPress is installed on a publicly available web server. This is true even if the file is not linked anywhere on the "front end" or public-facing website. The links will look like http://site.com/wp-content/some-date/whatever.pdf.

I would not recommend a tool like wget for this. If I'm understanding you correctly, it sounds like you want to be able to access the WordPress site locally (meaning on your personal computer) for future reference, but you don't want it to be online anymore. If this is correct, I would use something like Local by Flywheel. (Side note: looks like they recently rebranded as just "Local.")

They made this free tool to help people use their hosting service, but it works incredibly well for running WordPress sites locally w/ no technical knowledge. Here are some great instructions for how you import a pre-existing WordPress site into Local.
posted by nosila at 6:29 AM on August 4, 2020


« Older a writing dilemma   |   Mitigating Itchy Sensation in Legs While Walking Newer »
This thread is closed to new comments.