Downloading/cloning a WordPress site
August 3, 2020 5:53 AM Subscribe
The site is published by a large contractor, who is posting Gbs of docs related to the consultation process for a very large and potentially hazardous local infrastructure project. They are also posting notices of public meetings, feedback from public meetings, etc. From a community history/activism point of view, it would be nice to snapshot this. I'm using a Mac. Thank you!
The contractors are legally obliged to provide this information, although not necessarily in the form of a website (they can deposit the docs somewhere and we'd have to write in and request copies). We could trawl the site for pdfs etc (and use downthemall for example) but feel it would be better to preserve the publishing context, web pages, etc. I was wondering also the extent to which WordPress is a content management system, with some files only being made available 'on the fly' at the time they are requested.
The contractors are legally obliged to provide this information, although not necessarily in the form of a website (they can deposit the docs somewhere and we'd have to write in and request copies). We could trawl the site for pdfs etc (and use downthemall for example) but feel it would be better to preserve the publishing context, web pages, etc. I was wondering also the extent to which WordPress is a content management system, with some files only being made available 'on the fly' at the time they are requested.
I have used httrack in the past to archive Wordpress sites and with a little experimentation to find the right settings it works very well.
httrack can create an internal cache which speeds up the process of updating your local copy of the website.
Be sure to read the docs about rate-limiting requests to sites you don't control.
posted by dweingart at 9:26 AM on August 3, 2020 [1 favorite]
httrack can create an internal cache which speeds up the process of updating your local copy of the website.
Be sure to read the docs about rate-limiting requests to sites you don't control.
posted by dweingart at 9:26 AM on August 3, 2020 [1 favorite]
Response by poster: Update: Thank you flabdablet! I managed to get wget installed by cut-and-pasting from tutorials. Now trying to get it to do what I want ...
posted by carter at 3:32 AM on August 4, 2020
posted by carter at 3:32 AM on August 4, 2020
WordPress is a pretty full-featured content management system. The place where files are typically stored is called the Media Library, which may help you if you're trying to understand its features. Important: in its natural state, all files stored in WordPress are publicly available on the web if WordPress is installed on a publicly available web server. This is true even if the file is not linked anywhere on the "front end" or public-facing website. The links will look like http://site.com/wp-content/some-date/whatever.pdf.
I would not recommend a tool like wget for this. If I'm understanding you correctly, it sounds like you want to be able to access the WordPress site locally (meaning on your personal computer) for future reference, but you don't want it to be online anymore. If this is correct, I would use something like Local by Flywheel. (Side note: looks like they recently rebranded as just "Local.")
They made this free tool to help people use their hosting service, but it works incredibly well for running WordPress sites locally w/ no technical knowledge. Here are some great instructions for how you import a pre-existing WordPress site into Local.
posted by nosila at 6:29 AM on August 4, 2020
I would not recommend a tool like wget for this. If I'm understanding you correctly, it sounds like you want to be able to access the WordPress site locally (meaning on your personal computer) for future reference, but you don't want it to be online anymore. If this is correct, I would use something like Local by Flywheel. (Side note: looks like they recently rebranded as just "Local.")
They made this free tool to help people use their hosting service, but it works incredibly well for running WordPress sites locally w/ no technical knowledge. Here are some great instructions for how you import a pre-existing WordPress site into Local.
posted by nosila at 6:29 AM on August 4, 2020
This thread is closed to new comments.
It won't grab stuff you have to fill in boxes and click buttons for, though.
posted by flabdablet at 6:10 AM on August 3, 2020 [6 favorites]