Automatically downloading text off web pages
January 8, 2007 8:16 PM
Subscribe
There's an ancient website that's in the Wayback Machine, and I need to download a few dozen pages off of it before it disappears forever - I need the material for research purposes and it just doesn't exist anywhere else.
I could just print everything out, but could really use the electronic versions so I could index them. If I had the time (which I will make if I can't find a way to do this programatically), I could just cut and paste into a text file. But I'd really prefer to not have to resort to that for 60+ pages worth of stuff.
There used to be utilities that would do this (back in Ye Golden Age of Ye Internets) but I have no idea what to use today.
I don't need the actual page, just the text on the pages. And again, to make it clear, this is not to steal or bootleg or appropriate, but to use data that will be lost when these pages disappear forever.
Thanks for any reasonable suggestions.
posted by micawber to computers & internet (16 comments total)
1 user marked this as a favorite
posted by docgonzo at 8:20 PM on January 8, 2007