Saving a webpage with javascript
September 15, 2010 11:26 AM   Subscribe

Is it possible to download a site that has complex javascript and run it locally on your computer?

I'm interested in learning how a particular site made some nice animations (this site).

I don't want to copy the site verbatim, but I would like to learn one or two of the techniques used here and apply them to a different type of visualization, using similar interactivity. So I wanted to save the site, and tinker with the code; however it does not work. I tried a few different options such as wget, and 'save page'.

Is there any strategy for doing this?

If I want to use any of the code directly I will email the developers to ask permission first, to clarify the ethics of this question.
posted by a womble is an active kind of sloth to Computers & Internet (5 answers total) 1 user marked this as a favorite
 
I haven't checked this site specifically to see if they have any JS which intends to prevent you from doing this, but one approach is to locally mirror their HTML page but reference their JS, etc, then gradually swap in local components of yours as you make modifications. If they aren't restricting their webserver by referrer, and their javascript isn't checking to see where it is running from, then this should work.

People may point out that this entails using their server for purposes they didn't intend, but if you keep your traffic relatively low-volume, it shouldn't hurt them.
posted by doteatop at 11:50 AM on September 15, 2010


wget is normally a great way to download a page fully. But it does not parse JavaScript, so resources loaded by JavaScript won't get downloaded. To analyze sites that are highly complex, I usually use something like httpfox to watch the files come down the wire and look at them. When I point it at the site you mention, I get this list (note: I've run an alpha sort on the list of files). So a start would be to get these files, and save them locally.
http://cephea.de/favicon.ico
http://cephea.de/gde/beta/
http://cephea.de/gde/beta/bubblechart.html
http://cephea.de/gde/beta/data_factbook.js
http://cephea.de/gde/beta/data_graph.js
http://cephea.de/gde/beta/graph.html
http://cephea.de/gde/beta/graph.js
http://cephea.de/gde/beta/graph_country2contient.js
http://cephea.de/gde/beta/graph_themes.js
http://cephea.de/gde/beta/img_deco.png
http://cephea.de/gde/beta/img_deco2.png
http://cephea.de/gde/beta/img_logo.png
http://cephea.de/gde/beta/lib_combobox.js
http://cephea.de/gde/beta/lib_factbook.js
http://cephea.de/gde/beta/lib_protovis-r3.2.js
http://cephea.de/gde/beta/lib_textFilter.js
http://cephea.de/gde/beta/main.css
http://cephea.de/gde/beta/map.js
http://cephea.de/gde/beta/map.xhtml
http://cephea.de/gde/beta/map_centers.js
http://cephea.de/gde/beta/map_countries.js
http://cephea.de/gde/beta/map_iso2name.js
http://cephea.de/gde/beta/map_name2iso.js
http://cephea.de/gde/beta/museo500.otf
http://cephea.de/gde/beta/parallel_coordinates.html
http://cephea.de/gde/beta/parallel_coordinates.js
http://cephea.de/gde/beta/settings.js
http://cephea.de/gde/beta/triangleCami.png
So that's a start, though it may be that you may need to alter these files to point at your own server. So, ideally you were actually running a local server to do this. perhaps you could search for instances of http://cephea.de/gde/ and replace with http://example.local/gde/ . Note there may be side effects to doing so, so keep backups.

Also note that the site you mention does not work in Firefox, so my list may be/is probably incomplete as it may have stalled because of incompatibilities.

I think you might be able to generate a similar listing using Chrome: Developer > Developer Tools > Timeline
posted by artlung at 11:57 AM on September 15, 2010


Try HTTrack.
posted by Bangaioh at 1:36 PM on September 15, 2010


I was about to suggest ^HTTrack^ as well. It's a pretty nice site mirrorererr, but I can't guarantee it will get it all, but it's free and easy. Worth a shot.
posted by dozo at 2:10 PM on September 15, 2010


I used it on the http://cephea.de/gde/beta/ address and it downloaded ~2 MB of files that when opened with a browser working offline behave like the original site, as far as I can tell.
posted by Bangaioh at 3:06 PM on September 15, 2010


« Older What careers might someone who's interested in...   |   oh its human relations, work and money, and health... Newer »
This thread is closed to new comments.