I need to find a method/product that allows me to analyze the contents of a website.
October 27, 2004 7:24 AM   Subscribe

I need to find a method/product that allows me to analyze the contents of a website. [More Inside]
posted by smcniven to Computers & Internet (7 answers total)
I hear web browsers are good for that.

(I kid! I kid! Just waiting for the MI...)
posted by mkultra at 7:27 AM on October 27, 2004

Response by poster: I've been put on a project that is looking at integrating a group of websites into an existing content management system (in-house product).

One of the first things I need to do is examine the new websites and breakdown how they are built (sections, content, images etc..) Once that's done I can make recommendations to my boss on how to proceed.

Are there any products that let you spider a website and get a full report on what it contains? Maybe a commandline function? Please note that my shop is a Windows based environment, so any solution must be able to work in that environment.

Thanks in advance
posted by smcniven at 7:29 AM on October 27, 2004

If you just want to download the entire web site as it stands, the spiderzilla plugin for Firefox (basically httrack with a nicer UI) does the job. But it sounds like you want it to come back and tell you things like "20 pages had 100Kb or more content", "there were 380 different images", etc etc. In which case, I'm no help at all.
posted by humuhumu at 7:49 AM on October 27, 2004

The big issue I see is this approach will provide you with the rendered pages as they appear to the browser. It won't tell you what the files themselves look like (do they use includes, scripting language code, etc.). Is it possible to FTP the files down (either instead or in addition to whatever else you do)?
posted by yerfatma at 8:08 AM on October 27, 2004

When you say "how they are built", are you talking about a logical structure (e.g. how pages are linked), or how things are broken out by physical directory?

If it's the former, there are a bunch of sitemapping tools that you can run your sucked-down site into. Sorry I don't have any specific app names, but I'm a Mac guy. Website management tools like Dreamweaver can do this fairly easily.

If it's the latter, any web site downloader worth its salt will create the appropriate directories for you.
posted by mkultra at 8:09 AM on October 27, 2004

Best answer: I have not seen any spidering tool that produced any output that would be useful for planning migration to a new CMS. If all you want a big list of assets and directories, a spider will do the trick. But for CMS migration, you need to know the information architecture of the existing sites, and to get that you need to do a content inventory.
posted by jjg at 8:34 AM on October 27, 2004

I was looking for something similar a little while ago. (We were re-building a client's site, and the old version had tons of cruft we had to sort through.)

This was one of the better products I found (although I wouldn't necessarily give it a ringing endorsement). I don't know if it'll do everything you need on a technical level, but it's a decent tool to survey and entire site and map it out.
posted by LairBob at 8:43 AM on October 27, 2004

« Older How do you vacation, travel and relax?   |   How often to bathe young kids? Newer »
This thread is closed to new comments.