how many pages in a website?
July 15, 2009 6:38 PM   Subscribe

geeks, help! how can I (relatively quickly) get a rough idea of how many pages are in a particular website? (more)

Googling around, I found and tried this site:
which basically does what i'm looking for (its a site-map generator, which I dont care about the sitemap, but it does tell you roughly the number of pages in the site, which is what i'd like to know). However, there the free version is limited to sites that have less than 500 pages. Booh.

Any ideas what I could use for larger sites? A program or some kind of analytics tool online? Preferably free of course. Thanks!
posted by jak68 to Computers & Internet (6 answers total) 5 users marked this as a favorite
Best answer: If I google for my own site by using in the search box and then when I get the results click "more results from" the number it gives me seems pretty close to the number of pages on my site. Keep in mind that on a blog-run site that will include category and tag pages as well as archival pages and whatnot.
posted by jessamyn at 6:40 PM on July 15, 2009 [1 favorite]

if you're on a mac and/or unix machine you can use the command:

wget -r

to download said website.

then, something like

ls -R | grep html | wc -l

to list everything in the directory, grep for .html files, and count the number of results.

it's not fullproof, but it works
posted by jimmy0x52 at 6:42 PM on July 15, 2009

I've done that using Visio. Here are instructions for generating a sitemap with Visio 2002. You can find similar guides for newer versions of Visio as well.
posted by ttyn at 6:46 PM on July 15, 2009

Response by poster: thanks jess, that worked very well as a ballpark.

if anyone has any ideas for methods that can count only content pages, I'd be happy to hear those suggestions too.
posted by jak68 at 6:48 PM on July 15, 2009

The trouble you might have is that most websites, especially large ones, are dynamically generated. For all intents and purposes my own website only has one page (Default.aspx), and the content management system fills in the content of that page based on context. The days of static HTML pages are long gone.
posted by Lokheed at 8:10 PM on July 15, 2009 [1 favorite]

Best answer: Xenu's LinkSleuth. If you select "Statistics" in the preferences, you'll get a list showing the number of each type of page (HTML, image, etc.) We use it for link checking, and it's free.
posted by Joleta at 8:54 PM on July 15, 2009

« Older Many papers, or just one?   |   I'm a Python, and I'm a Mac. Newer »
This thread is closed to new comments.