How to comprehensively spider a site
June 5, 2008 9:42 AM
Subscribe
I have been charged with doing a full audit of my company's "web portal solution". This involves me going through the hundreds of pages and essentially developing an incredibly detailed sitemap showing where all pages link back and forth to. Please help me do this efficiently and accurately - I want to impress.
I will add that this "Web portal solution" is indeed online, however it is password protected, and therefore I have not been able to find a web service that can automate this task. The ideal solution would create a document that has a tree-type structure, or maybe flowchart layout detailing what children URLS branch off of other parent URLS.
It gets tricky because there are several external links which do not need to be followed, and several links are just ASP pages (ie .../menupage.asp?pageid=21, .../menupage.asp?pageid=22 etc...) does this complicate things?
Is there a firefox add on that can track where I click and then create a logical, visual output of where I visited? Basically I need something to look at all the links on the page, follow those links to the sub page, then repeat this process until all links in the domain have been followed.
Any ideas?
posted by yoyoceramic to technology (4 comments total)
3 users marked this as a favorite
You could also write a program (or hire a programmer) to spider the site for you, but if you can buy a tool to do what you want for well under $100, it's going to be much cheaper to go that way.
posted by pocams at 10:15 AM on June 5, 2008