<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter questions tagged with webcrawling</title>
      <link>http://ask.metafilter.com/tags/webcrawling</link>
      <description>Questions tagged with 'webcrawling' at Ask MetaFilter.</description>
	  <pubDate>Fri, 30 Jan 2009 09:06:08 -0800</pubDate> <lastBuildDate>Fri, 30 Jan 2009 09:06:08 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>How do we speel check an entire web sight?</title>
	<link>http://ask.metafilter.com/112967/How%2Ddo%2Dwe%2Dspeel%2Dcheck%2Dan%2Dentire%2Dweb%2Dsight</link>	
	<description>How do we spell check (and link check) an entire web site? So I look after a number of sites and we&apos;re starting to consider how we spell check an entire web site. The site uses a content management system and a number of other applications that deliver content to the user. We have things in place to spell check in the editor window of the CMS, but we&apos;re looking at a way to crawl the site and report errors. If the toolset also did link checking and validation, we&apos;d be even happier. Got any recommendations?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.112967</guid>
	<pubDate>Fri, 30 Jan 2009 09:06:08 -0800</pubDate>
	<category>spellcheck</category>
	<category>spelling</category>
	<category>validation</category>
	<category>webcrawling</category>
	<category>websites</category>
	<dc:creator>advicepig</dc:creator>
	</item>
	<item>
	<title>How to comprehensively spider a site</title>
	<link>http://ask.metafilter.com/93305/How%2Dto%2Dcomprehensively%2Dspider%2Da%2Dsite</link>	
	<description>I have been charged with doing a full audit of my company&apos;s &quot;web portal solution&quot;.  This involves me going through the hundreds of pages and essentially developing an incredibly detailed sitemap showing where all pages link back and forth to.  Please help me do this efficiently and accurately - I want to impress. I will add that this &quot;Web portal solution&quot; is indeed online, however it is  password protected, and therefore I have not been able to find a web service that can automate this task.  The ideal solution would create a document that has a tree-type structure, or maybe flowchart layout detailing what children URLS branch off of other parent URLS.&lt;br&gt;
&lt;br&gt;
It gets tricky because there are several external links which do not need to be followed, and several links are just ASP pages (ie .../menupage.asp?pageid=21, .../menupage.asp?pageid=22 etc...)  does this complicate things?&lt;br&gt;
&lt;br&gt;
Is there a firefox add on that can track where I click and then create a logical, visual output of where I visited?  Basically I need something to look at all the links on the page, follow those links to the sub page, then repeat this process until all links in the domain have been followed.&lt;br&gt;
&lt;br&gt;
Any ideas?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.93305</guid>
	<pubDate>Thu, 05 Jun 2008 09:42:41 -0800</pubDate>
	<category>spidering</category>
	<category>webcrawling</category>
	<dc:creator>yoyoceramic</dc:creator>
	</item>
	<item>
	<title>What to do with spare download bandwidth?</title>
	<link>http://ask.metafilter.com/92949/What%2Dto%2Ddo%2Dwith%2Dspare%2Ddownload%2Dbandwidth</link>	
	<description>Is there anything meaningful I can do with 10Mbps worth of spare download bandwidth? My ISP recently has an offer on a 10Mbps connection, under a 2-year contract. I&apos;ve been thinking about getting a separate connection, mainly because the price is good and because I want a good upload bandwidth for torrent-seeding (don&apos;t ask).&lt;br&gt;
&lt;br&gt;
That plan comes with 1Mbps upload bandwidth which, as mentioned, will be largely used for torrent-seeding. That leaves the 10Mbps download bandwidth largely unemployed. I will still be downloading every so often, but hardly enough to put more than the occasional dent in it.&lt;br&gt;
&lt;br&gt;
This is where I need to pick your brains. Are there any meaningful projects out there requiring spare bandwidth, and if so, how can I contribute? Is there are other way I can put it to good use?&lt;br&gt;
&lt;br&gt;
I&apos;ve also been thinking about social network-related experiments and data-logging webcrawlers that would gather historical, social and other data (of course, ideally without maiming other servers), but have yet to think of a meaningful way to turn that into useful insights or conclusions.&lt;br&gt;
&lt;br&gt;
If anyone has ideas on what to do with that download bandwidth (that would preferably leave my upload bandwidth alone), please hit me with them =)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.92949</guid>
	<pubDate>Sun, 01 Jun 2008 22:57:09 -0800</pubDate>
	<category>bandwidth</category>
	<category>datalogging</category>
	<category>optimisation</category>
	<category>social</category>
	<category>webcrawling</category>
	<dc:creator>kureshii</dc:creator>
	</item>
	
	</channel>
</rss>

