<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: How to track the sources a website pulls news from?</title>
	<link>http://ask.metafilter.com/225595/How-to-track-the-sources-a-website-pulls-news-from/</link>
	<description>Comments on Ask MetaFilter post How to track the sources a website pulls news from?</description>
	<pubDate>Sat, 29 Sep 2012 17:41:07 -0800</pubDate>
	<lastBuildDate>Sun, 30 Sep 2012 11:33:57 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: How to track the sources a website pulls news from?</title>
		<link>http://ask.metafilter.com/225595/How-to-track-the-sources-a-website-pulls-news-from</link>	
		<description>I&apos;m wondering if anyone could suggest a tool or website that allows me to see a list of sites that a particular website is pulling news from.  I know I can do this manually by looking at articles and noting the sites that are sourced, but I&apos;m wondering if there is an easier way to do it.  

Any ideas?

Thanks, 
- Michael</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2012:site.225595</guid>
		<pubDate>Sat, 29 Sep 2012 17:41:07 -0800</pubDate>
		<dc:creator>ISeemToBeAVerb</dc:creator>
		
			<category>SEO</category>
		
			<category>Marketing</category>
		
			<category>OnlineMarketing</category>
		
			<category>SearchEngineMarketing</category>
		
			<category>SearchEngineOptimization</category>
		
			<category>Journalism</category>
		
			<category>Research</category>
		
	</item>
	<item>
		<title>By: Orb2069</title>
		<link>http://ask.metafilter.com/225595/How-to-track-the-sources-a-website-pulls-news-from#3265070</link>	
		<description>If you have access to a UNIX command line, something like:&lt;br&gt;
&lt;br&gt;
wget -O http://particular_website.com | grep -cf file_with_root_website_addresses.txt &gt; output.txt&lt;br&gt;
&lt;br&gt;
where file_with_root_website_addresses.txt would look like:&lt;br&gt;
&lt;blockquote&gt;&lt;br&gt;
http://*.nbcnews.com/&lt;br&gt;
http://www.reuters.com&lt;br&gt;
http://www.eweek.com&lt;br&gt;
...etc&lt;br&gt;
&lt;/blockquote&gt;&lt;br&gt;
...for each news agency you were looking for.&lt;br&gt;
Since this is more of a hint than explicit instructions, here&apos;s the &lt;a href=&quot;http://www.gnu.org/software/wget/manual/html_node/&quot;&gt;wget manual&lt;/a&gt; and the &lt;a href=&quot;http://www.gnu.org/software/grep/manual/html_node/index.html&quot;&gt;grep manual&lt;/a&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225595-3265070</guid>
		<pubDate>Sun, 30 Sep 2012 11:33:57 -0800</pubDate>
		<dc:creator>Orb2069</dc:creator>
	</item><item>
		<title>By: ISeemToBeAVerb</title>
		<link>http://ask.metafilter.com/225595/How-to-track-the-sources-a-website-pulls-news-from#3265162</link>	
		<description>Thanks Orb2069, that sounds promising, I&apos;ll give it a shot.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225595-3265162</guid>
		<pubDate>Sun, 30 Sep 2012 12:51:58 -0800</pubDate>
		<dc:creator>ISeemToBeAVerb</dc:creator>
	</item>
	</channel>
</rss>
