Comments on: How to track the sources a website pulls news from?

Question: How to track the sources a website pulls news from?

ISeemToBeAVerb — Sat, 29 Sep 2012 17:41:07 -0800

I'm wondering if anyone could suggest a tool or website that allows me to see a list of sites that a particular website is pulling news from. I know I can do this manually by looking at articles and noting the sites that are sourced, but I'm wondering if there is an easier way to do it. Any ideas? Thanks, - Michael

By: Orb2069

Orb2069 — Sun, 30 Sep 2012 11:33:57 -0800

If you have access to a UNIX command line, something like:

wget -O http://particular_website.com | grep -cf file_with_root_website_addresses.txt > output.txt

where file_with_root_website_addresses.txt would look like:

http://*.nbcnews.com/
http://www.reuters.com
http://www.eweek.com
...etc

...for each news agency you were looking for.
Since this is more of a hint than explicit instructions, here's the wget manual and the grep manual

By: ISeemToBeAVerb

ISeemToBeAVerb — Sun, 30 Sep 2012 12:51:58 -0800

Thanks Orb2069, that sounds promising, I'll give it a shot.