Where does it all come from?
How do I find what sites a particular aggregator like, say, metafilter.com links to the most?

I want to find the sites that are the most popular sources for posts in the major aggregators (reddit, digg, metafilter, etc.).

It would be especially excellent if I could limit my search by keyword. How would I go about this?

I pretty much ended reading each line of your question with a "what?"

Where does what all come from? The stuff you read on the internet is either reporting on stuff that's happened away from in front of a computer screen or something specifically created for publishing and distribution on the internet. It comes from anyone who comes from something happening away from a computer screen sitting down at a computer screen and reporting that it happened.

The reporting takes place in the online venue of that person's choosing. That could be anything from a personal blog to a community message board to youtube to email to anonymous questions on a random site to wherever it strikes someone to post something online. Other people presumably peruse or consult that venue (otherwise why report it?) and then pass on a link or recounting via another venue. Aggregators are the popular sources where lots of eyeballs wind up, specifically because everything else comes from the little places that individuals frequent for whatever reason but aren't widely popular.
So there are all of these sources with their individual small audiences, and I suppose the frequency with which a source is used probably varies to an extent (and along with the composition of its audience), but there aren't really many sources that are wayyy more popular than others aside from the bleedingly obvious ones like youtube and news sites. If what you're wondering is 'where do people go to find good links?' then there is no one good answer to that specific question (except for: aggregators!). People who turn up consistently good links trawl for them, ranging over many sources, evaluating, and eventually hypertextually reporting what they find via an aggregator. Aggregators like Digg facilitate the distribution and evaluation process by allowing the audience to decide what of all the trawlings is important or popular enough to share first. Ones like MetaFilter rely less on the audience and more on the individual trawler in dictating what's reported.

I want to find the sites that are the most popular sources for posts in the major aggregators (reddit, digg, metafilter, etc.).
What? OK, on the other hand you could be looking for the most popular sources for individual aggregators. Like I said above, this probably isn't going to result in a small number of hugely-referenced sites that aren't aggregators. There'll be some variation between the rest of the links, for sure, but that's as much a function of the size of the audiences on the other end of the link as it is anything else. So, to find the sites most-linked-by-a-given-aggregator just look for sites with already-large audiences. But this information could also be found from the aggregators. You'd basically have to locate a specific site's admin (their pb or cortex) and request they report that information, assuming it's stored in their database somehow. For instance, I believe that sort of information is available to those who know how to manipulate Metafilter's Infodump. I don't think there's any passive way (ie, searching) for you to figure it out on your own. You could scrape the site and analyze the links over a given span of time... I'm not sure if there's some sort of reference service that does this already and reports findings. And if there is something like that, I think it may rely on croudsourcing or self-selective reporting (like Alexa) and not on any rock-solid, across the board observational reporting.

It would be especially excellent if I could limit my search by keyword.
What? What search?

In short, I'd suggest you attempt to clarify what you're looking for. The questions as you state them seem to be based on some odd conceptions of how hypertext and information aggregation/dissemination occurs. ...I think.
Or I could just restate the question. =\
