Database of famous persons' names + disambiguations - where?
May 8, 2007 4:13 AM

Where can I find a data source containing names and disambiguations of most well-known (living and dead) people?

I downloaded some data from wikipedia, but biographical-type articles are not distinguished as such and there are some 3.5 million entries. Hrm.

Any thoughts?
posted by masymas to Computers & Internet (3 answers total)
I did a data mining project like this once. It's a bitch. I found no single data source. I ended up picking a proxy for popularity, like appearance in the NYT (or some collection of prominent newspapers). If I recall, I downloaded a year or so of text, stripped out all the proper names (they have a standard format--two consecutive words both of which begin with caps), dumped them into Excel, did a frequency count, and then sorted. You have a list of the most mentioned names in the press, sorted by number of mentions. Actually I remember one striking finding--there are about 3000 famous people in the America, dead and alive. Not 4. Not 5. But 3000.
posted by MarshallPoe at 5:13 AM on May 8, 2007


You need to pick a source to use as your authority. This is a common problem in library cataloging.

Most libraries use the Library of Congress Name Authority File. Interface takes a bit of practice to learn, but this is probably what you are looking for.
posted by rachelpapers at 5:53 AM on May 8, 2007


Not sure, but maybe this will help.
posted by SuperNova at 4:37 PM on May 8, 2007


« Older Will I regret never playing the field?   |   Minority Report question & spoiler Newer »
This thread is closed to new comments.