Entertaining Data Sets
June 12, 2016 9:14 PM
I'm looking for a whimsical large data set. Something bizarrely specific, or just an area you'd never guess there was that much to measure. For example, the amount of materials used by the Teddy Bear industry or the shifting volume of animal feces at major zoos. I've been searching through the many big data set sites to no avail.
The ideal set would contain around a million nodes and, if relational, a few hundred thousand connections. A factor of ten either way would be reasonable though.
I don't know any such databases, so I went to datahoaders's exchange subreddit, and found 10 million fan fiction stories, plus sqlite database, presumably for filtering. NFL is tracking football with rfid now, but hasn't released the data. You can still grab NFL play by play data though.
EFF released their SSL Observatory, which pokes at certs they've found across the internet. There's actually some interesting relationship queries one can build -- good certs are signed by other certs, and IIRC they found some long chains. At one point someone tried putting as much of DNS into SQL as possible, but I don't know that they made the dataset available.
After the 1999 collapse, Enron email archives were cleaned up, put into SQL, and published.
Jeopardy! questions. From a subreddit I didn't know existed./r/datasets/ sorted by upvote has some interesting ones, like Weekly Liquor Sales in Iowa.
posted by pwnguin at 10:39 PM on June 12, 2016
EFF released their SSL Observatory, which pokes at certs they've found across the internet. There's actually some interesting relationship queries one can build -- good certs are signed by other certs, and IIRC they found some long chains. At one point someone tried putting as much of DNS into SQL as possible, but I don't know that they made the dataset available.
After the 1999 collapse, Enron email archives were cleaned up, put into SQL, and published.
Jeopardy! questions. From a subreddit I didn't know existed.
posted by pwnguin at 10:39 PM on June 12, 2016
Oh yea, open street map has a lot of random shit. Someone tells me that pretty much every remaining tree in Germany is in there.
posted by pwnguin at 10:39 PM on June 12, 2016
posted by pwnguin at 10:39 PM on June 12, 2016
Did you poke through the British Government's open data collection? All sorts of obscure stuff, but not really guided or explained, naturally!
posted by danteGideon at 5:06 AM on June 13, 2016
posted by danteGideon at 5:06 AM on June 13, 2016
This thread is closed to new comments.
posted by waninggibbon at 10:34 PM on June 12, 2016