Where can I find data?
June 7, 2015 9:32 AM   Subscribe

Where can I just find spreadsheets of interesting data?

I'm working on a data visualization project, but I need some real data to put in it. I looked everywhere but I have no idea where people actually find their data. I tried all the gov't websites (UNESCO etc.) but nothing that interesting (or at least, nothing I can do with it). I'm basically just looking for sites/APIs that have interesting, correlated data that I can put in scatterplots and stuff. Preferably in CSV/XLS/Excel format, but I'm desperate for anything that can be scraped/parsed! Thanks in advance!

PS I can't use infographics or bar graphs that have already been made. I'm specifically looking for spreadsheets of numbers that I can put in my own graph.
posted by lhude sing cuccu to Computers & Internet (15 answers total) 33 users marked this as a favorite
The R package "datasets" contains a bunch of datasets... List here. If you don't know R, it's free and wouldn't take you much googling to figure out how to get the data and export it in a format you needed.
posted by brainmouse at 9:39 AM on June 7, 2015 [2 favorites]

The Broad Institute has a lot of publicly available data sets but I don't know if its the kind of thing you're looking for. You probably need to be more specific.

GEO has Gene Expression data sets, so microarrays and stuff. Again, no idea if this is appropriate for you though.
posted by shelleycat at 9:45 AM on June 7, 2015

How raw? What subject area? This is often called a "teaching dataset".
posted by unknowncommand at 9:52 AM on June 7, 2015

Canadian government data: http://open.canada.ca/en
posted by Poldo at 10:04 AM on June 7, 2015

NYC Open Data has some good data sets, exportable in convenient formats.
posted by aparrish at 10:04 AM on June 7, 2015

I like these from the Pew Research Center.
posted by pantarei70 at 10:05 AM on June 7, 2015

The Greater London Authority make a, frankly, obscene amount of data available on everything to do with the city - including a shedload of interesting data about usage of the London Underground and bus routes.

You can find it all here in the London Datastore
posted by garius at 10:25 AM on June 7, 2015

I'm a fan of the US Bureau of Labor Statistics: http://www.bls.gov/data/
posted by ndfine at 10:26 AM on June 7, 2015

The MeFi Wiki:
Infodump and Excel
posted by Little Dawn at 10:29 AM on June 7, 2015

Lahman baseball database, which is available in CSV format. It covers the standard statistics (no sabermetric stuff, no play-by-play) for every baseball player in history (with different entries for each season).
posted by vogon_poet at 11:29 AM on June 7, 2015

The Japanese Patent Office compiles statistics on the global Patent Prosecution Highway (.xlsx).
posted by invisible ink at 1:14 PM on June 7, 2015

The UCI datasets are aimed more at machine learning people, but they cover a range of subjects; you might find something useful there.

More generally, the various US government agencies are generally pretty brilliant about putting their data into the public domain. You could start with the USGS or NOAA, but simply Googling $ABBREVIATION + "datasets" is a good way to find huge quantities of CSV data. I'm away from my work machine at the moment but will check my bookmarks when I get back.
posted by Zeinab Badawi's Twenty Hotels at 6:17 AM on June 8, 2015

I know you said you had looked at government data, but maybe take a second look at these?

Federal Reserve Economic Data (FRED)


US Bureau of Labor Statistics (BLS)

The data is generally dry, but if you poke around they have some interesting things. For instance:

Federal Debt, Corporate Profits After Tax, Per Capita Disposable Personal Income (FRED)

And also if you like maps they now have GeoFRED.

European Migratant Integration, Accidents at Work all broken down by country and region (EUROSTAT)

American Time Use Survey, Mass Layoff Statistics, Union Affiliation Data (BLS)
posted by BusyBusyBusy at 4:39 AM on June 9, 2015

« Older 80's sax solo   |   Lymph node is driving me insane. Newer »
This thread is closed to new comments.