Join 3,501 readers in helping fund MetaFilter (Hide)


Giant CSV Files Needed
October 30, 2012 4:13 AM   Subscribe

Where can I find publicly-available scientific data to use in an Excel project?

I need to do a project for a computer science course I'm doing that involves VBA and Excel. Basically, we need to develop a fairly complex spreadsheet application that manipulates data sets - preferably real data, and preferably a lot of it.

I'm looking for data sources that might make for an interesting project. At the moment I'm using a catalog of exoplanets, but I'm wondering if there's something else out there along similar lines that might contain more information to work with. Ideally, I'd like something that can be downloaded in CSV format.
posted by anaximander to Computers & Internet (18 answers total) 10 users marked this as a favorite
 
Some of the World Bank data sets may be of interest.
posted by xyzzy at 4:27 AM on October 30, 2012 [1 favorite]


Data dot gov has a lot.
posted by oceanjesse at 4:47 AM on October 30, 2012 [1 favorite]


If you want lots of data, you can look at Amazon's public data sets. They are only available inside the Amazon cloud, so getting them to your local machine might take some work.
posted by Idle Curiosity at 5:21 AM on October 30, 2012 [2 favorites]


The IPEDS data center lets you download zipped .csv files containing lots of data about US post-secondary educational institutions. You can choose to download pre-defined datasets, or build your own.

I looked at one of the default sets, and it had about 7500 rows and 65 columns of varying data types -- not sure if that's big enough for you. Using the custom data set generator, you could probably string together a lot more columns. Perhaps worth looking at.
posted by mean square error at 5:23 AM on October 30, 2012


Stack Exchange Creative Commons data dump
posted by katrielalex at 6:06 AM on October 30, 2012


There is a crazy amount of scientific data from microarray experiments located within the Gene Expression Omnibus at NCBI: http://www.ncbi.nlm.nih.gov/sites/GDSbrowser. I think their file formats can be easily converted to CSV.
posted by sevenyearlurk at 6:06 AM on October 30, 2012


The Enron Email Dataset might be of interest to you.

Also, if you're willing to do a bit of data pre-processing, then libraries at University of Michigan and Harvard University have released enormous datasets of their bibliographic data, too.

All three of these datasets are pretty huge. There are also googleable examples out there of people who have done cool things with the data, so that might give you some ideas.
posted by skye.dancer at 6:21 AM on October 30, 2012


Is this a self link? I guess so, but it is completely relevant. I work on the Water Quality Portal which provides water quality data for 2 million sites and almost 200 million samples. This is pretty big data though, and you will have to scope your query to fit within Excel's limits. (roughly 1 million rows).
posted by rockindata at 6:22 AM on October 30, 2012


Marinexplore offers 450 million datapoints collected from the world's oceanographic instruments.
posted by Egg Shen at 6:28 AM on October 30, 2012


Nat'l Center for Education Statistics is here.
posted by smirkette at 7:05 AM on October 30, 2012


FAA on time flight performance data is here
posted by crazycanuck at 7:31 AM on October 30, 2012


The Guardian's data section has links to a lot of publically available data. You could also look at the Department for Education's Research and Statistics Gateway.
posted by paduasoy at 8:06 AM on October 30, 2012


The ENCODE Project Data Summary is a spreadsheet of 3776 data sets made available by the project.
posted by grouse at 8:51 AM on October 30, 2012 [1 favorite]


Scientific data? Environmental Protection Agency.
posted by croutonsupafreak at 9:25 AM on October 30, 2012


Have a look around DataDryad and see what takes your fancy, it's:

"an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies."

posted by roofus at 10:26 AM on October 30, 2012


The National Climate Data Center has enough downloadable climate data (in CSV amongst other formats) to keep you busy for a long time. (6 petabytes and counting -- I believe it's all downloadable but haven't checked in detail.)
posted by pont at 4:09 PM on October 30, 2012


IMDB's reasonably big.
posted by pompomtom at 5:21 PM on October 30, 2012


Quora List
posted by vegetableagony at 2:46 PM on November 9, 2012


« Older Navy rack (bed, not ribbons) k...   |  I just moved from Movable Type... Newer »
This thread is closed to new comments.