Database dumps
March 2, 2011 4:41 PM   Subscribe

Wikipedia makes its entire contents available in a (gargantuan) single downloadable file: the Wikipedia Dump. What are some other websites or public databases that do this?

I am trying to convince the operator of a public information database to release its entire contents as a single file or archive so I (and others) can play with the data locally. They should be open to the idea since their mandate is to make these data available to everyone, but it would help if I could point to as many examples as possible of databases that let the public access their contents like this.

P.S. Ripping the db myself is not an option.

P.P.S. I know about the Mefi Infodump. I'm more interested in databases that make their entire contents available in a single file or archive.
posted by hayvac to Technology (11 answers total) 22 users marked this as a favorite
 
CiteSeer used to offer a single giant XML file with their millions of bibliographic entries. Don't know if they still do.

The Internet Archive sort of fits the bill. They don't release their whole DB in a single file (it would be too big), but their whole purpose is to make all of their archives available to everybody.

You haven't said what kind of data you've got, but I would urge you to consider the privacy implications before releasing a DB. If there's ANY information that's specific to individuals, then it can often be used to identify them, even if actual identifiers are anonymized.
posted by qxntpqbbbqxl at 4:48 PM on March 2, 2011 [1 favorite]


IMDB
posted by pompomtom at 4:58 PM on March 2, 2011 [1 favorite]


PubChem does this, though it's not all in a single file.
posted by invitapriore at 5:04 PM on March 2, 2011


data.gov
posted by grouse at 5:15 PM on March 2, 2011 [1 favorite]


Discogs.com (artist, label and release information, from 2008 to the present). I think this is the page for the similar MusicBrainz db, but I could be wrong. Of course, FreeDB's database is also free.
posted by filthy light thief at 5:42 PM on March 2, 2011


BoingBoing
posted by travis08 at 5:56 PM on March 2, 2011


Infochimps's entire purpose is to catalog such dumps of data, check it out.
posted by neustile at 6:30 PM on March 2, 2011 [1 favorite]


Stack Exchange which has data for sites such as Stack Overflow and Server Fault.
posted by austinetsu at 7:01 PM on March 2, 2011


OpenStreetMap has this; their most recent weekly dump is about 15GB compressed.
posted by teraflop at 7:57 PM on March 2, 2011 [2 favorites]


CKAN is a registry of (mostly) open datasets.
posted by metaquarry at 5:22 AM on March 3, 2011


MusicBrainz has their entire database of artists and their music available. I also second the InfoChimps recommendation.

DataMarket has 100 million timeseries available, from various sources.
posted by cheerleaders_to_your_funeral at 9:02 AM on March 3, 2011 [1 favorite]


« Older How do I overcome learned helplessness?   |   What can I do with dry-cured pork tenderloin? Newer »
This thread is closed to new comments.