Distributed DBMS
November 6, 2008 4:44 PM   Subscribe

I need distributed database suggestions.

One of my clients has data that needs to have high availability.

I'm looking for a database system (it doesn't have to be relational) that is geographically distributed, has redundancy and that we can install ourselves in our own hardware.

Ideally, I could distribute different parts of the database in servers located in different ISPs. If one ISP goes temporarilly down, data is still available in the other servers.

Amazon's SimpleDB looks *great* for what I need, but I can't depend on only one provider for the data. If Amazon goes down, which has happened a couple of times, I will have problems.

Right now, we are using MySQL replication, but we are having problems with the replicated lagging way behind, and this is something that very soon we can't let happen.

(Probably some Oracle product does this, but I don't think we can afford it)

posted by edmz to Computers & Internet (8 answers total) 2 users marked this as a favorite
Have you looked at CouchDB?
posted by nicwolff at 5:29 PM on November 6, 2008

Lots of ideas for PostgreSQL. Have not tried them myself, but IMHO PostgreSQL is much better constructed than MySQL. The development team does not implement half-solutions. Greenplum offers a high availability customized version, unsure about cost.
posted by benzenedream at 6:19 PM on November 6, 2008

My experience with Lotus Notes is about ten years old but even back then this kind of synchronization of distributed databases was something that Notes was fabulous for. You'd have to pay for it but depending on what your exact needs are you might well get by buying someone's license of an old version.

The other thought that occurs to me, again depending on what you need exactly, is monotone. monotone isn't a database product, it's actually a distributed source code management system that is backed by SQLite on each node. But it's designed to handle both text and binary files and to be very robust and secure. It's pretty mature too - way back in 2005 it was a candidate to replace the SCM system used by the entire Linux community, although they decided to write their own, Git which is the same sort of thing. Mercurial is another open source product in the category.
posted by XMLicious at 6:38 PM on November 6, 2008

benzenedream: I much prefer PostgreSQL to MySQL too, but replication is the one place it actually is kind of half-solved - the Slony engine has for years been the usual solution, but it's trigger-driven and somewhat slow. The Pg team got serious about building in a replication feature this summer and they're working on a log-shipping solution for inclusion in 8.4, but it's aimed more at failover than distributed operation.
posted by nicwolff at 7:01 PM on November 6, 2008

It might be an abuse of the technology, but ICE has the potential to do what you want if you use it incorrectly. Depending on how much data is going to be pushed into the data store, it might work.
posted by krisak at 5:25 AM on November 7, 2008

It sounds like Drizzle is what you're after:
The Drizzle project is building a database optimized for Cloud and Net applications. It is being designed for massive concurrency on modern multi-cpu/core architecture. The code is originally derived from MySQL.
I'm not sure how old the project is, how complete it is; I've not even used it. Moreover, Ohloh can't read it's bzr format. (Incidentally, bzr is another SCM of the kind XMLicious mentioned).
posted by pwnguin at 7:01 AM on November 7, 2008

If I'm understanding you correctly, Lotus Notes/Domino is perfect for this. Besides providing fail-over server capability (via "clustering") and distributing data subsets (via "replication formulas" and "Readers fields") between servers & clients, a Lotus Notes system lets users work off-line, disconnected from any server, and sync-up their data later when they're able. Dunno if this last part appeals to your particular application, but for many applications (sales force automation, for example), it's critical. There are a number of service providers you can rent server space from if you don't want to maintain your own infrastructure.

Disclaimer: Lotus Notes application development has been my main gig for over 10 years.
posted by LordSludge at 7:13 AM on November 7, 2008

I'll take a different tack and say have you worked with anyone to improve your MySQL replication or have you validated that your distance/data volume/freshness requirements are beyond what normal installations of MySQL are capable of?

Anything short of doing distributed transactions to update all the databases in one go will certainly have latency for changes to show up on all the systems.
posted by mmascolino at 7:39 AM on November 7, 2008

« Older What travel company is the best for river rafting...   |   Charge me up, just don't charge me a lot Newer »
This thread is closed to new comments.