MySQL vs PostgreSQL?
May 22, 2008 4:32 PM Subscribe
What's the hive mind's opinion on MySQL vs PostgreSQL?
In a web dev shop, rebuilding the platform. It seems like at MySQL 5.2, the differences have become rather negligible, but I am still interested in your experiences. Currently we're using MySQL (5.0 though), but I've always heard wonderful things about Postgres's robustness and speed.
Everything I find on google for 'mysql vs postgresql' seems to date bate to 2003.
In a web dev shop, rebuilding the platform. It seems like at MySQL 5.2, the differences have become rather negligible, but I am still interested in your experiences. Currently we're using MySQL (5.0 though), but I've always heard wonderful things about Postgres's robustness and speed.
Everything I find on google for 'mysql vs postgresql' seems to date bate to 2003.
I don't know much about the subject, but people I know who do all seem to favour PostGreSQL. This exhaustive wikipedian feature comparison might be of interest. (Another article I read recently, plus a poorly edited wiki comparison.)
posted by Luddite at 4:47 PM on May 22, 2008
posted by Luddite at 4:47 PM on May 22, 2008
I know something of them both, I guess you could say.
MySQL version 5.0 is stable -- there is no 5.2. It goes 5.0, 5.1 (frozen), 6.0 (active devel).
You don't find very much since 2003 because there's no good reason to choose one over the other these days, except your other 3rd-party software and perhaps your preference for the other very-minor variation and which community you like more.
posted by cmiller at 4:51 PM on May 22, 2008
MySQL version 5.0 is stable -- there is no 5.2. It goes 5.0, 5.1 (frozen), 6.0 (active devel).
You don't find very much since 2003 because there's no good reason to choose one over the other these days, except your other 3rd-party software and perhaps your preference for the other very-minor variation and which community you like more.
posted by cmiller at 4:51 PM on May 22, 2008
MySQL is easier to start using.
Nicer UI tools. Faster, if you don't use ACID. More tolerant of invalid data. Autoincrement columns are as easy as typing autoincrement.
Posgtres is much more standards compliant, but it's uglier and more complicated, especially from a UI perspective. It requires manual vacuuming, and actually enforces referential integrity (which is a great thing that can be a pain in the ass). Autoincrement is much more flexible, but requires sequences, and wait, what's an OID?
So if you don't really know or care much about databases, data validity, ACID compliance, etc, but you do care about ease and speed, you tend to go with MySQL.
Too many "web programmers" know a lot about "web 2.0" or PHP or Java, but don't know much about database theory or practice ("an index? what's that?"). They tend to see a database as just a fancy hashtable or bag of data, and indeed one that's not anywhere as dynamically changeable or forgiving as a hashtable.
For these folks, MySQL -- because until 5.0 it wasn't really an RDBMS, and in many ways still is not -- is a godsend. It's "faster" than the competition, and doesn't "waste time" on "esoteric" database stuff a web programmer doesn't want, understand, or see the value of.
For somebody with a database background, on the other hand, MySQL is a minefield: stuff that should work (complicated views, group bys, order bys in group bys) may work or may if you're lucky crash the server, or if you're unlucky just give results with incorrect data.
I've spent days working around some of these things in admittedly complicated by not extraordinarily complex views and group bys.
And MySQL isn't really faster. If you're using InnoDb tables for ACID (or just because at more than 30 Million rows, MyIsam tables tend to get crappy), yes a straight one-table select is probably faster than in pg. But add in joins, and pg is suddenly significantly faster. (MySQL is especially bad at inner joins.)
In summary: if to you the database is a bag, if you never intend to do data mining or reporting, if you're mostly interested in serving up big hunks of text with few relations or updates-- that is, if you're using a database to power a blog, MySQL is a great choice.
But if you're actually managing data, if you understand that data lives longer than front-end programs and middle-tier business rules, if you need the features of a real databse, use pg.
A "web programmer" who has decided all his table structures can be auto-generated by Hibernate (or some other ORM) looks at that and says, "too complicated" and "I bet complicated means more cost and slower speed" and so he goes with MySQL.
As I said, pg is far superior, and I hate dicking with MySQL bizarre bugs, and I think that overall pg performance is probably better than MySQL for any even slightly complicated query.
But MySQL makes things look (deceptively) simple, so you get a lot of people who don't really understand database design figuring that MySQL is a great choice.
Use pg. It's consistent, it's reliable, it's standards-compliant, it's faster on (even moderately) complicated queries, it doesn't completely throw off your schedule with weird bugs.
posted by orthogonality at 5:22 PM on May 22, 2008 [19 favorites]
Nicer UI tools. Faster, if you don't use ACID. More tolerant of invalid data. Autoincrement columns are as easy as typing autoincrement.
Posgtres is much more standards compliant, but it's uglier and more complicated, especially from a UI perspective. It requires manual vacuuming, and actually enforces referential integrity (which is a great thing that can be a pain in the ass). Autoincrement is much more flexible, but requires sequences, and wait, what's an OID?
So if you don't really know or care much about databases, data validity, ACID compliance, etc, but you do care about ease and speed, you tend to go with MySQL.
Too many "web programmers" know a lot about "web 2.0" or PHP or Java, but don't know much about database theory or practice ("an index? what's that?"). They tend to see a database as just a fancy hashtable or bag of data, and indeed one that's not anywhere as dynamically changeable or forgiving as a hashtable.
For these folks, MySQL -- because until 5.0 it wasn't really an RDBMS, and in many ways still is not -- is a godsend. It's "faster" than the competition, and doesn't "waste time" on "esoteric" database stuff a web programmer doesn't want, understand, or see the value of.
For somebody with a database background, on the other hand, MySQL is a minefield: stuff that should work (complicated views, group bys, order bys in group bys) may work or may if you're lucky crash the server, or if you're unlucky just give results with incorrect data.
I've spent days working around some of these things in admittedly complicated by not extraordinarily complex views and group bys.
And MySQL isn't really faster. If you're using InnoDb tables for ACID (or just because at more than 30 Million rows, MyIsam tables tend to get crappy), yes a straight one-table select is probably faster than in pg. But add in joins, and pg is suddenly significantly faster. (MySQL is especially bad at inner joins.)
In summary: if to you the database is a bag, if you never intend to do data mining or reporting, if you're mostly interested in serving up big hunks of text with few relations or updates-- that is, if you're using a database to power a blog, MySQL is a great choice.
But if you're actually managing data, if you understand that data lives longer than front-end programs and middle-tier business rules, if you need the features of a real databse, use pg.
A "web programmer" who has decided all his table structures can be auto-generated by Hibernate (or some other ORM) looks at that and says, "too complicated" and "I bet complicated means more cost and slower speed" and so he goes with MySQL.
As I said, pg is far superior, and I hate dicking with MySQL bizarre bugs, and I think that overall pg performance is probably better than MySQL for any even slightly complicated query.
But MySQL makes things look (deceptively) simple, so you get a lot of people who don't really understand database design figuring that MySQL is a great choice.
Use pg. It's consistent, it's reliable, it's standards-compliant, it's faster on (even moderately) complicated queries, it doesn't completely throw off your schedule with weird bugs.
posted by orthogonality at 5:22 PM on May 22, 2008 [19 favorites]
[Postgres] requires manual vacuuming
Not since version 8.1 - we're at 8.3 now.
Otherwise, what you said; Postgres is much more complete, compliant, reliable, scalable, and mature than MySQL. Also, INSERT...RETURNING, yay.
posted by nicwolff at 5:41 PM on May 22, 2008
Not since version 8.1 - we're at 8.3 now.
Otherwise, what you said; Postgres is much more complete, compliant, reliable, scalable, and mature than MySQL. Also, INSERT...RETURNING, yay.
posted by nicwolff at 5:41 PM on May 22, 2008
Autoincrement columns [in MySQL] are as easy as typing autoincrement
And autoincrement columns in Postgres are as easy as typing "serial".
posted by nicwolff at 5:43 PM on May 22, 2008
And autoincrement columns in Postgres are as easy as typing "serial".
posted by nicwolff at 5:43 PM on May 22, 2008
nicwolff writes "Not since version 8.1 - we're at 8.3 now."
and writes "And autoincrement columns in Postgres are as easy as typing 'serial'."
Yeah, sorry, I was trying to explain how pg got a (deserved) rep as being less user friendly . Crap. I copied my own comment from reddit, where I went through the same thing:
and writes "And autoincrement columns in Postgres are as easy as typing 'serial'."
Yeah, sorry, I was trying to explain how pg got a (deserved) rep as being less user friendly . Crap. I copied my own comment from reddit, where I went through the same thing:
Yes, as I mentioned, sequences are much more flexible than autoincrement: I can use the same sequence for more than one table (as say, two concrete derived types of one abstract super type).posted by orthogonality at 5:57 PM on May 22, 2008
And yes, lately there's a vacuuming daemon.
If referential integrity is important to you, go for Postgres, but both will have their “gotchas.” If cheap shared hosting is important, Postgres may be harder to support, but there are options. I started on pgsql, but have used MySQL for the past 4-5 years. I’d prefer to go back to pgsql.
posted by ijoshua at 5:58 PM on May 22, 2008
posted by ijoshua at 5:58 PM on May 22, 2008
I agree with everything orthogonality said. MySQL does have the advantages that it's more user-friendly, plus it's got much better docs (last I looked) and a larger userbase (meaning a more active user community, helpful when you have questions, and more third-party tools), but its weird quirks and eccentricities always seem to wind up costing me time in the long run.
For the sort of tasks that I would tend to favor MySQL for - simple schemas and quick prototyping, mostly - I tend to use single-file databases like sqlite these days anyways, since they generally don't require installing an entire complex application.
posted by whir at 6:12 PM on May 22, 2008 [2 favorites]
For the sort of tasks that I would tend to favor MySQL for - simple schemas and quick prototyping, mostly - I tend to use single-file databases like sqlite these days anyways, since they generally don't require installing an entire complex application.
posted by whir at 6:12 PM on May 22, 2008 [2 favorites]
My personal opinion is that if you want to use MySQL and don't specifically know why, you should use SQLite. SQLite fills that niche of "basically a hash table or some other place to dump some data" that just happens to use SQL syntax, which makes it easier for those of us who know said syntax. And it doesn't require a server process or any of that junk. MySQL has a reputation for being fast precisely because it doesn't do all that Postgres does (and Postgres has essentially caught up in the speed department), but SQLite is faster because it does so little.
Basically what I'm saying is that if you want an actual database, go with PostgreSQL. If you don't, use SQLite. Forget MySQL unless you have a specific reason to use it, such as having no other option with your web host. (or if you just really, really love phpmyadmin..I think using the psql command line tool's help is better, but that's just me)
posted by wierdo at 8:14 PM on May 22, 2008
Basically what I'm saying is that if you want an actual database, go with PostgreSQL. If you don't, use SQLite. Forget MySQL unless you have a specific reason to use it, such as having no other option with your web host. (or if you just really, really love phpmyadmin..I think using the psql command line tool's help is better, but that's just me)
posted by wierdo at 8:14 PM on May 22, 2008
Mysql is to MS Access as Postgres is to SQL Server 2000.
MySQL is a toy, but it's a pretty good toy for doing things with the web. It just gets really, really hairy if you're trying to do big data things like a business system or data warehouse in it.
posted by SpecialK at 9:31 PM on May 22, 2008
MySQL is a toy, but it's a pretty good toy for doing things with the web. It just gets really, really hairy if you're trying to do big data things like a business system or data warehouse in it.
posted by SpecialK at 9:31 PM on May 22, 2008
+1 to orthogonality's answer...
posted by russm at 9:46 PM on May 22, 2008 [1 favorite]
posted by russm at 9:46 PM on May 22, 2008 [1 favorite]
mySQL supports referential integrity if you use InnoDB tables.
posted by missmagenta at 11:11 PM on May 22, 2008
posted by missmagenta at 11:11 PM on May 22, 2008
PostgreSQL is technically better, but MySQL is a bit easier and more widely used/supported. If you need to deploy clients' sites on existing hosting, and use ready-made PHP apps, then MySQL will give you fewer problems and is a sensible option, whereas if you have full control over the servers & code and have the kind of app & data that would benefit from PostgreSQL's features/performance then give it a try.
posted by malevolent at 11:32 PM on May 22, 2008
posted by malevolent at 11:32 PM on May 22, 2008
I agree with the prevailing opinions here. Postgres also has a number of established, useful extensions (such as PostGIS for spatial data). Most web devs don't need this, but depending on your problem domains, one of these may come in handy.
posted by devilsbrigade at 11:43 PM on May 22, 2008
posted by devilsbrigade at 11:43 PM on May 22, 2008
Just another data point. I work in the Financial Services industry. In this industry, you either have loads of money and can afford to pay Oracle and Microsoft their fees, or you're living in the real world and have to dip into the open-source kool-aid drink mix every now and again because, frankly, we enjoy a broader diet than Shin Ramen noodles every other night.
That said, we use Postgres. I imagine the only reason we haven't committed ourselves 100% to it isn't because of any perceived "lack of support" (there's plenty) but because the stored procedure language is so different than what MS/Oracle people are used to, which means you can't migrate shit without wasting months in testing. See, folks, that's how they getcha.
posted by Civil_Disobedient at 3:15 AM on May 23, 2008
That said, we use Postgres. I imagine the only reason we haven't committed ourselves 100% to it isn't because of any perceived "lack of support" (there's plenty) but because the stored procedure language is so different than what MS/Oracle people are used to, which means you can't migrate shit without wasting months in testing. See, folks, that's how they getcha.
posted by Civil_Disobedient at 3:15 AM on May 23, 2008
Just to clarify the above: they meaning for-profit DBs (Sybase/MS, Oracle), getcha meaning having to pay up your firstborn for software you never even really get to own, how meaning those friendly 1-seat free "developers" versions they hand out to get your developers hooked like crack fiends.
Also minor correction: It's MS and Sybase that share the TSQL stored procedure language, not Oracle, which uses its own proprietery language (PL/SQL).
posted by Civil_Disobedient at 3:25 AM on May 23, 2008
Also minor correction: It's MS and Sybase that share the TSQL stored procedure language, not Oracle, which uses its own proprietery language (PL/SQL).
posted by Civil_Disobedient at 3:25 AM on May 23, 2008
# mySQL supports referential integrity if you use InnoDB tables.
Not really. InnoDB supports ACID transactions, and doesn’t ignore referential constraints wholesale the way MyISAM does, but if you read through my link above, each of the “gotchas” listed do affect InnoDB tables. This bug report also shows that you cannot rely on triggers to be invoked via foreign key constraints. If you care about referential integrity, MySQL will let you down.
Also, MySQL doesn’t support INTERSECT or MINUS, some of the most basic set operations.
posted by ijoshua at 5:42 AM on May 23, 2008
Not really. InnoDB supports ACID transactions, and doesn’t ignore referential constraints wholesale the way MyISAM does, but if you read through my link above, each of the “gotchas” listed do affect InnoDB tables. This bug report also shows that you cannot rely on triggers to be invoked via foreign key constraints. If you care about referential integrity, MySQL will let you down.
Also, MySQL doesn’t support INTERSECT or MINUS, some of the most basic set operations.
posted by ijoshua at 5:42 AM on May 23, 2008
My opinion, as a sysadmin and systems architect.
The stuff I've seen running on MySQL tends to be Ad-Hackery. There are exceptions, but MySQL's primary feature is "I'm easy!"
The stuff I've seen running on Postgres seems to be much more solid. I suspect the reason is that only real DBAa and data modelers are willing to learn Postgres, because they understand the technology advantages that Postgres has. General developers don't, so they don't see the win in stepping to Postgres, and they don't.
MySQL seems to be fine in basic select situations, but in complicated scenarios, I've seen more than one developer trying to figure out what the heck MySQL just did. This seems, well, bad -- if MySQL is throwing curveballs at you, how do you develop reliable and scalable applications?
There is a real reason that Microsoft, Oracle and IBM charge what they do for their production SQL servers. It is *not* a trivial problem.
So: If you are not dealing with a trivial problem set, I suspect Postgres will be a better answer, but there will be a ramp-up cost in learning it. This is true, BTW, of *any* major infrastructure change.
posted by eriko at 7:20 AM on May 23, 2008
The stuff I've seen running on MySQL tends to be Ad-Hackery. There are exceptions, but MySQL's primary feature is "I'm easy!"
The stuff I've seen running on Postgres seems to be much more solid. I suspect the reason is that only real DBAa and data modelers are willing to learn Postgres, because they understand the technology advantages that Postgres has. General developers don't, so they don't see the win in stepping to Postgres, and they don't.
MySQL seems to be fine in basic select situations, but in complicated scenarios, I've seen more than one developer trying to figure out what the heck MySQL just did. This seems, well, bad -- if MySQL is throwing curveballs at you, how do you develop reliable and scalable applications?
There is a real reason that Microsoft, Oracle and IBM charge what they do for their production SQL servers. It is *not* a trivial problem.
So: If you are not dealing with a trivial problem set, I suspect Postgres will be a better answer, but there will be a ramp-up cost in learning it. This is true, BTW, of *any* major infrastructure change.
posted by eriko at 7:20 AM on May 23, 2008
I used to do large scale packet analysis by converting everything to SQL and running queries against a fully RAM-backed database. MySQL would return answers in seconds, PostgreSQL in minutes.
Sure, if you're trying to use all the hairy modes of SQL, PostgreSQL is superior. If your queries look very much like select * from foo where a>100 and b="timmeh", MySQL is often the obvious choice.
posted by effugas at 11:16 AM on May 23, 2008
Sure, if you're trying to use all the hairy modes of SQL, PostgreSQL is superior. If your queries look very much like select * from foo where a>100 and b="timmeh", MySQL is often the obvious choice.
posted by effugas at 11:16 AM on May 23, 2008
I don't know that we should let the "speed" argument for MySQL go unchecked. My understanding is that Postgres is more tuneable to match your performance needs than MySQL, and in many cases the speed differences can be made up (plus some, even) with tuning. Don't quote me on that, though.
posted by abcde at 1:58 PM on May 23, 2008
posted by abcde at 1:58 PM on May 23, 2008
Oh, and this goes for everyone: If you're starting a new MySQL instance, please for the love of all that's good, turn on strict "sql_mode". Please. Think of the children!
posted by cmiller at 3:52 PM on May 23, 2008
posted by cmiller at 3:52 PM on May 23, 2008
Some good answers here, but I feel there's more to it than this. I'm a PostgreSQL user who have thinks of MySQL a bit like Linux or Mac people think about Windows users — I pity them and cannot for the life of me fathom why they are putting up with the crap.
Some background is needed. It is worth pointing out that MySQL is extremely popular, and that MySQL's popularity precedes its maturity: that is, MySQL was popular long before InnoDB, long before Oracle saw it as a serious competitor, long before it was available as part of every ISP's virtual server config. MySQL grew up with Linux: A much-hyped concept a few years back was LAMP (Linux, Apache, MySQL, PHP), a stack of free software that was easy to get up and running with as an newbie developer. In this environment, people chose things like the LAMP components because they were free and simple, whereas Windows and Oracle were not. Of course, the momentum of a crowd means that MySQL expertise is easy to come by.
PostgreSQL has lagged in popularity in part due to its reputation as a slow, difficult if reliable database; when MySQL works and does the trick, why bother? Back in the early 1990s, when 7.4 was the reigning version of PostgreSQL and the development team had barely come off the stage where they fully understood the whole codebase (PostgreSQL was inherited from Postgre95, which came from Michael Stonebraker's original Postgres project at Berkeley), this was absolutely true; sure, as a database it had a level of maturity unmatched by MySQL — real transactions, triggers, foreign key integrity, table inheritance, some interesting data types — but it was slow and required occasional manual "vacuuming".
PostgreSQL is no longer slow. The core team has since been able to make huge leaps in performance and usability, and now exceeds MySQL in its ability to scale. In fact, if you compare the history of PostgreSQL and MySQL, you will see that PostgreSQL started out with great features and crap performance; MySQL with great performance and crap features, and both have been approaching each other from opposite end of the scale. Now MySQL has been piling on all the features it needs to compete with PostgreSQL, Oracle, SQL Server and so on, all at the expense of performance; while PostgreSQL has been able to spent all that time carefully optimizing the performance of their existing feature set.
For some benchmarks showing how PostgreSQL scales past MySQL on SMP boxes under heavy load, see here and here (particularly this comparison).
I'm not sure why PostgreSQL is known as being "hard to use". On Debian, installing PostgreSQL is a matter of doing "aptitude install postgresql-8.2". Maybe it's because MySQL's SQL prompt is called "mysql", whereas PostgreSQL's is called "psql", and nobody likes reading manuals. Personally I always have a hard time setting up MySQL because the root user has a separate password and you don't use "mysqladmin" to set up a new user — you have to connect as root and run "grant all privileges". I mean, huh? What does "grant" have to do with creating users?
All right, so what do I think are reasons to use PostgreSQL over MySQL? Here's my list.
PostgreSQL has an open-source spirit. It's a BSD license, which means anyone can (and does) use and customize it for commercial projects. The project is run by a foundation, whereas MySQL is a purely commerical ventrue. There are plenty of commercial players in the PostgreSQL arena.
The database kernel is stable and well-defined. MySQL has several "pluggable storages", each with its own idiosynchacies; for example, MyISAM does not support transactions or referential integrity, and each storage has different limitations on columns lengths, indexing behaviour and so on. PostgreSQL has one database architecture and focuses on implementing it really well.
PostgreSQL has modern data types. It has unlimited-length string ("text") and binary ("bytea") datatypes; MySQL has "varchar" which must be defined with an explicit length — sure, there's the "text" datatype, but it's stored as a "blob" outside the table and suffers from a bunch of typical MySQL limitations. Ever noticed how Digg headlines are always truncated mid-word? That's MySQL for you. PostgreSQL also supports date/time values with embedded time zones, as well as time intervals; the latter allows you to store something like "2 weeks" in a column, and then check whether a given timestamp is within that span. PostgreSQL also has sequences (see blow).
Sequences. MySQL has a crummy "autoincrement" type and there can be only one per table. PostgreSQL has Oracle-style sequence objects, which are like named, persistent counters. Each can have a starting count and a step. And a table can have multiple columns all fed from different sequences.
PostgreSQL rows are practically unlimited in size. MySQL has a 64KB row limit. PostgreSQL's is around 2GB. You won't care until you hit the limit, and then you'll kick yourself.
MySQL is a moving target. PostgreSQL's developers don't play loose and fast with how their database behaves. For example, MySQL has traditionally been very lax about how it treats your data — insert 100 characters into a "text(50)" column and you lose 50 characters silently. They had to invent an optional "strict" mode to solve such legacy problems, but wait! In 5.1.24, which is a minor version, strict mode is always enforced for such columns! PostgreSQL has always been careful about preserving behaviour and not surprising you.
PostgreSQL has transactional DDL. This one is really nice. So DDL is "data definition language", ie. everything that touches your schemas: "create table", "drop table" etc. are DDL statements. PostgreSQL is one of the few databases where you can execute these statements inside a transaction; MySQL can't do it, Oracle can't do it, SQL Server can't do it, DB2 can't do it. Ingres can, and that's the only other SQL database I know that can. This is incredibly useful when upgrading to new application versions because you can make sure that everything in your migration works or else is rolled back, and you can keep your application running at the same time. We always wrap our Rails migrations in transactions, the app runs live while we do it.
PostgreSQL has a sensible user account model. PostgreSQL can authenticate logins using things like PAM — in other words, when you connect it can check your password against your Linux user account. PostgreSQL has a "real" security model, instead of just a table with names and hashed passwords; to change a password you use an "alter user" command.
PostgreSQL has a plugin architecture. You can write stuff in C that is embedded into the database. People have created nice extensions with this system: PostGIS is a big one, but there are also contributed plugins for things like calculating earth distance, working with matrices, working with tree structures, etc.
PostgreSQL has PostGIS. PostGIS, which implements the OGC model, is pretty much the de facto GIS system in the open source world as well as in science. There's nothing quite like it for MySQL, Oracle, MS SQL Server or IBM DB2. Sure, they all provide geographic extensions, but they're not as nice as PostGIS. MySQL implements part of OGC and allows indexing, but again the different storages behave differently: With MyISAM you get proper R-trees as with PostgreSQL, with InnoDB you get useless B-trees which only work for exact searches (and not, say, "select everything with 20km of..."-type queries).
PostgreSQL has no legacy cruft. MySQL has tons of weird stuff that is being crudely painted over with "strict modes" or conflicting bug fixes. PostgreSQL has no ugly features that exist for legacy reasons.
PostgreSQL has hardly any known gotchas. MySQL has tons of gotchas. Lots of ugly behaviour, much of legacy, much of it intentional due to the multiple storage backends offered.
PostgreSQL has a real optimizer. PostgreSQL's cost-based optimizer uses genetic algorithms to quickly compute the best table plan for your query, and collects statistics about column value distribution to choose the right indexes and operators; for example, if a table is very small, a simple sequential scan (search table from top to bottom looking for the value) is the most efficient algorithm, but if the table is large, an index (typically a B-tree structure) should be used. But MySQL doesn't even have an optimizer. Sure, MySQL looks at whether you're using indexed columns and so on, but that's about it. It doesn't know how the result set is going to be, so it lets you provide yummy hints like "select sql_big_result ... from ..." to tell the database what it already knows.
PostgreSQL has neat index parameters. You can create functional indexes, eg. "create index ... on people (lowercase(name))" and PostgreSQL will understand when to use the index (in this case, "select ... where lowercase(name) = 'bob'") to speed up queries. You can also create conditional indexes: "create index ... on people (name) where (employer_id is null)". This also optimizes that particular query.
PostgreSQL has pluggable languages. You can create triggers or functions in Python, Ruby, etc. and they run "inside" the database. A lot of people choose to use to run much of their business logic inside the database this way; it's an acquired taste.
Of course, MySQL has a bunch of things going for it. NDB, the in-memory database storage they purchased and integrated into MySQL, is neat and could be useful for certain applications. MySQL has a less invasive way to implement partitioning that resembles Oracle's; PostgreSQL requires much more work. MySQL has updateable views; so does PostgreSQL, but you have to write the update rules manually.
MySQL also has replication built in, making it super easy to set up several read-only nodes; with PostgreSQL you have to install Slony, Bucardo or any of the other third-party products. The benefit is that there's a market for replication products that compete on quality; the downside is that they're mostly trigger-based and invasive, and not at all easy to set up.
In the end, choosing a database is not a light decision. I suggest downloading both and giving them a go. Good luck.
posted by gentle at 12:03 PM on May 24, 2008 [10 favorites]
Some background is needed. It is worth pointing out that MySQL is extremely popular, and that MySQL's popularity precedes its maturity: that is, MySQL was popular long before InnoDB, long before Oracle saw it as a serious competitor, long before it was available as part of every ISP's virtual server config. MySQL grew up with Linux: A much-hyped concept a few years back was LAMP (Linux, Apache, MySQL, PHP), a stack of free software that was easy to get up and running with as an newbie developer. In this environment, people chose things like the LAMP components because they were free and simple, whereas Windows and Oracle were not. Of course, the momentum of a crowd means that MySQL expertise is easy to come by.
PostgreSQL has lagged in popularity in part due to its reputation as a slow, difficult if reliable database; when MySQL works and does the trick, why bother? Back in the early 1990s, when 7.4 was the reigning version of PostgreSQL and the development team had barely come off the stage where they fully understood the whole codebase (PostgreSQL was inherited from Postgre95, which came from Michael Stonebraker's original Postgres project at Berkeley), this was absolutely true; sure, as a database it had a level of maturity unmatched by MySQL — real transactions, triggers, foreign key integrity, table inheritance, some interesting data types — but it was slow and required occasional manual "vacuuming".
PostgreSQL is no longer slow. The core team has since been able to make huge leaps in performance and usability, and now exceeds MySQL in its ability to scale. In fact, if you compare the history of PostgreSQL and MySQL, you will see that PostgreSQL started out with great features and crap performance; MySQL with great performance and crap features, and both have been approaching each other from opposite end of the scale. Now MySQL has been piling on all the features it needs to compete with PostgreSQL, Oracle, SQL Server and so on, all at the expense of performance; while PostgreSQL has been able to spent all that time carefully optimizing the performance of their existing feature set.
For some benchmarks showing how PostgreSQL scales past MySQL on SMP boxes under heavy load, see here and here (particularly this comparison).
I'm not sure why PostgreSQL is known as being "hard to use". On Debian, installing PostgreSQL is a matter of doing "aptitude install postgresql-8.2". Maybe it's because MySQL's SQL prompt is called "mysql", whereas PostgreSQL's is called "psql", and nobody likes reading manuals. Personally I always have a hard time setting up MySQL because the root user has a separate password and you don't use "mysqladmin" to set up a new user — you have to connect as root and run "grant all privileges". I mean, huh? What does "grant" have to do with creating users?
All right, so what do I think are reasons to use PostgreSQL over MySQL? Here's my list.
PostgreSQL has an open-source spirit. It's a BSD license, which means anyone can (and does) use and customize it for commercial projects. The project is run by a foundation, whereas MySQL is a purely commerical ventrue. There are plenty of commercial players in the PostgreSQL arena.
The database kernel is stable and well-defined. MySQL has several "pluggable storages", each with its own idiosynchacies; for example, MyISAM does not support transactions or referential integrity, and each storage has different limitations on columns lengths, indexing behaviour and so on. PostgreSQL has one database architecture and focuses on implementing it really well.
PostgreSQL has modern data types. It has unlimited-length string ("text") and binary ("bytea") datatypes; MySQL has "varchar" which must be defined with an explicit length — sure, there's the "text" datatype, but it's stored as a "blob" outside the table and suffers from a bunch of typical MySQL limitations. Ever noticed how Digg headlines are always truncated mid-word? That's MySQL for you. PostgreSQL also supports date/time values with embedded time zones, as well as time intervals; the latter allows you to store something like "2 weeks" in a column, and then check whether a given timestamp is within that span. PostgreSQL also has sequences (see blow).
Sequences. MySQL has a crummy "autoincrement" type and there can be only one per table. PostgreSQL has Oracle-style sequence objects, which are like named, persistent counters. Each can have a starting count and a step. And a table can have multiple columns all fed from different sequences.
PostgreSQL rows are practically unlimited in size. MySQL has a 64KB row limit. PostgreSQL's is around 2GB. You won't care until you hit the limit, and then you'll kick yourself.
MySQL is a moving target. PostgreSQL's developers don't play loose and fast with how their database behaves. For example, MySQL has traditionally been very lax about how it treats your data — insert 100 characters into a "text(50)" column and you lose 50 characters silently. They had to invent an optional "strict" mode to solve such legacy problems, but wait! In 5.1.24, which is a minor version, strict mode is always enforced for such columns! PostgreSQL has always been careful about preserving behaviour and not surprising you.
PostgreSQL has transactional DDL. This one is really nice. So DDL is "data definition language", ie. everything that touches your schemas: "create table", "drop table" etc. are DDL statements. PostgreSQL is one of the few databases where you can execute these statements inside a transaction; MySQL can't do it, Oracle can't do it, SQL Server can't do it, DB2 can't do it. Ingres can, and that's the only other SQL database I know that can. This is incredibly useful when upgrading to new application versions because you can make sure that everything in your migration works or else is rolled back, and you can keep your application running at the same time. We always wrap our Rails migrations in transactions, the app runs live while we do it.
PostgreSQL has a sensible user account model. PostgreSQL can authenticate logins using things like PAM — in other words, when you connect it can check your password against your Linux user account. PostgreSQL has a "real" security model, instead of just a table with names and hashed passwords; to change a password you use an "alter user" command.
PostgreSQL has a plugin architecture. You can write stuff in C that is embedded into the database. People have created nice extensions with this system: PostGIS is a big one, but there are also contributed plugins for things like calculating earth distance, working with matrices, working with tree structures, etc.
PostgreSQL has PostGIS. PostGIS, which implements the OGC model, is pretty much the de facto GIS system in the open source world as well as in science. There's nothing quite like it for MySQL, Oracle, MS SQL Server or IBM DB2. Sure, they all provide geographic extensions, but they're not as nice as PostGIS. MySQL implements part of OGC and allows indexing, but again the different storages behave differently: With MyISAM you get proper R-trees as with PostgreSQL, with InnoDB you get useless B-trees which only work for exact searches (and not, say, "select everything with 20km of..."-type queries).
PostgreSQL has no legacy cruft. MySQL has tons of weird stuff that is being crudely painted over with "strict modes" or conflicting bug fixes. PostgreSQL has no ugly features that exist for legacy reasons.
PostgreSQL has hardly any known gotchas. MySQL has tons of gotchas. Lots of ugly behaviour, much of legacy, much of it intentional due to the multiple storage backends offered.
PostgreSQL has a real optimizer. PostgreSQL's cost-based optimizer uses genetic algorithms to quickly compute the best table plan for your query, and collects statistics about column value distribution to choose the right indexes and operators; for example, if a table is very small, a simple sequential scan (search table from top to bottom looking for the value) is the most efficient algorithm, but if the table is large, an index (typically a B-tree structure) should be used. But MySQL doesn't even have an optimizer. Sure, MySQL looks at whether you're using indexed columns and so on, but that's about it. It doesn't know how the result set is going to be, so it lets you provide yummy hints like "select sql_big_result ... from ..." to tell the database what it already knows.
PostgreSQL has neat index parameters. You can create functional indexes, eg. "create index ... on people (lowercase(name))" and PostgreSQL will understand when to use the index (in this case, "select ... where lowercase(name) = 'bob'") to speed up queries. You can also create conditional indexes: "create index ... on people (name) where (employer_id is null)". This also optimizes that particular query.
PostgreSQL has pluggable languages. You can create triggers or functions in Python, Ruby, etc. and they run "inside" the database. A lot of people choose to use to run much of their business logic inside the database this way; it's an acquired taste.
Of course, MySQL has a bunch of things going for it. NDB, the in-memory database storage they purchased and integrated into MySQL, is neat and could be useful for certain applications. MySQL has a less invasive way to implement partitioning that resembles Oracle's; PostgreSQL requires much more work. MySQL has updateable views; so does PostgreSQL, but you have to write the update rules manually.
MySQL also has replication built in, making it super easy to set up several read-only nodes; with PostgreSQL you have to install Slony, Bucardo or any of the other third-party products. The benefit is that there's a market for replication products that compete on quality; the downside is that they're mostly trigger-based and invasive, and not at all easy to set up.
In the end, choosing a database is not a light decision. I suggest downloading both and giving them a go. Good luck.
posted by gentle at 12:03 PM on May 24, 2008 [10 favorites]
This thread is closed to new comments.
posted by Blazecock Pileon at 4:43 PM on May 22, 2008