Better than vi versus emacs!
April 2, 2009 9:23 AM Subscribe

Please help settle a bet: Is a spreadsheet a simple type of database, or are they fundamentally different things?

A friend emailed me today with this:

i am having a spirited discussion about what is a database and what is a spreadsheet. i claim that a spreadsheet is a simple non-relational database. i need an unimpeachable source. do you know of any? thanks.

As his resident computer geek, I have an opinion, but I wouldn't say its a authoritative. So please help my buddy out. Links to relevant web pages are welcome (I couldn't find any that were spot on), as well as informed opinions.

posted by perrce to Computers & Internet (37 answers total) 6 users marked this as a favorite

No, define spreadsheet first!
posted by spaceman_spiff at 9:27 AM on April 2, 2009

Semantic game dependent on your definition of "database". Especially when your friend slips in "non-relational". Yeah, if it's non-relational, I suppose a Comma Separated Value file is a database too, given that there's a trivial transform from spreadsheet to CSV. So yeah, it a database is just n-tuple after n-tuple.... But it's not, because an RDBMS is relational.
posted by orthogonality at 9:28 AM on April 2, 2009

No, they're fundamentally different. A spreadsheet is a collection of rows and columns that don't preserve data like a record in a database. If you sort a column in a database, the other columns sort appropriately and stay attached to the values in the sorted column. If you sort a column in a spreadsheet, your data is destroyed.
posted by The Michael The at 9:30 AM on April 2, 2009 [2 favorites]

orthogonality: Ah, but what is a relation? Wikipedia Relation: "In the relational model of databases, a set of tuples (also called rows), otherwise known as a table." Doesn't that sound like a spreadsheet to you?
posted by themel at 9:32 AM on April 2, 2009

I would say it meets the definition of a flat-file database. Most of your modern spreadsheet applications use XML, which is still a flat file.
posted by Lyn Never at 9:33 AM on April 2, 2009 [1 favorite]

They're different things, for sure, but a spreadsheet could be used as a database. Like a wall isn't a notebook, but you can write on it.
posted by ae4rv at 9:33 AM on April 2, 2009 [4 favorites]

I would say no. I would say that a non-relational database would still have records/rows. You don't have distinct "records" in a spreadsheet that you adhere to. I'd say that a spreadsheet could act as a non-relational database, but a text file could also.
posted by wongcorgi at 9:34 AM on April 2, 2009

A flat-file database is a subset of both databases and spreadsheets; that both can be reduced to an equivalent form doesn't mean that they're the same. For example, look at what Excel does with PivotTables. It's neither typical db functionality nor is it really report writing. It's something else completely.

In a venn diagram, the db bubble is bigger and the spreadsheet & db bubbles overlap about 75%.
posted by GuyZero at 9:37 AM on April 2, 2009

Like orthogonality says, it really depends on how you define the terms. You can certainly come up with an expansive definition of "database" that would include a spreadsheet (and many other things), but that would not accurately describe what most technically competent people mean when they use the word "database."

So your friend will never prove his case since this is essentially a semantic issue. If he insists, people are going to roll their eyes and perhaps think he is ignorant or stupid.
posted by grouse at 9:40 AM on April 2, 2009 [1 favorite]

As a practical matter, non-IT office workers will often use a spreadsheet as a simple database (and a single-user PC database app to run a database for an entire department, and so on). So regardless of whether you think a spreadsheet is a database or not in some Platonic sense, it's a database if a user uses it like one.
posted by metaquarry at 9:43 AM on April 2, 2009 [4 favorites]

The answer entirely depends upon how the data in the spreadsheet is structured. If it's laid out as one record per row, with separate columns for the fields in the record, then it is a single-table database.
posted by wheat at 9:44 AM on April 2, 2009

Whether a spreadsheet has fixed rows or records is a matter of interpretation rather than a property of the format. In principle it's possible to sort a column of a spreadsheet and destroy the rows. In principle it's possible to destroy the records of a database, too. That means nothing.

(Lots of people use Excel spreadsheets to do linear algebra stuff. The answers change when you swap around numbers in a matrix, especially if you are not preserving rows. So spreadsheets must be perfectly good for cases where you need to preserve rows / records.)
posted by grobstein at 9:45 AM on April 2, 2009

Metaquarry makes my point better. Whether a given spreadsheet is a database is a matter of interpretation. It frequently is one. Therefore, it makes sense to say that spreadsheet is a database format.
posted by grobstein at 9:46 AM on April 2, 2009

If you (simplistically) define a database as something that can be accessed via ODBC, then Excel spreadsheets can qualify.
posted by metaquarry at 9:48 AM on April 2, 2009

Depending on your definition of "database," a flat file could be one, too.

In the not-too-distant past, and still on many mainframe systems, data is stored in what are basically flat text files, where each line is a "record", and specific character positions within the line are designated for particular fields. E.g., you might have a fixed record length (line length) of 133 characters, and you designate the first 10 positions for account number, the next 20 for customer name, etc. This was very common in the COBOL world.

There's really no difference between a rigidly-constructed flat file and a spreadsheet; you can perform many of the same tasks with them. (And most spreadsheet programs allow you to trivially import fixed-record-length files.)

However I would not call a flat file constructed in this way "a database." To do so makes the word "database" meaningless. You could construct a database using files like this as a data store, but really, "database" ought to refer to how the data is made available to the user. I.e., can you query it, perform calculations, return records/rows that match certain criteria but not others, etc.

So, I think whether something is a database or not is totally separate from how the data is stored; you could build a database that used Excel sheets as a data store, or you could use fixed-width flat files, or you could use CSV, or any of the more modern storage formats actually used by RDBMSes — the difference is in the layers above that.

The Excel program (separate from Excel files) has some database-like features, but I'd argue that's not really its primary focus and that at any rate, it's a poor database when it tries to act like one, and best not used that way even if you can shoehorn it into the definition, or vice versa.
posted by Kadin2048 at 9:53 AM on April 2, 2009

When most programmers say "database" they mean "relational database," which is specialized and which most spreadsheets are not. But that doesn't mean that the meaning of "database" has changed; it means, essentially, a collection of information (data).

I have text files (CSV) that I use as databases. They work fine. In a way, they become databases because of the programming I do to read and process the contents of a file.

Some programmers scoff at the idea that MS Access is a database program. Of course it is, but there are other properties they imply when they say "database" that MS Access does not possess.

Here's the Wikipedia definition of database:

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.

This definitely includes my text file, although it excludes the stack of hand-written paper forms or even punch cards that my old file structures prof used as examples of primitive databases.

The first dictionary.com version is broader (it says it's sourced from the Random House dictionary):

a comprehensive collection of related data organized for convenient access, generally in a computer.

Context is all.
posted by amtho at 9:56 AM on April 2, 2009

I'm with wongcorgi on this. Though it is obviously confused by the fact that Excel is increasingly set up to play the part of a database, even though this was not its original function.
posted by desuetude at 9:57 AM on April 2, 2009

themel writes "orthogonality: Ah, but what is a relation? Wikipedia Relation: 'In the relational model of databases, a set of tuples (also called rows), otherwise known as a table.' Doesn't that sound like a spreadsheet to you?"

Well, no, because in an RDBMS, I'd refer to a row in another table by its key, in a spreadsheet i can only refer to it by an offset ("Row 3, column 4) (which implies a certain ordering).
posted by orthogonality at 9:59 AM on April 2, 2009

I'd say: a spreadsheet is not necessarily a database, but it can be used as one. It may not be the best tool for the job, though.

It's like asking whether an ice pick is a weapon, or if a patch of earth is a garden. It all depends on what you do with it.
posted by amtho at 10:08 AM on April 2, 2009 [1 favorite]

With a broad definition, a phonebook is a database. A piece of paper is a database.

My idea of a database includes both "relational" as a concept, and the ability to run SQL queries against it.

Ask your friend if he or she, upon purchasing a database package, when he or she opened the box, would he or she be annoyed to find a paper ledger book? If he or she would be annoyed, then they're just arguing to argue. If annoyance does not occur, I recommend that they stay out of the software purchasing department.
posted by adipocere at 10:18 AM on April 2, 2009 [3 favorites]

The variety of different types of software that the word "database" refers to is quite wide. There are the hierarchical mainframe databases that Kadin2048 mentions, where the word originally came from, that didn't include SQL interpreters and may not have been "relational". Modern "XML database" and "object database" systems similarly have a hierarchical rather than an orthogonal structure internally and may or may not include SQL interpreters. Then there's the concept of a "flat-file database" as Lyn Never mentions: not only can various applications and programming languages use a simple file as a database, with utilities such as Microsoft LogParser you can run simple SQL queries against a CSV or tab-separated file!

And of course the "desktop database" like FileMaker or Microsoft Access is actually a presentation UI and application development platform as much as it is a data store.

So my verdict is that by all means, the concept of a database has been sufficiently poorly defined by the computing community that it embraces spreadsheets.
posted by XMLicious at 10:46 AM on April 2, 2009

Is a spreadsheet a simple type of database, or are they fundamentally different things?

Yes. Not either or, but both.
posted by caddis at 10:53 AM on April 2, 2009

orthogonality: Well, no, because in an RDBMS, I'd refer to a row in another table by its key, in a spreadsheet i can only refer to it by an offset ("Row 3, column 4) (which implies a certain ordering).

... but a RDBMS is not the same thing as a RDB. Anyone who knows what they're doing will agree that a spreadsheet is an awful management system for a relational database, but that doesn't mean you can't use it as one; the way you refer to a given value isn't essential to the nature of relational databases.

This is the mistake the poster's question makes. A spreadsheet is emphatically not a database; neither, technically, is an SQL server, or the SQL language. These are all parts of a database management system; but the difference between the management system and the database itself is not trivial. When a person makes the decision to port all of their information from an Excel spreadsheet to a PostgreSQL server, their database does not change (at least not on purpose); their database management system changes.

As far as definitions of 'database' and 'relational database' that havw any authority goes, personally I'd turn to Stéphane Faroult, who says on the second page of The Art of SQL that

A database is nothing but a model of a small part of a real-life situation. As any representation, a database is always an imperfect model, and a very narrow depiction of a rich and complex reality. There is rarely a single way to represent some business activity, but rather several variants that in a technical sense will be semantically correct. However, for a given set of processes to apply, there is usually one representation that best meets the business requirement.

The relational model is thus named, not because you can relate tables to one another (a popular misconception), but as a reference to the relationships between the columns in a table. These are the relationships that give the model its name; iin other words, relational means that if several values belong to the same row in a table, they are related. The way columns are related to each other defines a relation, and a relation is a table (more exactly, a table represents one relation).

In other words, a spreadsheet, which is really just a collection of value-containing cells organized into a grid, can contain something other than a database (I've made, for example, order forms in Excel spreadsheets) - or it can contain a database if it presents a model, however incomplete, of real-world values. It can even contain a relational database; as long as all columns and all rows each separately have common relationships amongst them, the database is a relational database.

The value of different management systems for databases isn't insignificant, however. A spreadsheet is a database management system about three steps above a sheet of graph paper; not really very good in terms of flexibility or stability. An SQL server allows the user to access data and to store it in a more rational and cohesive manner. There are even some people who make the argument that SQL has some serious problems in relational database modeling.
posted by koeselitz at 11:01 AM on April 2, 2009 [3 favorites]

Echoing koeselitz above, in a previous life I got paid to put together an inventory management system in ACIUS 4D. It was sweet, but I could have done the same thing in Excel.

If it's got organized data, it's a database.
posted by mrt at 11:26 AM on April 2, 2009

I disagree with the response you marked as 'best' by adipocere. There are plenty of databases that aren't relational, and databases existed long before SQL existed. So saying that it's only a database if it's relational and support SQL sets a pretty arbitrary standard.

I also think the claim that "a piece of paper is a database" is a bit much. A better example would be an address book application. Is that a database? Sure. It has records which contain fields. You can search for records that have particular values in particular fields. You can sort it. To me that makes it a simple, special-purpose database.

Similarly, there are specialized apps for tracking recipes or wine. These are often described as databases.

Given that broader definition, I would say that a spreadsheet can be used for implementing relatively simple databases. That doesn't mean a spreadsheet is a database, any more than a computer is a database. A spreadsheet (like a computer) can many things other than implement database applications. But it can be used to create databases.

The key point, though, is that many people use spreadsheets as simple databases. That's because spreadsheets are easy to use, and people know how to use them, and full-featured databases are harder to use and fewer people know how to use them. So very very many people, when they want to keep track of a list of records with fields, will do so by creating a spreadsheet. That's a simple database they've created.
posted by alms at 12:00 PM on April 2, 2009

A lot of the above posts seem to be focusing on only relational database counting as real databases, but there is a strong trend in favor of non relational databases that offer various advantages, such as CouchDB and SimpleDB.

For those unfamiliar, check out the SimpleDB blurb:

Today, many developers correlate the word “database” with Relational Database Management Systems (RDBMS). While RDBMS offerings provide deep functionality, for many use cases, they introduce more complexity (and more cost) than is necessary. Many developers simply want to store, process, and query their data without worrying about managing schemas, maintaining indexes, tuning performance or scaling access to their data. Amazon SimpleDB removes the need to maintain a schema, while your attributes are automatically indexed to provide fast real-time lookup and querying capabilities. This flexibility minimizes the performance tuning required as the demands for your data increase.
posted by avex at 12:00 PM on April 2, 2009

MSPaint, it's a database! I just can't remember if I have 600x800 tables with three columns and one row each or 800 single column tables each with 1800 rows.

Nethack, it's a database! I just arrange dead slimes to keep track of my uncle's finances.

Onion seeds, why, just as useful as a spreadsheet, I plant them in my garden, latitude represents the date and longitude represents quarterly US steel production... oh, never mind, you don't care.
posted by bdc34 at 1:01 PM on April 2, 2009 [1 favorite]

"latitude represents the date and longitude represents quarterly US steel production"

I laughed.
posted by ae4rv at 1:46 PM on April 2, 2009

bdc34: Onion seeds, why, just as useful as a spreadsheet, I plant them in my garden, latitude represents the date and longitude represents quarterly US steel production... oh, never mind, you don't care.

Well, I don't know exactly what you're getting at, but the point is that building a successful database is about two things:

(1) choosing the right tool

and

(2) having the right understanding of the database itself.

The first one is about not doing any real sizeable database in Excel. (Or with onion seeds.) But (and it took me a while of learning to realize this) the second is a hell of a lot more important; it really doesn't matter what system you're using to set up the database if you haven't concieved of the data correctly. Most of the SQL databases I've built so far are atrocious; I don't think I've even really gotten to the second normal form yet.

Normalizing data makes the databae more predictable and reliable by eliminating repetition and modeling it more closely on the real-life relationships it's trying to describe. Spreadsheets make crappy database management systems precisely because it's completely impractical to try to normalize them even to first normal form, which means making it possible to update the database without editing several different fields (so that you don't have a chance to screw it up), making sure there's room to put in all the data that you might need, and getting rid of null fields; there are ways to reference cells, and these ways work relatively well, but pretty soon any reasonable complex database becomes impossible to maintain on a spreadsheet.

However, it's possible (and even very easy) to set up an SQL database that's so bad that it may as well be made out of onion seeds. All you have to do is ignore the data structure completely.
posted by koeselitz at 2:22 PM on April 2, 2009

Can you do transactions and rollback in Excel? If not, it isn't a database. At least, not a "real" database. In another sense though, a flat file or a scrap of paper would qualify as a database.
posted by chairface at 3:56 PM on April 2, 2009

A spreadsheet isn't a database because there is no data structure. Any cell can be any thing, and further, any cell can be either data or an operator or a result. A spreadsheet is more of a programming language than a database.

A database stores data, a spreadsheet analyzes it.
posted by gjc at 4:23 PM on April 2, 2009

Just to throw my own two cents into the mix…

If you can have titles and headings and columns that are right-justified and cells merged with other cells and fonts and SUM columns that just have a line and then the word "Sum" on its own and then the actual sum right next to it, completely intertwined with raw data… well, that's a spreadsheet. A database doesn't have any cutesy-sections where you can put a little chart or something. A database is rows of similar things organized by columns of similar things.

I suppose if you left the cutesy out of the spreadsheet, you could use it as a database. But the two are not synonymous.
posted by Civil_Disobedient at 4:44 PM on April 2, 2009

At the risk of redundancy, the reason there is no agreement in this thread is that the term "database" has different meanings in different contexts.

Historical: Since the 1960s, databases (originally "data base") have evolved from flat files in file systems to some of the most advanced pieces of software available. Thus, depending on when in history you're talking about, a database might provide none or all of the guarantees of modern databases.

Cultural: People often say "database" when they mean a dataset that could be organized in any number of ways.

Marketing: Marketers have destroyed the term database in a myriad number of ways. Software programs might have "databases" where they store their data, Oracle (or IBM or whoever else) are happy to sell you whatever you need as a database, and so on.

Academic: Academics can sometimes be happy enough to call anything a database. A recent text that I have says that a database is "nothing more than a collection of information that exists over a long period of time." Academics might also equate a "database" with a Database Management System (DBMS), which generally guarantees persistent storage, has a data model, has a language for interfacing with the data model, and so on.

Database Researcher: Database researchers presenting or reviewing for conferences like SIGMOD, VLDB, and ICDE generally take an abstract view of what a database is. They also might treat things like stream systems (where a stream of data goes by and SQL-like languages filter the data) and XML flat files as databases.

Software Engineer: It is hard to think of a context when a programmer who asks for a license to a database product would be happy if you gave him a copy of Excel.

Relational Database DBA: A DBA probably would assume that any database that they would use would be ACID compliant, implement most of the SQL89 and SQL92 standards, have support for standard interfaces, and so on.

For most of the above contexts, a spreadsheet could easily be considered a "database," though by the last two more operational contexts, it clearly wouldn't be.

Anyway, I think the losers of this bet are all of us.
posted by pbh at 6:11 PM on April 2, 2009 [1 favorite]

This thread is so geekalicious, with all kinds of nerds trying to define a database as having to be relational. Perhaps a quality database is, but the dictionary definition is far, far broader, something like a usually large collection of data organized especially for rapid search and retrieval (as by a computer) and by such a spreadsheet easily qualifies.
posted by caddis at 7:24 PM on April 2, 2009

and by that last comment I mean that the first answer in the thread was the best and the rest of us are just making noise
posted by caddis at 7:25 PM on April 2, 2009

My idea of a database includes both "relational" as a concept, and the ability to run SQL queries against it.

Um, no. This is rather like saying that a computer language must be object-oriented and run arbitrary Java code. The relational model is only one way in which one can structure a database. Some of the other ways are:
* Basic key-value databases (good old Berkeley DB, which is still developed by Oracle)
* Hierarchal databases (XML, and XPATH, which is good when you have arbitrarily branching trees)
* Object-Oriented databases (db4o and a layer on top of Berkely DB are examples)
posted by KirkJobSluder at 7:54 AM on April 3, 2009 [1 favorite]

Yeah Kirk Job.

What about BerkeleyDB.

WHAT ABOUT DBD::CSV which is an interface for Perl's databasing module to use CSVs AS databases. Granted they don't support fancy character encoding or any flavor-dependent functions, BUT, you still use vanilla SQL '99 to talk to it when you're writing queries. (part of Perl5's database-agnostic methodology).

What about relational XML's AS databases. Microsoft Excel *already* supports this.
posted by judge.mentok.the.mindtaker at 8:18 AM on April 3, 2009

« Older How to attract people to blogs | hdtv picture dim Newer »

This thread is closed to new comments.

Ask MetaFilter

Better than vi versus emacs!
April 2, 2009 9:23 AM Subscribe

Tags

Share

Better than vi versus emacs! April 2, 2009 9:23 AM Subscribe

Tags

Share

Better than vi versus emacs!
April 2, 2009 9:23 AM Subscribe