Do relational databases always require a static structure?
January 20, 2005 2:03 PM   Subscribe

Do relational databases always require a static structure? [+]

I always thought that they had to be rigidly formed, but now I am working with a programmer trying to design a database, and he is proposing the construction of data tables with varying numbers of columns (dependant on product.)

It really doesn't make much sense to me, (and goes against everything I've ever been taught about db modeling, but wtfdik?) but I have to document the design for a presentation. The whole thing sounds like XML to me - but I can't SAY that - around here "XML" is like some sort of profanity.........*grumble*....
posted by lilboo to Technology (25 answers total)
 
Strictly, there's no reason why not, as long as you don't start dropping fields being used as relations. Of course, the proper way to do it is to normalize your database enough that you don't need dynamic tables. Separate each type of dependent data into its own table and then key it back to the other items that use it, a la 3NF.
posted by wackybrit at 2:19 PM on January 20, 2005


Response by poster: Yeah, we are trying to keep the number of tables down, but we need to update the thing on a weekly basis - plus other people have to program off of it. It just seems risky to me.
posted by lilboo at 2:28 PM on January 20, 2005


proposing the construction of data tables with varying numbers of columns (dependant on product.)

My instincts say that this is something that should be done in code once data has been fetched from the DB, but maybe your co-worker has a good reason for wanting it this way?
posted by falconred at 2:28 PM on January 20, 2005


wackybrit is right on target. Normalize the database properly and you won't have to worry about "dynamic" columns. Also, don't let anybody tell you that "binary xml" is the solution to your problem. Binary XML is never the answer.
posted by idlemind at 2:32 PM on January 20, 2005


Having dynamic columns makes it more like a glorified Excel spreadsheet with a database engine.

The power in a relational database are the relationships derived from normalization (surprise, surprise). On the other side, having a flat file "sheet" is wonderful for doing anyalytics, but less useful for data management. In other words, it's easy in a database to go from tables to columns with a join. Going the other way around is data management hell.

I often see database models reflect the user interface and vice versa. They're mutally exclusive. That's the beauty. You can make managing data easy with a database and transform it into a friendly tabular format.

In my experience, XML is great for moving and passing data, but less useful when it comes to managing document internals.
posted by pedantic at 2:37 PM on January 20, 2005


Response by poster: Excellent points here. Sounds like I've got to take this to the data architect - well, it looks like I'm off to start inter-departmental WWIII....K
posted by lilboo at 2:45 PM on January 20, 2005


lilboo, that sounds scary and wrong. Is there an actual reason for this design? Sometimes people deliberately choose the scary, wrong way for a compelling reason, but I don't hear a compelling reason here. Wackybrit is right: normalisation is the way to solve the problem of products with varying attributes.

Crude example with four tables:
- Products (has product ID, other consistent data like names).
Eg 001 "Spleenco Lawnmower"
- Product Attributes (relates products to product attributes)
Eg 001 789
- Attributes (attribute id, type_id, value)
Eg 789 456 "Briggs and Stratton 50cc 4 stroke"
- Attribute Types (here's where you go nuts with product -specific shit)
Eg 456 engine-type

Anyone who specialises in this could come up with something better, I just bump into RDBMSs a lot.
posted by i_am_joe's_spleen at 2:51 PM on January 20, 2005


As far as I know you are right - there is no such thing as a database table where columns come and go from record to record. So even if this was a good idea (and it isn't), I don't know how you'd implement it without writing your own new database server.

Lucky for you there is a model for this, and here it is:

So you have a type of record (I'll call them "items"), and individual records can have different data fields (I'll call them "attributes").

Three tables:
table ITEMS has fields used by all items - say ITEM_ID and ITEM_NAME.

table ATTRIBUTE_TYPES has fields ATTRIBUTE_TYPE_ID and ATTRIBUTE_NAME. This table has one row for each type of attribute that your items can have. This is what's called a domain table.

table ITEM_ATTRIBUTES has fields ITEM_ID, ATTRIBUTE_ID and ATTRIBUTE_VALUE. This is where attributes are connected to items.

Each ITEM has a variable number of rows related to it in the ITEM_ATTRIBUTES table.

So let's say you have one item that uses an attribute called "color", and the value of that attribute is "green". Here's how that would be expressed in this data model:

one row in ATTRIBUTE_TYPES:
attribute_id: 5
attribute_name: color

one row in ITEMS:
item_id: 1
item_name: shirt

one row in ITEM_ATTRIBUTES:
item_id: 1
attribute_id: 5
attribute_value: green

Now you know more than your programmer about data modeling. Congratulations!

On preview: pretty much what joe's spleen said - but you know, I couldn't bear to trash this comment after spending all this time on it.










I
posted by newton at 2:59 PM on January 20, 2005


I can't think of a relational database that allows you a variable number of columns for each row. The nearest construct to this would be the use of binary or free-form text as a column. This column can be of any length, and could (for example) contain its own set of variable item data.

Solutions which use variable length fields to hold multiple items of data are a BAD IDEA. Don't do it unless you have to.

It's possible, however that your programmer is talking about a situation where the table has a fixed number of columns, and some of these columns don't contain any information. A simple example of this would be a user table. Some users may have a mobile number, and some may not. The field for unmobiled users would be null or blank. This usage of unused fields is acceptable.

If there are a large number of associated fields (mobile number, mobile serial #, mobile SIM #, mobile supplier, network, etc), and all of these are either blank or filled in then it's generally better to place those details in a separate table.

A handy rule of thumb would be to visualise your table as a spreadsheet. If there is a lot of blank space, then your design is probably wrong.

Also, because you're basically talking about columns with null values, it's worth noting that there is an academic argument that tables containing Null Values are not properly normalised. (I think we're talking about 4NF here, but I may be wrong). Academic arguments don't pay the bills though, and most people accept that sometimes your table has to contain fields which are not used.

b.t.w Whatever the SlashDot crowd may tell you, XML is not a good format for databases.

Finally. A more concrete example of what your programmer is suggesting would be good. What is the table format he is suggesting.

Gut feeling. What your programmer is suggesting is wrong. Redesign.
posted by seanyboy at 3:20 PM on January 20, 2005


Second (3rd, 4th, etc.) what everybody else said. I have no idea what planet this so-called "programmer" is from. Is it possible you misunderstood? Some of us programmers have poorer linguistic skills and have a hard time making themselves clear.
posted by matildaben at 3:32 PM on January 20, 2005


Are they actually varied numbers of columns? Or varied definitions? I've met more than a few databases where column A is the product description, and column B is the number of wheels, unless the product description is 'FOO' in which case column B is actually the colour of the sky aobve the lab where they make DB2.
posted by jacquilynne at 3:41 PM on January 20, 2005


A single table with varying numbers of columns is... shall we say, not very smart. Not to mention physically impossible to produce (as a single table) within any traditional RDBMS that I'm familiar with (Oracle, SQL Server, MySql,and Postgre). However, there is a simple construct that can be built, using well formed tables and proper normalization, that can simulate a single table, with varying columns per row.

Example:

__ItemType__
ItemTypeID

__Item__
ItemID
ItemTypeID
Name
Description
Blah1
Blah2
Blah3

__Widget__
WidgetID
ItemID
WidgetBlah1
WidgetBlah2

__Sprocket__
SprocketID
ItemID
SprocketBlah1
SprocketBlah2
SprocketBlah3
SprocketBlah4

Then you create a view, using a union of all widgets and sprockets. The shape of the view will look like this:

__ItemsView__
ItemID
ItemTypeID
WidgetID
SprocketID
WidgetBlah1
WidgetBlah2
SprocketBlah1
SprocketBlah2
SprocketBlah3
SprocketBlah4

When you query the view, all widgets will simply have NULL values in the sprocket columns, and vice versa for the sprockets.

The advantage is that, theoretically, you save space as you don't have lots of blank or NULL columns for each distinct type of item. The trade-off is that it becomes computationally intesive to retrieve the data because you're going through a view, which (under normal circumstances) just executes the union query each time you access it.

1) Saving space is not normally a consideration these days. Space is CHEAP. Performance is expensive.

2) You can create the union view as a materialized view (at least that's the Oracle term) which basically saves the state of the query to a table, which it automagically manages, thus eliminating the performance bottleneck. But at the same time, eliminating your space savings, so why even bother going down this road in the first place?

This is just a long winded blathering that says much of the same as above. Though I hope you can use it as ammunition, if you need to.

Good luck.

PS. Try to get your developer to ensure that the database reflects the best shape to store the data. It is the data access layer's responsibility to reshape the data so that it meets the application's needs. Not an easy task, but it pays off in the long run if you plan to keep the application and/or database around for a long time.
posted by C.Batt at 3:58 PM on January 20, 2005


Not to mention physically impossible to produce (as a single table) within any traditional RDBMS that I'm familiar with

Not so.
The extra attributes could be formatted and placed in a text/notes field. You could even use XML for this, and (with the correct syntax) you could use word indexing to search for specific attributes.

For example, the example you gave (containing multiple widget and sprocket details) is probably held in one field on the metafilter database.
posted by seanyboy at 4:22 PM on January 20, 2005


Strictly speaking, the reason a database schema should not be dynamic is that there is no reason it has to be. Any data structure that would use a variable number of columns (including key/value associations) can be adequately represented in a normalized database. Perhaps the data design work is a bit harder than pretending the RDBMS is a spreadsheet, but a design that includes "dynamic schema" is flawed from the get go. Both in Ivory Tower theory and in Down and Dirty practice, this is a bad idea and will cost you performance, maintainability, and potentially reliability of the application.

Whatever this guy is trying to pull, he's outright wrong. When told this, if your programmer insists he is correct, fire him, because he is incompetent, too inexperienced for the work, or malicious. Give him a chance to correct the flawed design, but under no circumstances should you accept it as is.
posted by majick at 4:47 PM on January 20, 2005


One other possibility - your database is not relational, but something old-skool and exciting, like Pick, in which case your programmer's suggestion may be entirely conventional for that environment.
posted by i_am_joe's_spleen at 6:31 PM on January 20, 2005


he is proposing the construction of data tables with varying numbers of columns

This is a recipe for heartbreak. Don't do this. Even in an object-relational database, don't do this. I suggest that your "programmer" is ignorant and inexperienced, and I mean that in the kindest possible way.
posted by SPrintF at 6:44 PM on January 20, 2005


In Pick, which these days bills itself as "extended relational" and "multi-dimensional" you can pack an arbitrary number of sub-fields into a column, and this is normal and expected (and very, very wrong from a relational POV).

lilboo hasn't told us much about their environment so it's just possible littleboo's working in a shop with a legacy -R- DBMS.
posted by i_am_joe's_spleen at 6:57 PM on January 20, 2005


Response by poster: Well luckily for me, I'm just assisting here, and it will have to undergo a pretty rigorous peer review process, so I am not too worried about this thing going into production "as-is." And joe's_spleen is pretty much on target; the programmer in question is a real-old-school type, who is more than a little in love with using the (much) older technology, and (IMO) making things just a little more complicated than they have to be, for kicks.
posted by lilboo at 7:35 PM on January 20, 2005


Not so.
The extra attributes could be formatted and placed in a text/notes field. You could even use XML for this, and (with the correct syntax) you could use word indexing to search for specific attributes.

For example, the example you gave (containing multiple widget and sprocket details) is probably held in one field on the metafilter database.
I know that at this point, all I'm being is pedantic by addressing this tangent, but... I'm a pedant.

In the situation you describe, I'm very sure that you are not actually defining a table with variable sized rows, though it does appear that way conceptually. When you define a large object (CLOB, BLOB, TEXT, etc...) field you are actually telling the RDBMS to maintain a list of blocks/chunks (4k, 16k, etc...) that will hold an arbitrarily sized bit of data. These chunks are stored in a separate space from the rows, and each row only contains a pointer to the LOB, not the LOB itself (perhaps in some DBs this is not the case; I'm speaking from Oracle and SQL Server experience here).

The data in the LOB can have embeded meta-data, but the meta-data will not be usable by an ANSI SQL statement. Yes it can be searched using other methods but it doesn't change the fact that the table is still, essentially, a rectangular shape with each row being the same length (aside from the variable length large object columns, which are acknowledge as special cases).

Whatever the case, it amounts to just another kludge/trick, much like the one I outlined in my previous statement.

BTW, not arguing, just... picking nits. Feel free to bash away at my knowledge.

Now, with Oracle 10 (and perhaps other DBs) you can combine Regular Expressions with your SQL statement so that you can parse the sub-data in the large objects during query execution. This is quite cool and does open up some interesting possibilities, but I think that it's more of a hack that encourages lazy design. Time will tell of course.
posted by C.Batt at 9:10 PM on January 20, 2005


the programmer in question is a real-old-school type

Still, if I had to guess as to whether he's attempting normal use of an unusual product, or attempting unusual use of a normal product, I'd guess the latter. And like the others, I can't emphasize strongly enough how desirable it is to normalize your data... it will pay for itself over-and-over down the road if you ever need to alter the kinds of things you represent in the database.

(Or D, mush your links info into a custom text-based format in a single field and parse it with regex).

I believe the SQL specification actually defines LIKE conditions to help with regex and allowing fields with internal structure, so C.Batt's suggestion isn't a total kludge.

Another way to get around normal normalized (hah!) design, in Postgresql at least (and I assume Oracle too) is through array fields. If you're storing user profiles, and want to let record their favorite links A) Create an ugly table with fields, say, cool_link1 through cool_link5, and leave the ones the user doesn't need NULL, B) Create a profile table and a seperate links table that lets links refer to the profile they belong to (thus allowing any number of links per profile, cleanly), or C) put a varchar array field in the profile (allowing any number of links per profile, not particularly cleanly, because you're limited to storing very simple data).

B is still best of course, but C is another technique to be aware of.
posted by gsteff at 11:09 PM on January 20, 2005


(Or D, mush your links info into a custom text-based format in a single field and parse it with regex. B is still best).
posted by gsteff at 11:10 PM on January 20, 2005


Doh.
posted by gsteff at 11:11 PM on January 20, 2005


That programmer doesn't seem to be a very good database designer, judging by what you said.

You really should do what wackybrit said in the very beginning. All the other proposed solutions are just hacks, unless you have a small fixed number of these fields (say, less than 10) and can afford to have most of them null. Dumping all the attributes into a varchar is particularly ugly in my opinion.
posted by azazello at 12:12 PM on January 21, 2005


Also, with rare exceptions, dynamically modified table structure is evidence of poor database design. For one thing, it renders your indexes useless, killing performance. Also, it leads to the necessity of a much uglier client side than it needs to be.
posted by azazello at 12:16 PM on January 21, 2005


(client side as in the code querying/modifying the database)
posted by azazello at 12:17 PM on January 21, 2005


« Older graduate student searching for grant money to...   |   Tranny Dating Newer »
This thread is closed to new comments.