Lookup Table or Lookup Tables
January 17, 2012 8:38 PM   Subscribe

Database lookup tables...one per relationship or a general lookup table?

I tend to have a single lookup table on my lightly used database applications. The table would be constructed something like this:

lookup_id (auto increment)
lookup_parent_table (table name for one half of the relationship)
lookup_parent_id (row being referenced from above table)
lookup_child_table (table name for other half of the relationship)
lookup_child_id (row for that)

Sometimes I add a couple of columns to contain information about the relationship, like order (if there is one parent and many siblings and they need to be organized).

I index it on parent_table, child_table.

This table can obviously have a lot of rows. It's nice for me because any new tables added to the database don't need their own lookup table, they can use this one, and queries are somewhat standardized. However, I can't get over the nagging feeling that I'm Doing It Wrong and every parent table/child table relationship is supposed to have its own lookup table.

What say you?
posted by maxwelton to Computers & Internet (16 answers total) 1 user marked this as a favorite
Bad. You can't do foreign key constraints this way. And, although this might not be a concern for your data size, it makes your indexes needlessly large and your queries just that much more complicated. Instead of just indexing on the ID (a simple integer I'm assuming) you're indexing on (character, integer) that that's going to be less efficient.
posted by sbutler at 8:50 PM on January 17, 2012

I think this is a bad decision in general, but before I get into precisely why, I'm curious what database system you're using and what sort of applications you're building. Are they web based multi-user or is this driving a single user Access Forms type interface? Is it realistic to assume that they will never become more heavily used or is it possible that at some point in the future usage might spike significantly?
posted by feloniousmonk at 8:55 PM on January 17, 2012

Response by poster: This is a mysql/php based CMS used by a single user for data entry, though the public-facing site gets a (very) moderate amount of traffic. Performance hasn't been an issue at the loads we see. (Doesn't mean it wouldn't die at higher loads, of course. We cache flat HTML on sites where traffic might spike.)

I figure I Did It Wrong, but am interested to learn how to improve. Obviously data entry by a single user doesn't really get into performance issues; on the public side, a page would look at the lookup table to discover which template it uses and which widgets are supposed to be on it. To do that the query would be something like

SELECT * FROM lookup WHERE lookup_parent_table='page' AND lookup_parent_id='1'

which would return an array of child tables and ids from those tables. Just typing this out I can see that I'm doing extra work here, aren't I, since I then have to query the child tables instead of doing a join (?) on the lookup and the child table, eh? Hm.
posted by maxwelton at 9:08 PM on January 17, 2012

If you have a one-to-many or many-to-one relationship then your foreign key goes in what you'd refer to as the child take and no lookup table is required.

Say you have multiple pages, each with a page ID, and then multiple comments for each page. So each page would have a row in the pages table. And in the comments table there would be a 'page_id' column. Then when you wanted the comments for a page you do:
SELECT * FROM comments WHERE page_id = 1
The only time you'll need a lookup table is when you have a many-to-many relationship. Where say a comment could appear on multiple pages, and a page can have multiple comments.
posted by sbutler at 9:15 PM on January 17, 2012 [1 favorite]

Response by poster: Widgets, for example, would be that. Page 1 could have widgets 1 & 2, Page 2 could have widget 1, Page 3 could have widgets 1 & 3, etc. I generally have been using this lookup table for those types of relationships. Comments are stored as you suggest, as an example.
posted by maxwelton at 9:22 PM on January 17, 2012

Yes, the alternative would be something like:

SELECT page.*, widget.* FROM page INNER JOIN widget ON page.PageId = widget.PageId WHERE page.PageId = 1

This allows you to index any of the FK relationships (page.PageId and widget.PageId, in this example) separately. I suspect that your current approach would result in a table scan for every query whereas this approach would be a straight bookmark lookup in the index. I haven't worked with MySQL in ages though, so I'm not certain about how its indexes function. You can verify this one way or another using mysql's explain feature.

Generally speaking, the lookup table isn't necessary in the case of a one to many relationship, which your examples are. You might use a lookup table (typically this is referred to as an association table, which is why I asked about Access, because I think it uses the lookup table terminology) in the case of many to many relationships, such as if you wanted to implement a tagging system.

As long as your caching approach is not somehow grievously flawed, the truth of it is that this sort of optimization is almost certainly not necessary. I would not approach it from the perspective of performance but rather maintenance. The 3NF approach is widespread and any DBA or developer will recognize it and be able to work with it. Additionally, as time passes and more requirements are uncovered, data integrity becomes an increasingly important issues, and not being able to implement FK relationships will result in an increasing burden on your application to maintain integrity in the face of deletes and updates.
posted by feloniousmonk at 9:22 PM on January 17, 2012

Based on your last comment, widgets are probably not the best example to use in my join, but the general concept is the same.
posted by feloniousmonk at 9:23 PM on January 17, 2012

Your query using an association table would look something like this:

SELECT page.*, widget.* FROM page INNER JOIN pageWidgets ON page.PageId = pageWidget.PageId INNER JOIN widget ON pageWidget.WidgetId = widget.WidgetId WHERE page.PageId = 1

You would have the same benefits of optimized indexes and the ability to use FKs here as well.

I should also mention that I'm not 100% on MySQL's join syntax. There may be some quirk I'm forgetting, but the above should get you started at least.
posted by feloniousmonk at 9:39 PM on January 17, 2012

Yes, every relationship is supposed to have its own lookup table. What if at some point you realize that one of your relationships has an attribute (e.g. widget 123 is on page 456 in location 3)? In your current schema you have to add that attribute to the lookup table and ignore it for other relationship types; add a few of these for different relationships and your unified lookup table will be quite a mess.
posted by nicwolff at 9:40 PM on January 17, 2012 [2 favorites]

The 3NF approach is widespread and any DBA or developer will recognize it and be able to work with it.

Not to mention that ORM libraries expect it. OP, you aren't using one now but in the future someone might want to write a rails or java app that generates something from the database and your structure makes that much harder to do.
posted by sbutler at 9:43 PM on January 17, 2012

That is a good point about ORM. I don't work with PHP, but I would suggest you investigate the crop of ORMs available to you and choose one or more to investigate in depth. I don't think there is a good argument to be made against using them in this context, so if one seems compelling, go for it, but even if you decide not to use any of them, the familiarity you gain with their general approach to data access will be influential in your own designs.
posted by feloniousmonk at 9:54 PM on January 17, 2012

This is a bad idea. To explain how bad it is, let me tell you I've been designing and developing database-driven applications for over a decade and I wasn't even sure what a "lookup table" was. Yes, your question and some of the answers provided some insight into the actual problem, but I really wanted to get a handle on this "lookup table" thing. And thanks to the SOPA/PIPA blackout protest, I couldn't use Wikipedia to figure it out. After some googling and reading this, it's even worse than I thought.

May I call you Max? Excellent. Here's the thing, Max: don't think of a single-user app without much traffic as something that shouldn't be designed well. I'm not saying you should overdesign, but don't shove a bunch of crap into a single table. Take the time to make the entities and relationships you need. If you don't, you're really limiting yourself to keeping this as a single-user app without much traffic. And it will be painful to maintain.

I agree heavily with checking out modern ORMs and getting to know their approach to data access. The design and modeling lessons you learn will be very helpful.
posted by cardioid at 10:52 PM on January 17, 2012

That is the wrong way to do it, for the reasons outlined above, but as screwy database design goes, it's not that bad. Far worse things hide in lots of production systems.

If you want, you could standardize the names of your join tables, so that they are always table1_table2_join (or whatever). That would let you use one table per join without any programmer cognitive overhead. (ORM systems usually have a naming scheme like this, because they hide the join table from the programmer.) On the other hand, I almost always find that a many-to-many relationship wants to be a pair of one-to-many relationships, with the hidden join table turning into a full-fledged object. Eg, instead of pages and widgets, I end up with pages, widgets, and widgetInstantiations. From SQL's point of view, there's not a difference, but from mine, it helps to give a name to the join.

One thing your combined lookup table does get you is being able to look up all the relationships of an object in one query, instead of having to look them all up individually. But that's a performance hack more than anything else, and you should only do it if you actually measurably need it.
posted by hattifattener at 11:18 PM on January 17, 2012

Response by poster: Thanks everyone for your insight. To be clear, I don't think the way I did it was a good solution, but since I'm largely self-taught by hacking around, it's what I came up with when fooling around a couple of years ago, and it's not been pressing at me to change. However, I'm building a new application, such as it is, and thought I'd see how it was really done.

This has been a very good look into what I should be thinking about. And to be clear, no one is paying me huge bucks for my DB "knowledge," it's mostly personal stuff.

And apologies for using the term "lookup," not sure where I got it lodged in my brain, I'm not an access guy. In fact, my table is actually named "assoc," but somehow thought "lookup" was clearer. Amateur in the house, obviously.
posted by maxwelton at 2:23 AM on January 18, 2012

I'm largely self-taught by hacking around

This is a way to get your feet wet in databases and there's nothing wrong with that, but when it comes time to design a new database and you want to design it like the pros do, you need a better background. Buy a book on database design for beginners. The lights will come on, the landscape will be magnificent, and you will work faster as your understanding and confidence grow.
posted by exphysicist345 at 9:47 AM on January 18, 2012

I recommend The Art of SQL if you're looking for a book.
posted by feloniousmonk at 12:24 PM on January 18, 2012

« Older Can you translate this Chinese character?   |   Closed Embassy Newer »
This thread is closed to new comments.