What's the best way to store hierarchical data?
September 6, 2006 12:48 PM   Subscribe

What's the best way to store hierarchical information in a database, and what's the best way to retrieve it?

I'm programming a web application for planning events. The data will be stored in a SQL Server database and I'm using ColdFusion for the application.

Events can belong to one or more categories and subcategories. They don't have to have a subcategory. So, I'll have at least one events table, one categories table, and a table for linking the events and categories. What's the best way to record the relationships of subcategories to their parent categories? I found this SitePoint article, but the Modified Preorder Tree Traversal it recommends is confusing.

Finally, I'll need to display the events in a category that aren't assigned to a category, then display the events that belong to the subcategory. What's the best way to do that?

In an earlier version of the application, the categories and subcategories were all in the same table. Each record had an ID, and the categories' IDs were the parent IDs of the subcategories. For each category, I did a query for events with the category's ID, then looped through the category's subcategories and did the same thing.
posted by kirkaracha to Computers & Internet (11 answers total) 2 users marked this as a favorite
 
Best answer: If the number of categories is small, I tend to store the category relationships in a seperate table and load the whole thing into memory on server startup. I have a server that makes this possible though (via persistent global variables and arrays), this may not be possible in all cases.

If you have the whole thing in memory you can traverse the sub-category-tree (and parent-tree if you wish) in memory, gathering category ids as you go, and then select * from whatever where category_id in (list of category ids)

I also tend to have a table that links "things" to which category they're in, because most things I want to do have more than one category to be in.
posted by RustyBrooks at 12:56 PM on September 6, 2006


Best answer: I always just use the id / parent_id model (adjaceny list). Put the recursive function in the database as a stored procedure and the performance problem is moot.

I don't really understand the second question. You need to display events in a category that aren't assigned to a category? That seems like an oxymoron. If you mean display events in a category that aren't assigned a subcategory, that is fairly easy. If you could elaborate on the second question a bit (maybe with fake data), I might have a better answer.
posted by toomuch at 12:59 PM on September 6, 2006 [1 favorite]


Oh, and regarding how I actually store the categories, it's just some variation of:

category_id
categroy_name
parent_id

where the "root" either has a special id like 0 or -1 or null, depending on what database I use.
posted by RustyBrooks at 1:00 PM on September 6, 2006 [1 favorite]


My suggestion is to use paths. Have a column in the categories/subcatgories table that;s string with the format "catid:subcatid:subcatid[etc]". This is easy to retrieve using WHERE path LIKE "catid:subcatid:%". The only drawback is having to be careful to ensure the integrity of the hierarchy whenever you edit it.
posted by cillit bang at 1:00 PM on September 6, 2006


Depending on the number of categories and on how much control you have over the application architecture, you might consider storing the hierarchical category information in an XML tree rather than in SQL tables. Then put your events in the SQL database and associate them with the correct category using ID attributes in the XML.

That's how I'd build it -- partly because database normalization makes my brain hurt, but also (and more justifiably) because XML is a really simple and convenient way to handle hierarchical data.
posted by ook at 1:02 PM on September 6, 2006


Response by poster: I don't really understand the second question.

Sorry, I phrased the second question poorly. You're right, I meant I need to find events in a category that aren't assigned a subcategory.

cillit bang's path suggestion is interesting.

So is ook's XML suggestion, but we're using ColdFusion 5, and it can't handle XML. Maybe I could use another text format.
posted by kirkaracha at 2:06 PM on September 6, 2006


I agree with Rusty's suggestions.

I've read that article before and agree that modified tree-order traversal is terribly confusing. I don't use it.

If you go the recursive route, don't follow the example in the article unless your hierarchies are only one or two levels deep. Instead, get all the category relationship information in a single query, populate a hash (or associative array) and recurse through that in memory.
posted by justkevin at 2:10 PM on September 6, 2006


>What's the best way to record the relationships of subcategories to their parent categories?

Why does there have to be a relationship?

If everything has a category assigned to it, and everything may or may not have a subcategory assigned to it, and all you need to do is store, edit and retrieve the items based on that, what difference does it make to the db that "french" is a subcategory of "languages"?

It's just a flag next to the name.

Humans see and understand the relationship but your code doesn't need to. Even if you have duplicate subcategories and "french" also appears as a subcategory of "sexual terms", it doesn't matter.
posted by AmbroseChapel at 4:47 PM on September 6, 2006


What toomuch said, or an ORM (ORMs are now mature enough to "just work"). If I do an ORM solution, I neverthless model it as toomuch and Rusty do.
posted by orthogonality at 8:52 PM on September 6, 2006


Hierarchies seem like a good idea, but suck in general. (This is why relational tables (e.g., SQL, and 'has-a') are so much easier to deal with than hierarchies (e.g., XML, and 'is-a'.))

I think it would definitely be worth your time to come up with a use case which can't be accomplished with either a fixed two-level hierarchy, or a tagging system like MetaFilter's (i.e., AmbroseChapel's suggestion,) before you embark on creating what may be an overly general solution.
posted by blenderfish at 3:47 AM on September 7, 2006


Response by poster: Here's my use case:

1. For each category, find events that are assigned to the category and don't have a subcategory.

2. For each subcategory within a category, find the events that are assigned to the subcategory. (There are a few cases where there's a sub-subcategory, but that's as deep as it goes.)

I'll probably go with the adjaceny list, since I understand it (and it's what I did before). Thanks for the help, everyone.
posted by kirkaracha at 6:56 AM on September 7, 2006


« Older Diwali music   |   Where to buy a small refrigerator online? Newer »
This thread is closed to new comments.