One table to rule them all?
May 1, 2008 3:12 AM
Can you help structure my oracle database appropriately?
I'm writing an helpdesk application that supports n types of call. A call is logged with the system through an HTML form.
There are certain properties common to each calltype (ID, datelogged, dateclosed, owner, status, for example), but beyond that each calltype has a completely different form. Calltype1 may just have a notes field. Calltype2 may have a notes field plus 10-15 other fields that allows the user to add in more specific information particular to that calltype.
My question is: should I have one table for the common call information and a separate table for each calltype, or one big table that contains all the call information covering the fields of all calltypes? I guess my concern is that I'm adding extra complexity to the database by using many calltype tables to spare what I see as wasted space in the big calltype table.
Is my concern justified?
I'm writing an helpdesk application that supports n types of call. A call is logged with the system through an HTML form.
There are certain properties common to each calltype (ID, datelogged, dateclosed, owner, status, for example), but beyond that each calltype has a completely different form. Calltype1 may just have a notes field. Calltype2 may have a notes field plus 10-15 other fields that allows the user to add in more specific information particular to that calltype.
My question is: should I have one table for the common call information and a separate table for each calltype, or one big table that contains all the call information covering the fields of all calltypes? I guess my concern is that I'm adding extra complexity to the database by using many calltype tables to spare what I see as wasted space in the big calltype table.
Is my concern justified?
Complexity in normalizing your schema costs a little now, in the design phase, but will save you immeasurable time and grief when you will inevitably make changes to the types of calls you may handle.
posted by Blazecock Pileon at 4:16 AM on May 1, 2008
posted by Blazecock Pileon at 4:16 AM on May 1, 2008
I'm uncomfortable enough with what I wrote above that I'm hoping someone's going to come along and rip it apart, and convince you to normalise everything to hell and back. But OTOH, normalisation's a tool, not a religion, and I've at least given you a third option.
posted by Leon at 5:02 AM on May 1, 2008
posted by Leon at 5:02 AM on May 1, 2008
Yes, your concern is justified. A giant table can be frustrating to work with. So can a properties table.
You may gain some flexibility with a call_properties table, but the SQL for inserting and updating a call is going to be a bit more complicated. A properties table can also suffer performance problems. Another option would be a table for each of your call types:
calltype1_call_info: call_id, notes, respondant, etc...
It's hard without knowing more about your problem domain. How often would these call types be created/changed? Do you know at query time which call types you're looking for? If you are changing these often and they're more free-form, I'd go with the call_properties table. Otherwise, you may be able to use this scheme to have more structured data.
posted by anomie at 5:35 AM on May 1, 2008
You may gain some flexibility with a call_properties table, but the SQL for inserting and updating a call is going to be a bit more complicated. A properties table can also suffer performance problems. Another option would be a table for each of your call types:
calltype1_call_info: call_id, notes, respondant, etc...
It's hard without knowing more about your problem domain. How often would these call types be created/changed? Do you know at query time which call types you're looking for? If you are changing these often and they're more free-form, I'd go with the call_properties table. Otherwise, you may be able to use this scheme to have more structured data.
posted by anomie at 5:35 AM on May 1, 2008
Inheritance in the DB isn't that clean. I'd say make a parent CALL table which all the common fields between all your call types, and then child tables that share the same PK with the CALL table for each call type that have the extended attributes. So something like:
REQUEST:
call_id integer primary key not null,
calltimestamp timestamp not null,
processed_by integer foreign key to user table not null,
priority integer not null,
assigned_to etc...
PROBLEMREQUEST:
call_id integer primary key foreign key to CALL not null
affectedasset_id integer foreign key to asset table
etc...
CHANGEREQUEST:
call_id integer primary key foreign key to CALL not null
changecontrol_id etc...
INFOREQUEST:
...
This is rigid, but the more rigid your data model is, the more secure you are that the underlying data is sound. There are good app reasons to have name/value pairs in the DB, but only if your application lets you dynamically add fields to forms and such.
With this model, your combined lists just hit the CALL table and you push all common fields up there to support these combined lists. Lists of specific types just use INNER JOIN to get the CALL and their specific type information. You can FK to whichever table you need to best describe what you are doing, so like the COMMENT table that lets service folk post comments to requests can have a CALL_ID in it FKed to REQUEST.CALL_ID. But other places that just need to RI to a problem can FK to PROBLEMREQUEST.CALL_ID.
The downside here is inserts and deletes are slower. But it's not too complicated.
posted by cmm at 6:22 AM on May 1, 2008
REQUEST:
call_id integer primary key not null,
calltimestamp timestamp not null,
processed_by integer foreign key to user table not null,
priority integer not null,
assigned_to etc...
PROBLEMREQUEST:
call_id integer primary key foreign key to CALL not null
affectedasset_id integer foreign key to asset table
etc...
CHANGEREQUEST:
call_id integer primary key foreign key to CALL not null
changecontrol_id etc...
INFOREQUEST:
...
This is rigid, but the more rigid your data model is, the more secure you are that the underlying data is sound. There are good app reasons to have name/value pairs in the DB, but only if your application lets you dynamically add fields to forms and such.
With this model, your combined lists just hit the CALL table and you push all common fields up there to support these combined lists. Lists of specific types just use INNER JOIN to get the CALL and their specific type information. You can FK to whichever table you need to best describe what you are doing, so like the COMMENT table that lets service folk post comments to requests can have a CALL_ID in it FKed to REQUEST.CALL_ID. But other places that just need to RI to a problem can FK to PROBLEMREQUEST.CALL_ID.
The downside here is inserts and deletes are slower. But it's not too complicated.
posted by cmm at 6:22 AM on May 1, 2008
Are you sure you really want to store a million different properties about a thousand different calls, either in a properties table (aka the Leon opton) or the Thousand Table option (aka the anomie option)?
Why do you want a database in the first place?
I imagine that you want one in order to track open orders and report on closed orders.
In that case, you want a database that records the call questions (WHO, WHAT, WHEN, WHERE, WHY), relevant dimensions, (WHO might connect to your HR database or another list of employees, their contact information, location, and supervisors), a freeform field for the support staff to read about the issue, a more specific field that lists the name of what would have been a thousand different forms, a free form field that allows you to document the resolution, and another non-free form "category" of solution (e.g. "Did you try rebooting?")
This will make it simple to use, simple to code, simple to maintain (can you imagine the pain it will be under your original scenario to create new forms for each new call type!?), and simple to report off of.
posted by Pants! at 6:25 AM on May 1, 2008
Why do you want a database in the first place?
I imagine that you want one in order to track open orders and report on closed orders.
In that case, you want a database that records the call questions (WHO, WHAT, WHEN, WHERE, WHY), relevant dimensions, (WHO might connect to your HR database or another list of employees, their contact information, location, and supervisors), a freeform field for the support staff to read about the issue, a more specific field that lists the name of what would have been a thousand different forms, a free form field that allows you to document the resolution, and another non-free form "category" of solution (e.g. "Did you try rebooting?")
This will make it simple to use, simple to code, simple to maintain (can you imagine the pain it will be under your original scenario to create new forms for each new call type!?), and simple to report off of.
posted by Pants! at 6:25 AM on May 1, 2008
I should preface this with a warning, I've not tested this, and test this, I've not got an oracle system handy.
A more normalized approach:
So you'll have your general table with the common information, 'CALLS' but one calls can have zero or one of a number of different table types:
So a two table example:
COMMON 0----1 HR_CALL
COMMON 0----1 SERVICE_CALL
Common has (ID, datelogged, dateclosed, owner, status);
HR_CALL has (id, commonID, rep_name, moo, bar, foo);
SERVICE_CALL has (id, commonID, other, example, field);
All's well and good, but how do we query? Like so (hopefully):
SELECT * FROM common, c, hr_call h, service_call s
where c.id = h.commonid (+)
and c.id = s.commonid (+);
Which should (please test!) just give you the information you require, the (+) specifies that it's an outer-join. I'll see if I can get my hands to an Oracle prompt later and confirm. I'm certainly getting rusty.
posted by Static Vagabond at 6:42 AM on May 1, 2008
A more normalized approach:
So you'll have your general table with the common information, 'CALLS' but one calls can have zero or one of a number of different table types:
So a two table example:
COMMON 0----1 HR_CALL
COMMON 0----1 SERVICE_CALL
Common has (ID, datelogged, dateclosed, owner, status);
HR_CALL has (id, commonID, rep_name, moo, bar, foo);
SERVICE_CALL has (id, commonID, other, example, field);
All's well and good, but how do we query? Like so (hopefully):
SELECT * FROM common, c, hr_call h, service_call s
where c.id = h.commonid (+)
and c.id = s.commonid (+);
Which should (please test!) just give you the information you require, the (+) specifies that it's an outer-join. I'll see if I can get my hands to an Oracle prompt later and confirm. I'm certainly getting rusty.
posted by Static Vagabond at 6:42 AM on May 1, 2008
Thanks to everyone for their answers. I've chosen cmm's answer as best as that seemed to be the direction I was heading in anyway but that explanation put to rest any doubts I had. I don't really understand the principle of the properties table so would be wary of putting it into practice, but thanks for introducing me to the concept.
In answer to a couple of questions: from my initial analysis i think I've identified all the calltypes required, but I'm designing the db in a way that will allow more to be added if it becomes necessary. A main CALL table with joined CALLTYPE tables looks like a good way to go. There will also be a CALLTYPE table that will identify the different calltypes. If I need to add a new one I can add a new record to that table, and add a new [calltype] table to the database without disrupting anything else.
Thanks also for raising the issue of referential integrity. As cmm mentions, this seems to be the most rigid model to use and will cut down on many RI issues down the road.
posted by urbanwhaleshark at 6:47 AM on May 1, 2008
In answer to a couple of questions: from my initial analysis i think I've identified all the calltypes required, but I'm designing the db in a way that will allow more to be added if it becomes necessary. A main CALL table with joined CALLTYPE tables looks like a good way to go. There will also be a CALLTYPE table that will identify the different calltypes. If I need to add a new one I can add a new record to that table, and add a new [calltype] table to the database without disrupting anything else.
Thanks also for raising the issue of referential integrity. As cmm mentions, this seems to be the most rigid model to use and will cut down on many RI issues down the road.
posted by urbanwhaleshark at 6:47 AM on May 1, 2008
That syntax is fine. It's oracle proprietary.
Oracle 9 and later support ANSI SQL join syntax, so you can use "FROM common LEFT JOIN hr_call ON common.id = hr_call.commonid"...
posted by cmm at 6:50 AM on May 1, 2008
Oracle 9 and later support ANSI SQL join syntax, so you can use "FROM common LEFT JOIN hr_call ON common.id = hr_call.commonid"...
posted by cmm at 6:50 AM on May 1, 2008
cmm's approach enforces a one-to-many relationship between caterogies and calls. You need to be very sure that a call will never appear in more than one category before you go that route, as changing it later would be a real headache. (For what it's worth I like Pants' response - he's identifying the actors in the system, and leaving room for alternative workflows. His code would probably be shorter than mine, as he's obviously thinking in objects).
posted by Leon at 7:56 AM on May 1, 2008
posted by Leon at 7:56 AM on May 1, 2008
This thread is closed to new comments.
Other Properties are not present for every Call, and are more likely to change as the application evolves ("can you just add some fields here for when the call gets passed to second-line support?"). These are good candidates for being represented, not as hard-coded tables, but as name-value pairs. I'd suggest something along the lines of:
call (callid, etc etc etc)
property (propertyid, callid, name, value)
in other words, treat the Property table as a dictionary array, and leave it to the code to decide what fields are appropriate for a given Call.
(You lose a lot of the benefits of having a database (eg referential integrity) when you go this route, and as you scale it can become horribly inefficient, but I can't imagine that being much of an issue for the volume of data you're likely to be dealing with here).
posted by Leon at 4:12 AM on May 1, 2008