What is the prefered way to represent status changes over time in a relational database?
September 4, 2007 12:53 PM
What is the prefered way to represent status changes over time in a relational database?
I've got this neato idea for a mysql-based inventory database for my work that'll be awesome once I clear this hurdle, so I'll be much obliged for any enlightenment you can provide!
Let's say you have a database acting as a national registry of cars: make, model,
owner, owner's state, and VIN could be the relevant entities. In addition to this, you want to record any change of owner or owner's state such that you could later call up the entire history of an individual vehicle, or what vehicles an owner has owned, or what states a car has been registered in. What's the best way to model this scenario for a relational database system? (All I can think of is to have an additional table that acts like a spreadsheet or flat file db, recording each relationship as it changes, plus the timestamp of the change. This isn't how the pros from Dover do it, is it?)
Finally, what is the proper terminology for the concept of being able to record and search on the history of status changes?
I've got this neato idea for a mysql-based inventory database for my work that'll be awesome once I clear this hurdle, so I'll be much obliged for any enlightenment you can provide!
Let's say you have a database acting as a national registry of cars: make, model,
owner, owner's state, and VIN could be the relevant entities. In addition to this, you want to record any change of owner or owner's state such that you could later call up the entire history of an individual vehicle, or what vehicles an owner has owned, or what states a car has been registered in. What's the best way to model this scenario for a relational database system? (All I can think of is to have an additional table that acts like a spreadsheet or flat file db, recording each relationship as it changes, plus the timestamp of the change. This isn't how the pros from Dover do it, is it?)
Finally, what is the proper terminology for the concept of being able to record and search on the history of status changes?
You need a normalised bunch of tables. (Go look up normalisation and 3rd normal form, it will do you good).
You want something like:
Cars:
- car id
- make
- model
- vin
States:
- state id
- state name
- state code
Owners
- owner id
- owner name
Car-Owner:
- car id
- owner id
- date
Car-State
- car id
- state id
- date
That'll do until orthogonality gets here to mock my feeble efforts.
As a start, try to avoid putting everything into one flat table, and think about how to specify tables such that every item is recorded in one place only.
posted by i_am_joe's_spleen at 1:03 PM on September 4, 2007
You want something like:
Cars:
- car id
- make
- model
- vin
States:
- state id
- state name
- state code
Owners
- owner id
- owner name
Car-Owner:
- car id
- owner id
- date
Car-State
- car id
- state id
- date
That'll do until orthogonality gets here to mock my feeble efforts.
As a start, try to avoid putting everything into one flat table, and think about how to specify tables such that every item is recorded in one place only.
posted by i_am_joe's_spleen at 1:03 PM on September 4, 2007
Forgot:
Car-history (for servicing, modification):
- car id
- service-type-id
- memo
- date
Service-types:
- type id
- description
posted by i_am_joe's_spleen at 1:06 PM on September 4, 2007
Car-history (for servicing, modification):
- car id
- service-type-id
- memo
- date
Service-types:
- type id
- description
posted by i_am_joe's_spleen at 1:06 PM on September 4, 2007
Read and learn.
Once you have this down, you will never again write something like "an additional table that acts like a spreadsheet or flat file db" without a shudder of horror.
posted by i_am_joe's_spleen at 1:11 PM on September 4, 2007
Once you have this down, you will never again write something like "an additional table that acts like a spreadsheet or flat file db" without a shudder of horror.
posted by i_am_joe's_spleen at 1:11 PM on September 4, 2007
Normalized tables are great, but I don't really see how that solves the history/change tracking problem.
posted by smackfu at 2:25 PM on September 4, 2007
posted by smackfu at 2:25 PM on September 4, 2007
(Well, I guess every query could say DATE = (select MAX(DATE) from table), but that seems like it would suck for performance.)
posted by smackfu at 2:28 PM on September 4, 2007
posted by smackfu at 2:28 PM on September 4, 2007
When you add a record to the history table, you could also store the autogenerated ID of the new record in the car's record in the car table. Then getting the latest record for all cars is again a simple join.
posted by kindall at 2:36 PM on September 4, 2007
posted by kindall at 2:36 PM on September 4, 2007
What you have here isn't as simple as they're making out; the data modeling in your application will get moderately hairy. It can absolutely be done, but you will need to think very carefully about it, and maybe enlist a pro to help you with the design.
Why is this hard? Because any owner can have many cars, and any car can have many owners. So you have a many-to-many relationship of cars to owners, which requires an intermediate join table. That would probably be car id, owner id, and date possession taken... you could have a null owner for cars that are presently unused. When looking up who the owner was for a car, you'd select where car=car id, ordered by date, and take the last record... the newest owner will be the current one. You probably need more stuff there too, but I'm just shooting from the hip, trying to give you a broad overview.
If you can get comfortable with normalization and many-to-many joins, you should be able to implement this: without those, you're gonna have a really hard time.
posted by Malor at 2:38 PM on September 4, 2007
Why is this hard? Because any owner can have many cars, and any car can have many owners. So you have a many-to-many relationship of cars to owners, which requires an intermediate join table. That would probably be car id, owner id, and date possession taken... you could have a null owner for cars that are presently unused. When looking up who the owner was for a car, you'd select where car=car id, ordered by date, and take the last record... the newest owner will be the current one. You probably need more stuff there too, but I'm just shooting from the hip, trying to give you a broad overview.
If you can get comfortable with normalization and many-to-many joins, you should be able to implement this: without those, you're gonna have a really hard time.
posted by Malor at 2:38 PM on September 4, 2007
smackfu, maybe I'm misunderstanding the question, but I read:
"you could later call up the entire history of an individual vehicle, or what vehicles an owner has owned, or what states a car has been registered in."
And the schema I suggested allows that, with minimal redundancy. (Actually Car-Owner needs a little love, as Malor says, to model the fact that you acquire a car and then get rid of it, and the car may not have an owner, but still...) The point is that there is no magic, standard answer here other than normal practice for a relational design.
It seemed like the original poster was stuck on having an additional flat table, knowing that was a bit iffy, but not knowing how to take it further.
posted by i_am_joe's_spleen at 2:54 PM on September 4, 2007
"you could later call up the entire history of an individual vehicle, or what vehicles an owner has owned, or what states a car has been registered in."
And the schema I suggested allows that, with minimal redundancy. (Actually Car-Owner needs a little love, as Malor says, to model the fact that you acquire a car and then get rid of it, and the car may not have an owner, but still...) The point is that there is no magic, standard answer here other than normal practice for a relational design.
It seemed like the original poster was stuck on having an additional flat table, knowing that was a bit iffy, but not knowing how to take it further.
posted by i_am_joe's_spleen at 2:54 PM on September 4, 2007
Having actually had experience in the auto/data industry I can say that it is many-to-many as Malor points out above. That is, one car (VIN actually is the primary key) has many owners, past and present. And one person has many cars, also past and present.
Just add some extra fields in a joined table to specify begin and end dates of ownership and, optionally, more details about the transaction itself. You can query this to get the history.
The main table is the car table though, since thats the focus. A car not only has many owners but also, for example, many service incidents and whatnot. That is, it helps to think of "current owner" as a property of the car just like current color.
posted by vacapinta at 3:33 PM on September 4, 2007
Just add some extra fields in a joined table to specify begin and end dates of ownership and, optionally, more details about the transaction itself. You can query this to get the history.
The main table is the car table though, since thats the focus. A car not only has many owners but also, for example, many service incidents and whatnot. That is, it helps to think of "current owner" as a property of the car just like current color.
posted by vacapinta at 3:33 PM on September 4, 2007
i_am_joe's_spleen gives you good advice, and so does Malor.
Temporal modeling is hard -- it's what 6th Normal Form addresses -- and not well supported in current RDBMSes.
But it's doable; what's gonna bite your ass is the granularity. 6th Normal Form (caveat: as I understand it) supports temporal modeling by making every non-key (non-key:, i.e., anything "on" the entity that can change without the entity losing its identity) a separate relation. To this, presumably, you add a timestamp or version number. That solves the granularity problem by making everything a join, but it's hell on efficiency, and requires you to figure out all keys and non-key attributes; this tends to be non-trivial.
A basic answer to your question is: use "lookup" tables like i_am_joe's_spleen suggests, and everywhere you assert a relation ("jack owns the car with VIN 123") you assert a time: "jack owns the car with VIN 123 now" so that you can simultaneously say, "jane owned the car with VIN 123 from 1 Jan 2000 to yesterday". Obviously these relations are many-to-many, (jack can own more than one cvar now, and more than one over his lifetime, too, and jane can have previously owned a car jack owns now).
So we have a table of owners, and a table of cars (well, of VINs), and a table (a "cardinality reduction table") that relates owners and cars (by ID, of course). To the relation table (i_am_joe's_spleen calls it car-owner, you should call it car_owner), we add a start_date and an end_date.
Now, you want to record the owner's residency or the car's state of registration. If the latter, we can add that to the relation table (which is now car_owner_registration_state), by adding the state id.
(NOTE: one can make an argument, I think, for making two relation tables, car_owner and car_registration_state, and a third that relates the two relation tables. This could make some queries cleaner.)
Ok, now given car_owner_registration_state, we can join it to car, owner, and state, and search for a particular car (VIN) to see all the owners and states, or by owner to see all his cars and states. Note we get state transitions (hahahhahahahhahahahha, programmer pun) whether or not we want them; this is the argument for the different structure mentioned in the note above. Order either these joins by the timestamp, and you have your history.
We haven't covered the problem of overlapping dates (or whether this IS a problem; think a car owned jointly by two persons, if that's allowable), and I haven't tested any of this. Your mileage (hahhahahahhahahha, non-programmer pun) may vary.
posted by orthogonality at 5:23 PM on September 4, 2007
Temporal modeling is hard -- it's what 6th Normal Form addresses -- and not well supported in current RDBMSes.
But it's doable; what's gonna bite your ass is the granularity. 6th Normal Form (caveat: as I understand it) supports temporal modeling by making every non-key (non-key:, i.e., anything "on" the entity that can change without the entity losing its identity) a separate relation. To this, presumably, you add a timestamp or version number. That solves the granularity problem by making everything a join, but it's hell on efficiency, and requires you to figure out all keys and non-key attributes; this tends to be non-trivial.
A basic answer to your question is: use "lookup" tables like i_am_joe's_spleen suggests, and everywhere you assert a relation ("jack owns the car with VIN 123") you assert a time: "jack owns the car with VIN 123 now" so that you can simultaneously say, "jane owned the car with VIN 123 from 1 Jan 2000 to yesterday". Obviously these relations are many-to-many, (jack can own more than one cvar now, and more than one over his lifetime, too, and jane can have previously owned a car jack owns now).
So we have a table of owners, and a table of cars (well, of VINs), and a table (a "cardinality reduction table") that relates owners and cars (by ID, of course). To the relation table (i_am_joe's_spleen calls it car-owner, you should call it car_owner), we add a start_date and an end_date.
Now, you want to record the owner's residency or the car's state of registration. If the latter, we can add that to the relation table (which is now car_owner_registration_state), by adding the state id.
(NOTE: one can make an argument, I think, for making two relation tables, car_owner and car_registration_state, and a third that relates the two relation tables. This could make some queries cleaner.)
Ok, now given car_owner_registration_state, we can join it to car, owner, and state, and search for a particular car (VIN) to see all the owners and states, or by owner to see all his cars and states. Note we get state transitions (hahahhahahahhahahahha, programmer pun) whether or not we want them; this is the argument for the different structure mentioned in the note above. Order either these joins by the timestamp, and you have your history.
We haven't covered the problem of overlapping dates (or whether this IS a problem; think a car owned jointly by two persons, if that's allowable), and I haven't tested any of this. Your mileage (hahhahahahhahahha, non-programmer pun) may vary.
posted by orthogonality at 5:23 PM on September 4, 2007
(Seriously, this is all off the top of my head while eating dinner and watching PBS. It is not a substitute for doing your own domain modeling.)
posted by orthogonality at 5:25 PM on September 4, 2007
posted by orthogonality at 5:25 PM on September 4, 2007
Wow, I knew that was a hard problem, but I didn't realize just how hard. Ouch.
posted by Malor at 5:55 PM on September 4, 2007
posted by Malor at 5:55 PM on September 4, 2007
See Temporal database on Wikipedia.
You can sometimes avoid the complexity of temporal databases by periodically creating snapshots of the data, and then running the queries over the snapshots.
posted by Sharcho at 1:07 PM on September 5, 2007
You can sometimes avoid the complexity of temporal databases by periodically creating snapshots of the data, and then running the queries over the snapshots.
posted by Sharcho at 1:07 PM on September 5, 2007
Sharcho, that answer is gold. Very interesting!
posted by i_am_joe's_spleen at 3:52 PM on September 5, 2007
posted by i_am_joe's_spleen at 3:52 PM on September 5, 2007
Thanks, everyone. I had no idea how complicated this was!
posted by kimota at 4:10 AM on September 6, 2007
posted by kimota at 4:10 AM on September 6, 2007
kimota: the takeaway from this is that you asked a very good question indeed. :)
posted by Malor at 6:46 PM on September 6, 2007
posted by Malor at 6:46 PM on September 6, 2007
This thread is closed to new comments.
You've got your car table:
tblCar
====
CarID (autogenerated)
CarName
CarMake
CarModel
etc.
One row per car. You've also got another table to track its history:
tblCarHistory
========
CarHistoryID (autogenerated)
CarID
CarHistoryDate
CarHistoryType
CarHistoryDescription
etc.
The "CarID" column, which is unique to tblCar, is what links each car to tblCarHistory. The simplest way to put them together via SQL is thus:
SELECT c.*,ch.*
FROM tblCar c
LEFT JOIN tblCarHistory hc ON c.CarID = hc.CarID
ORDER BY hc.CarHistoryDate
posted by mkultra at 12:59 PM on September 4, 2007