Datawarehouse 101
August 28, 2007 2:22 AM
Subscribe
I am changing jobs in a couple of months, and need to know more about database design, and more specifically datawarehouse and datamart for reporting systems.
I will be working on the ETL and reporting side, but would like to sound more knowledgeable about general database design techniques, such as normalisation, star schemas and so on, with particular emphasis on datawarehousing. The last time i looked at this Ralph Kimball was the datawarehouse reference point. Would it still be worth reading his Lifecycle Toolkit? Or is there somebody new out there that is a must-read? Online or hard-copy recommendations would be most welcome.
posted by jontyjago to computers & internet (2 comments total)
1 user marked this as a favorite
Once you have your database, do some "warehousing": what's the median track count of songs per genre per decade? Which composers have the most covers (in unique tracks; don't count tracks twice, i.e,, that are republished on "Best Of" or compilation albums)? The most artists covering them (i.e., don't count a different version of the cover by the same artist)? Of artists in "bands" (i.e, , several artists together on a song, e.g, John, Paul, George & Ringo made up the Beatles), what percentage of artists have more than three solo albums in the decade after the band has disbanded? (Can you even determine when a band has disbanded by examining your database (this is a surprisingly subtle thing to map)?
Learn that, whether you're doing ETL, reporting, or warehousing, views are your friends. Get in the habit of writing views for anything you do; the view will serve to functionally decompose your queries, and will provide building blocks for more complex queries. "First, build a view."
Learn the value of indices, both for efficiency and for semantic correctness.
Finally, learn what I mean when I say, "If you lie to the database, the database will lie to you." Explore how poorly-conceived database structures can retain unimportant information, lose vital information, or worst, make wrong outputs out of corrrect inputs.
posted by orthogonality at 4:12 AM on August 28, 2007 [3 favorites]