I need to become a database-design expert ASAP!
November 11, 2005 2:09 AM   Subscribe

I need to become a database-design expert ASAP!

We are designing an MS Access human resources (HR)database for a well-established company of about 5000 people. We already have Access or Excel files for recruitment, current employees, and alumni which we want to integrate and set-up to record historical data and generate commonly needed reports. I'm pretty good in Excel, Access, and with basic programming but I've never designed a database before - let alone something this complex. If anyone knows how I can get up to speed please let me know. I can buy some books, too, since my company will reimburse me.

Regarding anticipated comments: (1) Yes, there are other knowledgable people helping me, but it is my responsibility to set this up. There is no option to pass off the project to someone else, outsource, or buy a commercial HR product like SAP or Oracle. (2) The database is going to be done in MS Access. I know there are better programs but my hands are tied on this.
posted by BigBrownBear to Computers & Internet (20 answers total) 7 users marked this as a favorite
 
Use Access to design the data entry forms for your database. Now go and buy a good book on SQL (new or second hand, it doesn't matter, but published any time after the early 90's). Read up on data normalisation.

Now install a database server on the machine that will be serving the data to the Access users and plug it in using ODBC. (You can find out about that in the SQL book too). Mysql, Postresql, SQLserver, it doesn't matter which (only the latter two for a database application with a larger user base than the one your description implies) but don't use the JET engine included with Access for a serious multi-user applicaiton (by which I mean more than one user).

Once you've worked out the data normalisation process, and the kind of reports you want to generate, get your employer to send you on a week long course, or pay for evening classes or whatever. You're gonna need it.

These days, with the maturity of the web as a platform, I wouldn't even bother with Access, it's got too many excentricities, and writing code for it is ... horrible ... and worse with 2003 as far as I can see ... but you don't have a say in that at this stage by the sounds of it.

HTH, sorry to sound negative about Access.
posted by singingfish at 2:26 AM on November 11, 2005


celko has written some good books on databases, but they're more the kind of book you read and reflect on, rather than the kind of book you read in a desperate attempt to pull of a job you shouldn't be doing.

i don't know if it's any consolation, but as i learn more about databases, i'm increasingly surprised at the number of really bad databases out there. so experience suggests that you will be able to get something working, even if you know nothing about it.

this is what i do (i'm learning too, so may miss things):

- design the tables. a process similar to oo domain modelling works quite well, except that you can't use inheritance. basically you're looking for the "different things" and the relations between them. you can get gui tools for this.

- drilling down to your basic "things" and there relations gets you a good way to a normalized schema. so the next step is to make sure it really is normalized, and clean it up.

- add not null and uniqueness/primary keys.

- add secondary key constraints, deciding how you want deletion to be handled.

- look at the operations you want to perform on the database and write procedures for them, making sure that you take care of the various constraints you have.

- around this point you may worry about triggers/rules to guarantee certain conditions. you may also need to add more tables to store state for processes that happen over time.

- make sure that transactions are grouping things together in the right size chunks so that failures leave you in a clean state.

basically you need to get three things right: store the data and relationships (easy); make it efficient (hopefully you'll be ok just by adding indices to everything); guaranteeing integrity. the last one is the most interesting - the database will do an awful lot of work for you if you let it. if you're used to pre/post conditions and invariants in programming (or declarative programming in general) then that's a big help - effectively that's what you're building into your database with the foreign key constraints, triggers, etc.

if you do go with a database behind access (which is what i understand singingfish is suggesting) then i'd suggest postgresql. mysql is only just starting to support all the integrity stuff, and in my experience with the latest release, it's a bit flakey still (although hugely better than earlier versions).
posted by andrew cooke at 2:52 AM on November 11, 2005 [1 favorite]


I wasn't going to reply, but ignoring your constraints does not a helpful reply make.

I do this stuff every day, but I don't know of any good beginner's books/tutorials (in fact, every one I ever read was verbose and overly complex - I learnt by doing, repeatedly, until something went click), so I'll leave that to someone else.

You need to start with data modelling. List all the "objects" in your system (Person and EmploymentStatus are the only ones you've mentioned so far). Create a table for each object which has a primary key and a list of properties for that object (Person has properties Name, Address, DoB, etc etc).

Now you need to decide how your objects relate to each other. What you're looking for is whether it's a one-to-many relationship (eg on a web discussion board one Person writes many ForumPosts), or whether it's a many-to-many relationship (one Book can have many Authors, and one Author can write many Books).

One-to-many relationships are easy - just add the One table's primary key to the Many table's column list as a foreign key. Thus, if Person is identified by personid, then we add a foreign key personid to table ForumPost. Then we can do:

SELECT ForumPost.* FROM Person, ForumPost WHERE Person.personid = ForumPost.personid AND Person.name = 'John'

to find all the ForumPosts written by John.

Your Person/EmploymentStatus example is probably of this type (Status is one of 'PotentialEmployee', 'Employee', or 'Alumni', and your EmploymentStatus table consists of (employmentstatusid, description). Your Person table has employmentstatusid as a foreign key. Then we do

SELECT Person.* FROM Person, EmploymentStatus WHERE Person.employmentstatusid = EmploymentStatus.employmentstatusid AND EmploymentStatus.description = 'Employee'

to get all the current employees.

Many-to-many relationships are more complex. To model these you need a joining table. Thus, in our Author/Book example, we create a new table called, say, AuthorWroteBook, which consists of (authorid, bookid). Then we can do:

SELECT Author.* FROM Book, AuthorWroteBook, Author WHERE Book = 'Dynamic HTML' AND Book.bookid = AuthorWroteBook.bookid AND AuthorWroteBook.authorid = Author.authorid

to get all the authors of a given book, and

SELECT Book.* FROM Author, AuthorWroteBook, Book WHERE Author.name = 'John' AND Author.authorid = AuthorWroteBook.authorid AND AuthorWroteBook.bookid = Book.bookid

to get all the books wriitten by John.

This is the 5-minute precis... I'm not sure how helpful it will be. If you have people around who can help with this, dive in with a graphical modelling tool (I'd recommend Visio Enterprise if you have access to it), ask them lots of questions along the way, and get them to check your design when you're done.

The biggest difficulty is deciding what is an object, and what is a property of an object. Is Address a property of Person, or an object unto itself? Rule of thumb: if there's a one-to-one relationship between two pieces of data (a person lives at one address, an address houses one person), then they probably belong in the same object. But this is where experience comes in.

A note on primary keys and joining tables: a primary key is a column or group of columns that identify a row uniquely. personid is a primary key because it's unique. On the joining table above, (authorid, bookid) taken together are a primary key, because they identify the row they're on uniquely. Joining tables don't need primary keys of their own, they shoud be able to be indetified uniquely based on all their foreign keys.
posted by Leon at 3:06 AM on November 11, 2005


ignoring your constraints does not a helpful reply make

i was wondering what that comment referred to and realised that maybe some of what i described isn't possible in access. i don't know - i've never used it (i assumed it's an sql db with some ms "extras", sorry if that's completely wrong).

in which case, the following may be no good either, but going one step on from the select examples above, sometimes you want to do that, but have NULLs if there's info missing in the other tables. for that look in the documentation for "OUTER JOIN".

also, and again it may not be in access i suppose, you may find that some SELECTs keep cropping up again and again - it would be nice if the information that they generated was a table itswelf, but then you'd be duplicating information. that's one example of when views can be useful - they're like tables, but are constructed dyamically.
posted by andrew cooke at 3:23 AM on November 11, 2005


Not you, andrew: I was referencing the first post, which IMO jumped off into "how to get around the Access-only constraint" rather than answering the substantive question in the post. Mentioning it at all was snarky of me. I apologise.
posted by Leon at 3:26 AM on November 11, 2005


oh, ok, no probs.
posted by andrew cooke at 3:27 AM on November 11, 2005


This is an old tutorial on using Access and ASP 3.0 to create a database driven website. You can skin the web programming parts and focus on the database sections. It's how I started with SQL a few years ago. Then I looked at SQL tutorials and Database design books, but I felt well-prepped by this little tutorial.

Access should work fine for you. I run an intranet that provides comprehensive data on a $1 billion portfolio of construction projects - financial, legal, construction, & CRM data - with an Access backend. The main drawback is that complex queries are a little slower.
posted by putzface_dickman at 3:57 AM on November 11, 2005


Also,this AskMe thread has instructions on moving your data from excel into access. I think you can do this. I'm self taught, but when I hire degreed programmers to work on our system, I find that I have to fix their database design work, or do it myself to get it right.
posted by putzface_dickman at 4:03 AM on November 11, 2005


This pdf contains a 35 page chapter of an O'Reilly book, entitled Java Database Best Practices. It could be part of what you are looking for. The chapter deals with Relational Data Architecture and really has no relation to Java so dont let that put you off.

BigBrownBear - I have already mailed it to you as I could not find the link which I obviously since found!
posted by kenaman at 4:46 AM on November 11, 2005


B^3, check the email in your profile.
posted by Richat at 4:57 AM on November 11, 2005


Response by poster: Dude, you guys are all incredibly awesome. Thank you very much for all of the suggestions and lengthy well-thought-out responses. If you know any more tutorials or book suggestions, please let me know. You are absolutely right that I am in over my head and I shouldn't be doing this. It is a long, complicated story, and thank you all for helping me out.
posted by BigBrownBear at 7:34 AM on November 11, 2005


Two observations of my own, based on years of experience:

1. Be prepared to throw away your first design. In fact, plan on it. As you move from your initial design to production, you will likely uncover relationships and special cases in your data that were previously unidentified. Do not be disheartened; you cannot hope to capture every nuance in your first attempt.

2. Since this is an HR database:

(A) Do NOT use SSN as your Employee Number. There are privacy issues around the publication of SSNs. Avoid these problem by restricting the use and visibility of SSNs.

(B) Do NOT use Employee Number as your primary key in the Employee table. Your Employee primary key will be used as a foreign key everywhere in your database. If your organization decides to change everyone's Employee Number (and this has happened to me twice), then you will have major integrity problems as you attempt to reassign every foreign key reference.

(As a general rule, I've found that it's best for every independent object to have a primary key that is invisible; ie, never published in a report or query. Primary keys, once assigned, should never be changed. If you publish an identifying data element, be it an Employee Number, SKU Number, Organization Number, or whatever, chances are, somewhere down the line, someone will want to re-issue or re-structure these ids.)
posted by SPrintF at 10:43 AM on November 11, 2005


Best answer: I am beginning to learn access database design and programming, so I'm a relative beginner here myself. That said, the books I've found most helpful are:

the Microsoft Access xxxx Bible, where xxxx refers to the release of Access you are working with. If you're using an older version of access you can find the correct edition very cheap on the web. It has a pet hospital database on a cd that you work with, and helps you get hands on practice dealing with all the issues above;

Access Database Design and Programming, by Roman; and

Access Cookbook
by Baron, Getz, and Litwin. This is an interesting book which just has questions, like how do I do xyz, and there's the answer right there.

There are also a number of incredibly helpful websites for particular questions, (e.g., dealing with Date functions). You are probably best off going to some local bookstore that has a good selection of books on Access, SQL, VBA, and just browsing to see which one is at your level, and has examples that are closest to the issues you will be dealing with.

Also, one thing I've done is to get one of the Microsoft template databases that is sort of similar to what I want, and then take it apart (looking at query and report field properties, relationships, etc.) to see how they make it work.

I appreciate all the advice above also - I'm really struggling with figuring out what elements should have their own table. I'm realizing I actually should tear the design I have apart and start over, so SPrintF's advice is heartening.
posted by jasper411 at 12:36 PM on November 11, 2005


I have been looking into learning database design myself for a project relating to my side business.

One book that I heard recommended a lot was "Database Design for Mere Mortals" by Michael J. Hernandez. I have checked out a good number of books that are related to the topic (as my primary job I work at the Penn State main library, so I have access to oodles of books) and this book is by far the best book I have found so far. The approach this book takes is independent of the database package that you are going to use, so if you want to learn about Access in particular you need another book, but it is very good at explaining the way to go about designing a database. This book is very clearly written, the majority of books that I looked at quickly went into very theoretical explanations, but this book gets to the essence of applying the theoretical constraints to actually developing a database.

I have yet to find a book, that I consider a good book, that teaches Access. If you find one I would be very interested in checking it out. Good luck.
posted by jefeweiss at 12:44 PM on November 11, 2005


the whole "what goes in what table" is really very like object or domain modelling. the big difference is, as was explained earlier up in the thread, that some relationships also become tables.

start table for each recognisable "thing". that will give you too many tables, since you have a separate table for every separate item. then look for things that are common children of just one larger thing, and which exist just once for that parent thing. those can be moved into columns in the larger thing's table. that (along with how you handle many-to-one relationships described above) is really 90% of getting a normalized design.
posted by andrew cooke at 3:17 PM on November 11, 2005


Not to derail the thread, but the issue that I've been working on building a database for is to track the progress of a referral for kids who are being considered to participate in a mental health program. And the issue about what should go in its own table is driving me a little nuts.

It's obvious to me that the youth and his characteristics comprise a table. But then there's all these events that take place that we want to track. Like 1) the referral is given to a supervisor on this date, 2) the supervisor calls the family on this date and they agree or not, 3) if they agree the family calls the probation officer, etc.

I've been wondering if each of these events are all part of the youth table, as they are things that happen to him/her. Or if they should each have their own table. Or if there should be a table called Events each row that would be linked to an Event Detail table which carries the data elements (usually only 2 or 3 elements) for each event.
posted by jasper411 at 4:04 PM on November 11, 2005


if in doubt, use a separate table.

your event table is an approach that i have used quite often, but it only works well if either (1) all events have the same details or (2) there are a fixed (and small) number of different kinds of events. if neither of those hold, it's a hard problem.

for the events in a sequence (the progress you are tracking), one possible approach is to store the events without worrying about how they are part of a sequence, and then construct a separate view (probably with a very complex select statement) that shows how the case of each person is progressing. but this only works well when the sequence is the same for everyone. so you might have a single table of events, with a column that is "event type", and another column that is "youth id". then the progress table might be something like

select y.id, exists (select * from events where event.id = y.id and event.type = referral-code) as referral-sent, exists (select * from events where event.id = y.id and event.type = called-code) as called family, ... from youths as y

don't know if that helps (disclaimer, just come back from a dinner; rather full, sleepy, and full of wine).
posted by andrew cooke at 8:47 PM on November 11, 2005


jasper411: I don't know anything about your problem domain, or even if you're still reading, but I'm assuming that your "Events" are going to mostly be presented to the end-user as reports in chronological order.

In that case I'd be tempted to make Events simple text lines, with some metadata attached to them. For example, an Event might consist of:

(eventid, personid, date, class, description)

where class is one of ('referral to supervisor', 'family called', 'probation officer called', 'other', etc etc), and description is a short descriptive note entered by the user. Example row:

(123, 12, '2005-11-14', 'family called', 'Spoke to mother Monday AM. Mother seems positive, agreed.')

I'm using enums for class here, but you could use a separate table.

I think of this as an "invoice pattern", as that's where I use it most often... Person is to Invoice as Event is to InvoiceLine. An invoice can be thought of as a bunch of InvoiceLines with data associated with them:

(invoicelineid, invoiceid, price, quantity, description)
posted by Leon at 6:21 AM on November 14, 2005


Response by poster: Once again, I want to thank all you guys a lot. From many of your suggestions I feel that I am getting a good handle on the necessary principles. Also, I'm glad I asked cause had I tried to just put something together without looking into this, it would be a big disaster. Thank you again.
posted by BigBrownBear at 2:40 AM on November 18, 2005


Response by poster: For the sake of archival purposes, I want to say that the most useful book I found was this on-line: Learn Access Now and "Database Design for Mere Mortals" by Michael J. Hernandez.

I also read the O´Reilly Access Database Design & Programming and their Access Cookbook, but both were a little unneccessarily heavy in the DAO/ADO/SQL coding for someone just starting out.

Also, it was very useful to download and play with the MS Templates.
posted by BigBrownBear at 4:03 AM on November 30, 2005 [1 favorite]


« Older IIS setup   |   Why are country songs so often complete sentences? Newer »
This thread is closed to new comments.