Designing a gradebook database
February 10, 2012 7:13 AM Subscribe
MS Access / relational database question. I'm trying to set up a database for a gradebook for multiple classes, that I can maintain indefinitely. I've got the basics in terms of tables for students, classes, etc. But in my ideal world, I'll be able to do item analysis, too -- like seeing what proportion of students got Question 17 correct on a certain exam, or pulling up the specific items that a student missed on Exam 3. The only way I can think of to get this level of detail is with a junction table that has a separate record matching up each test item with each student for each exam. Of course this will eventually swell into hundreds of thousands of records (students * test items * exams * classes * years). Am I thinking about this correctly? Is there a better way of designing this? Are there any examples you can point me to of databases that accomplish similar things? Thanks!
Response by poster: Thanks for the response mrgrowler, but when you say the inter-table should only have thousands (not hundreds of thousands) of records, it makes me concerned that I'm still not understanding the design correctly. As I'm picturing it, 6 classes * 30 students * 4 exams * 30 test items would mean over 20,000 records per year, which would be over 100k in 5 years. Were you suggesting a different kind of design?
posted by svenx at 7:48 AM on February 10, 2012
posted by svenx at 7:48 AM on February 10, 2012
I think if you do a students_questions junction table and track which questions were on which exams on a separate table, you can infer how students did on the exams, yeah?
posted by empath at 7:51 AM on February 10, 2012 [1 favorite]
posted by empath at 7:51 AM on February 10, 2012 [1 favorite]
Yeah, it's not clear to me what the table layout you are envisioning, or already have, looks like. However, if you are duplicating data anywhere, you're likely not doing it optimally.
posted by utsutsu at 7:55 AM on February 10, 2012
posted by utsutsu at 7:55 AM on February 10, 2012
Best answer: As utsutsu says and an inter-table (might be what you are calling junction) should have only Ids so it will be fairly lean.
It could look something like this
Student Table
Student_Id|First Name| More fields about Student|n
Exam Table
Exam_ID|Exam Name|Other Exam Details|n
Questions
ExamQuestion_ID|Exam_ID|Question|More?|n
ExamResults
Student_ID|Exam_ID|ExamQuestion_ID|ExamDate|QuestionsMark|StudentsMark
The you can do queries to get your results. I would also consider breaking up the results into another table to "denormalize" the data more and have just thee over all mark in exam results with an exam detail table with the individual question breakdown but that will add a smidgen of complexity you may not need. and you want the same design for the Student_X_class so you can also get that information.
Another thing you COULD do because it is your database would make the inter table for results be specific to each class or exam but then cross result comparing may not be as easy (though I can't imagine why you need that)
Does that help at all?
posted by mrgroweler at 8:10 AM on February 10, 2012
It could look something like this
Student Table
Student_Id|First Name| More fields about Student|n
Exam Table
Exam_ID|Exam Name|Other Exam Details|n
Questions
ExamQuestion_ID|Exam_ID|Question|More?|n
ExamResults
Student_ID|Exam_ID|ExamQuestion_ID|ExamDate|QuestionsMark|StudentsMark
The you can do queries to get your results. I would also consider breaking up the results into another table to "denormalize" the data more and have just thee over all mark in exam results with an exam detail table with the individual question breakdown but that will add a smidgen of complexity you may not need. and you want the same design for the Student_X_class so you can also get that information.
Another thing you COULD do because it is your database would make the inter table for results be specific to each class or exam but then cross result comparing may not be as easy (though I can't imagine why you need that)
Does that help at all?
posted by mrgroweler at 8:10 AM on February 10, 2012
Response by poster: This is still in the preliminary stages, but as it stands, there's a Students table and a Classes table, with a junction table connecting them, and an Exams table (one record per exam), with a junction table connecting it to Classes. What I'm imagining that I'll need to do is add a TestItems table, with one record per test item, and a junction table that links that to Students. It's that last table that has me worried -- it would involve individual records like "John Smith answered 'B' on TestItem 147", which seems like it would get pretty unwieldy pretty quickly.
posted by svenx at 8:11 AM on February 10, 2012
posted by svenx at 8:11 AM on February 10, 2012
Response by poster: If I'm reading you correctly mrgroweler, it looks like you're suggestion is basically what I was planning -- but I'm concerned about the stability of that last table after it turns into 100s of 1000s of records. I don't think there's much if any data duplication going on. Will the fact that each record in the table contains relatively little information mean that the large number of records won't be such an issue?
posted by svenx at 8:16 AM on February 10, 2012
posted by svenx at 8:16 AM on February 10, 2012
For something like "John Smith answered B on TestItem 147" why can't you make a query on the fly, instead of keeping a whole table? If you're interested in John Smith's answers on a specific test, just pull up something like
posted by desjardins at 8:16 AM on February 10, 2012
SELECT Answers where TestID = x and STUDENT_ID = y
(I may have the syntax wrong, it's been a little while.)posted by desjardins at 8:16 AM on February 10, 2012
Best answer: You're logging 20,000 answers to test questions each year because that's the level of granularity you need. It seems like a large number to you, but it's the whole point of databases. That's what they are made to do. They hold tons of information. I don't believe 100,000 not-very-large records would be too much for Access.
(If you wanted to save space, you could consolidate the information. Instead of logging an answer for each student, you could create one correct entry (23 got it right) and one wrong entry (7 got it wrong) for each question, rather than creating 30 seperate entries. But that may not be as useful.)
Access is a flat file database, like sqlite. It has limitations that other databases, such as MS SQL Server databases and MySQL databases, don't have. But it should be good for 100,000 records. The limit for Access is a 2GB file size. You'll be fine for years.
Once you hit 1GB or so in 2018 or whenever, create a duplicate of the database, name it something relevent (like, add "current" to the end), purge the current one of information, and begin adding to it. Whenever you need older statistics, you can go back to the old database. Everything should remain nicely responsive. Repeat as needed.
Alternately, you could keep a rolling three or four years worth of information in the active database, and every summer archive the oldest year. The old information would remain available, and the one you use most often should have plenty of room to grow.
Either method will provide for future growth indefinitely, and the data would still be available for statistical analysis.
If it's not in your plans already, please remove student names from your database once they are no longer in your class.
posted by jsturgill at 8:21 AM on February 10, 2012 [1 favorite]
(If you wanted to save space, you could consolidate the information. Instead of logging an answer for each student, you could create one correct entry (23 got it right) and one wrong entry (7 got it wrong) for each question, rather than creating 30 seperate entries. But that may not be as useful.)
Access is a flat file database, like sqlite. It has limitations that other databases, such as MS SQL Server databases and MySQL databases, don't have. But it should be good for 100,000 records. The limit for Access is a 2GB file size. You'll be fine for years.
Once you hit 1GB or so in 2018 or whenever, create a duplicate of the database, name it something relevent (like, add "current" to the end), purge the current one of information, and begin adding to it. Whenever you need older statistics, you can go back to the old database. Everything should remain nicely responsive. Repeat as needed.
Alternately, you could keep a rolling three or four years worth of information in the active database, and every summer archive the oldest year. The old information would remain available, and the one you use most often should have plenty of room to grow.
Either method will provide for future growth indefinitely, and the data would still be available for statistical analysis.
If it's not in your plans already, please remove student names from your database once they are no longer in your class.
posted by jsturgill at 8:21 AM on February 10, 2012 [1 favorite]
are your tests all multiple choice?
and do you care if they just got the questions right or wrong?
if yes to both, you could concat the answers, so
you'd just need
EXAM_ID | ANSWER_KEY
STUDENT | EXAM_ID | STUDENT_ANSWER
Then:
to find out what questions are wrong/right and percentages, do character comparisons...
okay, this would be messy for the comparisons, but easier on the data...
posted by fozzie33 at 8:21 AM on February 10, 2012
and do you care if they just got the questions right or wrong?
if yes to both, you could concat the answers, so
you'd just need
EXAM_ID | ANSWER_KEY
STUDENT | EXAM_ID | STUDENT_ANSWER
Then:
to find out what questions are wrong/right and percentages, do character comparisons...
okay, this would be messy for the comparisons, but easier on the data...
posted by fozzie33 at 8:21 AM on February 10, 2012
Best answer: At work, I have the data geeks pull 100K records out of our big internal data warehouse and then do quick and dirty little Access databases for shuffling stuff around for analysis in Excel all the time. I really wouldn't worry about this at all. If *you* are worried, just create a big dummy dataset that's larger than you can possibly imagine and see if Access chokes on it. It won't. Efficiency be damned; it's ok to waste a few CPU cycles.
And 10 years from now your iPad will be about as powerful as a current desktop workstation, so what's slow today will be NBD when you actually need it.
Keep lots of backups, though. :)
posted by pjaust at 8:45 AM on February 10, 2012
And 10 years from now your iPad will be about as powerful as a current desktop workstation, so what's slow today will be NBD when you actually need it.
Keep lots of backups, though. :)
posted by pjaust at 8:45 AM on February 10, 2012
Exactly what everyone is saying. I am more conservative because of production issues related to pre-1998 access databases with thousands of remote users. I personally have made access Databases with millions of records to do quick and dirty data things but these were all throw-away. AND IF it got to large which would take a few years you could always migrate it to the free local SQL Server that MS makes now.
posted by mrgroweler at 8:54 AM on February 10, 2012
posted by mrgroweler at 8:54 AM on February 10, 2012
Looking ahead to when the database is up and running, I'm wondering how the test data will be entered. Have you thought about this? I understand your desire for granular information about each student + each test + each answer, but consider how much time it will take to enter 6 classes * 30 students * 30 test items = 5400 records per test. (Or do you have a way to merge test results into your database?) Just wondering.
posted by exphysicist345 at 3:13 PM on February 10, 2012
posted by exphysicist345 at 3:13 PM on February 10, 2012
With 32 bit primary keys and 16 bit answers, you're looking at around 10 bytes per student answer. That works out to 1 megabyte for your five years of data. Even if there's overhead for the data structures, I think you're going to be ok.
Anyway, it's not worth throwing away data for performance issues you might have 5 years from now. Premature optimization...
posted by scose at 4:52 PM on February 10, 2012
Anyway, it's not worth throwing away data for performance issues you might have 5 years from now. Premature optimization...
posted by scose at 4:52 PM on February 10, 2012
This thread is closed to new comments.
It has been a while since I used access but the size should be less of a problem then it used to be but as a caution I would do regular backups and Compaction/repairs if you DB gets to large.
In the past I have had access databases containing 100's of thousands of records. They weren't great and they were prone to corruption BUT yours would probably only be 1000's. in the past I always said my hard limit for a "safe" database table in access was 10,000 which was conservative even in access 2. So I say go with it and have fun, and learn how to use an inter-table to connect the data through queries.
posted by mrgroweler at 7:29 AM on February 10, 2012