Comparing two similar but non-matching tables in Access?
February 5, 2009 7:43 AM   Subscribe

Microsoft Access query problem - Comparing two similar tables that don't match up well. It's lists of classes in a collection of locations from two consecutive years, and what I need is to show the class fees side by side for any class that happened in the same location both years.

So, two tables. Each is something like this:

ClassID# - ClassLocation - ClassName - ClassFee

One table is for 2007, one for 2008. The class ID numbers are ALL different - thus, if location A offered a "yoga" class in both years, the class ID# is different in each table, so the only way to match them up is if both the "ClassLocation" and "ClassName" fields are the same. Which I cannot figure out how to do in Microsoft Access.

Further - the lists of classes differ from year to year - what each location offered in 2007, they didn't necessarily offer also in 2008. (Otherwise one could theoretically just paste the two tables next to each other in Excel.)

So the result I'm looking for is:

ClassLocation - ClassName - 2007fee -2008fee

...for any instance where the "location" and "name" are the same in both years.

Would I be better off combining the two tables into one? (say, adding a column for "year"?) And then running some sort of query off of that?
posted by dnash to Computers & Internet (13 answers total)
 
What you want to do is join the tables together in a meaningful way. I confess to not know Access inside and out, but maybe these steps will get you moving in the right direction.
posted by teabag at 7:55 AM on February 5, 2009


This may overcomplicated things, but what I would do is make a third table with the following structure. The reason I'd do this is presumably you're going to have a 2009 table at some point, and why keep merging them.

ClassTypeID - ClassName
0001 - Yoga
0002 - Meditation
etc.

Run an update query to add the ClassTypeID to both tables. Add relationships between the ClassTypeID table created above and Table2007 and Table2008. Now run a select query where ClassTypeID from Table2007 = ClassTypeID from Table2008 AND Location from Table2007 = Location from Table2008. Display your desired fields.

The other reason I'd create this new table is because otherwise you get data duplication in your merged table (it's not normalized), e.g.

ClassID# - ClassLocation - ClassName - ClassFee - Class Year
0001 - Los Angeles - Yoga - $100 - 2007
0023 - Los Angeles - Yoga - $120 - 2008

Technically, to make it completely normalized you should also create a Locations table with unique IDs.
posted by desjardins at 8:05 AM on February 5, 2009


Btw, "some sort of query" gives me the impression that you're not adept with Access, so if you have follow-up questions, don't be shy.
posted by desjardins at 8:07 AM on February 5, 2009


SELECT a.ClassFee AS ClassFee2007, b.ClassFee AS ClassFee2008
FROM TableFrom2007 a, TableFrom2008 b
WHERE a.ClassLocation = b.ClassLocation and a.ClassName = b.ClassName
posted by grippycat at 8:09 AM on February 5, 2009


You want to drag both the ClassName and the ClassLocation fields from the 2007 table to the 2008 table in the build query view, after adding both tables to the query.

This will only return results where the classname and classlocation were the same for both years. If you added new classes, or removed some classes, they won't be included in the results.
posted by Pants! at 8:09 AM on February 5, 2009


SELECT tbl2007.ClassLocation, tbl2007.ClassName, tbl2007.ClassFee, tbl2008.ClassFee
FROM tbl2007 INNER JOIN tbl2008 ON (tbl2007.ClassName = tbl2008.ClassName) AND (tbl2007.ClassLocation = tbl2008.ClassLocation);
posted by okbye at 8:09 AM on February 5, 2009


okbye stole my brain (and did it with much better syntax!)
posted by grippycat at 8:11 AM on February 5, 2009


Response by poster: okbye - that doesn't work. It gives me tons of duplicate rows. There are about 3800 rows in each table, so that should be the most rows in the query result, right? But your way gives me a result with 15000+ rows. I actually had tried that yesterday and having that problem with the result is why I came to Ask.Me today.
posted by dnash at 8:29 AM on February 5, 2009


Response by poster: I think I figured out what causes the multiple rows - each location can have multiple classes of the same name in each year, and I guess the query is matching each instance of one year to each instance of another

2007 2008
yoga yoga
yoga yoga

gives a query result with FOUR "yoga" classes. (Am I figuring that right?)

Any ideas on how to eliminate that issue, or minimize it?

@desjardins - I did add the "class name ID" number table you suggested. (actually already had one separately, it just wasn't connected to the actual class list database.) I still get the multiple rows using it, though.
posted by dnash at 9:06 AM on February 5, 2009


Best answer: Open up the properties of the query and setting the Unique Values property to Yes. I'm not very well versed in SQL, but I think this is equivalent to SELECT DISTINCT. That might help with the multiple rows of the same thing.
posted by Mister Cheese at 9:45 AM on February 5, 2009


dnash writes "I think I figured out what causes the multiple rows - each location can have multiple classes of the same name in each year, and I guess the query is matching each instance of one year to each instance of another"

Add a "distinct" to the query okbye or gippy gave you:

select DISTINCT tbl2007.ClassLocation, tbl2007.ClassName, tbl2007.ClassFee, tbl2008.ClassFee
FROM tbl2007 INNER JOIN tbl2008 ON (tbl2007.ClassName = tbl2008.ClassName)
AND (tbl2007.ClassLocation = tbl2008.ClassLocation);

In the long run, desjardins is correct: your tables are denormalized which is why this is so much trouble. (And her "technically" note about locations is correct, but I wouldn't call it a technicality.)

You have two kinds of entities, class types and class locations, and a third entity which is a sort of relation between the two: this kind of class at that location. This third entity, call it a class or class instance, has a fee and a year, which are attribute data*.


* Purists will note that a year is an entity too. This is where database design becomes an art: do we make that an entity or not? The brief answer is, "yes, if it is true that a year is not really a year, but an arbitrary period during which we hold classes". Why? Because if we decide, for instance, to hold classes in the Spring and Fall, our year designation breaks down.

Then some idiot will try to talk about "2008-Fall" is "year"'s column type is a string, or "102008" if it's an int, or (clever but still wrong), "2008-03-01" if it's a date type. If year's an entity and thus our class refers to it by a key, we can do two things arbitrarily add attributes to it ("season", "semester", whatever) and/or arbitrarily make however many we need: "2008 Spring", "2008 Fall".

If, however, a year is really a year, and a location can only have any particular class once in ayear, then year is merely attribute data.

Now: what if the same type of class can be held at the same location more than once in any period, that is, say, Section 1 Yoga and Section 2 Yoga, because yoga's so popular this year? I leave this as an exercise for the reader. How would you accommodate this? How would you accommodate this if your client assured you that it could never happen? How would you accommodate this if after assuring you this could never happen and building a front-end that was costly to change, the client then discovered it need to be accommodated?
posted by orthogonality at 9:50 AM on February 5, 2009


Response by poster: Bingo. The "unique values" property (in Access - which changes the SQL to "select distinct") did the trick. Thank you, thank you, thank you.

And yes, I'm aware of the normalization issues. I'm not a real database expert, but I'm fairly experienced, enough to know that this database is not how I would have it be. The trouble is I don't really have control over that - I'm given a big "flat file" of a class schedule every season. I suppose I could work out ways to break it up into a more normalized form every time, but that might be more work than it's worth.

For now, I think that "select distinct" option may have just solved one of my most nagging issues, and hopefully will make my bosses nice and happy. :)
posted by dnash at 10:15 AM on February 5, 2009


I suppose I could work out ways to break it up into a more normalized form every time, but that might be more work than it's worth.

Or you could import (or link) it and save your steps, so it's added (or linked) to your normalized database. *shrugs* It all depends on how much you use the thing - if it's something that you only have to look at once a month, then why bother, but if it's weekly or daily, it's worth the effort.
posted by desjardins at 12:00 PM on February 5, 2009


« Older What is the difference between card stock and...   |   How to fight depression? Newer »
This thread is closed to new comments.