Now that I have torn my hair out. Whats next access?
June 21, 2006 8:25 AM   Subscribe

Yet another Access Query Question. Similar to my other question, but even more frustrating. I need to craft a query that will tally values across many fields and entries.

Ok, So I have this table that has ALOT of fields, basically repeating, but not really. The user has to look at 40 different Criteria and report its state (Ok, NG, NA) and if they are NG they need to choose one of several "failure modes" that vary for each question. My table is built like this:
OpStdA | OpStdB | MasterLabelA | MasterLabelB |etc,etc
Where the A fields hold the OK/NG/NA data, and the B Fields hold the reason why it faid if A was NG
Thanks to the help I recieved on my last question I can now run a query that will pull out all of the "NG" audits, but now I need to take that information and extract even further information from it.
As I said before, each question has different failure modes. So OpStdB may have one of the following
Not Posted
Missing Pages
And the next one, MasterLabelB will have one of these
Missing Label
Wrong Label
Now some may have ones that repeat (Such as 'Op not following standards' which could apply to several different criteria).
I need to write a query that will count each Value for me. I want to know that between all of the results there were 120 Not posted, 10 Missing Pages and 40 Wrong labels.

I know my access-fu is weak, and I should have made this a relational Database originally. But it is what it is, and I just want to make it work at this point rather than remake everything from the ground up.

So hive mind, How do I make access count values to suit my needs.
posted by JonnyRotten to Computers & Internet (27 answers total) 1 user marked this as a favorite
sounds like you need a crosstab query. you can put one set of variables on the row setting, one on the column setting, and then one field that always has a data as the value field that you then set to count.

go to new query, add your table(s) and then select 'crosstab query' from the query types.

so opstdb might be your row value, masterlabelb your column value, and then select any other field that always has data (at my current job, I use the 'state' field in the address portion of the record, or sometimes I create a new field in a table called 'count' and then populate it with 1's)

if your data is complex, or in multiple tables, it might be easier to build a seperate table for the basis for your crosstab.

good luck.
posted by lester at 9:28 AM on June 21, 2006

I'm not sure there's an easy way to do this given your data structure. The following might work, although it assumes the failure modes are known in advance, requires an extra table, and may or may not present the data in the way you want.

1. Create a new table, call it FailureModes. It has one column, let's name it "FailureMode." Populate the table so that every possible failure mode is listed (that is, every possible value of the B fields: Not Posted, Missing Pages, etc).

2. You can then write a SQLquery like this:

SELECT FailureMode,
OpStdCount = SUM( IIF(OpStdB = FailureMode,1,0)),
MasterLabelCount = SUM(IIF( MasterLabelB = Reason, 1 , 0))
[, etc.]
FROM Audits, FailureModes
GROUP BY FailureMode

This will give you the number of failures of each type broken down by each of the 40 columns.
posted by blue mustard at 9:28 AM on June 21, 2006

I need to proofread better... in the SQL query above, the portion that says
SUM(IIF( MasterLabelB = Reason, 1 , 0))
should be
SUM(IIF( MasterLabelB = FailureMode, 1 , 0)).
posted by blue mustard at 9:36 AM on June 21, 2006

You might find the SQLAggregate function "Count(fieldname)" helpful. This counts occupied fields, ignoring Nulls. Trouble is, your data is not normalized, making this harder than it has to be.
posted by kc0dxh at 10:30 AM on June 21, 2006

Seems like "group by " is the important part. Just add it to your previous query.
posted by Four Flavors at 11:07 AM on June 21, 2006

Response by poster: Heres a related question.
What would be an easier way to do it?
I grok relational tables a little. But I don't see how I could use that here, as I have to repeat the form for 50 some presses each time with all 40 questions. How would I be able to relate the order the OK/NG/NA and related failure mode questions were answered in. OpStd is allways Question 1, but how would I show that when I went to call the data back up.
And the form to have them enter the information stumps me too. It would have to be a Form with a Sub form, but how can I make the subform ask all 40 questions and keep the order and labeling correct?
posted by JonnyRotten at 11:25 AM on June 21, 2006

Response by poster: kc0dxh, I wanted to be sure I understand what you mean by not normalized. Do you mean that its not going to be spelled or puncuated the same each time? Because I took out the human error aspect of it, and they choose their failure modes from combo boxes, so there is no chance of misspelling.
posted by JonnyRotten at 11:28 AM on June 21, 2006

JonnyRotten writes a month ago "I have been beating my head for days trying to think of a better table structure. So I guess if its 'naive table structure' I'll have to admit I could not for the life of me think of a better way to do it."

He writes today, "I know my access-fu is weak, and I should have made this a relational Database originally. But it is what it is, and I just want to make it work at this point rather than remake everything from the ground up."

Meta-answer: For over a month, you've been beating your head against a naive and awkward table design. Unless this project ends tomorrow, you're going to continue beating your head, getting frustrated, and coming to AskMefi asking how to make the best of a mess.

It's like someone saying, "I only know Roman Numerals. I've been told it's much more difficult to multiply Roman Numerals than Arabic numerals, but I just want to finish calculating these tables of sines ans cosines in Roman Numerals. I just want to make it work at this point rather than remake everything from the ground up. So tell me what MLCXII times XXXVI times CCCCI is, and by the way, how do I represent floating-point in Roman Numerals?"

(Bonus for readers notice that part of the analogy is that some Roman Numeral strings, like "CCCCI" don't even make sense.)

It's like someone asking, "how do I hammer all these nails with my screwdriver?" The only right answer is "slowly, painfully, and the result will be ugly."

The tables you've got are simple, but, as you've learned, not extensible as the project requirements change, and can only give you the analysis you need by "heroic" brute-force manipulation. Now's the time to step back and convert everything to Arabic Numerals.

The extra work required to set up the correct tables will save you days and weeks of frustration trying to make the wrong tables "work". In my answer to your previous question, a month ago I laid out pretty much the table structure you need. Don't frustrate yourself by continuing to insist on using the wrong tool for the job.

On preview: JonnyRotten writes "kc0dxh, I wanted to be sure I understand what you mean by not normalized."

In Relational database theory, normalization refers to the removal of redundant entities and the data anomalies or inconsistencies that redundant entities cause. There are six normal forms, each of which is a superset of requirements the prior normal form (e.g, for something to be in 3rd normal form, it must also be in 2nd normal form; to be in 2nd normal form it must be in 1st normal form). In practice,,third normal form is considered sufficient for many production databases, though I would (and do) argue that there is value in achieving fifth normal form. A tutorial on the first normals forms ("3NF") can be found here: Read it, it's actually pretty fun.
posted by orthogonality at 11:45 AM on June 21, 2006

Response by poster: Orthogonality, I have a second version of my database that I started when I posted this all the last time, building it like you said, and I still cannot get it to function in the way I need to. I'm sure its errors on my part, and I wish I could get things to jive the way your telling me, but with my lack of in depth knowledge I am haivng a hard time so far.
But I will keep banging on the second version, but I also need to have something working 100% soon, which I can then replace with the streamlined more intelligently designed version down the road. I know its wasted effort, but I can't seem to iron things down doing it relationally.
posted by JonnyRotten at 12:11 PM on June 21, 2006

I feel your pain. Seriously, read the tutorial on normalization. Even if it doesn't help now, you'll find it invaluable later in your career. If my financial situation were better, I'd just re-do your database gratis and for fun (I was sorely tempted last time), but right now I can't work for free.
posted by orthogonality at 12:17 PM on June 21, 2006

johnny: break your tables down: you have too many fields in your records that are not holding data. base your data on the variables you are working with instead: have a incident field, a field that establishes what kind of record it is, and then a field for data.

so your table would be as thus:

idfield (to identify the record)
incident field (this is what links this record with others that are associated with the dataset being entered, it might, for example be a date field)
type of record field (this identifies the kind of variable found in the value field)
value field (actual data)

so if i was handling a table with 10 columns, 1 for respondent number (rn), date, and 8 for answers to questions (q1-9), the initial structure looks thus:
resp. number
and so on.

to handle it differently, i'd make a new table as so:

resp number

qnumber would be a value of q1 through q8. date would be my incident value

that link that ortho has above gives a much more detailed version of this. it'd be good reading if you expect to fully understand relational databases.
posted by lester at 12:30 PM on June 21, 2006

Response by poster: I apperciate the gesture, I could sub-contract this out to a local place no problem, but this is a learning experiance for me as well. So Just having someone do it won't really teach me anything.
I just got done with that Tutorial on normalization. I have a better grasp, but I'm still stumbling on implementation.
I guess what I am having a hard time with is this(using the article you linked):
I have a the Customers table with the CustomerId as the primary key. Then I have the Orders table which pulls information from the three other tables. I don't know how to at some point take the information flow in reverse. I don't know if I am phrasing this right, which may be why I am mind fucking myself trying to understand.
take row 1 in the Orders table. How can I know that OrderId 126 is for "freens r us" at some later time?
posted by JonnyRotten at 12:38 PM on June 21, 2006

the order items table contains a field indicating what the order number is for each record. in a form or report, the order number field would used to link the orders table and order items table with a 1 to many link.
posted by lester at 12:53 PM on June 21, 2006

Response by poster: Ok let me know if I am even in the ballpark here.
IDField (autonumber Primary key)
Auditor (text)

IdField (Number Primary Key)
QuestionID (autonumber)

Question (autonumber)

Table 3 will have all 40 questions, so that when my form has the subform that will ask the questions, I can track that the first question is questionID 1 etc etc.

Am I even close?
posted by JonnyRotten at 1:06 PM on June 21, 2006

You're closer (but it's hard to tell how close, as we don't know exactly what you're trying to achieve).

Try to abstract it out. Leave out the details (date, time, shift, cell, etc.) until you have the basic structure.

From what we know, you have a series of questions, which are in "Table 3". Good so far.

Now, I think, your answers are a limited set too (three possible answers, right?). So make a table for answers. By answers, we mean TYPEs of answers, not the actual answers. So this table has an id and three rows, one for each answer: OK, NG, NA.

Finally a table for the (machines? things the questions are concerning?). Remember, each table concerns a TYPE of entity, and each row a particular entity.

Then we have a table that RELATES thingies to questions to answers. It only contains the KEYS from the other tables. We call it machine_question_answer It looks like this:

Relation_id(pk) Machine_id question_id answer Id
1 1 1 1
2 1 2 1
3 1 3 2

And it means, for machine 1, for question 1, the answer is answer 1
for machine 1, for question 2, the answer is answer 1,
for machine 1, for question 3, the answer is answer 2

Now with the relationship table, we can answer all sorts of interesting questions: How many "bad" answers total did we get?
select count(*) from machine_question_answer where answer_id = 2
How many bad answers did machine #1 get
select count(*) from machine_question_answer where answer_id = 2 and machine_id = 1

How many machine got the "bad" answer for question #10?
select count(*) from machine_question_answer where question_id = 10

Which machines had no bad answers?
select distinct machine_id from machine_question_answer a where not exists ( select * from machine_question_answer b where a.machine_id = b.machine_id and b.answer id <> 1 )

Now, you still need a way to deal with failure modes. You can do this a couple of ways: either get rid of answer "NA" and replace it with a different answer for each failure mode, or add a table of failure modes and a relational table that relates machine_question_answer's id to a failure mode. But don't do this until you have the basic thing working.

Here's basically what you'll have accomplished: instead of having 40 unrelated columns for questions and answers, you'll have a table of 40 rows for questions and a table of three rows for answers, and a table that combines questions and answers (and machines). Because you'll be dealing with rows instead of columns, adding a new question is as easy as inserting a row (instead of altering the table to hold a new columns). Because the questions are in rows, you'll be able to treat them as instances of an entity rather than (in columns) as attributes of a machine.
posted by orthogonality at 1:48 PM on June 21, 2006

take row 1 in the Orders table. How can I know that OrderId 126 is for "freens r us" at some later time?

By looking at a query.
I once read that databases are splitting up data just to put it right back together again and that's right. A simple join query on Orders table with Customers table joined would give you all of the info you needed. You would never really be able to edit within the Orders table by itself, but that's really something you "shouldn't" be doing anyway.

ps. listen to orthogonality
posted by Brainy at 3:01 PM on June 21, 2006

Response by poster: Sorry, should have been more clear on my field descriptions.

AuditResult - The OK/NG/NA result
Problem - When Audit result is NG this will be the problem description.

Problem can be different for each question, Some have 1-2 some have 5 or more.

I think I am getting it otherwise. I see where your going with the example. The queries look ALOT simpler then the monsters I am fighting with.

Playing with this database I am coming across the problem of building my form for data entry. How can I make my subform show all 40 questions at once? Right now I have Form1 subform1 where form1 is basically Table1 information, and Subform1 is table2 (which I will eventually need to ask all 40 questions, but I am starting with 10 to figure it out on a smaller scale)
I would like to open Form1 and have Subform1 display all the entries to fill out at once. Is that possible?

Thank you so much for all your help and guidance!
posted by JonnyRotten at 4:00 PM on June 21, 2006

"Getting" normalization is the biggest hurdle in understanding databases. It might help you to realize that from a properly normalized database, you should be able to generate a query that spits out, pretty much row for row and column for column, your original all-in-one table.

The key to making it all work is those little glue tables that have nothing in them but rows full of primary keys from other tables.

For your questions and problems stuff: you need a table of questions, which will have 40 rows. Then you need a table of problems, which will have as many rows as you have possible problems. Then you need one of those little glue tables, to associate questions with possible problems. Each row in the glue table will have just a question ID, and a problem ID.

One neat hack would be to make the first row in the problem table "no problem".

You'll need a table for auditors; another one for audits; and a glue table where each row relates an audit ID, a question ID and a problem ID (that is: each row says that during THIS audit, the result for THIS question was THIS problem).

A separate glue table would tie audit ID's to machine ID's.

If you ever find yourself needing to deal with different question sets for different machines: you'd need another glue table that relates machine ID's to question ID's.

For the purposes of designing forms, combo boxes and so forth, you generate queries, and use those as record sources.

Basically, the idea is to consider the entire process you're modelling, identify the different kinds of Thing you have (machines, problems, questions, auditors, audits...) and make a table for each of those; then make glue tables that tie those Things to each other in ways that fit in with your processes. As soon as you find yourself putting repeated columns in a table (Question 1, Question 2 etc), realize that there's a New Thing you haven't seen before; figure out what it is; make a new table for it; and add a glue table so you can generate your redundant-columns table using a query.

And once you've got this working in Access, have a look at porting it all over to OpenOffice so you won't have to worry about seat licences :)
posted by flabdablet at 9:06 PM on June 21, 2006

Ortho, I *love* your Roman Numerals analogy :)
posted by flabdablet at 9:07 PM on June 21, 2006

JonnyRotten writes "Problem can be different for each question, Some have 1-2 some have 5 or more. "

"The queries look ALOT simpler then the monsters I am fighting with."

Yeah, that's the point of all the hard upfront design.

Ok, good. A problem is an entity. Now, our "rule" is, every entity gets a row (and only one row) somewhere in our database . Since all individual problems (problem entities) are of general type "Problem", we need a "Problem" table. Our other "rule" is that each entity gets a PK. So this gives us a table with two columns, an id and a description.

Now we have to RELATE the problems to questions. This gives us a relation table with two keys and its own PK. Note that this means that a Problem can be related to one or many questions (e.g, every question might have the problems, "I dunno what the hell the problem is!" and "operator error!") and a question can have zero, one, or many questions. This is a many-to-many relation, which we implement as a cardinality reduction table.

At this point I'd add a "null" problem to avoid actually having null data in the table. (This is an advanced topic, and different designers would differ on what to do here. In RDBMS, the value null should be reserved for "I don't know". The "null problem" is not an "I don't know" but a way of saying this isn't really a problem, it's a placeholder that indicates "we KNOW there is no problem". In other words, we'd use it when the answer is "OK" or "NA". Used correctly, we could factor out "OK" and "NA" by using the null problem placeholder and possibly an "OK" placeholder. You'll see that this is essentially isomorphic to my previous idea ("You can do this a couple of ways: either get rid of answer "NA" and replace it with a different answer for each failure mode") . Again, all this is an advanced topic, it's one that good designers can differ on, and it's dependent on the goal of your whole model. So, for the moment, just do what I suggest.)

Now, with the relation table, let's call it question_problems, we can ask what the allowed answer is for any question with a join. We need to join three tables: the question table, which gives us the question text and id, the problem table, which gives us the problem text and id, and the relation table that relates the questions to problems:

select question.text, problem text from question, problem, question_problem where = question_problem.question_id and = question_problem.problem_id

(Note that is unordered. If we were being really serious, we'd have an order column on question (and on problem). Since we're still learning, we'll just order by PK. In real life, this would be evil, for reasons I won't go into here. Basically, you should never trust a PK to be anything but unique, and you should never use it for any purpose but a key.)

To make your subform show all questions, you just select * from questions. Now, for each question, you show the allowable problems. How you do this gets into non-SQL questions about how Access lays data out, but basically: if there's a control that suppresses repeating rows, you do the select above, ordering by question id. If the control won't suppress repeating rows, you do a select for each question's problems (in a subform) and put them into a drop-down, writing the answer to the machine_question_answers table. Again, this is not a SQL question, it's an Access form question, so it's outside of my scope. But it certainly can be easily done.

So you end up with a main form that show the machines (and any machine attribute data, obviously), a subform that shows the questions for each machine, and a sub-sub-form or dropdown that shows allowed answers for each question. All this crap to write machine_id, question_id, and problem_id to the machine_question_answer table. Again, if you make "OK" special form of "problem", a placeholder, the user can select that "OK" from the "problems" list, and you don't need an answer table.

"NA", I don't know what that means. If "NA" means, we should never ask this question about this machine, you just don't ask, by not including the (machine_id, question_id) tuple in the machine_question relation. If NA means "I dunno" it's a form of null (and in our structure, we either want a placeholder problem value or we want no row in machine_question_answer for it; which depends on your larger design, so I can't answer this.)

To recap:
* three entity tables, machine, question, and problem. Since problem includes an "OK" placeholder, there's no table of answer entities. Each table contains a PK auto-id and a description. It may include more attribute data (time, shift, etc.) later. (See below for a caveat.)
* two non-updated "cardinality reduction" or many-to-many relation tables: machine_question, which relates machines to the questions you ask about each machine, and question_problem, the allowable problems (or answers) for each question. Each contains a PK and the ids of the two entities being related.
* one table you update, machine_question_problem (which we previously called machine_question_answer) which records "for machine X, for question Y, the answer (or problem) was Z". (Database gurus will immediately see a way to reduce this to two columns, and improve referential integrity, but we won't get into that here.)

(Caveat: shift and time are unclear from your problem description, and may actually be attributes of the answer, not the machine. From your problem description, we can't tell.)
posted by orthogonality at 9:08 PM on June 21, 2006

flabdablet writes "As soon as you find yourself putting repeated columns in a table (Question 1, Question 2 etc), realize that there's a New Thing you haven't seen before;"

Yeah, this is great practical advice. A big part of database design is identifying entities. (The other big part is representing entities, and the other big part is representing entities' relations.) flabdablet gives an excellent rule of thumb for recognizing entities.

flabdablet's "glue tables" are what I'm calling relational tables or relations. Both of us indedendently suggested the "placeholder problem" hack.

Neither one of us has really explained why it's necessary to "break apart"stuff so you can "glue it together": the basic answer is each entity gets represented once in the database, so that it becomes impossible for the database to model two conflicting things about the same entity. Rather than duplicating an entity, you refer to it by its key. If you're familiar with the C programming language, a database key serves the same purpose as a C pointer.
posted by orthogonality at 9:20 PM on June 21, 2006

Crap. I'd hoped to spend tonight writing C code to handle the Ipod's scroll wheel.
posted by orthogonality at 9:24 PM on June 21, 2006

Just to get you thinking along the right lines: you'd start a new audit session by doing something like this:

1. Add a new row to the Audits table.

2. For each row in the Questions table, add a new row to the Audit_Questions glue table, with the Audit_ID you got from step 1, the Question_ID from the Questions table row you're working on, a Question_Number you generate on the fly, and NULL for the Problem_ID.

To build the query representing the current audit, you'd join the Questions table to the Audit_Questions table by Question_ID; join the Audit table to it by Audit_ID; specify the current Audit_ID as a match criterion, and order by Question_Number. Expect to fight with Access for a while before figuring out how you refer to the current Audit_ID. It will be accessible by some arcane combination of . and ! and form name and Parent and This and control name.

As your auditor works through the questions in this audit, the NULLs (which mean "we don't yet know the answer to this question for this audit") would be replaced with Problem_ID's selected from a combo box sourced from the Problems table. You can do this replacement right into the query, and Access will figure out which tables to stuff things in as it goes.

You can tell when an audit is complete when there are no more rows in your query with NULL in the Problem_ID field.

Incidentally, the way to do those combo boxes is to make them multi-column, with Problem_ID in the first column and Problem_Description in the second column; then set the width of the Problem_ID column to zero. That way, your user sees only problem descriptions, but you can easily select Problem_ID as the control's data source.

It's been ages since I played with Access, so this is all vague as hell, but hopefully it will start you off in the right direction.
posted by flabdablet at 10:11 PM on June 21, 2006

If your questions are machine-specific, you'd populate Audit_Questions from a query against several joined tables, rather than straight from the Questions table.

Also, you could impose a bit of sanity by having your combo boxes sourced from a query based on orthogonality's question_problem construct, rather than from the raw Problems table.

Just pull up Access's Relationships window and draw lots of boxes and arrows. You'll work it out :)
posted by flabdablet at 10:18 PM on June 21, 2006

flabdablet writes "a Question_Number you generate on"

flabdablet seems to have identified that audit is the master table, not machine, and he's probably right. On the other hand,

flabdablet writes "If your questions are machine-specific, you'd populate Audit_Questions from a query against several joined tables, rather than straight from the Questions table."

It's impossible to tell, because you (the OP) has never fully specified either the problem domain or the model. Not to harp -- you (the OP) are really making good, impressive, progress -- but one thing that really is necessary for a good answer is a complete question.

Again, don't be discouraged about this: when I worked a project at a major mortgage guarantor, they gave me a 90 page requirements document. Which never fully described the problem domain or the methods involved in billing. I spent several months iteratively doing use cases and implementations ("Ok, is this the right billing sequence" "Yes, unless the second insurance is flat rate insurance") with the domain client to piece together the actual financial rules. Almost to the end, there were always "special cases" the requirements didn't cover.

So flabdablet (and I) have to guess and make assumptions that may or may not be right, which may or may not lead you to the right answer. flabdablet clearly knows what he's talking about; it would be a joy to work with him.

But neither he nor I can tell you the right answer when the question is vague or incomplete. And we really, really want to: you've stepped up to the plate, worked hard to understand what we're bitching about, really want to learn, and I'm just tickled to see that, nothing is more satisfying than teaching a hard-working student. But for future reference, make your question as explicit as possible.
posted by orthogonality at 11:48 PM on June 21, 2006

Response by poster: Ok. Let me try to draw out the process I am trying to replace as fully as possible, incase you guys are still checking in on this thread. I'll try to answer your questions as much as I can.
Currently the Auditors have a sheet of paper with 40 questions on it. The Same forty get asked for every machine, but some do not apply (which is why there is a NA choice). Each press runs different parts from time to time, so whats NA today may not be NA tommorow. The questions are *somewhat* part specefic, but not allways, again that may change from day to day as well. So, to me, it warrents asking all questions every time.
We have 60 different presses (machines) and several hundred parts that we run between them.
Shift is what Shift they are working on, we have three a day. I *could* make this relational to the auditor, but if they stayed over or came in early I think that could cause problems. And I intended time to be a timestamp of when they started filling out that exact audit on that press.
Every shift they fill out the audit sheet on every press, usually two rounds around the factory per shift.
Here is my current crappy version that I am trying to replace with your help. Maybe seeing where I am now will help you see what I am trying to accomplish.

Thank you guys so much. Its starting to click. The Glue tables analogy makes sense to me as far as sticking things together. Words cannot even begin to express how much I apperciate everyones help.
posted by JonnyRotten at 6:24 AM on June 22, 2006

The machine I'm at right now doesn't have Access installed, but I've downloaded your .mdb and will have a poke at it next time I have access to Access. Can't promise to do this quickly - now we're getting all specific 'n' stuff, the time required is probably going to go up a fair bit - but I will work on it.

I've been meaning for a while to mess about with OpenOffice's Base component to see how it compares with Access and how to go about porting stuff from one to the other, and this could be a good toy project for me to do that with, so it's not like I'm getting nothing out of this; and, as O says, it's great fun to watch normalization clicking with somebody who actually appears to give a shit.

I'll email you when I've got something to show you.
posted by flabdablet at 8:04 PM on June 22, 2006

« Older robots are taking over   |   Can a noise provoke disgust? Newer »
This thread is closed to new comments.