How do we know survey responses are honest?
June 29, 2010 5:41 AM   Subscribe

How do social scientists, psychologists, etc. detect and control for people lying in response to their questionnaires?

I have no training at all in the social sciences but find some of the research absolutely fascinating.

I read a slow trickle of articles and listen to a couple of social science podcasts* which often feature interviews with the scientists who conducted the study. Every time I read or hear about a study that relies on survey or interview data, I'm struck by the fact that no-one ever mentions the possibility that the answers given were misleading.

For a few examples:

(a) In a survey about drug use and sexual activity that I completed in high school, I and almost all my peers eventually admitted to each other that we'd lied. Boys had generally exaggurated their experience (although a few regular drug-takers strongly downplayed this) and girls generally downplayed their experience. [Of course, it's possible that we were lying to each other about lying on the survey...]

(b) I recently listened to a podcast about a study on teenagers in gangs. Along with aggregate data, individual responses were quoted and discussed; neither the study author nor the interviewer mentioned the possibility that these teenage boys were exaggurating their answers to make themselves sound tough.

(c) Recently on Metafilter, we discussed a study on the development of lesbian couples' children (here) that relied entirely on the parents' testimony. The authors didn't seem to question their testimony at any point, e.g. the possibility that lesbian parents might feel their families to be subject to a lot of scrutiny, and so be more motivated to present their kids' behaviour in a good light than hetero parents.**

It's well-established that people tend to stretch the truth to make ourselves look good and that, through selective formation and distortion of memories, we actually tend to remember events and patterns in a light that flatters our self-image and agrees with our pre-conceptions. So for any given survey response, there seem to be at least four possibilities:

1) The response is a full and accurate description of the situation
2) The respondant believes the response to be true, but their memory is biased and/or incomplete
3) The respondant believes the response to be mostly true, but with some deliberately altered details to make themselves look good
4) The response is mostly or entirely a deliberate lie

My gut instinct is that (2) and (3) are the most likely classes of response that people will give. However, all the papers I've read or interviews I've listened to seem to treat the answers as if they're (1).

So, my questions are:
(a) Is it really generally assumed that all responses are honest and accurate? How is this assumption justified?
(b) If not, how do researchers distinguish between honest-and-accurate, honest-but-wrong and dishonest answers?
(c) Is there good data indicating how certain populations tend to lie about certain responses (e.g. "take a teenage boy's reported sexual encounters and divide by three"!), or is trustworthiness just based on gut instinct?
(d) Do surveys or interviews usually include error-checking questions, and how are the sensitivity and selectivity of these deception-detecting questions validated?

*The BBC's Thinking Allowed is a great social science podcast, if you're interested.
**NB: I'm NOT saying that this is necessarily the case, as I don't have any information. I'm just using it as a recent example of a potential strong motivation for participants to lie, in a paper where the authors appear to take all the responses at face value.
posted by metaBugs to Science & Nature (21 answers total) 15 users marked this as a favorite
Best answer: Let's say you want to find out what percentage of men beat their wives. You place 2 cards with questions upside down in front of each participant, one with the question "Do you beat your wife?" and the other one with "Do you like drinking tea?". They pick one and answer the question, knowing that the researcher doesn't actually know which question they answered.

Since you know what percentage of men like to drink tea (either it's common knowledge or you find it out in a separate survey), you can then calculate how many of them beat their wives, by taking into account the deviance from that number.

Sorry if my explanation is a bit sloppy, I'm also sure there's a scientific term for it, but I can't remember.
posted by cronholio at 6:07 AM on June 29, 2010

Best answer: Most data I've seen has at least mentioned problems like that. The technical term for this is "measurement error: response bias: social desirability" which will get you a lot more information.
posted by anaelith at 6:09 AM on June 29, 2010 [1 favorite]

Best answer: The CDC is behind many of those high school surveys through something called the Youth Risk Behavior Surveillance System. They have done a lot of work on the reliability. Here's one paper on the subject: Assessment of Factors Affecting the Validity of Self-Reported Health-Risk Behavior Among Adolescents (PDF)
posted by smackfu at 6:15 AM on June 29, 2010

Best answer: One very simple technique is to ask essentially the same question in several different ways. This takes advantage of the fact that, while its easy to lie, its harder to lie consistently.

For example, say you are trying to figure out the number of sexual partners that a person has during an average year. In a survey with 50 questions, question #3 might be, "How many sexual partners have you had during the past year?" Question #29 might be, "When did you first become sexually active?" and question #30 might be, "How many sexual partners have you had in your entire life?" Finally, Question #47 might be, "How many sexual partners have you had in the past six months?"

Each of these questions elicits similar (though not the same) information in slightly different ways. Obvious incongruities (e.g., answering "5" to #3 and "6" to #47) will identify blatant lies or errors. More sophisticated study design and analysis can help ferret out lying, sloppy answers, and/or recall bias as well.
posted by googly at 6:22 AM on June 29, 2010 [1 favorite]

Found it.

See also.
posted by cronholio at 6:22 AM on June 29, 2010 [1 favorite]

You may want to read up on "implicit attitudes" or "implicit measures of attitude" that try to overcome the problems that you mentioned. Here is an example of one such test. There is a demo that you can take to see how it works. From the FAQ, "...people don't always ‘speak their minds’, and it is suspected that people don’t always ‘know their minds'.
posted by prenominal at 6:34 AM on June 29, 2010 [1 favorite]

The Validity Scales of the Minnesota Multiphasic Personality Inventory (MMPI) are supposed to determine whether is someone is giving false answers. They're based on patterns such as consistency, as well as, IIRC, tendency to answer in a fashion that is too perfect or too imperfect to be realistically true. As with all aspects of the MMPI, they are not without controversy.
posted by dephlogisticated at 6:39 AM on June 29, 2010

On questionnaires it's possible to check for some of this by including a lie-scale made up of questions that sound weird or objectionable, but are actually true for almost everybody, like "I sometimes get confused" or "I have taken things home with me from work that didn't belong to me." These are mixed in with the regular questions. If a person gets too many of these questions "wrong" (a high score on the lie scale), their answers can be disregarded.

It also a good idea for researchers be honest themselves and acknowledge that they're using self-report data and that it has to be taken with a grain of salt: "many participants in our study reported X," etc. These kind of caveats frequently get omitted in media reports.

(How much this is a problem also depends on how you're interpreting the data. If people in group A report Y more often than people in group B, then there's a good chance people in group A do encounter Y more often, assuming the temptation to lie is the same for both groups - but maybe not at the actual levels reported for either group.)
posted by nangar at 6:39 AM on June 29, 2010 [2 favorites]

Despite the various methods listed above, there are probably a lot of false responses that go unnoticed.
posted by Obscure Reference at 6:44 AM on June 29, 2010

On a more global level, studies that ask questions like these (personal, sensitive, illegal) are often required to be conducted with additional safeguards for personal anonymity - cronholio's example is one way of doing this, but even better is truly anonymous (rather than confidential) research, where once the study has been conducted noone (including the investigator) has any idea who participated. Phone & internet surveys are one way of doing this, provided no phone numbers or IP addresses are recorded. If the study requires more immediate contact (say, face-to-face interviews with gang members), the researcher can get something called a Certificate of Confidentiality that protects data from subpoena.

Of course even with all this you want internal checks like those mentioned above :)

At American research institutions, the organizations that regulate this kind of stuff are called IRBs (institutional review boards), if you're interested in that end of things.
posted by heyforfour at 6:58 AM on June 29, 2010

Not quite the same thing, but many market research studies ask whether the respondent has ever heard of certain brands, whether they're familiar with them, etc. These studies often include questions about imaginary but plausible sounding products to get a baseline measurement of people saying "yes" when they're unsure.
posted by Clambone at 7:18 AM on June 29, 2010

This question and possible answers was a great subplot in the movie Kinsey. (The movie character) Kinsey's answer was to treat the data very carefully. I have no idea how factual it is, but the character in the movie said something like a literal questionnaire was of no value compared to a lengthy interview with empathy between researcher and subject and the researcher had something like intuitive trust regarding the truthfulness of the subject.
posted by bukvich at 7:30 AM on June 29, 2010

Often the false answers themselves are of interest because of what they show about societal attitudes -- I was just reading about the latest iteration of the ubiquitous "how much time do you spend doing X?"study that looks at how Americans spend their time. (These numbers are checked against self-kept time diaries -- and people still are very wrong when they do the survey even when they've been keeping track! -- and with researchers who can follow people through their day, both applied to some percentage of the survey-takers.) And the researchers were discussing how men have consistently and significantly lied/been inaccurate about how much time they spend parenting their children (while women are fairly accurate).

But in the 1950s, they lied to say they spent LESS time parenting than they actually had, because it was seen as women's work.

Today, they lie to say they spent MORE time parenting than they actually had, because hands-on dads are now admired.

Which gives us an interesting window not just into what's being surveyed ("how do you spend your time?") but into attitudes towards parenting!
posted by Eyebrows McGee at 7:30 AM on June 29, 2010 [1 favorite]

I'd add to the above by saying that even if you have problems with honest responses, your data can still be quite useful.

Some ways to get around this sort of error: multiple measures, randomization, and several waves of data collection from the same set of individuals over time may also help you to evaluate the honesty of the respondents/ quality of your data.

Multiple measures: say you survey high school student delinquency. In addition to your survey of students, you can do physical observations of the school for things like graffiti, survey teachers, and so on. Seeing how your delinquency survey data match up to these measures can give you a good notion of the relative level of delinquency in a school, though it doesn't necessarily tell you much about the truthfulness of a particular response.

Randomization: knowing that you are going to have some sort of issue with truthfulness, you draw random samples of students so that you know, on average, that your samples will have the same issues with honest responses. You can still therefore talk about differences in responses between the two groups.

Multiple waves of data collection: if you administer a survey to a cohort once a year or so, you can get an idea of how consistent an individual's response to the survey is, which gives you a notion of how honest that person is being.
posted by _cave at 9:19 AM on June 29, 2010

It depends on your data set but one of the things you can do is outlier analysis. If your data should be normall distributed (say rank this these bands from 1 to 20) and you find for one band you get a normal curve centrered on, oh, say 14 and then a whole bunch of 20s, you can pretty much bet ballot box stuffy occured.

I bring up outlier analysis here.
posted by Kid Charlemagne at 10:12 AM on June 29, 2010

Here is a concrete example I use in class:

I give you a coin. You flip it. If it comes up heads, you answer the question "Have you ever cheated on a test while at this college?" If it comes up tails, you answer the question "Is today Wednesday (or whatever day it is)"? You give me your answer after flipping the coin in some private location.

Suppose I have 500 volunteers. Say I get 300 "yes" and 200 "no". I reason in the following way:

If someone answered "no", they must have been answering the cheating question, because today is Wednesday.

If they answered yes, about half of them (150) answered the first question and about half of them (150) answered the second question, since heads and tails are equally likely when you flip a fair coin.

So, I had 150 "yes" to my first question, and 200 "no" to my first question, so I estimate that 150/350 = 43% of students have cheated on a test at some point. Then, I can use this estimate to construct a confidence interval or whatever.
posted by wittgenstein at 4:24 PM on June 29, 2010

wittgenstein, why do you assume that half/half of the "yes" answers must be for the test cheating question? I would think that, since heads and tails were about equally likely if you flip a fair coin, then you should assume that 250 of the flips were heads, and 250 were tails. Since all 250 of those tails flips had to answer "yes", then wouldn't that leave only 50 "yes"es for the cheating question, and a total cheater percentage of 20% instead?

Or is this another insidious example of the Monty Hall problem?
posted by that girl at 1:12 AM on June 30, 2010

Yeah, wittgenstein, getting 150+350 heads+tails from a fair coin is less likely than 250+250 by a factor of 1018. that girl's analysis is fairer.

Although, I don't immediately see why that method controls for liars. You know that 200 people claim never to have cheated on a test, and you have 90% confidence that you asked between 225 and 275 people about cheating, so somewhere between 25 and 75 admitted cheaters. But if half the cheaters falsely said "no," how are you sensitive to that?

Also: Wednesday already? I don't believe it.
posted by fantabulous timewaster at 5:37 AM on June 30, 2010

Response by poster: Thanks for some fascinating answers - I'll be able to spend more time reading around and searching with some new terms. I know my questions are covering a big swathe of what's probably a complex topic, so thanks for the crash course :).

I've repeatedly seen what an unholy mess the media tend to make of reporting physical science results, so I suppose it's not a big surprise that I don't see the error-catching techniques discussed.

Re. the "coin flip" method of asking difficult questions, I've read about a similar but slightly simpler technique. The participant is given a card with the question, then told "Toss a coin: if it's heads, answer the question; if it's tails just write 'yes' in the answer space". The analysis then runs something like:

Consider a study of 200 people, of whom 143 answered "yes".

Half of the participants (200*0.5 = 100) just wrote "yes" because of the coin toss. This accounts for 100 of the "yes" answers.
The other half of the participants (200-100 = 100) actually answered the question
Of those 100 who actually answered the question, 43 said "yes"
Therefore the rate is 43/100 = 43%
...or so I was told. Of course, there's some error introduced by the coin toss. I'll delightedly follow along any discussion about the interpretation of these statistical tests but can't contribute much. While I understand the Monty Hall problem (and have a reasonable but very utilitarian grip on statistics), thinking about it too much makes my head hurt.

Following fantabulous timewaster's point (with eponysterical footnote), I know the idea of these studies is that people should no longer be motivated to lie as, if challenged about their answer, they can just blame the coin toss. It's a clever idea, but has it ever been (or can it ever be) validated?
posted by metaBugs at 6:37 AM on June 30, 2010

Good point. I picked 300 out of the air, to make the arithmetic easy (though I am on Metafilter, I probably don't need to do that) but it doesn't make sense. I should have 250 nos, then split the 250 other people up so that the percentage of cheaters is a sensible number (like 43%).

I will have to put a little more thought into my example next time I do it. (Fortunately, my students are not as sharp as you guys.)
posted by wittgenstein at 8:58 AM on June 30, 2010

Also, the idea as I understand it is that people will be comfortable telling the truth because I can't know if any individual person is answering the first question or the second question. But, I do believe there are people who think this method does not work as well in practice as it should in theory. I would hope that someone who actually used this method would be appropriately skeptical of the results.
posted by wittgenstein at 9:02 AM on June 30, 2010

« Older How to become a chef?   |   Cigarette lighter in car doesn't work. Please help... Newer »
This thread is closed to new comments.