Stats / BI / KPIs in an adverse environment
November 6, 2018 6:02 AM   Subscribe

I'm looking for books, papers, blog posts or anything else that deals with measurement in an environment where the data sources are not simply biased, but actively try to deceive the observer. Is there research into this? If so, what are the key terms to look for? Details, examples below the fold.

I work as a BI developer in a big multinational software company. This is not the first time I'm working in BI and I am again struck with the same fundamental issues. Basically, I see Goodhart's Law in action all day long, coupled with management commiting McNamara's fallacy wherever possible. This got me thinking: is there any serious statistical, management, organizational or game theoretic research in an environment where the "data sources" are intelligent agents and may or may not fudge their numbers or try to influence the data collected according to their own incentives?

Examples I have in mind:
  • Sales reps hiding data from each other (in a CRM system) for fear of losing a deal to another salesperson

  • Managers inflating reported work, headcount requirements

  • Projects fudging RoI or other metrics

  • Emissions by german cars is also a good example :)


  • These are only from the top of my head.

    Is there any research, if it's possible to draw inferences in such an environment? Any pointers are appreciated.

    Or if the above problem can be rephrased in other terms, or can be shown to be analogous with some other field, that could be also useful. For example, if the business situations can be imagined as a task in counterintelligence (=trying to uncover deception from the opposing side) then perhaps techniques and tools from that field could be useful in the business context too. Or is that too far fetched?
    posted by kmt to Technology (8 answers total) 11 users marked this as a favorite
     
    Demand characteristics are well documented in the social sciences (and, for that matter, in political polling...).

    Not sure what to tell you about management who misuse KPI's, other than keep a good stock of horror stories to be trotted out during presentations.
    posted by Mogur at 6:18 AM on November 6, 2018 [1 favorite]


    Best answer: I'm not in the BI world, but your question made me think of this paper, which analyzes inflation expectations in an environment where official inflation statistics were (potentially-)biased.

    It's a fun paper that you might find interesting, even though it's not quite what you're asking about.
    posted by schroedingersgirl at 6:28 AM on November 6, 2018 [1 favorite]


    Great question. My first thought is, can we articulate how it changes matters statistically if the deception is intentional? That is, how is the scenario different from the types of bias we are used to working with (censoring e.g.)? It seems important to know how far existing methods will get us and where they will fail. The one difference that comes to mind is that in this case the bias may be adaptive. I can imagine this raising special issues in statistical process control, where comparisons over time are important.
    posted by eirias at 6:38 AM on November 6, 2018


    Try “statistical fraud detection” as a key phrase.
    posted by eirias at 7:23 AM on November 6, 2018 [1 favorite]


    There are lots of examples of this in the blog, Statistical Modeling, Causal Inference, and Social Science. See, in particular, the numerous blog entries about Cornell researcher Brian Wansink. Note, however, that there are also many more examples of people who act in good faith (more or less, anyway) but whose statistical analyses lead to spurious results -- results that usually conform with what the researcher hopes to see.
    posted by alex1965 at 7:40 AM on November 6, 2018 [1 favorite]


    I have known call center agents to manipulate their own individual performance metrics, does that fit in with what you're looking for? If so, I've primarily seen two different types of manipulation:

    - Deliberately choosing cases that are easy or quick wins or are otherwise much more likely to result in positive customer satisfaction scores, and deliberately avoiding those that are more likely to be complex or involve telling the customer no
    - Artificially inflating their case solve counts by not merging duplicates or consolidating related cases, deliberately seeking out cases that can be closed with no work, and in very rare instances, creating false cases
    posted by rhiannonstone at 8:42 PM on November 6, 2018 [1 favorite]


    Response by poster: schroedingersgirl, thanks for that paper! Though it is not exactly what I had in mind, it's pretty close + it seems to have a couple of nice references which I can follow and try to dig up more papers.

    All the other suggestions are great, and provide food for thought.

    rhiannonstone - your examples are a perfect fit for what I'm looking for. Even if there's no field systematically investigating these problems, having more examples like this helps a lot!
    posted by kmt at 12:49 AM on November 7, 2018 [1 favorite]


    Response by poster: Almost a year later, totally by accident but I found a paper exactly about this question. Thought others may find it useful: Manheim, David: Building Less Flawed Metrics. The abstract:
    Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell[Cam79] and Goodhart[Goo75], and in some forms it is not only common, but unavoidable due to the nature of metrics[MG18]. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach.
    It's worth skimming if only for the bibliography!
    posted by kmt at 3:53 AM on October 24, 2019 [1 favorite]


    « Older “Politics = Personal” is eating away at my...   |   Guitars, Pedals, Kits, Oh My! Newer »
    This thread is closed to new comments.