The Adventures of Shylock Holmes: Statistics

Monday, March 26, 2012

"We're better than those guys

Statisticians rarely make good members of sports teams.

I found this out the hard way when I used to be on a frisbee team. Most people run on overconfidence. I've had numerous arguments with people over whether this makes sense or not. The general view is that if I psych myself up that we're going to win, I'm going to try harder to make it happen. If I believe I'll fail, I'll be demoralised and not try hard.

The idea is thus that belief in success and failure has a self-fulfilling component. Only a component, mind you - if I really truly believe I can beat Kobe Bryant to the net in a game of one-on-one, I will fail. But I'll still have a better chance than if I don't believe in myself.

Frankly, I was always a bit skeptical of this argument, as it reeks of a second-best solution. In other words, if you're being rational, better answers are unlikely to come from deliberately feeding in faulty inputs. Including your chances of victory. This only works if it's the workaround to some other faulty process - one bias (inability to try hard in the face of failure) is offset by another bias (convincing yourself that you won't fail).

But I remain committed to the belief that the first-best solution is always to eliminate the biases - in this case, figure out how to try hard even if you do think you'll lose.

Since this is what I aim at, I want to know the true probability, and work from there. It's a fair bet that most other team members (if they're non-economists or non-statisticians) won't feel that way. They'll view you as a negative nancy.

I remember this came to its zenith when we were down at half time. The captain of the team was trying to get us fired up. He said 'hands up who thinks we're going to win this game'. About half the team put up their hands. He responded, 'Right, you guys are on the field'. Personally, I thought this was absurd, but that's probably part of the reason I never got made captain.

The net effect of all this is that you end up with the absurd result that on any given sports field, at least 70% of the players think they're going to win. They think that they're better than the other team. Talk about the Lake Wobegon Soccer team effect.

It also leads to a hilarious misconception of what it means to be 'better' than the other team. For most people, if they lose on a knife-edge, they'll be bitterly disappointed.

But the statistician sees it differently.

If we play against a really rubbish team, we'll win about 95% of the time. Then we'll advance higher, and play a better team, that we'll beat 70% of the time. We'll advance higher still, until we're playing a team that we have an edge over, but it's tough - we might win 60% of the time.

And eventually, we'll get to a point where we're playing against a team that's very evenly matched. We'll have a 50% chance of winning. And we might just end up in a 16-16 game to 17. And someone drops the disc, and the other team scores. And we lose.

The non-statistician weeps.

The statistician is sanguine. In expectation, we got exactly where we should have. We bet on a fair coin, and it came up tails. This time we lost. Next time we'll win.

But there's no disappointment just because the coin landed on tails.

Funnily enough, that might make for a reasonable consolation speech afterwards. It would certainly have a better likely effect relative to the 'we're probably not going to win, but I plan to try jolly hard anyway' speech.

On the other hand, I'd would be much more inspired by the speech that talked about the true probabilities.

After all, not everybody who's willing to face up to true probabilities is necessarily a coward. The best response to likely defeat is to stare the truth in the face, and give it the finger.

Thursday, December 1, 2011

Psychologists Getting Statistics Wrong

Ace of Spades links to a study that claims to show that people view atheists as being less trustworthy. This was also covered in the National Post. The headline claim is attention-grabbing:

Atheists cannot be trusted: Religious people rank non-believers alongside rapists, study

Controversial stuff. As in all this stuff, you should always read the original study before rubbishing it. The author, Will Gervais, kindly has a version on his webpage, which you can read here. And I'm sorry to say that nearly the whole study appears to be done wrong.

So how exactly does Mr Gervais establish that atheists are as untrustworthy as rapists? Let the study tell the story - this is Study 2 of 6, but 5 out of the 6 studies have the same problem:

One hundred five UBC undergraduates (age range 18 –25 years, M 19.95; 71% female) participated for extra credit. Participants read the following description of an untrustworthy man who is willing to behave selfishly (and criminally) when other people will not find out:

Richard is 31 years old. On his way to work one day, he accidentally backed his car into a parked van. Because pedestrians were watching, he got out of his car. He pretended to write down his insurance information. He then tucked the blank note into the van’s window before getting back into his car and driving away. Later the same day, Richard found a wallet on the sidewalk. Nobody was looking, so he took all of the money out of the wallet. He then threw the wallet in a trash can.
Next, participants chose whether they thought it more probable that Richard was either (a) a teacher or (b) a teacher and XXXX. We manipulated XXXX between subjects. XXXX was either “a Christian” (n 26), “a Muslim” (n 26), “a rapist” (n 26), or “an atheist (someone who does not believe in God)” (n 27).

So the authors are relying on the conjunction fallacy of Tversky and Kahnemann (1983) - logically, the probability of being a teacher and [Y] is less than or equal to the unconditional probability of being a teacher, for all values of [Y]. People sometimes get this the wrong way around if the behaviour is associated with the trait. That is what the authors are trying to test (I think). They report that the proportion of people who answered (wrongly) that the person was more likely to be a teacher and an atheist was higher than the proportion who answered (wrongly) that the person was more likely to be a teacher and a Christian.

The first thing that should make alarm bells start ringing in your head is the way the question is phrased. To say 'are atheists untrustworthy?' is to ask the probability of being untrustworthy given you're an atheist. But the question implicitly being asked in the survey is something different, namely the probability of being an atheist given you're untrustworthy. These are not the same thing!!!! And this is really going to screw up the inferences.

If statistics bore you, let me skip to the punchline - the authors screw it up because they're not taking into account that there's tons of atheists and very few rapists. This means that the probability of being an atheist given you're untrustworthy is always going to be much higher than the probability of being a rapist given you're untrustworthy. But this says nothing at all about trustworthiness, and everything about how rare it is that a person is a rapist! And this makes the whole study flawed.

For stats people, what is actually being asked is whether people erroneously believe that:
P(teacher | Untrustworthy actions) < P(teacher AND atheist | Untrustworthy actions).

This answer is then compared to answers to the question as to whether:
P(teacher | Untrustworthy actions) < P(teacher AND rapist | Untrustworthy actions).

Since the left hand side is the same in each inequality, let's think about what could drive differences in the right hand side (even if people are screwing it up via the conjunction fallacy, this is still the implicit comparison). Using Bayes Rule:

$\frac{P(A_1|B)}{P(A_2|B)} = \frac{P(B|A_1)}{P(B|A_2)} \cdot \frac{P(A_1)}{P(A_2)}.$

(where A1 = Teacher and Atheist, A2 = Teacher and Rapist, and B = Untrustworthy).

Let's ignore the teacher bit for simplicity (it doesn't change the logic). What the author really wants to know is the second ratio - are people viewed as more likely to be untrustworthy given they're an atheist, relative to being untrustworthy given they're a rapist.

What they're actually measuring is the first ratio: the probability of being an atheist given you're untrustworthy versus the probability of being a rapist given you're untrustworthy.

But the difference between the two ratios is also driven by the third ratio - the overall probability of being a rapist versus an atheist, regardless of whether you're untrustworthy.

And this ratio is huge! The study was done at the University of British Columbia. According to Wikipedia, 42.2% of Vancouver is atheist. What's the probability of being a rapist? The overall rate of rape crimes in Canada is 0.016 per 1000 people. As long as each rape is only committed by one rapist, this will overstate the probability of being a rapist (i.e. if a rapist has multiple victims, the probability of being a rapist will be lower. If a victim is raped by multiple people in a single rape, the number will be higher, however)

So the third term is equal to 42.2/0.0016 = 26,375! In other words, suppose that people thought that you were 1000 times more likely to be untrustworthy if you were a rapist than an atheist (i.e. the second ratio equals 1/1000). The left hand side will be equal to 26375/1000 = 26.375. In other words, P(atheist | untrustworthy) will always be much higher than P(rapist | untrustworthy), even if rapists are considered far less trustworthy than atheists.

The authors only report the proportion of respondents who made the conjunction error - in other words, they report the number who state that P(teacher | Untrustworthy actions) < P(teacher AND Y | Untrustworthy actions), which is clearly wrong, and compare this for different values of Y. Sadly, this doesn't allow us to say anything about the real ratio, which is P(Untrustworthy | Atheist) versus P(Untrustworthy | Rapist).

In other words, the study is unsalvageable if you're trying to answer the question you're hoping to ask. Which is a shame, because it's actually an interesting question.