Wednesday, May 8, 2013

Hate Generalisations? You Probably Just Hate Statistics

One of the most oft-repeated nonsense claims by a certain type of low-wattage intellectual lefty is that one 'shouldn't generalise'. (For reasons that are worthy of a separate post', this seems to me to be reasonably correlated with people who also proudly announce that they 'don't judge').

Apparently, one of the Worst Things In The World you can do is to notice that information about the generality of a distribution may useful in predicting where a specific point in the distribution will lie.

For those people that don't like to 'generalise', I wonder what, if any, statistical measures they actually find interesting or legitimate.

What is an average, if not a statement that lets one generalise from a large number of data points to a concise summary property about all of the points combined? Or a standard deviation? Or a median?

The anti-generalisers tend to apply their argument ('assertion' is probably a better description) in two related ways, varying slightly in stupidity:

a) One should not summarise a range of data points into a general trend (e.g. 'On average, [Group X] commits murders at a higher rate than [Group Y]').

b) One should not use a general trend to form probabilistic inferences about a particular data point (e.g. 'Knowing statement a), if I also know that person A is in Group X, and person B is in Group Y, I should infer that person A has a higher probability of committing a murder than person B').

Version a) says you shouldn't notice trends in the world. Version b) says you shouldn't form inferences based on the trends you observe.

Both are bad in our hypothetical interlocutor's worldview, but I think version b) is what particularly drives them batty.

But unless you just hate Bayesian updating, the two statements flow from each other. b) is the logical consequence of a).

Now, this isn't a defence of every statement about the world that people make which cites claims a) and b). To a Bayesian, you have to update correctly.

You can have priors that are too wide, or too narrow.

You can make absurd mistakes that P(R|S) = P(S|R).

You can update too fast or too slowly based on new information.

And none of this has even begun to specify how you should treat the people you meet in life in response to such information.

None of my earlier statements are a defence of any of this. The first three are all incorrect applications of statistics. The last one is a question about manners, fairness, and how we should act towards our fellow man.

But there's nothing wrong with the statistical updating.

If your problem is with 'generalising', your problem is just some combination of 'the world we live in' and 'rationality'.

Suppose the example statements in a) and b) made you slightly uncomfortable. Let me ask you the following:

What groups X and Y did you have in mind when I spoke about the hypothetical murder trends example? Notice I didn't specify anything.

One possibility that you may be thinking I had in mind was that X = 'Blacks' and Y = 'Whites'. People don't tend to like talking about that one.

In actual fact, what I had in mind was X = 'Men' and Y = 'Women'. This one is not only uncontroversial, but it almost goes without saying.

As it turns out, both are true in the data.

Do inferences based on these two both make you equally uncomfortable? Somehow I doubt it.

And if they don't, you should be honest enough to admit that your problem is not actually with statistical updating, or 'generalisations'. It's just trying to launder some sociological or political concern through the action of browbeating the correct application of statistics.

So stop patronisingly sneering that something is a generalisation, and using that as an implied criticism of an argument or moral position. Otherwise zombie Pierre-Simon Laplace is going to come and beat yo' @$$ with a slide rule.

No comments:

Post a Comment