Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

Friday, December 11, 2020

Last Thoughts on Voter Fraud

Winston Churchill once observed that a good definition of a fanatic was someone who can’t change his mind, and won’t change the subject.

On the subject of voter fraud, I like to think that I meet neither arm of the test.

On the first part, I feel like I’m definitely open to having my mind changed, but not many people engage with the better evidence on the subject, so I don’t often hear good arguments to the contrary. Then again, every fanatic on every topic feels the same way, so perhaps this doesn’t distinguish me very much.

But I can at least make sure I don’t fall foul of the second arm. Few things in this life, even if true, are worth driving away those near and dear to you, having friends of long standing view you as some crank and lost cause obsessive. My twitter feed the past month has been that of a single issue kook, which has gained me a lot of new followers, but I never really wrote to build a large audience, and definitely wrote for the sheer joy of being able to say whatever was on my mind, not for advancing a single cause.

To know if you’ve started to become viewed as a crank, you have to listen to the silences – the friends that don’t respond to your whatsapp messages when you send them something on the subject, the people on twitter who used to engage that you haven’t heard from for a while. You don’t have to change your beliefs about the election because others don’t agree with you, but you do need to value your audience, especially when they are friends and loved ones.

In finance, most trades are essentially neutral – if you buy a stock, and nothing happens, you stay flat. However, a famous trade in foreign exchange is the carry trade – borrow in low interest rate currencies, and invest in high interest rate currencies. There, if nothing happens to the exchange rate, you win (on the difference in interest rates). This term, “carry”, gets used broadly to describe any such trade with this property, where you win by things staying the same. An anti-carry trade is thus the opposite. If nothing happens, you lose.

Since the Wednesday morning after the election, it has been quite clear that Biden had a strong carry trade, and Trump had an anti-carry trade. Something fairly large had to happen to change the answer. The Supreme Court case with Texas was my last bet on what that something large might be. Related to my post earlier this year on how Republicans can’t get their appointed judges to stay conservative, the answer was depressing, if not surprising. The number of ways the outcome can change at this point is small, most of them would be highly alarming if they occurred, and not many of them seem to hinge upon a great new empirical analysis of voter fraud being written by me.

So having written much on the subject, this is my coda to the past month’s thinking, at least for the time being. Like the Dylan poem to which the title is an homage, it’s not that the issue is suddenly dead, it’s just a way of collecting one’s thoughts and drawing a line under a chapter that seems to be coming to a close. I will probably have more to say on the subject, like every addict, but the time for being a single issue author is passed. Please bear with me even if you feel heartily sick of the subject. I have spent an extraordinary amount of time thinking about these issues over the past month, and I feel confident I may yet be able to tell you something new, the things that at least I didn’t know before I started out. Without further ado, they are as follows.

The average American believes three things about voter fraud in his country.

First, he believes that there is very little of it, perhaps almost zero, and certainly not enough to swing an election.

Second, he believes that if there were a reasonable amount of it in general, he would have heard about it, from experts on the subject.

Third, he feels that if any single election had been fraudulent, said experts would be able to identify such fraud and bring it to light before it was able to decide the election outcome.

I am not going to have much to say about the first point, at least not directly. I suspect that by this juncture, the number of people who haven’t made up their mind about this is very small. My firm belief is that one’s priors on this should be quite wide, but that’s another subject.

Rather, I want to convince you that the second point, and especially the third point, are wrong.

While I don’t want to inflate my credentials here, I am one of those fortunate people (or unfortunate, depending on perspective) whose skills and training puts them in a good position to actually be able to empirically study the question of voter fraud. There are few academic papers on the subject that I would not back myself to be able to read and understand.

I have spent almost the entire past month digging into various ways of trying to find voter fraud. Much of that work has been out of the public eye, and not all of it was ever released officially to anyone. This is how data digging works – you do a lot of analysis for everything you actually write, in the “measure twice, cut once” manner.

And I can tell you, as someone who’s hunted very hard for it – voter fraud is extremely difficult to prove using only public data, whether it actually happens or not.

To which you might immediately think – that’s because there isn’t much voter fraud!

On the contrary. It is not at all difficult to find extremely alarming and weird anomalies in election data.

A good working definition of fraud is “wrong data entered for malicious reasons”. The big challenge is that a good working definition of data errors is “wrong data entered for innocent reasons”.

The extremely hard part is thus not finding anomalous and suspicious patterns in the data, but proving with certainty that these arise due to malicious intent. Moreover, one has to rule out every possible innocent reason these errors could arise, where the functional form of errors is allowed to be incredibly vague. Further still, the counties and election officials are given almost every single benefit of the doubt. Moldbug is right on this point. The sovereign is he who determines the null hypothesis.

One can very easily find loads of extremely suspicious things in the data.

One can find 169 updates in the New York Times county-level election update data where the vote count in one category (in-person or absentee) actually decreased in an update. Here is one of the most suspicious, in Montgomery County, PA which still hasn’t been well-explained. You have not even heard of the remaining 168. Here’s the count by state:


    state |      Freq.     Percent        Cum.
------------+-----------------------------------
         AL |          1        0.59        0.59
         AR |         12        7.10        7.69
         AZ |          5        2.96       10.65
         FL |          3        1.78       12.43
         GA |         24       14.20       26.63
         IA |         20       11.83       38.46
         ID |          1        0.59       39.05
         IN |          1        0.59       39.64
         KS |          2        1.18       40.83
         MA |          1        0.59       41.42
         MI |         21       12.43       53.85
         MS |          1        0.59       54.44
         NH |          1        0.59       55.03
         NJ |          4        2.37       57.40
         NM |          1        0.59       57.99
         NY |          3        1.78       59.76
         PA |          9        5.33       65.09
         SC |         30       17.75       82.84
         TX |         11        6.51       89.35
         UT |          1        0.59       89.94
         VA |         15        8.88       98.82
         WI |          1        0.59       99.41
         WV |          1        0.59      100.00
------------+-----------------------------------

Several of the disputed and contentious states are heavily represented – Georgia, Michigan, Pennsylvania. But so are places you haven’t heard of. Arkansas. Virginia. Iowa. South Carolina.

(By the by, through my various digging, Virginia is my bet for “state with the most election fraud in 2020 that you never read about”, and not just because of the metric above)

Look at how much work went into the analysis of Montgomery PA, which covered one of these data points, trying to rule out every possible innocent explanation, and showing additional evidence that points to fraud. Do you think anyone is digging that much into the remaining 168? The NYT data can be downloaded in a bunch of places, and it's not hard to find these updates. I've looked at them, about half of them are quite small, less than 100 votes. Some of the rest look like a single set of ballots being reclassified from one category to another. But even after taking out all of these, there's a large number of these where frankly I have no idea what's going on, and I doubt you would either.

One can find vote updates that look like colossal outliers in terms of the fairly intuitive rule that updates can be either large, or unrepresentative, but not generally both. Here’s a long analysis of this. The most suspicious, in Wisconsin, Michigan and Georgia (surely a coincidence with the states identified on the metric above!), also came in the middle of the night, and were large enough to swing the election. The defenders argue that this is all just normal absentee votes. At least for Milwaukee, one can also find corroborating evidence in suspicious patterns in down-ballot races too, that at least don’t fit simple stories about mail ballots.

But suppose you don’t believe the New York Times data. That could all just be errors! Indeed. Couldn’t it all.

One can find 58 Pennsylvania registered voters born in the year 1800, 11 born between 1801 and 1899, and 25 born in 1900. Admittedly, these particular cases are more likely just errors - if this is voter fraud, it’s the stupidest form ever, since it’s going to stick out like a sore thumb. But it proves beyond any doubt that errors in this data do not get checked or corrected anywhere. And indeed, these implausible years of birth are in fact the mere tip of the iceberg of suspicious patterns in birthdays, which follow much more notable patterns indicating fraud involving round numbered days of the month and months of the year, plus month distributions that are too smooth. These patterns consistent with fraud are related to counties voting for Biden, including at record levels.

Or suppose you don’t believe statistics at all. You insist on hard evidence! In Wayne County, MI, you can find totally normal scenes from election night, like them boarding up the windows in the vote counting center to stop observers even seeing in. In Fulton County, GA, you had the insane spectacle that on election night, election officials sent all the observers home, telling them that counting was over for the night. In the press, dubious accounts were circulated implying that a burst pipe was the cause, although it turns out that may have been from that morning, or may not have happened at all. In any case, an hour later, they started counting again, with no observers in the room, using ballots in suitcases under a desk that had been delivered at 8:30am that day. Oh, and all this was caught on video. As part of this, you can also watch the officials scan the same set of ballots multiple times.  As has been noted before – if this were happening in a third world country, the State Department would declare it presumptively fraudulent. This isn't an exhaustive list. This is the ones I managed to remember and write down, while working furiously on other things over the whole period, and where the main allegations were actually caught on video. If you go through everything alleged in affidavits in lawsuit, many are much more shocking, though also harder to verify.

My point is not that you should believe this absolutely nails down fraud, let along how widespread you should infer the fraud to be based on these incidents. My point is to emphasise how difficult the task is, even if there were actually fraud. Fraud would look exactly like this. People switching votes back and forth to swing a total, or deleting inconvenient votes from the count. Bringing fake and colossally unrepresentative ballot dumps in during the middle of the night. Registering tons of fake voters to flood in mail ballots. Counting happening in secret after observers are sent home under false pretenses. Reports coming in from whistleblowers in affidavits.

But how sure are you that these aren’t just data errors in very noisy data? That someone incorrectly entered a vote total in a database, and later corrected it? That patterns in absentee ballots, while highly weird, represent odd preferences of mail-in voters? That the ballots in Georgia were all scanned regularly, and that the machine will never count ballots twice if they’re scanned twice, and that there’s not some innocent mixup as to why everyone was sent home? That the witnesses in the lawsuits were confused about what they saw?

If every benefit of the doubt is given to the other side, what's the chances you can ever overcome them all?

Suppose, like a number of readers, you are in the category of someone who still isn’t convinced. There’s some weird stuff going on, sure, but it doesn’t rise to the level of “fraud may have decided the election result”.

Three good questions to ask are the following.

1.      What kind of voter fraud do you have in mind?

2.    What evidence would actually convince you that there might have been this kind of voter fraud?

3.     What data is actually available, and based on this, how likely is it that this evidence might ever conceivably be discovered?

The first question, as it turns out, is actually the most important. Because fraud comes in many different types, and the likelihood of catching them varies enormously.

The most egregious type is to make up election returns out of whole cloth. In this version, the vote totals are plucked from someone’s head, and don’t correspond to any actual ballots or button presses in the real world.

This type is actually the most likely to get caught. Totally fake numbers leave lots of traces that can be studied by things like digit analysis via Benford’s Law. Only the most basket case third world countries do this. I think one can say with high certainty that, at the conservative end, this does not occur very often in US elections, and I would wager strongly in America does not occur at all.

The next category of obvious fraud is when some dictator reports winning 99% of the vote. Like Theodore Dalrymple observed about propaganda in communist countries, this kind of election is not actually meant to convince anyone, but rather to humiliate them, to insist on obvious lies and dare them to say differently.

But even here, most of the argument about fraud is already at the level of a smell test. Suppose you had to prove statistically that it was impossible that these election results in Cuba or Syria were genuine. How exactly would you do it? I suspect you’ll find it’s a lot harder than you might think. Bear in mind, in 2020 the “Norristown 2-2” precinct in Montgomery County had reported mail-in votes up to November 10th where Biden had won 98.7% of the two-party vote, across 150 votes. Please tell me how you plan to show that this number is genuine, yet Assad’s 88.7% of the vote is not. Not by digging up the raw ballots (though even here, if Assad can produce his fake ballots, you may still be out of luck). From your computer, which is what nearly all of us have had to do.

Or put it differently. Suppose that Assad in Syria decided to rig the elections, but instead of generating insane levels of support, he decided to replace all the genuine ballots with fake ones that showed him getting support levels between 60% and 71%, with turnout at 70% of the electorate. He has total control of the vote counting process.

You know this is bullshit. But that’s not the question. How would you go about proving it?

Almost anything below the first two cases – making up numbers whole, or 90% vote shares – is actually extremely difficult to prove, even if it’s occurring. I mean, he kicked out the observers, which is pretty bad. But so did Fulton County, GA, and kept on counting.

Let’s take some scenarios more likely to actually occur in the US.

You are an election official who is not being closely monitored. There is a list of eligible voters in your precinct. Suppose it is a normal year, with relatively few absentee/mail ballots. You have hidden a genuine ballot box of pre-filled in ballots, with genuine ballot papers, that you know contains 1000 votes total, of which 97% are for your candidate. All registered voters in your precinct are on a list, and get crossed off as they come in. You wait until polls close, and you can see the list of everyone who hasn’t voted. You cross 1000 names off the list, and bring in your pre-filled in box of ballots, mingling it with the main ones.

How do you propose to identify that in the data? If you had periodic updates, you can maybe find batches that look really anomalous, sure. That’s what this analysis did! And this one! The scenario wasn’t exactly the same, but it was similar. Did you find it sufficient proof?

In this particular variant, every voter is a genuine, registered voter. Every voter votes exactly once. Every ballot paper is a genuine ballot. Every vote corresponds to a ballot paper that can be counted and re-counted. No ballot gives any indication it was not cast by a genuine voter.

Let us agree on this much. Unless you catch the person in the act, this will be flat out impossible to detect just by looking at final election results. I actually don’t know how you’d prove it with any other data either. Don’t believe me? Propose a test. I’m all ears. I have heard stories from campaign operatives that this actually happens, I didn’t think up this idea myself.

But I’m not here to convince you to believe those stories. Suppose one accepts, as indeed you’re told, that there is no evidence of this kind of voter fraud. It’s true. There broadly isn’t. Now, ask yourself, what’s the signal to noise ratio of this kind of lack of evidence? If there were no voter fraud of this kind, we’d expect to find no evidence. If there were voter fraud of this type, but we lacked any realistic ability to catch it, we would also expect to find no evidence. So the lack of evidence tells us almost precisely zero one way or the other.

Especially germane to the current election, there are many types of fraud involving mail ballots. It is much easier for a person to send in mail ballots for someone else, than to turn up at a polling station and claim to be five different people of different ages. This mail then gets handled by postal workers, with a crazily weak chain of custody, from the same people that lead to your Amazon packages being stolen with reasonable frequency. This leads to a number of stories you can find for the search string “ballots found in the trash”. Meanwhile, signature verification on potentially fraudulent ballots got greatly weakened in 2020 in many of the key states, just as the number of mail ballots increased massively, as described in the Texas lawsuit. A discussion I had with a campaign operative (which I haven’t been able to verify, so I’m just reporting the claim, not asserting it) said that in Arizona, once the signature was verified on the envelope, the envelope got thrown away, making it impossible for anyone to verify after the fact what it said.

Don’t think about “was there fraud”. I’m not interested in the question of haggling over the specific details here of what precisely happened in each place, and you can make up your own mind on that. Rather, I care much more about the question of “if there were fraud, would it have been caught?”

And here’s the crazy part, if you’re sure that election fraud in general would have been caught. 2020 is actually the single best year in history to catch election fraud. Because unlike in the past, we have periodic snapshots taken by internet amateurs of the update of counts scraped from the NYT website, rather than just the final tally. We can also download a ton of stuff from the internet.

For most past elections, we can get final vote counts at the precinct level if we’re lucky, or the county level more likely. Votes by candidate. That’s it. You want to go back and find out if the 2016 election was fraudulent, that’s basically the overwhelming extent of the data you’ve got to work with. Oh, and four years later, that data is still riddled with errors, because it has to get kludged together from 3300 odd counties, with vastly different reporting systems.

Tell me what kinds of fraud you are confident you can identify from those numbers. Not just you, but “the experts” who study this stuff.

I understand enough about this data to know that while there are clearly some tricks one can do if one is clever, there are large and fundamental limitations to how much fraud you can ever hope to identify from this kind of data.

And that’s it. That’s basically what you’ve got. Or you can hope that someone does something dumb and gets caught in the act. But is that the state of the art strategy? How many would slip through the net for each one that gets caught, like in Fulton County GA? Not that anything is going to happen to the people in Fulton County, which also is quite revealing. In a year, I predict fairly confidently it will be one more rumored and then forgotten local story, and the videos will eventually disappear. Along those lines, if more evidence does come to light, you certainly can't publish them on Youtube, no matter what you find from here on out, as they've said that their policy is to delete all such videos. Big tech has spoken! The matter is closed. There is no evidence of voter fraud, and also, you had a total of four weeks to come up with any of it, before the verdict is entered for all time. 

I think there is a strong case to be made that, for many types of fraud, catching them is extremely difficult.

And so almost the entire question comes down to one of priors. We have no reasonable hope of actually identifying it from the data. Most people are sure it is extremely rare. I am not. The evidence demanded to budge their priors is enormous. That evidence will never be found, whether there is fraud, or whether there is no fraud.

And so finally, we get to the last question. Even if fraud could be caught, eventually, somehow, with enough time and analysis and manpower, would it be caught in time?

Reader, prepare yourself, because the next sentence may be shocking to you. 

The Trump campaign, in many respects, was not very well organized.

But I have come to have enormous sympathy for the sheer scale and difficulty of the task in front of them, even if they were well organized.

A campaign is not a permanent organization, but a bunch of operatives coming together for a particular period and task. I suspect, and it accords with the few anecdotal discussions I’ve had with people who’ve worked on them, that most presidential campaigns are a shitshow at the best of times, but some candidate has to win, so we assume after the fact that their campaign internally must have been great, when it probably wasn’t.

So what happens after the dubious election returns start coming in in the dead of night on Wednesday after the election?

You have a small staff. Most of it is lawyers and political operatives, not statisticians and data scientists. Everyone is absolutely frazzled. You are trying to put out a thousand fires. You are trying to coordinate dozens of people and teams. Everyone is demoralised and worrying about their employment future, since most were working on an implicit promise of employment in the administration if they won, which is now looking unlikely. You are trying to keep track of ten thousand different leads and reports coming in from all over the country. Half of them will be straight up wrong, either bogus third hand accounts, or claims from someone genuinely concerned but insufficiently skeptical and not probing into alternatives. Avoiding this is actually quite hard, to be honest. When one really wants to find fraud (or indeed any empirical result) it is psychologically difficult to then switch gears to convincing oneself of all the ways the hypothesis could be false, and then trying to find evidence of that.

Of the other half of the leads, perhaps 80% will be plausible, but either inconclusive, or admitting of multiple interpretations. Of genuine ones, they may be contained in a two hour video that’s not very well explained, and you don’t have time to watch the whole thing. They may be written down in some long technical piece that you don’t have the training to follow entirely, or which doesn't explain clearly what its doing. Even if you think it seems legit and you understand what it’s doing, you have to take a gamble that it’s not a coding error or bad data cleaning or some other screwup. They may be some anonymous whistleblower that you have to spend resources to try to find out if they’re fake or well-intentioned, if they’re right or wrong, if their claims are provable or unprovable.

Now, you have to figure out, can I get this in an affidavit? Is this author willing to go public? Will this convince a judge? Can I get an expert witness to testify, assuming a judge is even interested in hearing evidence, which often they're not? As far as I can tell, the statistical analyses I liked the most were all written pseudonymously. It is not a surprise that they didn’t find their way into the major lawsuits. The Williams professor who did a god damn confidence interval for the Matt Braynard analysis got dragged in the papers by his utterly contemptible colleagues. The chances that they would do this if he’d computed a confidence interval for literally any other survey in history are zero. Are you surprised that more people aren't signing up to put their professional reputations on the line for what's almost certainly a Hail Mary, and which won't even benefit them personally?

But even if you can find an expert willing to go public, how long do they have to generate such a report? You need to scramble to scrape and download the data straight away from lots of sources, and start analyzing it. Find the weird anomalies, dig into them, try to figure out which ones might be errors. Think of different ways to test them. Think of different data you might get that would corroborate this. Manually do more gathering, and cleaning, and merging. Think of which things might rise above the metric of “dubious” to “very hard to explain with anything other than fraud”. Run the results. Double check the results. Triple check the results, because if you start making false claims, you’ve actively hurt the cause (and you’ll feel like a total fool and fraud). Start writing the results up. Refine the writeup to make it less jargon-y. Try to balance the tension between “easily accessible to public readers”, “understandable to smart but busy and innumerate lawyers” and “detailed enough to withstand public scrutiny by hostile experts or readers”. Also, there’s dozens of different investigative angles you can take. Each one takes a few days or a week to look into, let alone write up, let alone actually get published. You’re pulling 80 hour weeks, but even so, there’s not many weeks you have. How many such analyses can you write? Meanwhile, you're working against the clock without knowing quite what the deadline is for "too late to matter", but you know it can't be very long. 

Now, consider the media environment you are operating in, if you are the Trump team. The same media that in 2016 was willing to report uncritically every breathless allegation of Russian interference, that was willing to circulate as evidence a single anonymous dossier of allegations about Trump and treat it as a basis for campaign wiretaps and impeachment, now is loudly insisting that a) the race is over, and b) “experts assure us there is no voter fraud”. Meanwhile, on the rare occasions they do report on the matter, they only focus on the most ludicrous witness statements and the most easily debunked claims. These are sure to circulate widely, so that by the time previously open-minded readers get around to seeing actual good evidence, they’re largely exhausted and cynical, and often won't even read it.

Partly for the fun of trolling, and partly just as an experiment, I started asking the Montgomery County twitter account, and its commissioner in charge of the election, Ken Lawrence Jr, why it was that their county looked so crooked on multiple dimensions, both in terms of having the most suspicious vote update in America, and the third most suspicious set of voter birthdays among Pennsylvania counties. They never answered. I tried poking newspaper reporters from multiple papers. Most didn’t bite. Ross Douthat, to his credit, linked to the Montgomery piece, admittedly in a one-liner in his NYT article on how weird it is that these kooks believe in conspiracy theories. I asked him in multiple places – have you, or any other journalist, actually just asked these guys in Montgomery County what their explanation is for it? Even just to get a response on the record? No dice. Nobody was interested. Hell, I couldn't even get a response out of the Pennsylvania Republican Party twitter account!

I didn’t really expect anything different, so my demeanour was mostly one of trollish entertainment, rather than disappointment. But at the end, even I found myself more cynical than I expected.

If you are Republican, and alleging voter fraud by the Democrats, the media will be actively opposed to you at every single step. How could they not be? These are the same people that have been writing about how Trump was Hitler for the past four years. Does any reasonable person expect them to voluntarily start digging into stories that might make Trump actually get another four years, when they can just turn a blind eye and end it all? Besides, if they start being called a voter fraud truther, it will be disaster for their career.

There is one more piece of the puzzle worth noting.

How many people do you think there actually were working on this, total, over the past month? At least on the data side?

The average person probably assumes that there must have been thousands of highly paid professionals working on it.

I estimate that the number is perhaps 40 at the high end, and maybe as low as 20. (If the sides had been flipped, it would definitely be more, perhaps a lot more, but I don't know). I’d estimate that nearly all of them were volunteers juggling other full time jobs. I personally knew about ten of them working on analysis, and there were a number of other excellent people helping enormously with data gathering and processing. 

That's it. That's the full extent of resources around the world that have gone into investigating from a statistical point of view whether the 2020 election may have been decided by fraud. With the time and resources available, it's remarkable we found as much as we did.

At least personally, I never really expected to change the outcome. The task was basically impossible, but damn it, we worked until the end anyway.

This is all one can ever do. 

To live not by lies, as Mr Solzhenitsyn put it.

And to fill the unforgiving minute with sixty seconds worth of distance run, as Mr Kipling put it.

To the ten, and to all those I know who helped  in the effort – friends, it was a true honour and pleasure to work with you.

Sunday, July 2, 2017

On the time-series, the cross-section, and epistemic humility

One of the advantages of the economist's training is just the ability to instinctively think in terms of empirical tests. There's a reason that economists have tended to colonise other fields like sociology, law and politics - as much as anything, it comes from knowing about how to design proper empirical tests, and what good identification is.

Perhaps more importantly, it comes from knowing what poor identification is, including basic issues of endogeneity, reverse causality and omitted variables. For instance, knowing that you cannot infer almost anything about the relationship between prisons and crime just by looking at the variation over time in the total number of prisoners and the total number of criminals in a society.

The gold standard for identification, of course, is pure randomisation. When that isn't available, as it usually isn't outside a laboratory, you go for natural experiments - where something almost exogenous occurred. This gets used as an instrument - in the prisons and crime case, for instance, Steve Levitt used ACLU prison overcrowding litigation as a quasi-random shock to the prison population.

Of course, the pendulum swings back and forth. If the initial identification push corrected the free-for-all of 1980's empirical work (regressing the number of left shoes on the number of right socks!), the subsequent view seems to have gone towards 'identification uber alles'. In other words, the only question of interest is whether you've really truly identified totally exogenous variation, not the importance of the underlying topic. Plus, oddly, very few people seem willing to learn from an imperfect instrument. It seems to me that if there are 100 potential explanations, and an imperfect instrument rules out 90 of them, then we've learned something quite valuable - the answer is either the main hypothesis, or one of the remaining 10. But making the perfect the enemy of the good seems to be the way of things these days.

These questions end up being most important when you actually run empirical tests. Me, I'm lazy - there is a large hurdle for me to actually download a dataset and start fiddling with it. I'm always impressed by people like Audacious Epigone and Random C. Analysis who do this stuff all the time.

But there is another aspect of empirical training that ends up being even more useful for the computationally lazy - helping you sort through hypotheses just by knowing the panel nature of the dataset.

In particular, a lot of questions that purport to be about a time-series are really about a panel. That is, it's not really about a single variable over time (the time-series), it's about a group of different individuals over time. And thinking of the cross-section simultaneously with the time series greatly clarifies a lot of things. That is, instead of just coming up with hypotheses about why the average of some variable as changed over time, think about whether this hypothesis would also be able to explain which individuals would change more or less, or would be higher or lower on average.

One of my favorite examples is birthrates. The classic question is about the time-series : why have birthrates, on average, declined over time?

But there is also a cross-section - whose birthrates? This could be of individuals, or countries, or characteristics. Moreover, the cross-section exists both today, and in the past. If you're too lazy to run a regression and just wanted to sort through hypotheses about the time series with help from your friend William of Occam, one rule of thumb might be as follows: A good variable explains both the time series and the cross section. A mediocre variable explains the time series, but not the cross-section. A bad variable explains the time-series, but predicts the cross-section in the wrong direction.

For instance, suppose I wanted to know why birthrates in the west had declined overall. Here's the US Total Fertility Rate over time



You might look at that graph, and notice a big decline starting in the early 1960s. Certain hypotheses start suggesting themselves. What was going on in the 60's? Feminism? The Pill, approved by the FDA in 1960?

Quite possibly. But what hypotheses would come to mind if instead I showed you this graph:


The US is now in green. But we've also plotted New Zealand, in purple, Sweden, in light blue, and Japan in pink.

I personally would be astonished if your reaction isn't at least partly like mine, thinking that the world is actually quite a lot more complicated than you'd bargained for. Sweden was actually rising in the early 1960s, and New Zealand was higher than the US for a long time. Japan, meanwhile, has a completely different picture altogether.

You can add any number of these together at the excellent World Bank website to test whatever theory you have.

But there are other cross-sections within countries which can be used to test theories further.

Suppose I were primarily interested in the west, and my theory was about a rising cost of having children. It's a lot more expensive to raise children than it was in the past, as you have to pay for daycare, and "good schools", and all this other stuff, because societal expectations are higher.

I certainly hear people with kids complaining about cost all the time, which tells me that maybe there's something to it.

In the first place, I don't think this hypothesis stands up well to an actual examination of the data above. It turns out there's nothing quite like downloading the bloody data and plotting it to explode a lot of preconceptions. This probably actually is the most basic economist's tool of all, to be honest. Because the US graph above shows that not only did birth rates in general go down, but they went down precipitously starting in the 1960s, hit their low point around 1975, and have been slightly rising since then. Of course, since the graph only starts in the 1960s, you have to be wary of giving prominence to this date alone for the initial decline. But still, does this look like a graph of what you expect the cost of raising a child was?

I'd guess not, but with something hard to measure like "the cost of raising a child", who knows, maybe it is. After all, the aggregate time series is always hard. We only have one run of history, and lots of things are changing at once. But the cross-section has a lot more data.

For instance, consider the cross-section of rich and poor. At least in 2000, here's how they looked:


In other words, the poor have more children than the rich.

This immediately doesn't sound like a cost story. Even if the cost has gone up for everyone, why are the rich less able to bear it than the poor? Even if you think that the poor are on welfare so they don't care about the cost, as long as the rich have more money, they still are better placed to be able to deal with it. Are children some sort of inferior good, getting substituted for jetskis as income rises?

And there's another aspect - there's a historical cross-section as well. I suspect, though it turns out to be harder than I thought to find an easy citation for this, that this disgenic pattern in birthrates with respect to income is a relatively recent phenomenon. Certainly in pre-history or in polygamous societies, only the rich could afford to have large large families (or multiple wives, in the polygamy case). When the Malthusian limit binds, access to resources matters, and the rich outbreed the poor.

I've written before about how I think improved birth control is a big part of the story. But doubt it not, this does not much better at explaining the current cross section as a cost story. That is, it approximately fails to predict it at all (making it mediocre by my rule of thumb), rather than predicting it in the wrong direction like costs do. To get birth control to explain the cross-section of income as well, you'd need to believe that the poor are unable to afford birth control, but are able to afford the resulting children. Seems hard to square to me.

Cost, incidentally, also has similar cross-sectional problems when it comes to the increase in obesity. Leftists love the 'food desert' explanation, whereby the poor are forced to become obese because the stores in their area don't have enough fresh fruits and vegetables, and hence the only options to them are potato chips and coke.

Again, from a cross-sectional point of view, this is possible. But it's a disaster from the time-series point of view. As society in general has gotten richer, we've also gotten fatter. How is it that the poor today "can't afford to eat healthily", but the poor in the 1930's could?

So explanations have to get more complicated. There's no rule of nature that everything has a single explanation. I've picked fertility and obesity because they are two of the most stubborn problems facing the west today, which suggests, but does not guarantee, that they may not be amenable to a single simple explanation.

I think there's something quite humbling about looking at the totality of the data, because it rarely looks like any one neat explanation of anything. It reminds you that your models of the world are just that - models. You include what you think are the most important parts, but you leave out lots of other stuff too. Even if you're right on what's important (a big if), the world is a large and complicated place.

Friday, November 7, 2014

They're all IQ tests, you just didn't know it

Here's one to file under the category of 'things that may have been obvious to most well-adjusted people, but were at least a little bit surprising to me'.

Many people do not react particularly positively when you tell them what their IQ is, particularly when this information is unsolicited.

Not in the sense of 'I think you're an idiot', or 'you seem very clever'. Broad statements about intelligence, even uncomplimentary ones, are fairly easy to laugh off. If you think someone's a fool, that's just, like, your opinion, man.

What's harder to laugh off is when you put an actual number to their IQ.

Having done this a couple of times now, the first thing you realise is that people are usually surprised that you can do this at all. IQ is viewed as something mysterious, requiring an arcane set of particular tasks like pattern spotting in specially designed pictures, which only trained professionals can ascertain.

The reality is far simpler. Here's the basic cookbook:

1. Take a person's score on any sufficiently cognitively loaded task = X

2. Convert their score to normalised score in the population (i.e. calculate how many standard deviations above or below the mean they are, turning their score into a standard normal distribution). Subtract off the mean score on the test, and divide by the standard deviation of scores on the test. Y = [ X - E(X) ] / [ σ(X)]

3. Convert the standard normal to an IQ score by multiplying the standard normal by 15 and adding 100:
IQ = 100 + 15*Y

That's it.

Because that's all IQ really is - a normal distribution of intelligence with a mean of 100 and a standard deviation of 15.

Okay, but how do you find out a person's score on a large-sample, sufficiently cognitively-loaded task?

Simple - ask them 'what did you get on the SAT?'. Most people will pretty happily tell you this, too.

The SAT pretty much fits all the criteria. It's cognitively demanding, participants were definitely trying their best, and we have tons of data on it. Distributional information is easy to come by - here, for instance. 

You can take their score and convert it to a standard normal as above - for the composite score, the mean is 1497 and the standard deviation is 322. Alternatively you can use the percentile information they give you in the link above and convert that to a standard normal using the NORM.INV function in excel. At least for the people I looked at, the answers only differed by a few IQ points anyway. On the one hand, this takes into account the possibly fat-tailed nature of the distribution, which is good. On the other hand, you're only getting percentiles rounded to a whole number of percent, which is lame. So it's probably a wash.

And from there, you know someone's IQ.

Not only that, but this procedure can be used to answer a number of the classic objections to this kind of thing.

Q1: But I didn't study for it! If I studied, I'm sure I'd have done way better.

A1: Good point. Fortunately, we can estimate how big this effect might be. Researchers have formed estimates of how much test preparation boosts SAT scores after controlling for selection effects. For instance:
When researchers have estimated the effect of commercial test preparation programs on the SAT while taking the above factors into account, the effect of commercial test preparation has appeared relatively small. A comprehensive 1999 study by Don Powers and Don Rock published in the Journal of Educational Measurement estimated a coaching effect on the math section somewhere between 13 and 18 points, and an effect on the verbal section between 6 and 12 points. Powers and Rock concluded that the combined effect of coaching on the SAT I is between 21 and 34 points. Similarly, extensive metanalyses conducted by Betsy Jane Becker in 1990 and by Nan Laird in 1983 found that the typical effect of commercial preparatory courses on the SAT was in the range of 9-25 points on the verbal section, and 15-25 points on the math section. 
So you can optimistically add 50 points onto your score and recalculate. I suspect it will make less difference than you think. If you want a back of the envelope calculation, 50 points is 50/322 = 0.16 standard deviations, or 2.3 IQ points.

Q2: Not everyone in the population takes the SAT, as it's mainly college-bound students, who are considerably smarter than the rest of the population. Your calculations don't take this into account, because they're percentile ranks of SAT takers, not the general population. Surely this fact alone makes me much smarter, right?

A2: Well, sort of. If you're smart enough to think of this objection, paradoxically it probably doesn't make much difference in your case - it has more of an effect for people at the lower IQ end of the scale. The bigger point though, is that this bias is fairly easy to roughly quantify. According to the BLS, 65.9% of high school graduates went on to college. To make things simple, let's add a few assumptions (feel free to complicate them later, I doubt it will change things very much). First, let's assume that everyone who went on to college took the SAT. Second, let's assume that there's a rank ordering of intelligence between college and non-college - the non-SAT cohort is assumed to be uniformly dumber than the SAT cohort, so the dumbest SAT test taker is one place ahead of the smartest non-SAT taker.

So let's say that I'm in the 95th percentile of the SAT distribution. We can use the above fact to work out my percentile in the total population, given I'm assumed to have beaten 100% of the non-SAT population and 95% of the SAT population
Pctile (true) = 0.341 + 0.95*0.659 = 0.967

And from there, we convert to standard normals and IQ. In this example, the 95th percentile is 1.645 standard deviations above the mean, giving an IQ of 125. The 96.7th percentile is 1.839 standard deviations above the mean, or an IQ of 128. A surprisingly small effect, no?

For someone who scored in the 40th percentile of the SAT, however, it moves them from 96 to 104. So still not huge. But the further you go down, the bigger it becomes. Effectively you're taking a weighted average of 100% and whatever your current percentile is, and that makes less difference when your current one is already close to 100.

Of course, the reality is that if someone is offering these objections after you've told them their IQ, chances are they're not really interested in finding out an unbiased estimate of their intelligence, they just want to feel smarter than the number you told them. Perhaps it's better to not offer the ripostes I describe.

Scratch that, perhaps it's better to not offer any unsolicited IQ estimates at all. 

Scratch that, it's almost assuredly better to not offer them. 

But it can be fun if you've judged your audience well and you, like me, occasionally enjoy poking people you know well, particularly if you're confident the person is smart enough that the number won't sound too insulting.

Of course, readers of this august periodical will be both a) entirely comfortable with seeing reality as it is, and thus would nearly all be pleased to get an unbiased estimate of their IQ, and b) are all whip-smart anyway, so the news could only be good regardless.

If that's not the case... well, let's just say that we can paraphrase Eliezer Yudkowsky's advice to 'shut up and multiply', in this context instead as rather 'multiply, but shut up about it'.

The strange thing is that even though people clearly are uncomfortable having their IQ thrown around, they're quite willing to tell you their SAT score, because everybody knows it's just a meaningless test that doesn't measure anything. Until you point out what you can measure with it. 

I strongly suspect that if SAT scores were given as IQ points, people would demand that the whole thing be scrapped. On the other hand, the people liable to get furious were probably not that intelligent anyway, adding further weight to the idea that there might be something to all this after all.

Monday, July 14, 2014

Lionel Messi and Soccer Equilibrium Outcomes

So another World Cup has come and gone. Enough water had passed under the bridge that I no longer resented Argentina for their dismal performance in 2002 when I wagered on them. I was vaguely hoping for an Argentine win, just because I would have liked to see Lionel Messi win a cup.

'Twas not to be, of course.

A very good starting point for understanding Messi is this excellent post by Nate Silver going through a whole lot of metrics of soccer success and showing that Messi is not only an outlier, he's such an outlier that his data point is visibly distinct from the rest even in simple plots. Like this one:

morris-feature-messi-1
(image credit)

Seriously, go read the whole thing. If you're apt to be swayed by hard data, it's a pretty darn convincing case.

So what happened in the World Cup? Why didn't he seem nearly this dominant when you watched him play?

The popular narrative is that there's some inability to perform under pressure - in the big situations when it really counts, he doesn't come through with the goods. He's a choker, in other words.

This is hard to disprove exactly, but one thing that should give you pause is that with Messi on the team, Barcelona has won two FIFA Club World Cups and three UEFA championships. This at least suggests that the choking hypothesis seems more specific to World Cups.

So one explanation consistent with the choking hypothesis is that the World Cup is much higher stakes than the rest, hence the choking is only visible in that setting. It's possible, and hard to rule out.

But another possibility is that the difference comes from the way that opposing teams play against Messi in each setting.

Remember, a player's performance is an equilibrium outcome. It's determined by how skilfully the person plays that day (which everyone thinks about), but also by how many opposing resources are focused on the person (which very few people think about).

Let's take the limiting case, since it's easiest. Suppose I take a team comprised of Lionel Messi and ten guys from a really good high school team, and pit them against a mid-range club team. My guess is that Messi wouldn't perform that well there, and not just because he wouldn't have as many other good people to pass to. Rather, the opposing team is going to devote about 4 defenders just to covering Messi, since it's obvious that this is where the threat is. Throw enough semi-competent defense players on someone, and you can make their performance seem much less impressive.

Have a look at the pictures from the Daily Mail coverage of the game against the Netherlands. In one, Messi is surrounded by four Dutch defenders. In another, he's surrounded by three. The guy is good, but that's a pretty darn big ask of anyone.

In other words, Messi may be better than the rest of the Argentine players by a large enough margin that opposing teams will throw lots of resources into covering him, making it harder for him to shine. In soccer, like in martial arts reality (as opposed to martial arts movies), numbers matter. Jet Li may beat up 12 bad guys at a time, but it you try that in real life, you're on your way to the emergency room or the morgue, almost regardless of your martial arts skill.

The last piece of the puzzle for this hypothesis is the question of why this doesn't happen when Messi plays at Barcelona.

I'm a real newb at soccer (evidenced by me referring to it as 'soccer' - you can take the boy out of Australia, etc.), but my soccer-following friends can tell me if I'm right here or not.

My guess is that the rest of the Barcelona team is much closer to Messi's level of skill than the rest of the Argentine team. This means that if opposing teams try to triple mark Messi in a Barcelona game, the rest of the attackers will be sufficiently unguarded that they'll manage to score and the result will be the same or even worse than if Messi were totally covered. As a result, Messi goes less covered and scores more.

There's a reason that the sabremetricians (who tend to be among the most sophisticated of sports analysers) talk about wins above replacement. You need to think about the counterfactual of if the person wasn't there, not the direct effect of what they did or didn't do in equilibrium.

Of course, the skeptics will point out the cases where great stars did manage to indivdiually play a big role in lifting their national teams to great success. What about Maradona, they say?

This is a fair question. Sometimes you really can get it past five defenders to win a world cup. Maybe that's what a true champion would have done yesterday.

Or maybe the English just weren't marking as well as the Dutch were.

Or maybe, even more pertinent, the rest of the Argentine team in 86 was sufficiently better in relative terms that England couldn't afford to mark Maradona as hard. The effect of this, if true, would be for Maradona's performance to look more spectacular relative to the rest of his team - having a good team means less defenders on you means more heroics. And when that happens, you look individually more brilliant, leading to you getting all the credit and making it look like you won the game single-handedly. If you really were that much better than everybody else, you would be less likely to deliver a performance that showed this fact to a novice observer.

Not many people think in equilibrium terms. This is why we analyse data.

The data case, however, is clear. Viva Messi!

Monday, May 26, 2014

Lies, Damn Lies, and STD Risk Statistics, Part 2

Continued from Part 1.

If you've just joined us, we're giving a good fisking to the Mayo Clinic's worthless list of STD risk factors, namely:
Having unprotected sex. 
Having sexual contact with multiple partners. 
Abusing alcohol or using recreational drugs. 
Injecting drugs. 
Being an adolescent female 
The biggest proof that their advice is completely worthless comes from the full description of the first point, 'having unprotected sex'. At a very minimum, they don't make the most minimal distinction between vaginal, anal and oral intercourse. But even within that, the whole thing is basically a ridiculous scare campaign:
Vaginal or anal penetration by an infected partner who is not wearing a latex condom transmits some diseases with particular efficiency. Without a condom, a man who has gonorrhea has a 70 to 80 percent chance of infecting his female partner in a single act of vaginal intercourse. Improper or inconsistent use of condoms can also increase your risk. Oral sex is less risky but may still transmit infection without a latex condom or dental dam. Dental dams — thin, square pieces of rubber made with latex or silicone — prevent skin-to-skin contact.
This one I know is in the 'deliberately misleading to fool the public' category. You know why? Because they use the weasel words 'some diseases'. They then back it up with the gonorrhea example, where one-off unprotected vaginal transmission rates are high. But people don't generally stay up late at night freaking out about getting gonorrhea, do they? As a matter fact, you don't hear about it much, because it can be treated with antibiotics. What people actually worry about the most is HIV. Why not tell them about that instead?

So what are the chances of HIV transmission from unprotected vaginal intercourse with someone who is HIV positive? This is such a classic that I want to put the answer (and the rest of the post, which gets even more awesome by the way, though you may not believe it's possible) below the jump. Suppose a man and a woman have unprotected vaginal intercourse once. 
a) If the man is HIV positive, what is the chance the women contracts HIV?
b) If the woman is HIV positive, what is the chance the man contracts HIV?

Tuesday, May 20, 2014

Lies, Damn Lies, and STD Risk Statistics, Part 1

Every time I read anything about STD risks, I tend to get mightily annoyed at how difficult it is to get any useful information from the medical profession, at least in the popular press, about the actual magnitude of different types of risks. I remember talking about this problem in the case of cancer risks and smoking. Smoking causes cancer, living under power lines causes cancer, and eating burnt steak causes cancer, but they do not all cause cancer at anything like the same rate. Same thing with STDs. I sometimes find it hard to tell how much of this is because the people writing it are morons when it comes to causal inference, and how much is due to them knowing the right answer but spinning nonsense for public consumption, assuming that everyone is a child unable to make their own risk assessments. 

Let's hear from the Mayo Clinic, they're a famous hospital, surely they'll have top quality medical advice about what big ticket items to avoid. And their list of risk factors is ...(drumroll).... :
Having unprotected sex.
Having sexual contact with multiple partners.
Abusing alcohol or using recreational drugs.
Injecting drugs.
Being an adolescent female
Seriously. 

The first thing you know is that what people mostly want to know are estimated treatment effects of particular actions. If I do X, my chance of an infection go up by Y%. Instead, what you get are a mish-mash of treatment effects, correlations with prevalence, correlations with transmission rates, and absolutely nothing on relative magnitudes, all leading to answers that are just laughable.

'Abusing alcohol or using recreational drugs' is hilariously stupid, because it doesn't map to anything directly. It could be correlation, it could be treatment, it could be both, who knows. They explain it as if it's mostly a treatment effect - "Substance abuse can inhibit your judgment, making you more willing to participate in risky behaviors.". In other words, the whole of their advice is that once you're drunk, you might do other stupid stuff. So just list that stuff! Of course, there's a strong correlation between people who get drunk all the time and people who do other stupid things. At a minimum, any treatment effects are going to be wildly heterogeneous. I'm pretty sure if your Aunty Gladys has a few too many sneaky shandies, the increase in her STD risk is zero. If you're a normally sensible person and you get drunk once, the chance of you picking up an STD are similarly low, because I'm guessing that most people will be unlikely to rush out and have anal sex with strangers just because they got drunk, though obviously some will. Most of the effect that makes this a risk factor has to be straight correlation with omitted factors, namely a tendency for reckless and risky behaviour. This is marginally actionable, if it tells you to avoid sleeping with perpetual drunks, but that's about it.

'Being an adolescent female' is even more stupid. The actionable interpretation of the previous statement was that perhaps we were being given correlations with overall prevalence. But how the hell do you interpret this one then? Do you really think that 'adolescent females' have high STD rates? Of course not. They may have higher transmission rates of certain diseases relating to cervical cancer, but this is a very different proposition. In what sane ordering is this among the five biggest STD risks for the general population to worry about? What adolescent females do have is a high rate of unplanned pregnancies, and it would be greatly in their interest to start using condoms regularly. So just say that! Stop trying to sell us a bunch of bull$#!& about how they also have massively high STD risks.

Since this post is already turning into a monster, I'll be back with Part 2 in a few days.

Monday, December 2, 2013

Australia as a Triumph of Reversion to the Mean

Not many people really understand the idea of reversion to the mean in the context of genetics. If it’s discussed at all, it’s usually in terms of the rich smart guy having an idiot son who ruins the family business. But there’s more to it than that.

The first part you need to realise is that it’s often unhelpful to think of your genes as a deterministic set of instructions that will be replicated over and over in your children unless mutations.

Instead, one crude metaphorical way to think of the process of Mendelian Inheritance is that your genetic outcomes are the process of a random variable that is drawn from the joint distribution of your mother’s family and your father’s family. Combined, you can think of this as your family genetic distribution.

Your particular genes contain information both about you (i.e. the one particular realization of that variable) and the overall distribution of traits in your family (the possible range of other realizations of you and your siblings). When you have children, each child is a realization of the joint distribution of your family traits and your husband or wife’s family traits. If you have enough children, you’ll start to see the outlines of the whole distribution of possible traits – ranges of height, ranges of facial features, ranges of hair colors, etc.

So what this means is that when it comes to whether your children will be smart, the question is not just whether you and your wife are smart. The question is whether you and your wife come from families that are generally smart. If you and your wife are both smarter than the rest of your families, unfortunately your children will probably be less smart than either of you. They’ll be closer to the average of the joint distributions, whereas you two are closer to your respective maximums.

So what’s this got to do with Australia?

Australia was a society settled from the dregs of British society. Not the absolute dregs, mind you – it didn’t take too much to get the gallows in those days, but mid-level crime like larceny or burglary might get you transported. But it’s fair to say that the convicts getting transported were likely below average for Britain at the time, like most convicts in most societies.

Suppose you take a cross-section of people from the lower end of the genetic distribution and put them in an environment with British laws and institutions. What happens next?

 The crucial part is that we’ve got people who are probably below their familial averages. But these cases get the benefit of mean reversion – if you’re dumber or more aggressively antisocial than your family average, your children will be on average smarter and less anti-social than you.

Run this forward a few generations, and you’re basically back to where you started. The convict starting point still lingers a little in terms of anti-authoritarian cultural attitudes, but that’s about it. You can take the dregs of society, but the next generation won’t be the same dregs. Thankfully. Mean reversion taketh away, but mean reversion giveth as well. So while the British who were sending convicts to Australia probably thought they were going to create a permanent colony of antisocial idiots, what they actually ended up creating was Britain #2, but with much better weather. The joke’s on them, really.

The practical punch line, of course, is that if you’re worried about how your children might turn out, pay close attention to the extended family, not just your partner. A son or daughter who’s not too bright but who has lots of doctors and lawyers and scientists in the family is still a pretty good bet.

Wednesday, May 8, 2013

Hate Generalisations? You Probably Just Hate Statistics

One of the most oft-repeated nonsense claims by a certain type of low-wattage intellectual lefty is that one 'shouldn't generalise'. (For reasons that are worthy of a separate post', this seems to me to be reasonably correlated with people who also proudly announce that they 'don't judge').

Apparently, one of the Worst Things In The World you can do is to notice that information about the generality of a distribution may useful in predicting where a specific point in the distribution will lie.

For those people that don't like to 'generalise', I wonder what, if any, statistical measures they actually find interesting or legitimate.

What is an average, if not a statement that lets one generalise from a large number of data points to a concise summary property about all of the points combined? Or a standard deviation? Or a median?

The anti-generalisers tend to apply their argument ('assertion' is probably a better description) in two related ways, varying slightly in stupidity:

a) One should not summarise a range of data points into a general trend (e.g. 'On average, [Group X] commits murders at a higher rate than [Group Y]').

b) One should not use a general trend to form probabilistic inferences about a particular data point (e.g. 'Knowing statement a), if I also know that person A is in Group X, and person B is in Group Y, I should infer that person A has a higher probability of committing a murder than person B').

Version a) says you shouldn't notice trends in the world. Version b) says you shouldn't form inferences based on the trends you observe.

Both are bad in our hypothetical interlocutor's worldview, but I think version b) is what particularly drives them batty.

But unless you just hate Bayesian updating, the two statements flow from each other. b) is the logical consequence of a).

Now, this isn't a defence of every statement about the world that people make which cites claims a) and b). To a Bayesian, you have to update correctly.

You can have priors that are too wide, or too narrow.

You can make absurd mistakes that P(R|S) = P(S|R).

You can update too fast or too slowly based on new information.

And none of this has even begun to specify how you should treat the people you meet in life in response to such information.

None of my earlier statements are a defence of any of this. The first three are all incorrect applications of statistics. The last one is a question about manners, fairness, and how we should act towards our fellow man.

But there's nothing wrong with the statistical updating.

If your problem is with 'generalising', your problem is just some combination of 'the world we live in' and 'rationality'.

Suppose the example statements in a) and b) made you slightly uncomfortable. Let me ask you the following:

What groups X and Y did you have in mind when I spoke about the hypothetical murder trends example? Notice I didn't specify anything.

One possibility that you may be thinking I had in mind was that X = 'Blacks' and Y = 'Whites'. People don't tend to like talking about that one.

In actual fact, what I had in mind was X = 'Men' and Y = 'Women'. This one is not only uncontroversial, but it almost goes without saying.

As it turns out, both are true in the data.

Do inferences based on these two both make you equally uncomfortable? Somehow I doubt it.

And if they don't, you should be honest enough to admit that your problem is not actually with statistical updating, or 'generalisations'. It's just trying to launder some sociological or political concern through the action of browbeating the correct application of statistics.

So stop patronisingly sneering that something is a generalisation, and using that as an implied criticism of an argument or moral position. Otherwise zombie Pierre-Simon Laplace is going to come and beat yo' @$$ with a slide rule.

Wednesday, June 13, 2012

Truly Understanding What Combat Mortality Statistics Mean

I find it interesting sometimes to imagine how my worldview might change if I experienced different events.

It seems elementary that if you've made the best use of the data available, you should only change your mind based on new information. Merely experiencing an event without finding out anything you didn't know before ought not change your perception of things.

So it's funny to read about how the average person's views change with a particular experience, and try to hypothesize where your current views fit along the claimed evolution.

What prompted this (and continuing with the 'All-Fussell-All-The-Time' theme of the blog of late) was Paul Fussell's description of how the average soldier's views on the chances of death change over time.
In war it is not just the weak soldiers, or the sensitive ones, or the highly imaginative or cowardly ones, who will break down. All will break down if in combat long enough. "Long enough" is now defined by physicians and psychiatrists as between 200 and 240 days. For every frontline soldier in the Second World War, according to John Ellis, there was the "slowly dawning and dreadful realisation that there was no way out, that . . . it was only a matter of time before they got killed or maimed or broke down completely." As one British officer put it, "You go in, you come out, you go in again and you keep doing it until they break you or you are dead." This "slowly dawning and dreadful realisation" usually occurs as a result of two stages of rationalization and one of accurate perception:
1. It can't happen to me. I am too clever / agile / well-trained / good-looking / beloved / tightly laced / etc.
Personally, I can't imagine ever thinking this. Death is always certain, and there's always a chance that you're going to draw the unlucky number even in much safer events than combat. So while this might be a subconscious starting point, I doubt it. What about the second stage?
This persuasion gradually erodes into
2. It can happen to me, and I'd better be more careful. I can avoid the danger by keeping extra alert at all times / watching more prudently the way I take cover or dig in or expose my position by firing my weapon / etc.
This conviction attenuates in turn to the perception that death and injury are matters more of bad luck than lack of skill...
At a minimum, I think I'd start at this stage (or the first half, anyway) - it definitely can happen to you. The question is how much agency you have over the matter. Note that the description above tends to not focus on probabilities - it can happen, but if I do X, then it can't. I think this is empirically a good description of the world - most people don't think in probabilities.

But to those that do, it's obvious that you dying in warfare can be both a) largely determined by chance, and b) something you can still shift a bit at the margin by not doing stupid things.

In essence, you're spinning a roulette wheel, and any number above 3 means you're dead, or something equivalent. You can have crummy odds and still understand what the odds are.

So that, in short, would be where I think I'd view World War 2 combat probabilities.

But I don't think I would have gotten to the conclusion that makes up Fussell's stage 3:
...making inevitable the third stage of awareness:
3. It is going to happen to me, and only my not being there is going to prevent it.
Huh.

On a number of dimensions, that is actually incredibly clear-sighted. Granted, it still makes the mistake of not thinking in the probabilistic way (a probability of 99% is not the same thing as a probability of 100%).

But which bias are you more likely to be succumbing to? Being overly optimistic that you will somehow be different and escape it all, or ignoring the tiny chance that you might actually make it? To ask the question is to know the answer. The bias is all on the side of optimism - if you round your estimated survival probability down to zero, it won't change the answer by much, the same way as if you assume that you'll never win the lottery you'll almost certainly make better choices than if you assume any non-trivial probability of the event occurring.

And indeed, it only takes a minor modification to the premise to make it technically correct as well, by beginning the sentence with the phrase 'Given long enough, ...' . This is expressed most memorably in the motto of Zero Hedge - on a long enough timeline, the survival rate for everyone drops to zero.

In wartime, you don't even need the timeline to be that long.

Which makes the second half of the sentence all the more powerful - the only way out is to not be there.

That is something that I wouldn't have figured out with equivalent clarity.

In the middle of combat, there are also very few ways out. Desert and you run a good chance of getting shot.

I can imagine that goes a fair way to explaining why people go insane in war - you figure out that it is now inevitable that you'll die a horrible, gruesome death at some random (but imminent) point, and until then you're going to be surrounded by horror and brutality.

The phrase 'only my not being there is going to prevent it' can also be paraphrased as 'the only winning move is not to play.'

Sunday, April 1, 2012

Don't take it personal, kid

Over at reddit a few days ago, there was this thread where a guy talks about how his daughter has cerebral palsy, and now at age three is doing really well.

But what I found interesting was part of the title:
When she was born doctors said she would never walk, talk and would probably need to be institutionalized.
I always find this a strange response. If you want to see what I mean, compare it to an alternative formulation:
She's doing incredibly well, given her initial condition made it unlikely that she'd be able to walk or talk, and probably would have needed to be institutionalised
I don't want to pick on this guy - I'm really glad his daughter is doing so well. But I find it an interesting example of a particular mindset.

For some reason, people seem to really like the narrative 'and then the doctor [delivered bad news], but he was totally wrong!'.

It's not enough that things turned out better than expected. Apparently there's an extra sweetness to proving wrong an expert who delivered negative news.

My best guess is that this comes from a combination of:

a) A general lingering dislike of people who deliver bad news

b) A particular dislike of people who deliver bad news that turns out to be wrong, even if it was probabilistically correct at the time, and

c) A sense that medical conditions have substantial scope for self-fulfilling prophesies: if you treat someone like they're disabled, they'll end up disabled, but if you treat them like a normal person, they'll end up comparatively more normal, even if not perfectly able-bodied.

The first one I can't relate to at all. The medical profession is the last place you want to start shooting the messenger - if you in fact have cancer, you're going to be a hell of a lot better off knowing that and starting chemo than pretending that you've got something else.

The second one I can't really relate to much more. I can understand getting irritated at advice that was bad ex-ante. But that doesn't quite explain it. As a layman, you'll probably have very little idea whether the advice was wrong ex-ante, or right ex-ante but you just ended up in the odd end of the distribution. e.g. Most people born with cerebral palsy won't be able to walk, but your daughter ended up as one of the lucky ones.

More importantly, would you be equally mad with a doctor who delivered ex-ante advice that was correct but ended up being too optimistic? "The doctors said she'd probably be able to walk just fine, but she can't." Unless you'd be equally bothered by this one, there's still something funny going on.

The third one may have some merit, but I don't know how much. I tend to be slightly skeptical (without any particular evidentiary basis) only because it sounds too much like wishful thinking - if we only act like there isn't a problem, there won't be a problem!

If you want the extreme opposite view, let me present you the great James Bagian, a man who was meant to be on the Challenger Space Shuttle but was substituted out shortly before the mission. He declined to wax lyrical about beating the odds or pretend to be shocked that the outcome was as bad as it was:
Was I sad that it happened? Of course. Was I surprised? Not really. I knew it was going to happen sooner or later—and not that much later. At the time, the loss rate was about 4 percent, or one in 25 missions. Challenger was the 25th mission. That's not how statistics works, of course—it's not like you're guaranteed to have 24 good flights and then one bad one, it just happened that way in this case—but still, you think we're going to fly a bunch of missions with a 4 percent failure rate and not have any failures? You gotta be kidding.
I'm going to go out on a limb and predict that a man who can get in a space shuttle and understand exactly what a 4% probability of the thing exploding means is not somebody inclined to blame a doctor for a negative diagnosis that turned out to be wrong. As indeed evidenced by the entire approach he takes in his current job - figuring out how to reduce medical errors.

As always, sign me up with James Bagian.

Thursday, March 29, 2012

Against Compulsory Voting

Justin Wolfers likes the fact that Australia has compulsory voting.
I share Tom Friedman's view that the divisive nature of U.S. democracy is due to non-compulsory voting. But fixing that requires a mandate.
I respectfully disagree.

Less divisive it may be, but it offends me deeply as a statistician.

Why?

Well, who are the marginal people who vote under a mandatory system but not a compulsory system?

It's the people who didn't care enough to turn up of their own accord under a voluntary system.

Now, some of these people might actually have a firm view of the world, but just be feeling lazy or ambivalent. Maybe we really do want their opinions.

But a large number of the people who you're forcing to vote either

a) know virtually nothing about politics

b) genuinely don't give a flying fig

or both. If those people rationally decide to not vote, that's an entirely sensible decision.

How on earth does the decision-making system improve by forcing these people to pick a random answer? You're just intentionally adding noise to the process.

I remember my uncle had a mother who was senile and in a nursing home. He went in on election day to take her in to vote, only to be told that she'd already voted with the rest of the nursing home in the morning. Who did she vote for? Who the hell knows! She didn't know. Possibly someone told her who to vote for. Possibly she voted for the candidate suggested to her. Possibly not, too. But her completely random vote counted, just as much as the guy who read the paper every day. You can rest assured about that.

Lest you think that these people make up an insignificant number of votes, consider the following:

In the 1998 Australian federal election in the seat of Lindsay, there was an independent candidate who stood for office named 'Steve Grim-Reaper'.

Yes, really.

Without delving into the details of his policies, let's assume for the sake of the argument that people voting for a guy called 'Grim-Reaper' are essentially voting for a joke candidate. Let's further assume that the people voting for 'Grim-Reaper' might, if the 'Grim-Reaper' weren't running, vote for anyone at all. They are pure noise in the electoral process.

So how many people voted for the Grim Reaper in 1998?

1,043, or 1.36% of the electorate.

This isn't even counting the additional 4467, or 1.94% of the electorate, who voted informally (i.e. didn't bother to fill out the ballot properly).

Now, let's look at the seats that changed hands at the 1998 election. How many of these were cases where the margin of victory was less than the number of people in Lindsay who appeared to be voting as a joke?

In Bass, Tas, the margin was 0.06%.
In Dickson, Qld, the margin was 0.12%
In Kingston, SA, the margin was 0.46%
In the Northern Territory, the margin was 0.57%
In Stirling, WA, the margin was 1.04%
In Patteron, NSW, the margin was 1.22%

Six seats, where the victory was within the margin of joke voting. What a triumph!

In the most recent federal election, in 2010, the Labor Party ended up forming a coalition government with a majority of only one seat.

Meanwhile, the seat of Corangamite, Vic, was decided by 0.82% of the vote, and the seat of Hasluck, WA, was decided by 1.14% of the vote.

It is entirely possible that not only the outcome of a few seats, but in fact the outcome of the entire 2010 election, was decided by morons voting at random.

Justin Wolfers is a highly-trained economist, and a very competent statistician. It would amaze me if he weren't offended by this kind of forced noise in the voting process.

Even if it increases the civility of debate, it seems like a pretty steep price to pay.