Winston Churchill once observed that a good definition of a fanatic
was someone who can’t change his mind, and won’t change the subject.
On the subject of voter fraud, I like to think that I meet
neither arm of the test.
On the first part, I feel like I’m definitely open to having
my mind changed, but not many people engage with the better evidence on the subject,
so I don’t often hear good arguments to the contrary. Then again, every fanatic on
every topic feels the same way, so perhaps this doesn’t distinguish me very
much.
But I can at least make sure I don’t fall foul of the second
arm. Few things in this life, even if true, are worth driving away those near
and dear to you, having friends of long standing view you as some crank and
lost cause obsessive. My twitter feed the past month has been that of a single
issue kook, which has gained me a lot of new followers, but I never really
wrote to build a large audience, and definitely wrote for the sheer joy of
being able to say whatever was on my mind, not for advancing a single cause.
To know if you’ve started to become viewed as a crank, you
have to listen to the silences – the friends that don’t respond to your
whatsapp messages when you send them something on the subject, the people on
twitter who used to engage that you haven’t heard from for a while. You don’t
have to change your beliefs about the election because others don’t agree with
you, but you do need to value your audience, especially when they are friends
and loved ones.
In finance, most trades are essentially neutral – if you buy
a stock, and nothing happens, you stay flat. However, a famous trade in foreign
exchange is the carry trade – borrow in low interest rate currencies, and
invest in high interest rate currencies. There, if nothing happens to the
exchange rate, you win (on the difference in interest rates). This term, “carry”,
gets used broadly to describe any such trade with this property, where you win
by things staying the same. An anti-carry trade is thus the opposite. If
nothing happens, you lose.
Since the Wednesday morning after the election, it has been
quite clear that Biden had a strong carry trade, and Trump had an anti-carry
trade. Something fairly large had to happen to change the answer. The Supreme
Court case with Texas was my last bet on what that something large might be.
Related to my post earlier this year on how Republicans can’t
get their appointed judges to stay conservative, the answer was depressing,
if not surprising. The number of ways the outcome can change at this point is small, most of them would be highly alarming if they occurred, and not many of them seem to hinge upon a great new empirical analysis of voter fraud being written by me.
So having written much on the subject, this is my coda to
the past month’s thinking, at least for the time being. Like the Dylan poem to which the title is an
homage, it’s not that the issue is suddenly dead, it’s just a way of collecting
one’s thoughts and drawing a line under a chapter that seems to be coming to a
close. I will probably have more to say on the subject, like every addict, but the time for being a single issue author is passed. Please bear with me even if you feel heartily sick of the subject. I
have spent an extraordinary amount of time thinking about these issues over the past
month, and I feel confident I may yet be able to tell you something new, the
things that at least I didn’t know before I started out. Without further ado, they are as follows.
The average American believes three things about voter fraud
in his country.
First, he believes that there is very little of it, perhaps almost zero, and
certainly not enough to swing an election.
Second, he believes that if there were a reasonable amount of it in general, he would have heard
about it, from experts on the subject.
Third, he feels that if any single election had been
fraudulent, said experts would be able to identify such fraud and bring it to
light before it was able to decide the election outcome.
I am not going to have much to say about the first point, at
least not directly. I suspect that by this juncture, the number of people who
haven’t made up their mind about this is very small. My firm belief is that
one’s priors on this should be quite wide, but that’s another subject.
Rather, I want to convince you that the second point, and especially the third point, are wrong.
While I don’t want to inflate my credentials here, I am one
of those fortunate people (or unfortunate, depending on perspective) whose
skills and training puts them in a good position to actually be able to
empirically study the question of voter fraud. There are few academic papers on
the subject that I would not back myself to be able to read and understand.
I have spent almost the entire past month digging into
various ways of trying to find voter fraud. Much of that work has been out of
the public eye, and not all of it was ever released officially to anyone. This
is how data digging works – you do a lot of analysis for everything you
actually write, in the “measure twice, cut once” manner.
And I can tell you, as someone who’s hunted very hard for it
– voter fraud is extremely difficult to
prove using only public data, whether it actually happens or not.
To which you might immediately think – that’s because there
isn’t much voter fraud!
On the contrary. It is not at all difficult to find
extremely alarming and weird anomalies in election data.
A good working definition of fraud is “wrong data entered
for malicious reasons”. The big challenge is that a good working definition of
data errors is “wrong data entered for innocent reasons”.
The extremely hard part is thus not finding anomalous and
suspicious patterns in the data, but proving with certainty that these arise
due to malicious intent. Moreover, one has to rule out every possible innocent
reason these errors could arise, where the functional form of errors is allowed
to be incredibly vague. Further still, the counties and election officials are
given almost every single benefit of the doubt. Moldbug is right on this point.
The
sovereign is he who determines the null hypothesis.
One can very easily find loads
of extremely suspicious things in the data.
One can find 169 updates in the New York Times county-level
election update data where the vote count in one category (in-person or
absentee) actually decreased in an
update. Here is one
of the most suspicious, in Montgomery County, PA which still hasn’t been
well-explained. You have not even heard of
the remaining 168. Here’s the count by state:
state | Freq. Percent
Cum.
------------+-----------------------------------
AL |
1 0.59
0.59
AR |
12 7.10
7.69
AZ |
5 2.96
10.65
FL |
3 1.78
12.43
GA |
24 14.20 26.63
IA |
20 11.83 38.46
ID |
1 0.59
39.05
IN |
1 0.59
39.64
KS |
2 1.18
40.83
MA |
1 0.59
41.42
MI |
21 12.43 53.85
MS |
1 0.59
54.44
NH |
1 0.59
55.03
NJ |
4 2.37
57.40
NM |
1 0.59
57.99
NY |
3 1.78
59.76
PA |
9 5.33
65.09
SC |
30 17.75 82.84
TX |
11 6.51
89.35
UT |
1 0.59
89.94
VA |
15 8.88
98.82
WI |
1 0.59
99.41
WV |
1 0.59
100.00
------------+-----------------------------------
Several of the disputed and contentious states are heavily
represented – Georgia, Michigan, Pennsylvania. But so are places you haven’t
heard of. Arkansas. Virginia. Iowa. South Carolina.
(By the by, through my various digging, Virginia is my bet
for “state with the most election fraud in 2020 that you never read about”, and not just because of the metric above)
Look at how much work went into the analysis
of Montgomery PA, which covered one of these data points, trying to rule out every possible innocent explanation,
and showing additional evidence that points to fraud. Do you think anyone is
digging that much into the remaining 168? The NYT data can be downloaded in a bunch of places, and it's not hard to find these updates. I've looked at them, about half of them are quite small, less than 100 votes. Some of the rest look like a single set of ballots being reclassified from one category to another. But even after taking out all of these, there's a large number of these where frankly I have no idea what's going on, and I doubt you would either.
One can find vote updates that look like colossal outliers
in terms of the fairly intuitive rule that updates can be either large, or
unrepresentative, but not generally both. Here’s a long
analysis of this. The most suspicious, in Wisconsin, Michigan and Georgia (surely a coincidence with the states identified on the metric above!),
also came in the middle of the night, and were large enough to swing the
election. The defenders argue that this is all just normal absentee votes. At
least for Milwaukee, one can also find corroborating evidence in suspicious
patterns in down-ballot
races too, that at least don’t fit simple stories about mail ballots.
But suppose you don’t believe the New York Times data. That
could all just be errors! Indeed. Couldn’t it all.
One can find 58 Pennsylvania registered voters born in the
year 1800, 11 born between 1801 and 1899, and 25 born in 1900. Admittedly, these particular cases are more likely just errors - if
this is voter fraud, it’s the stupidest form ever, since it’s going to stick out like a sore thumb. But it proves beyond any doubt that errors in this data do
not get checked or corrected anywhere. And indeed, these implausible years of
birth are in fact the mere tip of the iceberg of suspicious patterns in
birthdays, which follow much more notable
patterns indicating fraud involving round numbered days of the month and
months of the year, plus month distributions that are too smooth. These patterns consistent with fraud are related to counties voting for Biden, including at record levels.
Or suppose you don’t believe statistics at all. You insist on hard evidence! In Wayne
County, MI, you can find totally normal scenes from election night, like them boarding
up the windows in the vote counting center to stop observers even seeing
in. In Fulton County, GA, you had the insane spectacle that on election night,
election officials sent all the observers home, telling them that counting was
over for the night. In the press, dubious accounts were circulated implying
that a burst pipe was the cause, although it turns out that may
have been from that morning, or may not have happened at all. In any case,
an hour later, they started counting again, with no observers in the room,
using ballots in suitcases under a desk that had been delivered at 8:30am that
day. Oh, and all this was caught on video.
As part of this, you can also watch the officials scan the same set of
ballots multiple times. As has been
noted before – if this were happening in a third world country, the State
Department would declare it presumptively fraudulent. This isn't an exhaustive list. This is the ones I managed to remember and write down, while working furiously on other things over the whole period, and where the main allegations were actually caught on video. If you go through everything alleged in affidavits in lawsuit, many are much more shocking, though also harder to verify.
My point is not that you should believe this absolutely
nails down fraud, let along how widespread you should infer the fraud to be based on these incidents. My point is to emphasise how difficult the task is, even if there were actually fraud. Fraud
would look exactly like this. People switching votes back and forth to swing a
total, or deleting inconvenient votes from the count. Bringing fake and
colossally unrepresentative ballot dumps in during the middle of the
night. Registering tons of fake voters to flood in mail ballots. Counting happening in secret after
observers are sent home under false pretenses. Reports coming in from whistleblowers in affidavits.
But how sure are you that these aren’t just data errors in
very noisy data? That someone incorrectly entered a vote total in a database,
and later corrected it? That patterns in absentee ballots, while highly weird,
represent odd preferences of mail-in voters? That the ballots in Georgia were
all scanned regularly, and that the machine will never count ballots twice if
they’re scanned twice, and that there’s not some innocent mixup as to why
everyone was sent home? That the witnesses in the lawsuits were confused about what they saw?
If every benefit of the doubt is given to the other side, what's the chances you can ever overcome them all?
Suppose, like a number of readers, you are in the category
of someone who still isn’t convinced. There’s some weird stuff going on, sure,
but it doesn’t rise to the level of “fraud may have decided the election result”.
Three good questions to ask are the following.
1. What kind of voter fraud do you have in mind?
2. What evidence would actually convince you that there might have been this kind of
voter fraud?
3. What data is actually available, and based on
this, how likely is it that this evidence might ever conceivably be discovered?
The first question, as it turns out, is actually the most
important. Because fraud comes in many different types, and the likelihood of
catching them varies enormously.
The most egregious type is to make up election returns out
of whole cloth. In this version, the vote totals are plucked from someone’s
head, and don’t correspond to any actual ballots or button presses in the real
world.
This type is actually the most likely to get caught. Totally
fake numbers leave lots of traces that can be studied by things like digit
analysis via Benford’s
Law. Only the most basket case third world countries do this. I think one
can say with high certainty that, at the conservative end, this does not occur
very often in US elections, and I would wager strongly in America does not
occur at all.
The next category of obvious fraud is when some dictator
reports winning 99% of the vote. Like Theodore Dalrymple observed about
propaganda in communist countries, this kind of election is not actually meant
to convince anyone, but rather to humiliate them, to insist on obvious lies and
dare them to say differently.
But even here, most of the argument about fraud is already
at the level of a smell test. Suppose
you had to prove statistically that
it was impossible that these election results in Cuba
or Syria
were genuine. How exactly would you do it? I suspect you’ll find it’s a lot
harder than you might think. Bear in mind, in 2020 the “Norristown 2-2”
precinct in Montgomery County had reported mail-in votes up to November 10th
where Biden had won 98.7% of the two-party vote, across 150 votes. Please tell
me how you plan to show that this number is genuine, yet Assad’s 88.7% of the
vote is not. Not by digging up the raw ballots (though even here, if Assad can produce his fake ballots, you may still be out of luck). From your computer, which is what nearly all of us have had to do.
Or put it differently. Suppose that Assad in Syria decided
to rig the elections, but instead of generating insane levels of support, he
decided to replace all the genuine ballots with fake ones that showed him
getting support levels between 60% and 71%, with turnout at 70% of the
electorate. He has total control of the vote counting process.
You know this is bullshit. But that’s not the question. How
would you go about proving it?
Almost anything below the first two cases – making up
numbers whole, or 90% vote shares – is actually extremely difficult to prove,
even if it’s occurring. I mean, he kicked out the observers, which is pretty bad. But so did Fulton County, GA, and kept on counting.
Let’s take some scenarios more likely to actually occur in
the US.
You are an election official who is not being closely
monitored. There is a list of eligible voters in your precinct. Suppose it is a
normal year, with relatively few absentee/mail ballots. You have hidden a genuine
ballot box of pre-filled in ballots, with genuine ballot papers, that you know
contains 1000 votes total, of which 97% are for your candidate. All registered
voters in your precinct are on a list, and get crossed off as they come in. You
wait until polls close, and you can see the list of everyone who hasn’t voted.
You cross 1000 names off the list, and bring in your pre-filled in box of
ballots, mingling it with the main ones.
How do you propose to identify that in the data? If you had
periodic updates, you can maybe find batches that look really anomalous, sure. That’s
what this
analysis did! And this
one! The scenario wasn’t exactly the same, but it was similar. Did you find
it sufficient proof?
In this particular variant, every voter is a genuine,
registered voter. Every voter votes exactly once. Every ballot paper is a
genuine ballot. Every vote corresponds to a ballot paper that can be counted and re-counted. No ballot gives any indication it was not cast by a genuine voter.
Let us agree on this much. Unless you catch the person in
the act, this will be flat out impossible
to detect just by looking at final election results. I actually don’t know
how you’d prove it with any other data either. Don’t believe me? Propose a
test. I’m all ears. I have heard stories from campaign operatives that this
actually happens, I didn’t think up this idea myself.
But I’m not here to convince you to believe those stories.
Suppose one accepts, as indeed you’re told, that there is no evidence of this kind of voter fraud. It’s true. There broadly isn’t.
Now, ask yourself, what’s the signal to noise ratio of this kind of lack of
evidence? If there were no voter fraud of this kind, we’d expect to find no evidence. If
there were voter fraud of this type,
but we lacked any realistic ability to catch it, we would also expect to
find no evidence. So the lack of evidence tells us almost precisely zero one way or
the other.
Especially germane to the current election, there are many
types of fraud involving mail ballots. It is much easier for a person to send
in mail ballots for someone else, than to turn up at a polling station and
claim to be five different people of different ages. This mail then gets
handled by postal workers, with a crazily weak chain of custody, from the same
people that lead to your Amazon packages being stolen with reasonable
frequency. This leads to a number of stories you can find for the search string
“ballots
found in
the trash”. Meanwhile, signature verification on potentially fraudulent
ballots got greatly weakened in 2020 in many of the key states, just as the number of mail ballots increased massively, as described in
the Texas
lawsuit. A discussion I had with a campaign operative (which I haven’t been
able to verify, so I’m just reporting the claim, not asserting it) said that in
Arizona, once the signature was verified on the envelope, the envelope got
thrown away, making it impossible for anyone to verify after the fact what it
said.
Don’t think about “was there fraud”. I’m not interested in the
question of haggling over the specific details here of what precisely happened
in each place, and you can make up your own mind on that. Rather, I care much more about the question of “if there were fraud, would it have been caught?”
And here’s the crazy part, if you’re sure that election
fraud in general would have been caught. 2020 is actually the single best year in history to catch
election fraud. Because unlike in the past, we have periodic snapshots taken by
internet amateurs of the update of counts scraped from the NYT website, rather
than just the final tally. We can also download a ton of stuff from the
internet.
For most past elections, we can get final vote counts at the
precinct level if we’re lucky, or the county level more likely. Votes by
candidate. That’s it. You want to go back and find out if the 2016 election was
fraudulent, that’s basically the overwhelming extent of the data you’ve got to
work with. Oh, and four years later, that data is still riddled with errors,
because it has to get kludged together from 3300 odd counties, with vastly
different reporting systems.
Tell me what kinds of fraud you are confident you can
identify from those numbers. Not just you, but “the experts” who study this
stuff.
I understand enough about this data to know that while there
are clearly some tricks one can do if one is clever, there are large and
fundamental limitations to how much fraud you can ever hope to identify from
this kind of data.
And that’s it. That’s basically what you’ve got. Or you can
hope that someone does something dumb and gets caught in the act. But is that
the state of the art strategy? How many would slip through the net for each one
that gets caught, like in Fulton County GA? Not that anything is going to happen to the people in Fulton County, which also is quite revealing. In a year, I predict fairly confidently it will be one more rumored and then forgotten local story, and the videos will eventually disappear. Along those lines, if more evidence does come to light, you certainly can't publish them on Youtube, no matter what you find from here on out, as they've said that their policy is to delete all such videos. Big tech has spoken! The matter is closed. There is no evidence of voter fraud, and also, you had a total of four weeks to come up with any of it, before the verdict is entered for all time.
I think there is a strong case to be made that, for many
types of fraud, catching them is extremely difficult.
And so almost the entire question comes down to one of
priors. We have no reasonable hope of actually identifying it from the data.
Most people are sure it is extremely rare. I am not. The evidence demanded to
budge their priors is enormous. That evidence will never be found, whether
there is fraud, or whether there is no fraud.
And so finally, we get to the last question. Even if fraud could be caught, eventually, somehow,
with enough time and analysis and manpower, would
it be caught in time?
Reader, prepare yourself, because the next sentence may be shocking to you.
The Trump campaign, in many respects, was not very well
organized.
But I have come to have enormous sympathy for the sheer
scale and difficulty of the task in front of them, even if they were well organized.
A campaign is not a permanent organization, but a bunch of
operatives coming together for a particular period and task. I suspect, and it
accords with the few anecdotal discussions I’ve had with people who’ve worked
on them, that most presidential campaigns are a shitshow at the best of times,
but some candidate has to win, so we assume after the fact that their campaign
internally must have been great, when it probably wasn’t.
So what happens after the dubious election returns start
coming in in the dead of night on Wednesday after the election?
You have a small staff. Most of it is lawyers and political
operatives, not statisticians and data scientists. Everyone is absolutely
frazzled. You are trying to put out a thousand fires. You are trying to
coordinate dozens of people and teams. Everyone is demoralised and worrying about their employment future, since most were working on an implicit promise of employment in the administration if they won, which is now looking unlikely. You are trying to keep track of ten
thousand different leads and reports coming in from all over the country. Half
of them will be straight up wrong, either bogus third hand accounts, or claims from someone genuinely concerned but
insufficiently skeptical and not probing into alternatives. Avoiding this is
actually quite hard, to be honest. When one really wants to find fraud (or
indeed any empirical result) it is psychologically difficult to then switch
gears to convincing oneself of all the ways the hypothesis could be false, and then trying to find evidence
of that.
Of the other half of the leads, perhaps 80% will be plausible,
but either inconclusive, or admitting of multiple interpretations. Of genuine
ones, they may be contained in a two hour video that’s not very well explained,
and you don’t have time to watch the whole thing. They may be written down in
some long technical piece that you don’t have the training to follow entirely, or which doesn't explain clearly what its doing.
Even if you think it seems legit and you understand what it’s doing, you have
to take a gamble that it’s not a coding error or bad data cleaning or some
other screwup. They may be some anonymous whistleblower that you have to spend
resources to try to find out if they’re fake or well-intentioned, if they’re
right or wrong, if their claims are provable or unprovable.
Now, you have to figure out, can I get this in an affidavit?
Is this author willing to go public? Will this convince a judge? Can I get an
expert witness to testify, assuming a judge is even interested in hearing
evidence, which often they're not? As far as I can tell, the statistical analyses I liked the most were
all written pseudonymously. It is not a surprise that they didn’t find their
way into the major lawsuits. The Williams professor who did a god damn
confidence interval for the Matt Braynard analysis got dragged in the papers by
his utterly contemptible colleagues. The chances that they would do this if he’d
computed a confidence interval for literally any other survey in history are
zero. Are you surprised that more people aren't signing up to put their professional reputations on the line for what's almost certainly a Hail Mary, and which won't even benefit them personally?
But even if you can find an expert willing to go public, how long do they
have to generate such a report? You need to scramble to scrape and download the data straight away from lots of sources, and start analyzing it.
Find the weird anomalies, dig into them, try to figure out which ones might be
errors. Think of different ways to test them. Think of different data you might
get that would corroborate this. Manually do more gathering, and cleaning, and merging. Think of which things might rise above the metric
of “dubious” to “very hard to explain with anything other than fraud”. Run the
results. Double check the results. Triple check the results, because if you
start making false claims, you’ve actively hurt the cause (and you’ll feel like
a total fool and fraud). Start writing the results up. Refine the writeup to
make it less jargon-y. Try to balance the tension between “easily accessible to
public readers”, “understandable to smart but busy and innumerate lawyers” and
“detailed enough to withstand public scrutiny by hostile experts or readers”.
Also, there’s dozens of different investigative angles you can take. Each one takes a few
days or a week to look into, let alone write up, let alone actually get published. You’re pulling 80 hour weeks, but even so,
there’s not many weeks you have. How many such analyses can you write? Meanwhile, you're working against the clock without knowing quite what the deadline is for "too late to matter", but you know it can't be very long.
Now, consider the media environment you are operating in, if
you are the Trump team. The same media that in 2016 was willing to report
uncritically every breathless allegation of Russian interference, that was
willing to circulate as evidence a single anonymous dossier of allegations
about Trump and treat it as a basis for campaign wiretaps and impeachment, now
is loudly insisting that a) the race is over, and b) “experts assure us there
is no voter fraud”. Meanwhile, on the rare occasions they do report on the
matter, they only focus on the most ludicrous witness statements and the most
easily debunked claims. These are sure to circulate widely, so that by the time
previously open-minded readers get around to seeing actual good evidence,
they’re largely exhausted and cynical, and often won't even read it.
Partly for the fun of trolling, and partly just as an
experiment, I started asking the Montgomery County twitter account, and its
commissioner in charge of the election, Ken Lawrence Jr, why it was that their county looked so crooked
on multiple dimensions, both in terms of having the most suspicious vote update
in America, and the third most suspicious set of voter birthdays among
Pennsylvania counties. They never answered. I tried poking newspaper reporters
from multiple papers. Most didn’t bite. Ross Douthat, to his credit, linked to
the Montgomery piece, admittedly in a one-liner in his NYT article on how weird
it is that these kooks believe in conspiracy theories. I asked him in multiple
places – have you, or any other journalist, actually just asked these guys in Montgomery County what their explanation is for
it? Even just to get a response on the record? No dice. Nobody was interested. Hell, I couldn't even get a response out of the Pennsylvania Republican Party twitter account!
I didn’t really expect anything different, so my demeanour
was mostly one of trollish entertainment, rather than disappointment. But at
the end, even I found myself more cynical than I expected.
If you are Republican, and alleging voter fraud by the Democrats,
the media will be actively opposed to you at every single step. How could they
not be? These are the same people that have been writing about how Trump was
Hitler for the past four years. Does any reasonable person expect them to
voluntarily start digging into stories that might make Trump actually get another
four years, when they can just turn a blind eye and end it all? Besides, if
they start being called a voter fraud truther, it will be disaster for their
career.
There is one more piece of the puzzle worth noting.
How many people do you think there actually were working on
this, total, over the past month? At least on the data side?
The average person probably assumes that there must have
been thousands of highly paid professionals working on it.
I estimate that the number is perhaps 40 at the high end, and maybe as low
as 20. (If the sides had been flipped, it would definitely be more, perhaps a lot more, but I don't know). I’d estimate that nearly all of them were volunteers juggling other full
time jobs. I personally knew about ten of them working on analysis, and there were a number of other excellent people helping enormously with data gathering and processing.
That's it. That's the full extent of resources around the world that have gone into investigating from a statistical point of view whether the 2020 election may have been decided by fraud. With the time and resources available, it's remarkable we found as much as we did.
At least personally, I never really expected to change the outcome. The task was basically impossible, but damn it, we worked until
the end anyway.
This is all one can ever do.
To live not by lies, as Mr Solzhenitsyn put it.
And to fill the unforgiving minute with
sixty seconds worth of distance run, as Mr Kipling put it.
To the ten, and to all those I know who helped in the effort – friends, it was a true honour and pleasure to
work with you.