Sunday, July 2, 2017

On the time-series, the cross-section, and epistemic humility

One of the advantages of the economist's training is just the ability to instinctively think in terms of empirical tests. There's a reason that economists have tended to colonise other fields like sociology, law and politics - as much as anything, it comes from knowing about how to design proper empirical tests, and what good identification is.

Perhaps more importantly, it comes from knowing what poor identification is, including basic issues of endogeneity, reverse causality and omitted variables. For instance, knowing that you cannot infer almost anything about the relationship between prisons and crime just by looking at the variation over time in the total number of prisoners and the total number of criminals in a society.

The gold standard for identification, of course, is pure randomisation. When that isn't available, as it usually isn't outside a laboratory, you go for natural experiments - where something almost exogenous occurred. This gets used as an instrument - in the prisons and crime case, for instance, Steve Levitt used ACLU prison overcrowding litigation as a quasi-random shock to the prison population.

Of course, the pendulum swings back and forth. If the initial identification push corrected the free-for-all of 1980's empirical work (regressing the number of left shoes on the number of right socks!), the subsequent view seems to have gone towards 'identification uber alles'. In other words, the only question of interest is whether you've really truly identified totally exogenous variation, not the importance of the underlying topic. Plus, oddly, very few people seem willing to learn from an imperfect instrument. It seems to me that if there are 100 potential explanations, and an imperfect instrument rules out 90 of them, then we've learned something quite valuable - the answer is either the main hypothesis, or one of the remaining 10. But making the perfect the enemy of the good seems to be the way of things these days.

These questions end up being most important when you actually run empirical tests. Me, I'm lazy - there is a large hurdle for me to actually download a dataset and start fiddling with it. I'm always impressed by people like Audacious Epigone and Random C. Analysis who do this stuff all the time.

But there is another aspect of empirical training that ends up being even more useful for the computationally lazy - helping you sort through hypotheses just by knowing the panel nature of the dataset.

In particular, a lot of questions that purport to be about a time-series are really about a panel. That is, it's not really about a single variable over time (the time-series), it's about a group of different individuals over time. And thinking of the cross-section simultaneously with the time series greatly clarifies a lot of things. That is, instead of just coming up with hypotheses about why the average of some variable as changed over time, think about whether this hypothesis would also be able to explain which individuals would change more or less, or would be higher or lower on average.

One of my favorite examples is birthrates. The classic question is about the time-series : why have birthrates, on average, declined over time?

But there is also a cross-section - whose birthrates? This could be of individuals, or countries, or characteristics. Moreover, the cross-section exists both today, and in the past. If you're too lazy to run a regression and just wanted to sort through hypotheses about the time series with help from your friend William of Occam, one rule of thumb might be as follows: A good variable explains both the time series and the cross section. A mediocre variable explains the time series, but not the cross-section. A bad variable explains the time-series, but predicts the cross-section in the wrong direction.

For instance, suppose I wanted to know why birthrates in the west had declined overall. Here's the US Total Fertility Rate over time

You might look at that graph, and notice a big decline starting in the early 1960s. Certain hypotheses start suggesting themselves. What was going on in the 60's? Feminism? The Pill, approved by the FDA in 1960?

Quite possibly. But what hypotheses would come to mind if instead I showed you this graph:

The US is now in green. But we've also plotted New Zealand, in purple, Sweden, in light blue, and Japan in pink.

I personally would be astonished if your reaction isn't at least partly like mine, thinking that the world is actually quite a lot more complicated than you'd bargained for. Sweden was actually rising in the early 1960s, and New Zealand was higher than the US for a long time. Japan, meanwhile, has a completely different picture altogether.

You can add any number of these together at the excellent World Bank website to test whatever theory you have.

But there are other cross-sections within countries which can be used to test theories further.

Suppose I were primarily interested in the west, and my theory was about a rising cost of having children. It's a lot more expensive to raise children than it was in the past, as you have to pay for daycare, and "good schools", and all this other stuff, because societal expectations are higher.

I certainly hear people with kids complaining about cost all the time, which tells me that maybe there's something to it.

In the first place, I don't think this hypothesis stands up well to an actual examination of the data above. It turns out there's nothing quite like downloading the bloody data and plotting it to explode a lot of preconceptions. This probably actually is the most basic economist's tool of all, to be honest. Because the US graph above shows that not only did birth rates in general go down, but they went down precipitously starting in the 1960s, hit their low point around 1975, and have been slightly rising since then. Of course, since the graph only starts in the 1960s, you have to be wary of giving prominence to this date alone for the initial decline. But still, does this look like a graph of what you expect the cost of raising a child was?

I'd guess not, but with something hard to measure like "the cost of raising a child", who knows, maybe it is. After all, the aggregate time series is always hard. We only have one run of history, and lots of things are changing at once. But the cross-section has a lot more data.

For instance, consider the cross-section of rich and poor. At least in 2000, here's how they looked:

In other words, the poor have more children than the rich.

This immediately doesn't sound like a cost story. Even if the cost has gone up for everyone, why are the rich less able to bear it than the poor? Even if you think that the poor are on welfare so they don't care about the cost, as long as the rich have more money, they still are better placed to be able to deal with it. Are children some sort of inferior good, getting substituted for jetskis as income rises?

And there's another aspect - there's a historical cross-section as well. I suspect, though it turns out to be harder than I thought to find an easy citation for this, that this disgenic pattern in birthrates with respect to income is a relatively recent phenomenon. Certainly in pre-history or in polygamous societies, only the rich could afford to have large large families (or multiple wives, in the polygamy case). When the Malthusian limit binds, access to resources matters, and the rich outbreed the poor.

I've written before about how I think improved birth control is a big part of the story. But doubt it not, this does not much better at explaining the current cross section as a cost story. That is, it approximately fails to predict it at all (making it mediocre by my rule of thumb), rather than predicting it in the wrong direction like costs do. To get birth control to explain the cross-section of income as well, you'd need to believe that the poor are unable to afford birth control, but are able to afford the resulting children. Seems hard to square to me.

Cost, incidentally, also has similar cross-sectional problems when it comes to the increase in obesity. Leftists love the 'food desert' explanation, whereby the poor are forced to become obese because the stores in their area don't have enough fresh fruits and vegetables, and hence the only options to them are potato chips and coke.

Again, from a cross-sectional point of view, this is possible. But it's a disaster from the time-series point of view. As society in general has gotten richer, we've also gotten fatter. How is it that the poor today "can't afford to eat healthily", but the poor in the 1930's could?

So explanations have to get more complicated. There's no rule of nature that everything has a single explanation. I've picked fertility and obesity because they are two of the most stubborn problems facing the west today, which suggests, but does not guarantee, that they may not be amenable to a single simple explanation.

I think there's something quite humbling about looking at the totality of the data, because it rarely looks like any one neat explanation of anything. It reminds you that your models of the world are just that - models. You include what you think are the most important parts, but you leave out lots of other stuff too. Even if you're right on what's important (a big if), the world is a large and complicated place.


  1. Great piece as always. A little more clarity in the introduction may have helped as I was somewhat lost on what exactly the subject was (I have zero familiarity with econometrics - yes I had to look that word up :)

    All this reminds me of the time Thomas Sowell was asked in an interview why he doesn't subscribe to the Leftist's faith in social engineering. His answer has always stayed with me. Loosely paraphrasing him, he said he was cured of utopian fantasies simply by working in a government agency. The sheer magnitude, complexity and interconnections of modern society and all its variables makes attempting to 'engineer' all its components hopelessly futile at best and destructive at worst. It speaks to the ignorance and arrogance of those who think they can.

    If your article says one thing its something I have begrudgingly come to respond to almost any question on any issue these days: 'its complicated'. I say begrudgingly precisely for the reason you suggest the experience is 'humbling' - when an idealistic youth (read: narcissist) I had a one line answer to all things and assumed that if made king for the day I could make all well again! Humbling indeed to realize that I 'know' far less now at 40 than I did at 15. I take comfort in that Socrates was said to be the wisest of all Athenians because he was the only one aware of his ignorance :)

    I suppose its related to the notion that people tend to move to the right as they get older - its not so much that they realize the truth of things and that Conservatism has a monopoly on said truths, but simply that they learn enough to know that nothing is black and white; that because of this the perspectives of the left are far to reductionist (class and identity politics as a whole), and that Conserving what works although not understood, makes more sense than Progress for its own sake.

    1. Fair point about the econ jargon - I was feeling lazy, and explaining that stuff properly would take most of another post. Having started writing it, the sunk cost fallacy prevented me from just deleting it, which is what I probably should have done. :)

      I think your point about the Thomas Sowell skepticism is exactly right. While I no longer tend to subscribe to a lot of what politically is described as "conservatism", the conservative disposition is something that I still retain. Part of the reason to not fiddle too much with what is working is because tradition itself is valuable, but the other part is the long odds of success on untried schemes of human engineering. At the moment, nearly all such schemes are on the left. But it's an open question how much better the success rate would be for untried right-wing social engineering. Not that such a thing seems like it's about to take off, but still.