A few weeks ago, I wrote the following nerdy and somewhat-cryptic tweet:
Abromowitz polarization variable: if(yr<96,0,if(is.Inc | app>0,1,-1)) counts as 4 degs of freedom? http://j.mp/SbHyzx
The “degrees of freedom” that I reference is the statistical term, which represents the difference between he wealth of data at your disposal and the complexity of your model. Allow me to explain as best I can:
Let’s say that you want to forecast the 2012 result based on the 16 presidential elections since 1948. In this example, you start with 16 degrees of freedom because those 16 election results could be anything. And in the simplest model, you can retain all 16 of those degrees of freedom:
The simplest model would be to guess that the incumbent party will garner 50% of the vote. (A plausible reason for this guess is the Median Voter Theorem, which says that parties move to the center and votes will generally be split equally.) Because your guess stays at 50% no matter what the 16 presidential elections could possibly be, the 50% model costs you zero degrees of freedom, and you retain all 16 degrees.
But, perhaps a keener idea would be to actually examine the historical trends in an attempt to improve accuracy. In the 16 elections since 1948, the average popular vote percentage for the incumbent party’s nominee is 52.1% — suggesting that the Median Voter Theorem by itself may be missing an inherent, incumbent party advantage. This new model, displayed as the horizontal line in the chart below, costs one degree of freedom for the averaging parameter.
Here’s why that model costs one degree of freedom. Imagine that you only had one election result to work with: the 2008 campaign in which McCain (the incumbent party nominee) won with 46.3% of the two-party vote. Averaging the historical data is trivial: the sole election of 2008 averages to 46.3%. The fact that your modeled forecast exactly equals the historical data means that you have used up all of your degrees freedom left. One data point minus one average equals zero degrees of freedom. In general, each additional data point adds degrees of freedom, and each additional variable to your model subtracts a degree.
A similar thought process controls for two data points. Since one line (defined by a slope and intercept) can always exactly intersect two any two points, the lines’ variables of the slope and intercept cancel out the two data points, and there are no degrees of freedom left. Adding a data point will (usually) cause the line to start missing points, even if it fits the data well — that mis-estimation is the degree of freedom.
In general, if you have lots of degrees of freedom, but your model still estimates the data well, then you’re in good shape. Nate Silver has written extensively on forecasting, saying “A general rule of thumb is that you should have no more than one variable for every 10 or 15 cases in your data set.” This keeps your degrees of freedom from edging toward zero add complexity (e.g., variables) to your model.
Abromowitz’s 2008 forecasting model, which was based on 15 data points, included three variables: presidential approval, the economy (specifically, GDP), and whether an incumbent president is running. Throw in the implicit historical average, and his model’s complexity costs him four degrees of freedom, leaving him with 11. With so few presidential elections to learn from, he was already pushing the limits of the data. Yet, after the 2008 election, he noticed another trend in the data:
The unexpected closeness of all four presidential elections since 1996 suggests that growing partisan polarization is resulting in a decreased advantage for candidates favored by election fundamentals, including first-term incumbents. … In fact, the last four presidential elections have produced the closest victory margins and the smallest inter-election vote swings of any four consecutive elections in the past century.
To account for this trend, he added another variable to his model: polarization. At first blush, this new parameter may appear to cancel out only one degree of freedom from the model (leaving Abromowitz with the same number of DoFs as he had going into 2008 because he gained one data point from Obama v McCain). But, the description of the variable demonstrates more complexity than normal:
For elections since 1996, the polarization variable takes on the value 1 when there is a first-term incumbent running or when the incumbent president has a net approval rating of greater than zero; it takes on the value -1 when there is not a first-term incumbent running and the incumbent president has a net approval rating of less than zero.
It the polarization variable took on a value of 1 post-1996 and 0 before 1996, then that addition would clearly cost Abromowitz one degree of freedom — just as approval and incumbency cost him one degree of freedom. To be specific, what costs Abromowitz the degree of freedom is his ability, retrospectively, to examine the data and atheoretically pick 1996 as the transition point. However, his polarization variable does not just take on one value after 1996. Rather, the variable’s post-1996 value depends on both approval rating and incumbency — his picking of those two criteria eliminate another two degrees of freedom, costing him a total of three. (The question mark in my tweet was very appropriate as I miscounted originally.)
Thus, Abromowitz’s model is now down to (16 – 4 – 3) nine degrees of freedom — in other words, he’s used up nearly half of the degrees available from his data. While I believe that the American electorate has become more polarized, this level of model complexity (relative to the data set) reeks of overfitting to me.