Post-Election Reflection

Recap of my 2024 Presidential Election Forecast Model

This semester, I explored a variety of indicators and datasets week-to-week to predict the outcome of the 2024 U.S. Presidential Election. Most posts on this blog used one type of data (e.g. economic, polling, demographic, advertising) to predict the election outcome. For my final model, I chose to amalgamate data from across these sets into one big dataset, and then run a regression on each state, to see which predictors were most effective in explaining variance in Democratic Vote Share over time, based on their value for R². Then in each state, I fit a new multilinear regression just using the top 5 most predictive variables and predicted Democratic Vote Share based on that. For some variables, data comes from as far back as 1972. In creating a 2024 dataset to predict on, I had to use some projected metrics, which may have influenced my model’s accuracy (for example, I had to use projected demographic data since the last Census was conducted in 2020, and I had to use projected turnout, from the political scientist Michael McDonald’s prediction, because there did not exist actual turnout data for 2024 when I was making my prediction). My model did not make a prediction for Washington D.C. due to a lack of data, nor did it predict the national popular vote share, but it did generate percentages of the Democratic Vote Share in every state.

Results and Accuracy of the Model

The table below shows each state’s projected Democratic Vote Share (DVS) next to its actual, observed Democratic Vote Share in the 2024 Election. Cells with a DVS below 50% are shaded red because a Republican would have won them and cells with a DVS above 50% are shaded blue.

state	predicted D2PV	actual D2PV
Alabama	36.58	34.2
Alaska	41.66	41.0
Arizona	50.31	46.7
Arkansas	27.86	33.5
California	55.86	58.8
Colorado	28.95	54.2
Connecticut	59.31	56.4
Delaware	18.10	56.6
Florida	48.21	43.0
Georgia	48.53	48.5
Hawaii	67.75	60.6
Idaho	27.91	30.4
Illinois	57.59	54.4
Indiana	84.54	39.7
Iowa	44.16	42.7
Kansas	15.40	41.0
Kentucky	38.07	33.9
Louisiana	46.26	38.2
Maine	27.68	52.1
Maryland	100.45	62.4
Massachusetts	67.12	61.3
Michigan	47.18	48.3
Minnesota	51.97	51.1
Mississippi	44.20	37.3
Missouri	40.43	40.1
Montana	39.63	38.5
Nebraska	40.71	22.5
Nevada	45.90	47.5
New Hampshire	55.79	50.9
New Jersey	26.64	51.8
New Mexico	54.38	51.9
New York	49.24	55.9
North Carolina	48.76	47.8
North Dakota	31.96	30.8
Ohio	40.32	43.9
Oklahoma	34.69	31.9
Oregon	56.08	55.6
Pennsylvania	49.59	48.6
Rhode Island	56.35	55.7
South Carolina	36.78	40.4
South Dakota	40.58	34.2
Tennessee	37.45	34.5
Texas	79.40	42.4
Utah	19.88	37.9
Vermont	64.37	64.3
Virginia	80.33	51.8
Washington	59.71	57.8
West Virginia	36.98	28.1
Wisconsin	83.59	48.9
Wyoming	30.20	26.1

Hypotheses of Inaccuracy

Number 1. Historical Trends

The amount of data about each predictor varied: some predictors had data recorded since 1972, and others just to 2000. Additionally, some predictors (like polling averages) involved multiple data points per year, and others just provided a single point.
The benefit of using data that stretches back to 1972 is that I had more results on which to train my model. That is, I had more data points about how the status of demographics, the economy, candidates’ polling, and advertising, impacted Democratic Vote Share. But since the political landscape looks vastly different today than in 1972 (with certain groups of voters swapping parties, and certain states reliably voting for a different party than they used to), the older data may bias the model.
For predictors with multiple data points per year, I aggregated them into a mean and used that as the predictor value for the Democratic 2 Party Vote Share of that year. This could have impacted the model’s performance by neglecting to account for the (potential) association of fluctuation in values like polling averages in the months leading up to an election on Democratic Vote Share.

Number 2. Equal Weight to All Variables

To adjust for the inaccuracy of some historical data, I could have weighted the observations used to fit my state-by-state models based on how long ago they were taken relative to 2024. For example, I could have made observations from 1972 worth less than observations from 2020. To do this, I could have (1) linearly decreased the weights of data points based on how far away from 2024 they were (the drawback being that I might weight surprising/“once in a blue moon” outcomes more than typical outcomes if they were longer ago) OR (2) divided the data into \(k\) training and test sets, used the test set to measure the accuracy of the prediction made based on the data included in the training set, and then down-weighted the years (in the training set) whose models caused the highest test errors. The problem with this is that we cannot attribute innacuracy to specific years’ data because it might be due to the other data in its training set, or the sheer combination of years used in the training set. To account for this, we could select \(k\) in such a way that all permutations of years since 1972 can be the training set once, but it is still imperfect.

I also could have chosen to weight some variables more than others in the original multilinear regression. If I wanted polling data to appear in all of my states’ models, for example, and to be weighted more than the economic/demographic/advertising data in those states’ models, I could have multiplied it by a positive constant greater than 1, or multiplied the other features around it by some fraction before incorporating them into the regression.

Number 3. Use of 5 Predictors in Every State

Regardless of the R² variables for the predictors in each state, I decided to use five predictors. In a future iteration of this model I could parameterize the number of predictor variables used in each state, either (1) selecting all of those features with an R² value above some certain threshold (and only those features), (2) selecting predictors until the marginal returns on each state model’s accuracy is below some threshold, or (3) training models using different numbers of predictors (3, 4, 5, 6, etc.) and selecting those that have the best accuracy overall. To assess accuracy, I would keep some years in a training set and others in a test set and use out-of-sample error because, in predicting an election, we can’t use error based on the “truth” until the election occurs.

Number 4. Collinearity

We can see that some of the predictors, particularly demographic predictors, are highly correlated with each other. This is a common result when working with highly dimensional data. To reduce collinearity, I could apply regularization, either LASSO or Ridge. This is similar to the previous method of limiting the predictors in each state. But the previous method would not solely reduce dimensionality in all states… in some, it would probably increase dimensionality. To remove features associated with each other, LASSO and Ridge regularization penalize predictors by adding a penalty term to their sum of squared residuals, and then removing the features with the highest penalties plus squared residual sums. The penalty is based on a sum of the coefficients in the model. These regularization techniques reduce bias in the model by removing those terms with high sums of squared residuals, but the presence of the penalty term allows them to remove more features, the higher the dimension of the data.

Number 5. Reliance on R² Statistic

R² is the proportion of variance in the dependent variable explained by variance in the independent variable. But the interpretation of R² changes when there are multiple predictors in a model. Similarly, the value of R² for each respective predictor changes as the number of predictors in the model changes. Other statistics I could have used to select predictor variables to use in my models could have been the statistical significance (p-value) of their coefficient in the regression model, or I could have relied on a traditional regularization method (as described in the section above).

Overall, much of my model’s inaccuracy can be attributed to my goal of creating a generalizeable model to predict the 2024 Presidential Election outcome in each of the 50 states. Had I set out with a unique approach to predicting each state, potentially crafting 50 very different models, using different features, I may have wound up with different and/or better results.

Tests to See if the Proposals Above Would Improve the Model

The major issue with my model was high variance. Looking at the actual results, it’s true that Democratic Vote Share differs greatly between states (it was as low at 22.5% in Nebraska, overall, and as high as 64.3% in Vermont). But my model’s variance issues occurred in states like New York, Delaware, and Indiana where it severely over- or under-estimated Democratic Performance. On the contrary, when it was close to the actual Democratic 2 Party Vote Share in a state, it was very close. To test whether any of the above methods could solve this variance problem, I would generate an empirical distribution of past Democratic Vote Shares in each state (dating back to 1972, the start of my data) and see where each updated prediction falls within the distribution. The further away from the center of the distribution that an updated prediction falls, the less likely that the change in the method actually helped correct any error from the original model. However, if the correction moves the prediction closer within the bounds of previous Democratic Vote Shares in that state, I would adopt the correction into my model.

What I Would Do Differently Next Time

Next time I try to build a predictive model using features of different natures (e.g. economic data vs. demographic data vs. polling data) I would employ a Super Learning technique. This means building individual models for each state on each type of data, reserving some of that data for a test and then testing each state model’s accuracy on a prediction made with that test data, and then weighting the models’ outcomes in a linear supermodel based on how accurate each model was when tested on a test-set that is discrete from the data it was trained on. I also would not have shy-ed away from changing the methodology of each state’s model based on prior belief. This is a very Bayesian practice, but the way I would have done it in this model would have been very qualitative (based on the news, polls, and historical trends I was reading). In this year’s prediction, I wanted to avoid introducing my own bias into the model. However, in predicting something as volatile and nuanced as U.S. Presidential Election Outcomes, some bias might be favorable if it reduced the variance between predictions and reality. Something I learned this election cycle is that every pollster is different, and that they each introduce bias through the methods they choose to employ. For example, the famous historian Alan Lichtman chose to apply an entirely qualitative model, reflecting his own biases and experience from correctly predicting election outcomes from previous years. His method differed greatly from that of pollsters and quantitative social scientists, even if he could be just as confident in it.

Cathy Stanton

2024/11/16

Recap of my 2024 Presidential Election Forecast Model

Results and Accuracy of the Model

Hypotheses of Inaccuracy

Number 1. Historical Trends

Number 2. Equal Weight to All Variables

Number 3. Use of 5 Predictors in Every State

Number 4. Collinearity

Number 5. Reliance on R² Statistic

Tests to See if the Proposals Above Would Improve the Model

What I Would Do Differently Next Time

Post-Election Reflection

Cathy Stanton

2024/11/16

Recap of my 2024 Presidential Election Forecast Model

Results and Accuracy of the Model

Hypotheses of Inaccuracy

Number 1. Historical Trends

Number 2. Equal Weight to All Variables

Number 3. Use of 5 Predictors in Every State

Number 4. Collinearity

Number 5. Reliance on R2 Statistic

Tests to See if the Proposals Above Would Improve the Model

What I Would Do Differently Next Time

Number 5. Reliance on R² Statistic