Although it may be weeks until we have verified vote counts for the U.S. presidential election, the error in the polls is undeniable, with state-level polling consistently underestimating support for President Donald Trump. As in 2016, the polling profession is scrambling to understand what went wrong, but clearly, something needs to change. The most likely culprit of the demonstrable polling problems in 2020 is also the most obvious: knowing who is going to vote.
Two most commonly cited culprits in early explanations that seem plausible are, upon closer examination, unlikely to explain the faulty results in presidential polling. The notion of the “shy Trump voters” holds that many felt social pressure to conceal their true preferences for a controversial candidate. Another more plausible potential source of error is what pollsters call “non-response bias.” This explains the underestimation of Trump’s share as a result of Trump voters being less willing to participate in surveys than Biden supporters.
These theories seem even less plausible now than they did in 2016, when the postmortems conducted by professional organizations such as the American Association for Public Opinion Research found little evidence for either; nor do preliminary assessments point in these directions, though more thorough examination of the 2020 results will follow.
We think going back to the fundamentals of the polling exercise provides the most likely source of explanation for errors in polling and points to the most likely remedies.
The source of error most likely came from pollsters’ predictions of who would actually cast a vote, called their “likely voter models.” Actual voters are a subset of registered voters, who themselves are a subset of eligible voters. Estimating the electoral preferences of some group of voters requires a prior judgment about whether someone from the large group of potential voters will become an actual voter. This is done by some combination of looking at voting history, stated intentions in the survey, or a statistical model, and it involves both calculation and subjective judgment calls.
Defining who is a likely voter is always challenging, but this election proved more so with a surging pandemic, a crippled economy, and, somewhat confoundingly, record-breaking turnout. Texas alone added more than 2 million voters over 2016 turnout and nearly 3 million over 2018.
The impossibility of knowing who is going to vote until people actually do so is an underlying source of uncertainty in election polling that can never be entirely dispelled. When we poll the public on their attitudes about almost anything other than elections, we usually know a lot about the size and characteristics of that population. But when we are polling voters before the election has taken place, we are necessarily speculating on a population that doesn’t actually exist yet.
The way forward requires more transparency. This could be accomplished by media and aggregation sites requiring pollsters to provide demographic breakdowns of their likely voter surveys before they’re reported on or included in anyone’s election forecast. For example, you should be skeptical of a Texas election poll that envisioned an electorate composed of 20% African Americans or 70% Anglos because we know, historically and demographically, that this would be impossible in 2020. But you have to have that information to exercise that skepticism.
In a more honest approach, pollsters could begin providing more than one estimated result from their polls based on different expectations of the electorate.
This change would make it more difficult for the poll aggregators and media outlets to give voters an inflated and unjustified sense of confidence in the outcome. This would call for a thorough revamping of how pollsters report their results, and, especially, how the political press, including poll aggregators, report and explain those results.
Uncertainty is inherent in polling, and we need to begin doing a better job of putting that uncertainty front and center.