Polling Data: The good, the bad, and the ugly
Polling Data: The good, the bad, and the ugly
We are data nerds. Much of what we do is based on an understanding of how statistics are run and applied. Naturally, we spent time ruminating about how the presidential election polls could be so wrong. We are offering some thoughts here:
Since the inception of elections people have been trying to predict the winner. However, polling data and data accuracy can be one of ambiguity and heavily debated. As Nate Silver recently stated on The Daily Show, “Polling is like democracy: it’s the least worst system ever invented.”
If you are interested in the nitty gritty of how polling data is collected, here is a comprehensive reference.
The quick and dirty version is that polling data is often subjected to a fair amount of methodological issues, they are based on constantly changing public opinion, often miss large amounts of the population, and the analytics used are based on past information and patterns that are not always adequate in predicting future trends.
First, questions are often worded differently for different polls and not worded optimally for “accurate” results.
Consider the following example (borrowed from electoral-vote.com):
• If the Nevada Senate election were held today, would you vote for the Democrat or the Republican?
• If the Nevada Senate election were held today, would you vote for the Republican or the Democrat?
• If the Nevada Senate election were held today, would you vote for Catherine Cortez Masto or Joe Heck?
• If the Nevada Senate election were held today, would you vote for Joe Heck or Catherine Cortez Masto?
• If the Nevada Senate election were held today, would you vote for Democrat Catherine Cortez Masto or Republican Joe Heck?
• If the Nevada Senate election were held today, would you vote for Republican Joe Heck or Democrat Catherine Cortez Masto?
• If the Nevada Senate election were held today, for whom would you vote?
First, there is inherently a lot of error (random and systematic) associated with surveying and polling. An example of systematic error in the case of the questions above could be that the questions were asked in a different order leading to different estimations of candidate support (Hillygus, 2011). Another example is in the format of the questions. Some of these questions have specific options listed, some provide time frames, and some are open ended. People will respond to these questions very differently depending on a host of variables (not excluding whether or not they had gotten around to a cup of coffee that morning.) Often times, the data collected from these different questions is aggregated to provide a much simpler picture than the questions above provide on their own. What you have probably guessed by now is that aggregating the data based on different questions does not always play well statistically.
Second, the “Bradley effect,” or a manifestation of wanting to behave in a socially desirable way, led those being polled to answer in ways that weren’t a true reflection of their voting intentions. In other words, when a pollster asks an individual if they are voting for the black candidate and they say “undecided,” they may not be undecided at all. They may have no real intention of voting for that candidate but do not want admit it (Hillygus, 2011). Said another way, participants said what they thought the pollster might want to hear and ushered in an unprecedented amount of error into aggregated polling data (Trunde, 2016).
Third, many of the polls also did not ask questions about the third party candidates. This is a vital piece of the puzzle that was missed. Last, thinking about who is likely to answer random phone calls, or stop and take a poll on the street? Probably not much of that population that came out in droves to vote on Nov 8th (Kurtzleben, 2016).
So what can we do better? First, we should consider a more widely agreed upon question format. Second, we should institute a stronger emphasis on word choice and a more representative method of collecting data rather than relying on what is most accessible. Third, we should learn as much about a poll’s construction as possible before consuming the information they provide. In the meantime, we advise taking this data with a hefty grain of salt.
· Hillygus, D. S. (2011). The evolution of election polling in the United States. Public Opinion Quarterly, 75, 962-981.
· Kurtzleben, D. (2016). 4 Possible reasons the polls got it so wrong this year. NPR. Retrieved from: http://www.npr.org/2016/11/14/502014643/4-possible-reasons-the-polls-got-it-so-wrong-this-year
· Trunde, S. (2016). It wasn’t the polls that missed, it was the pundits. RealClear Politics. Retrieved from: http://www.realclearpolitics.com/articles/2016/11/12/it_wasnt_the_polls_that_missed_it_was_the_pundits_132333.html