"E pur si muove."

After being forced to recant his belief that the Earth revolved around the Sun by the Inquisition, Galileo was rumored to have muttered the phrase "E pur si muove." "And yet it moves." This was his rejection of the conventional wisdom at the time - that the Earth was the stationary center of the universe - which we now know to have been most spectacularly false.

While not the sole topic of this blog, much of what I write revolves around this theme - that the conventional wisdom is often flawed, and that all lies, inexorably, must eventually lead to the truth.

Sometimes I write because I have something to say; others, simply because I find it helpful to see my ideas written out; occasionally it's to see if one of my hair brained ideas actually holds any water. Either way, I hope you'll enjoy at least a few of my fairly random rants! If you care to read more about my motivations behind starting this blog, please click here. Feel free to on any of my posts; your feedback is always greatly appreciated.

Wednesday, May 28, 2008

Super Tuesday Rant

Originally Posted: February 5th, 2008

There's almost always a disclaimer on political polls which reads something like, "This is not a scientific poll." As consumers of media products, we have no idea what this actually means. That is, exactly WHICH rules did they violate? Well, you can be pretty sure in most cases that at the very least, the sample size was not statistically significant.

Undoubtedly, you're voting. You're either taking your lunch break to do so, or waiting until you get out of work, or skipping class... Or doing whatever it is you have to do in order to VOTE. If you're not, then I'll request that you please turn in your credentials as a human being and live in the wild for the rest of your existence. You have a say, and you'd best be using it!


However, that's not what's grinding my gears today. If you refuse to exercise your right to active citizenship, so be it. Today, I'm going to complain, once again, about the plague upon humanity that is the prevailing conventional wisdom.

All of the pundits on television have been brashly rattling off polling results and attempting to predict today's outcomes based on polling results across the country. Conventional wisdom holds that these polls good indicators of the outcomes. That they have strong predictive power. Hence, the debate and discussion revolving around which candidates are going to win which states is rife with statistics derived from the results of these polls.

How many people would you guess are being questioned in these polls? 1000? 5,000? 10,000? One would imagine that a significant percentage of the population were undoubtedly being questioned - such is the resolute conviction with which the results are being used to predict who will win each state!

Haha. Fat chance. Take a look at the bottom of that little box they have the chart in next time. Usually they have the sample size in there. State polls? I've seen numbers as low as 200. National? Try 1,000. The question is - why? The pollsters don't have enough time or money to ask EVERYONE how they're going to vote, so they have to pin down a reliable sample size. The conventional wisdom says that these sample sizes are derived by responsible statisticians, and are widely adhered to by polls around the country.

Sadly, this is NOT the case. Most of the criticism you'll see of the polls is that they employ an unreliable sample frame - that the people being asked who they are voting for are often inadequately representative of the entire sample population, be it due to demographic distribution, or researcher error. There is however, an additional concern - is the sample size large enough to warrant statistical significance?

If you don't have a background in stats and are curious as to how this decision should actually be come to, here's what is generally accepted as the definitive study: http://www.osra.org/itlpj/bartlettkotrlikhiggins.pdf

There are many variables that determine what the minimum sample size should be for any given population - alpha level for each tail, variance estimate, number of standard deviations, acceptable margin of error - since they're all different and we aren't looking at any data, it's hard to say what the minimum sample size for a given state would be. Two things however, are certain. The first is that the data is categorical, which pushes the required minimum sample size up. The second is that we'll be choosing a fairly small t-value, because it's a two tailed test, and we're reasonably sure the actual margin of error isn't greater than acceptable.

Haha - half of you are yelling at me - "WHAT DOES THAT MEAN IN ENGLISH???" Well, instead of talking in math for the next few paragraphs, let's do something simpler. Working backwards, the majority of state polls I've seen have been averaging about 300 responders. After doing a few quick estimates with the aforementioned assumptions, the minimum sample size from a population of 100 would be about 80-something. From a population of 1000? About 400. From 10,000? Well over 600. See a pattern here? 10,000 is FAR below the number of persons even voting in a small state like New Hampshire.

Even if we severely relax our assumptions, which is undoubtedly what many of these folks are doing, we end up needing more than 250 responders in a population of 10,000. The only way you can relax those assumptions is by being absolutely sure that your sample frame is as diverse and representative as it should be. Sadly, since this isn't observational data and people have to volunteer information, you're really going to want a much bigger sample size than folks have been using.

Not to mention that on TOP of all that statistical mumbo-jumbo, some people LIE, some people change their minds, and some people don't even vote. If you're trying to guess the outcomes this Super Tuesday my advice is as follows: forget about the polls, and read between the lines. Take a look at the political data on each state and make an educated guess. The conventional wisdom is that the polls are a fair predictor of outcomes and therefore very useful tools. The truth is that they are generally statistically flawed and, while useful at times, should generally be taken like everything else in politics: with a grain of salt, and alot of tequila.

No comments: