File this one under "eternal vigilance." The Daily KOS has been running weekly polling results for about a year and a half. They recently parted company with the polling firm, Research 2000 (R2K), when FiveThirtyEight found that R2K had a very poor record. If you want to know which polls are reliable, click the previous link.
Perhaps the most famous R2K poll found that large numbers of Republican voters believe outlandish things, such as the birther myth.
Yesterday DKOS announced that there were much graver problems with some of the R2K polls than mere inaccuracy.
A bit over two weeks ago, a group of statistic wizards (Mark Grebner, Michael Weissman, and Jonathan Weissman) approached me with a disturbing premise -- they had been poring over the crosstabs of the weekly Research 2000 polling we had been running, and were concerned that the numbers weren't legit.
I immediately began cooperating with their investigation, which concluded late last week. Daily Kos furnished the researchers with all available and relevant information in our possession, and we made every attempt to obtain R2K's cooperation which… was not forthcoming.
The wizard's report makes for fascinating reading. It is a brilliant example of how an expert, and maybe even a layman, can determine that a statistical summary is bogus without any access to the underlying data. Consider this one sample from R2K:
Approval of: | Favorable | Unfavorable | Undecided | |||
Men | Women | Men | Women | Men | Women | |
Obama | 43 | 59 | 54 | 34 | 3 | 7 |
Pelosi | 22 | 52 | 66 | 38 | 12 | 10 |
Reid | 28 | 36 | 60 | 54 | 12 | 10 |
McConnell | 31 | 17 | 50 | 70 | 19 | 13 |
Boehner | 26 | 16 | 51 | 67 | 33 | 17 |
Cong. (D) | 28 | 44 | 64 | 54 | 8 | 2 |
Cong. (R) | 31 | 13 | 58 | 74 | 11 | 13 |
Party (D) | 31 | 45 | 64 | 46 | 5 | 9 |
Party (R) | 38 | 20 | 57 | 71 | 5 | 9 |
This is a pretty simple polling question: do you have a favorable view of X, an unfavorable view, or are you undecided. However, there is something very wrong with those numbers. Can you see it? Look carefully. I would like to think that I would have seen it if I had looked long enough, but I'll never know because I skipped ahead to the explanation. Give yourself a few minutes, and if it doesn't jump out, then…
Overall, the results are unsurprising. Men are more conservative than women. Adjusted for that, everyone thinks that both parties and both houses aren't worth a pitcher of warm spit. Now let me give you a clue.
Approval of: | Favorable | Unfavorable | Undecided | |||
Men | Women | Men | Women | Men | Women | |
Obama | 43 | 59 | 54 | 34 | 3 | 7 |
Pelosi | 22 | 52 | 66 | 38 | 12 | 10 |
Reid | 28 | 36 | 60 | 54 | 12 | 10 |
McConnell | 31 | 17 | 50 | 70 | 19 | 13 |
Boehner | 26 | 16 | 51 | 67 | 33 | 17 |
Cong. (D) | 28 | 44 | 64 | 54 | 8 | 2 |
Cong. (R) | 31 | 13 | 58 | 74 | 11 | 13 |
Party (D) | 31 | 45 | 64 | 46 | 5 | 9 |
Party (R) | 38 | 20 | 57 | 71 | 5 | 9 |
Now do you see the problem? I've highlighted all the even numbers. In every male/female comparison both numbers are even or both numbers are odd. There isn't a single case where the stat for men is even and that for women is odd, or vice versa. That is statistically all but impossible.
Consider a double coin toss. You either get Heads Heads, TT, HT, or TH. If you do it 27 times, you get about six or seven of each result; if, that is, the game is fair. If you get all HH and TT, something is very wrong. If you get all HH, you're using double headed coins.
Of course the numbers above are rounded off, but rounding has a fifty/fifty chance of turning an even number into an odd one or vice versa. No random sampling would have produced the even and odd pairs in the table above. It's not just 27 examples, unfortunately (for R2K).
Were the results in our little table a fluke? The R2K weekly polls report 778 M-F pairs. For their favorable ratings (Fav), the even-odd property matched 776 times. For unfavorable (Unf) there were 777 matches.
Common sense says that that result is highly unlikely, but it helps to do a more precise calculation. Since the odds of getting a match each time are essentially 50%, the odds of getting 776/778 matches are just like those of getting 776 heads on 778 tosses of a fair coin. Results that extreme happen less than one time in 10228. That's one followed by 228 zeros. (The number of atoms within our cosmic horizon is something like 1 followed by 80 zeros.) For the Unf, the odds are less than one in 10231. (Having some Undecideds makes Fav and Unf nearly independent, so these are two separate wildly unlikely events.)
The authors of the report were reasonably cautious in drawing conclusions, but I won't be. Research 2000 was reporting manifestly fraudulent results. This isn't a matter of distorting data, or asking misleading questions. It is a matter of making the data up.
There is no political scandal here. The Daily KOS, to its credit, exposed the fact that its own polls were compromised. I suspect that this is probably a simple case of professional incompetence followed by fraud. R2K, or someone in that firm, for whatever reason, couldn't deliver a genuine product on schedule. So they delivered a fake one. If you own R2K stock, I'd sell. Now.
It is a good reminder that you should always be suspicious. It is also really cool to look at something bogus and see it for what it is.
Update! The Politico is reporting that R2K polls were among the reasons that Blanche Lincoln was expected to lose the Arkansas Primary. If so, and these polls were fraudulent, that would be a major scandal.
Hence Mark Twain's observation that there are three kinds of lies - Lies, Damn lies and Statistics
Posted by: BillW | Thursday, July 01, 2010 at 05:26 AM
Don't generalize to all statistics, Bill W. There are useful stats, and there are bogus stats. Our challenge is to discern them.
And discern them Kos does! I had my doubts on the even-odd thesis at first, but the original article and the analysis of variance is persuasive.
On the birther poll: are you sure that was an R2K poll? I recall making some noise about a March Harris poll, which found 57% of Republicans thinking President Obama is a Muslim and which was found subsuequently to suffer from sloppy wording. Is there another such poll by R2K to which you refer?
(background: http://madvilletimes.blogspot.com/2010/03/tea-bag-protest-fueled-by-hatred-and.html)
Posted by: caheidelberger | Thursday, July 01, 2010 at 07:35 AM
Does this mean you will stop quoting Rasmussen, create-a-positive-Republican-narrative, robo-call polls as though they are worth more than a pitcher of warm spit?
Posted by: A.I. | Thursday, July 01, 2010 at 08:41 AM
A.I.: I don't know what you have against Rasmussen, except that you don't like the results. In fact, Rasmussen rates about as well as Gallup. Sorry.
Cory: I am sure lots of polls focused on this question. Here is a link focusing on an R2K/Daily KOS poll. http://volokh.com/2010/06/30/possible-data-fraud-in-research-2000-polls-commissioned-by-the-daily-kos/
Posted by: KB | Thursday, July 01, 2010 at 09:42 AM
ps. Just out of curiosity, I went back and checked the 2008 results. Rasmussen's last poll was 52/46 Obama v. McCain. The actual results were 53/46. Right on target.
But right now you don't need Rasmussen to "create a positive Republican narrative". Check out Gallup.
Posted by: KB | Thursday, July 01, 2010 at 10:59 AM
I find it difficult to believe that Markos Moulitsas wasn't aware of the huge disparity in "his numbers" and that of, essentially, everyone else.
He was aware of this disparity for over a year and a half, wrote numerous article based upon these polls and also wrote a book!
Now, that the revelation that Research 2000 "pulled their numbers from where they sit", he's going to "go public" with this "revelation" to try and distance himself from them.
Will he retract everything he's written based on these polls? Not a chance...
The well is already poisoned and Markos is just trying to salvage his "brand".
The "Big Lie" is already established and will have to be refuted, time and time again...
Posted by: William | Thursday, July 01, 2010 at 10:58 PM
William: I do not find it difficult at all. I didn't see the problem until someone pointed it out. I have no trouble understanding why it's anomalous, I just am not trained to look for it. Remember how The New Republic got scammed by Steven Glass?
I am no big fan of Moulitsas, but it looks like he acted honestly enough on this one.
Posted by: KB | Friday, July 02, 2010 at 11:03 AM
Perhaps I'm too quick in judging his motives, but given the disparity of R2000 polls against most others and his reliance on their results in his writings, I certainly find it difficult to sympathize with him. I certainly thought R2000's polling results were "funny" for quite some time.
Posted by: William | Saturday, July 03, 2010 at 10:49 AM
I won't argue with you on that one. I am just not sure I wouldn't have fallen for the same scam if it had been pulled on me. I confess that I like the Rasmussen polls in large part because they show what I want to see. I hope I am reasonable enough to see through a scam, if I am subject to one. But I am sympathetic in this case to the Daily KOS.
Posted by: KB | Sunday, July 04, 2010 at 01:08 AM