Today the polling inquiry under Pat Sturgis’ presented its initial findings on what caused the polling error. Pat himself, Jouni Kuha and Ben Lauderdale all went through their findings at a meeting at the Royal Statistical Society – the full presentation is up here. As we saw in the overnight press release the main finding was that unrepresentative samples were to blame, but today’s meeting put some meat on those bones. Just to be clear, when the team said unrepresentative samples they didn’t just mean the sampling part of the process, they meant the samples pollsters end up with as a result of their sampling AND their weighting: it’s all interconnected. With that out the way, here’s what they said.
Things that did NOT go wrong
The team started by quickly going through some areas that they have ruled out as significant contributors to the error. Any of these could, of course, have had some minor impact, but if they did it was only minor. The team investigated and dismissed postal votes, falling voter registration, overseas voters and question wording/ordering as causes of the error.
They also dismissed some issues that had been more seriously suggested – the first was differential turnout reporting (i.e, Labour people overestimating their likelihood to vote more than Conservative people), in vote validation studies the inquiry team did not found evidence to support this, suggesting if it was an issue it was too small to be important. The second was the mode effect – ultimately whether a survey was done online or by telephone made no difference to its final accuracy. This finding met with some surprise from the audience, given there were more phone polls showing Tory leads than online ones. Ben Lauderdale of the inquiry team suggested that was probably because phone polls had smaller sample sizes and hence more volatility, hence spat out more unusual results… but that the average lead in online polls and average lead in telephone polls were not that different, especially in the final polls.
On late swing the inquiry said the evidence was contradictory. Six companies had conducted re-contact survey, going back to people who had completed pre-election surveys to see how they actually voted. Some showed movement, some did not, but on average they showed a movement of only 0.6% to the Tories between the final polls and the result, so can only have made a minor contribution at most. People deliberately misreporting their voting intention to pollsters was also dismissed – as Pat Sturgis put it, if those people had told the truth after the election it would have shown up as late swing (but did not), if they had kept on lying it should have affected the exit poll, BES and BSA as well (it did not).
With all those things ruled out as major contributors to the poll error the team were left with unrepresentative samples as the most viable explanation for the error. In terms of positive evidence for this they looked at the differences between the BES and BSA samples (done by probability sampling) and the pre-election polls (done by variations on quota sampling). This wasn’t a recommendation to use probability sampling (while they didn’t do recommendations, Pat did rule out any recommendation that polling switch to probability sampling wholesale, recognising that the cost and timing was wholly impractical, and that the BES & BSA had been wrong in their own way, rather than being perfect solutions).
The two probability based surveys were, however, useful as comparisons to pick up possible shortcomings in the sample – so, for example, the pre-election polls that provided precise age data for respondents all had age skews within age bands, specifically within the oldest age band there were too many people in their 60s, not enough in their 70s and 80s. The team agreed with the suggestions that samples were too politically engaged – in their investigation they looked at likelihood to vote, finding most polls had samples that were too likely to vote, and didn’t have the correct contrast between young and old turnout. They also found samples didn’t have the correct proportions of postal voters for young and old respondents. They didn’t suggest all of these errors were necessarily related to why the figures were wrong, but that they were illustrations of the samples not being properly representative – and that ultimately led to getting the election wrong.
Finally the team spent a long time going through the data on herding – that is, polls producing figures that were closer to each other than random variation suggests they should be. On the face of it the narrowing looks striking – the penultimate polls had a spread of about seven points between the poll with the biggest Tory lead and the poll with the biggest Labour lead. In the final polls the spread was just three points, from a one point Tory lead to a two point Labour lead.
Analysing the polls earlier in the campaign the spread between different was almost exactly what you would expect from a stratified sample (what the inquiry team considered the closest approximation to the politically weighted samples used by the polls). In the last fortnight the spread narrowed though, with the final polls all close together. The reason for this seems to be because of methodological change – several of the polling companies made adjustments to their methods during the campaign or for their final polls (something that has been typical at past elections, companies often add extra adjustments to their final polls). Without those changes them the polls would have been more variable….and less accurate. In other words, some pollsters did make changes in their methodology at the end of the campaign which meant the figures were clustered together, but they were open about the methods they were using and it made the figures LESS Labour, not more Labour. Pollsters may or may not, consciously or subconsciously, have been influenced in the methodological decisions they made by what other polls were showing. However, from the inquiry’s analysis we can be confident that any herding did not contribute to the polling error, quite the opposite – all those pollsters who changed methodology during the campaign were more accurate using their new methods.
For completeness, the inquiry also took everyone’s final data and weighted it using the same methods – they found a normal level of variation. They also took everyone’s raw data and applied the weighting and filtering the pollsters said they had used to see if they could recreate the same figures – the figures came out the same, suggesting there was no sharp practice going on.
So what next?
Today’s report wasn’t a huge surprise – as I wrote at the weekend, most of the analysis so far has pointed to unrepresentative samples as the root cause, and the official verdict is in line with that. In terms of the information released today there were no recommendations, it was just about the diagnosis – the inquiry will be submitting their full written report in March. It will have some recommendations on methodology – but no silver bullet – but with the diagnosis confirmed the pollsters can start working on their own solutions. Many of the companies released statements today welcoming the findings and agreeing with the cause of the error, we shall see what different ways they come up with to solve it.