In January the BPC inquiry team announced their initial findings on what went wrong in the general election polls. Today they have published their full final report. The overall conclusions haven’t changed, we’ve just got a lot more detail. For a report about polling methodology written by a bunch of academics it’s very readable, so I’d encourage you to read the whole thing, but if you’re not in the mood for a 120 page document about polling methods then my summary is below:
Polls getting it wrong isn’t new
The error in the polls last year was worse than in many previous years, but wasn’t unprecedented. In 2005 and 2010 the polls performed comparatively well, but going back further there has often been an error in Labour’s favour, particularly since 1983. Last year’s error was the largest since 1992, but was not that different from the error in 1997 or 2001. The reason it was seen as so much worse was twofold – first, it meant the story was wrong (the polls suggested Labour would be the largest party, when actually there was a Tory majority, in 1997 and 2001 the only question was scale of the Labour landslide), second in 2015 all the main polls were wrong – in years like 1997 and 2001 there was a substantial average error in the polls, but some companies managed to get the result right, so it looked like a failure of particular pollsters rather than the industry as a whole.
Not everything was wrong: small parties were right, but Scotland wasn’t
There’s a difference between getting a poll right, and being seen to get a poll right. All the pre-election polls were actually pretty accurate for the Lib Dems, Greens and UKIP (and UKIP was seen as the big challenge!) it was seen as a disaster because they got the big two parties wrong, and therefore they got the story wrong. It’s the latter bit that’s important – in Scotland there was also a polling error (the SNP were understated, Labour overstated) but it was largely unremarked because it was a landslide. As the report says, “underestimating the size of a landslide is considerably less problematic than getting the result of an election wrong”
There was minimal late swing, if any
Obviously it is possible for people to change their minds in those 24 hours between the final poll fieldwork and the actual vote. People really can tell a pollster they’ll vote party A on Wednesday, but chicken out and vote party B on Thursday. The Scottish referendum was probably an example of genuine late swing – YouGov recontacted the same people they interviewed in their final pre-referendum poll on polling day itself, and found a small net swing towards NO. However, when pollsters get it wrong and blame late swing it does always sound a bit like a lame excuse “Oh, it was right when we did it, people must have changed their minds”.
To conclude there was late swing I’d want to see some pretty conclusive evidence. The inquiry team looked, but didn’t find any. Changes from the penultimate to final polls suggested any ongoing movement was towards Labour, not the Conservatives. A weighted average of re-contact surveys found change of only 0.6% from Lab to Con (and that was including some re-contacts from late campaign surveys, rather than final call surveys. Including only re-contact of final call surveys the average movement was towards Labour)
There probably weren’t any Shy Tories
“Shy Tories” is the theory that people who were not natural Tories were reluctant to admit to interviewers (or perhaps even to themselves!) that they were going to vote Conservative. If people had lied during the election campaign but admitted it afterwards, this would have shown up as late swing and it did not. This leaves the possibility that people lied before the election and consistently lied afterwards as well. This is obviously very difficult to test conclusively, but the inquiry team don’t believe the circumstantial evidence supports it. Not least, if there was a problem with shy Tories we could reasonably have expected polls conducted online without a human interviewer to have shown a higher Tory vote – they did not.
Turnout models weren’t that good, but it didn’t cause the error
Most pollsters modelled turnout using a simple method of asking people how likely they were to vote on a 0-10 scale. The inquiry team tested this by looking at whether people in re-contact surveys reported actually voting. For most pollsters this didn’t work out that well, however, it it was not the cause of the error – the inquiry team re-ran the data replacing pre-election likelihood to vote estimates with whether people reported actually voting after the election and they were just as wrong. As the inquiry team put it – if pollsters had known in advance which respondents would and would not vote, they would not have been any more accurate.
Differential turnout – that Labour voters were more likely to say they were going to vote and then fail to do so – was also dismissed as a factor. Voter validation tests (checking poll respondents against the actual marked register) did not suggest Labour voters were any more likely to lie about voting than Tory voters.
Note that in this sense turnout is about the difference between people *saying* they’ll vote (and pollsters estimates of if they’ll vote) and whether they actually do. That didn’t cause the polling error. However, the polling error could still have been caused by samples containing people who are too likely to vote, something that is an issue of turnout but which comes under the heading of sampling. It’s the difference between having young non-voters in your samples and them claiming they’ll vote when they won’t, and not having them in your sample to begin with.
Lots of other things that people have suggested were factors, weren’t factors
The inquiry put to bed various other theories too – postal votes were not the problem (samples contained the correct proportion of them), excluding overseas voters was not the problem (there are only 0.2% of the electorate), voter registration was not the problem (in the way it showed up it would have been functionally identical to misreporting of turnout – people who told pollsters they were going to vote, but did not – for the narrow purpose of polling error it doesn’t matter why they didn’t vote).
The main cause of the error was unrepresentative samples
The reason the polls got it wrong in 2015 was the sampling. The BPC inquiry team reached this conclusion to begin with by using the Sherlock Holmes method – eliminating all the other possibilities, leaving just one which must be true. However they also had positive evidence to back up the conclusion – the first is the comparison with the random probability surveys conducted by the BES and BSA later in the year, where past recall more closely resembled the actual election result, the second are some observable shortcomings within the samples. The age distribution within bands was off, the geographical distribution of the vote was wrong (polls underestimated Tory support more in the South East and East). Most importantly in my view, polling samples contained far too many people who vote, particularly among younger people – presumably because they contain people too engaged and interested in politics. Note that these aren’t necessarily the specific sample errors that caused the error: the BPC team cited them as evidence that sampling was off, not as the direct causes.
In the final polls there was no difference between telephone and online surveys
Looking at the final polls there was no difference at all between telephone and online surveys. The average Labour lead in the final polls was 0.2% in phone polls, and 0.2% in online polls. The average error compared to the final result was 1.6% for phone polls and 1.6% for online polls.
However, at points during the 2010-2015 Parliament there were differences between the modes. In the early part of the Parliament online polls were more favourable towards the Conservatives, for a large middle part of the Parliament phone polls were more favourable, during 2014 the gap disappeared entirely, phone polls started being more favourable towards the Tories during the election campaign, but came bang into line for the final polls. The inquiry suggest that could be herding, but that there is no strong reason to expect mode effects to be stable over time anyway – “mode effects arise from the interaction of the political environment with the various errors to which polling methods are prone. The magnitude and direction of these mode effects in the middle of the election cycle may be quite different to those that are evident in the final days of the campaign.”
The inquiry couldn’t rule out herding, but it doesn’t seem to have caused the error
That brings us to herding – the final polls were close to each other. To some observers they looked suspiciously close. Some degree of convergence is to be expected in the run to the election, many pollsters increased their sample sizes for their final polls so the variance between figures should be expected to fall. However, even allowing for that polls were still closer than would have been expected. Several pollsters made changes to their methods during the campaign and these did explain some of the convergence. It’s worth noting that all the changes increased the Conservative lead – that is, they made the polls *more* accurate, not less accurate.
The inquiry team also tested to see what the result would have been if every pollster had used the same method. That is, if you think pollsters had deliberately chosen methodological adjustments that made their polls closer to each other, what if you strip out all those individual adjustments? Using the same method across the board the results would have ranged from a four point Labour lead to a two point Tory lead. Polls would have been more variable… but every bit as wrong.
How the pollsters should improve their methods
Dealing with the main crux of the problem, unrepresentative samples, the inquiry have recommended that pollsters take action to improve how representative their samples are within their current criteria, and to investigate potential new quotas and weights that correlate with the sort of people who are under-represented in polls, and with voting intention. They are not prescriptive as to what the changes might be – on the first point they float possibilities about longer fieldwork and more callbacks in phone polls, and more incentives for under-represented groups in online polls. For potential new weighting variables they don’t suggest much at all, worrying that if such variables existed pollsters would already be using them, but we shall see what changes pollsters end up making to their sampling to address these recommendations.
The inquiry also makes some recommendations about turnout, don’t knows and asking if people have voted by post already. These seem perfectly sensible recommendations in themselves (especially asking if people have already voted by post, which several pollsters already do anyway), but given none of these things contributed to the error in 2015 they are more improvements for the future than addressing the failures of 2015.
And how the BPC should improve transparency
If the recommendations for the pollsters are pretty vague, the recommendations to the BPC are more specific, and mostly to do with transparency. Pollsters who are members of the BPC are already supposed to be open about methods, but the inquiry suggest they change the rules to make this more explicit – pollsters should give the exact variables and targets they weight to, and flag up any changes they make to their methods (the BPC are adopting these changes forthwith). They also make recommendations about registering polls and providing microdata to help any future inquiries, and for changes in how confidence margins are reported in polls. The BPC are looking at exactly how to do that in due course, but I think I’m rather less optimistic than the inquiry team about the difference it will make. The report says “Responsible media commentators would be much less inclined, however, to report a change in party support on the basis of one poll which shows no evidence of statistically significant change.” Personally I think *responsible* media commentators are already quite careful about how they report polls, the problem is that not all media commentators are responsible…
There’s no silver bullet
The inquiry team don’t make recommendations for specific changes that would have corrected the problems and don’t pretend there is an easy solution. Indeed, they point out that even the hugely expensive “gold standard” BES random probability surveys still managed to get the Conservatives and UKIP shares of the vote outside of the margin of error. They do think there are improvements that can be made though – and hopefully there are (hopefully the changes that some pollsters have already introduced are improving matters already). They also say it would be good if stakeholders were more realistic about the limits of polling, of how accurately it is really possible to measure people’s opinions.
Polling accuracy shouldn’t be black and white. It shouldn’t be a choice between “polls are the gospel truth” and “polls are worthless, ignore them all”. Polls are a tool, with advantages and limitations. There are limits on how well we can model and measure the views of a complex and mobile society, but that should be a reason for caveats and caution, not a reason to give up. As I wrote last year despite the many difficulties there are in getting a representative sample of the British public, I still think those difficulties are surmountable, and that ultimately, it’s still worth trying to find out and quantify what the public think.