In January the BPC inquiry team announced their initial findings on what went wrong in the general election polls. Today they have published their full final report. The overall conclusions haven’t changed, we’ve just got a lot more detail. For a report about polling methodology written by a bunch of academics it’s very readable, so I’d encourage you to read the whole thing, but if you’re not in the mood for a 120 page document about polling methods then my summary is below:

Polls getting it wrong isn’t new

The error in the polls last year was worse than in many previous years, but wasn’t unprecedented. In 2005 and 2010 the polls performed comparatively well, but going back further there has often been an error in Labour’s favour, particularly since 1983. Last year’s error was the largest since 1992, but was not that different from the error in 1997 or 2001. The reason it was seen as so much worse was twofold – first, it meant the story was wrong (the polls suggested Labour would be the largest party, when actually there was a Tory majority, in 1997 and 2001 the only question was scale of the Labour landslide), second in 2015 all the main polls were wrong – in years like 1997 and 2001 there was a substantial average error in the polls, but some companies managed to get the result right, so it looked like a failure of particular pollsters rather than the industry as a whole.

Not everything was wrong: small parties were right, but Scotland wasn’t

There’s a difference between getting a poll right, and being seen to get a poll right. All the pre-election polls were actually pretty accurate for the Lib Dems, Greens and UKIP (and UKIP was seen as the big challenge!) it was seen as a disaster because they got the big two parties wrong, and therefore they got the story wrong. It’s the latter bit that’s important – in Scotland there was also a polling error (the SNP were understated, Labour overstated) but it was largely unremarked because it was a landslide. As the report says, “underestimating the size of a landslide is considerably less problematic than getting the result of an election wrong”

There was minimal late swing, if any

Obviously it is possible for people to change their minds in those 24 hours between the final poll fieldwork and the actual vote. People really can tell a pollster they’ll vote party A on Wednesday, but chicken out and vote party B on Thursday. The Scottish referendum was probably an example of genuine late swing – YouGov recontacted the same people they interviewed in their final pre-referendum poll on polling day itself, and found a small net swing towards NO. However, when pollsters get it wrong and blame late swing it does always sound a bit like a lame excuse “Oh, it was right when we did it, people must have changed their minds”.

To conclude there was late swing I’d want to see some pretty conclusive evidence. The inquiry team looked, but didn’t find any. Changes from the penultimate to final polls suggested any ongoing movement was towards Labour, not the Conservatives. A weighted average of re-contact surveys found change of only 0.6% from Lab to Con (and that was including some re-contacts from late campaign surveys, rather than final call surveys. Including only re-contact of final call surveys the average movement was towards Labour)

There probably weren’t any Shy Tories

“Shy Tories” is the theory that people who were not natural Tories were reluctant to admit to interviewers (or perhaps even to themselves!) that they were going to vote Conservative. If people had lied during the election campaign but admitted it afterwards, this would have shown up as late swing and it did not. This leaves the possibility that people lied before the election and consistently lied afterwards as well. This is obviously very difficult to test conclusively, but the inquiry team don’t believe the circumstantial evidence supports it. Not least, if there was a problem with shy Tories we could reasonably have expected polls conducted online without a human interviewer to have shown a higher Tory vote – they did not.

Turnout models weren’t that good, but it didn’t cause the error

Most pollsters modelled turnout using a simple method of asking people how likely they were to vote on a 0-10 scale. The inquiry team tested this by looking at whether people in re-contact surveys reported actually voting. For most pollsters this didn’t work out that well, however, it it was not the cause of the error – the inquiry team re-ran the data replacing pre-election likelihood to vote estimates with whether people reported actually voting after the election and they were just as wrong. As the inquiry team put it – if pollsters had known in advance which respondents would and would not vote, they would not have been any more accurate.

Differential turnout – that Labour voters were more likely to say they were going to vote and then fail to do so – was also dismissed as a factor. Voter validation tests (checking poll respondents against the actual marked register) did not suggest Labour voters were any more likely to lie about voting than Tory voters.

Note that in this sense turnout is about the difference between people *saying* they’ll vote (and pollsters estimates of if they’ll vote) and whether they actually do. That didn’t cause the polling error. However, the polling error could still have been caused by samples containing people who are too likely to vote, something that is an issue of turnout but which comes under the heading of sampling. It’s the difference between having young non-voters in your samples and them claiming they’ll vote when they won’t, and not having them in your sample to begin with.

Lots of other things that people have suggested were factors, weren’t factors

The inquiry put to bed various other theories too – postal votes were not the problem (samples contained the correct proportion of them), excluding overseas voters was not the problem (there are only 0.2% of the electorate), voter registration was not the problem (in the way it showed up it would have been functionally identical to misreporting of turnout – people who told pollsters they were going to vote, but did not – for the narrow purpose of polling error it doesn’t matter why they didn’t vote).

The main cause of the error was unrepresentative samples

The reason the polls got it wrong in 2015 was the sampling. The BPC inquiry team reached this conclusion to begin with by using the Sherlock Holmes method – eliminating all the other possibilities, leaving just one which must be true. However they also had positive evidence to back up the conclusion – the first is the comparison with the random probability surveys conducted by the BES and BSA later in the year, where past recall more closely resembled the actual election result, the second are some observable shortcomings within the samples. The age distribution within bands was off, the geographical distribution of the vote was wrong (polls underestimated Tory support more in the South East and East). Most importantly in my view, polling samples contained far too many people who vote, particularly among younger people – presumably because they contain people too engaged and interested in politics. Note that these aren’t necessarily the specific sample errors that caused the error: the BPC team cited them as evidence that sampling was off, not as the direct causes.

In the final polls there was no difference between telephone and online surveys

Looking at the final polls there was no difference at all between telephone and online surveys. The average Labour lead in the final polls was 0.2% in phone polls, and 0.2% in online polls. The average error compared to the final result was 1.6% for phone polls and 1.6% for online polls.

However, at points during the 2010-2015 Parliament there were differences between the modes. In the early part of the Parliament online polls were more favourable towards the Conservatives, for a large middle part of the Parliament phone polls were more favourable, during 2014 the gap disappeared entirely, phone polls started being more favourable towards the Tories during the election campaign, but came bang into line for the final polls. The inquiry suggest that could be herding, but that there is no strong reason to expect mode effects to be stable over time anyway – “mode effects arise from the interaction of the political environment with the various errors to which polling methods are prone. The magnitude and direction of these mode effects in the middle of the election cycle may be quite different to those that are evident in the final days of the campaign.”

The inquiry couldn’t rule out herding, but it doesn’t seem to have caused the error

That brings us to herding – the final polls were close to each other. To some observers they looked suspiciously close. Some degree of convergence is to be expected in the run to the election, many pollsters increased their sample sizes for their final polls so the variance between figures should be expected to fall. However, even allowing for that polls were still closer than would have been expected. Several pollsters made changes to their methods during the campaign and these did explain some of the convergence. It’s worth noting that all the changes increased the Conservative lead – that is, they made the polls *more* accurate, not less accurate.

The inquiry team also tested to see what the result would have been if every pollster had used the same method. That is, if you think pollsters had deliberately chosen methodological adjustments that made their polls closer to each other, what if you strip out all those individual adjustments? Using the same method across the board the results would have ranged from a four point Labour lead to a two point Tory lead. Polls would have been more variable… but every bit as wrong.

How the pollsters should improve their methods

Dealing with the main crux of the problem, unrepresentative samples, the inquiry have recommended that pollsters take action to improve how representative their samples are within their current criteria, and to investigate potential new quotas and weights that correlate with the sort of people who are under-represented in polls, and with voting intention. They are not prescriptive as to what the changes might be – on the first point they float possibilities about longer fieldwork and more callbacks in phone polls, and more incentives for under-represented groups in online polls. For potential new weighting variables they don’t suggest much at all, worrying that if such variables existed pollsters would already be using them, but we shall see what changes pollsters end up making to their sampling to address these recommendations.

The inquiry also makes some recommendations about turnout, don’t knows and asking if people have voted by post already. These seem perfectly sensible recommendations in themselves (especially asking if people have already voted by post, which several pollsters already do anyway), but given none of these things contributed to the error in 2015 they are more improvements for the future than addressing the failures of 2015.

And how the BPC should improve transparency

If the recommendations for the pollsters are pretty vague, the recommendations to the BPC are more specific, and mostly to do with transparency. Pollsters who are members of the BPC are already supposed to be open about methods, but the inquiry suggest they change the rules to make this more explicit – pollsters should give the exact variables and targets they weight to, and flag up any changes they make to their methods (the BPC are adopting these changes forthwith). They also make recommendations about registering polls and providing microdata to help any future inquiries, and for changes in how confidence margins are reported in polls. The BPC are looking at exactly how to do that in due course, but I think I’m rather less optimistic than the inquiry team about the difference it will make. The report says “Responsible media commentators would be much less inclined, however, to report a change in party support on the basis of one poll which shows no evidence of statistically significant change.” Personally I think *responsible* media commentators are already quite careful about how they report polls, the problem is that not all media commentators are responsible…

There’s no silver bullet

The inquiry team don’t make recommendations for specific changes that would have corrected the problems and don’t pretend there is an easy solution. Indeed, they point out that even the hugely expensive “gold standard” BES random probability surveys still managed to get the Conservatives and UKIP shares of the vote outside of the margin of error. They do think there are improvements that can be made though – and hopefully there are (hopefully the changes that some pollsters have already introduced are improving matters already). They also say it would be good if stakeholders were more realistic about the limits of polling, of how accurately it is really possible to measure people’s opinions.

Polling accuracy shouldn’t be black and white. It shouldn’t be a choice between “polls are the gospel truth” and “polls are worthless, ignore them all”. Polls are a tool, with advantages and limitations. There are limits on how well we can model and measure the views of a complex and mobile society, but that should be a reason for caveats and caution, not a reason to give up. As I wrote last year despite the many difficulties there are in getting a representative sample of the British public, I still think those difficulties are surmountable, and that ultimately, it’s still worth trying to find out and quantify what the public think.


While the gap between online and telephone polls on the EU referendum has narrowed of late, it is still there, and Populus have put out an interesting paper looking at possible explanations and written by James Kanagasooriam of Populus and Matt Singh of Number Cruncher Politics. The full paper is here.

Matt and James essentially suggest three broad reasons. The first thing is don’t knows. Most telephone polls don’t prompt people with the option of saying don’t know, but respondents are free to volunteer it. In contrast in online polls people can only pick from the options that are presented on the screen, so don’t know has to be presented up front as an option (Personally, I have a suspicion that there’s a mode effect as well as a prompting effect on don’t knows. When there is a human interviewer people may feel a certain social pressure to give an answer – saying don’t know feels somehow unhelpful).

Populus tested this in two parallel surveys, one online, one phone, both split. The phone survey was split between prompting people just with the options of Remain or Leave, or explicitly including don’t know as an option in the prompt. The online survey had a split offering don’t know as an option, and a split with the don’t know option hidden away in smaller font at the bottom of the page (a neat idea to try and simulate not explicitly prompting for an option in an online survey).

  • The phone test had a Remain lead of 11 points without a don’t know option (the way phone polls normally ask), but with an explicit don’t know it would have shown only a 3 point Remain lead. Prompting for don’t knows made a difference of eight points in the lead.
  • The online survey had a Leave lead of six points with a don’t know prompt (the way they normally ask), but with the don’t know option hidden down the page it had only a one point Leave lead. Making the don’t know prompt less prominent made a difference of six points in the lead.

The impact here is actually quite chunky, accounting for a fair amount of the difference. Comparing recent phone and online polls the gap is about seven or so points, so if you looked just at the phone experiment here the difference in don’t knows could in theory account for the whole lot! I don’t think that is the case though: things are rarely so simple, earlier this year there was a much bigger gap and I suspect there are probably also some issues to do with sampling make up and interviewer effect in the actual answers. In the Populus paper they assume it makes up about a third of a gap of fifteen points between phone and online, obviously that total gap is smaller now.

The second thing Populus looked at was attitudinal differences between online and phone samples. The examples looked at here are attitudes towards gender equality, racial equality and national identity. Essentially, people give answers that are more socially liberal in telephone polls than they did in online polls. This is not a new finding – plenty of papers in the past have found these sort of differences between telephone and online polling, but because attitudinal questions are not directly tested in general elections these are never compared against reality and it is impossible to be certain which are “right”. Neither can we really be confident how much of the difference is down to different types of people being reached by the two approaches, and interviewer effects (are people more comfortable admitting views that may be seen as racist or sexist to a computer screen than to a human interviewer?). It’s probably a mixture of both. What’s important is that how socially liberal people were on these scales correlated with how pro-or-anti EU they were, so to whatever extent there is a difference in sample make-up rather than interviewer effect, it explains another couple of points difference between EU referendum voting intention in telephone and online polls. The questions that Populus asked had also been used in the face-to-face BES survey: the answers there were in the middle – more socially liberal than online polls, less socially liberal that phone polls. Of course, if there are interviewer effects at play here, face-to-face polling also has a human interviewer.

Populus think these two factors explain most of the difference, but are left with a gap of about 3 points that they can’t readily explain. They float the idea that this could be because online samples have more partisan people who vote down the line (so, for example, online samples have fewer of those odd “UKIP for Remain” voters), when in reality people are more often rather contradictory and random. It’s a interesting possibility, and chimes with my own views about polls containing people who are too politically aware, too partisan. The impact of YouGov adopting sampling and weighting by attention paid to politics last month was mostly to increase don’t knows on questions, but when we were doing testing it before rollout it did increase the position of remain relative to leave on the EU question, normally by two or three points, so that would chime with Populus’s theory.

According to Populus, therefore, the gap comes down partially to don’t know, partially towards the different attitudinal make-up and a final chunk because they think online samples are more partisan. Their estimate is that the reality will be somewhere inbetween the results being shown by online and telephone, a little closer towards telephone. We shall see.

(A footnote for the just the really geeky among you who have paid close attention to the BPC inquiry and the BES team’s posts on the polling error, but is probably too technical for most readers. When comparing the questions on race and gender Populus also broke down the answers in the BES face-to-face survey by how many contacts it took to interview them. This is something the BES team and the BPC inquiry team also did when investigating the polling error last May. The inquiries looking at the election polls found that if you took just those people the BES managed to interview on their first or second go the make up of the sample was similar to that from phone polls, and was too Labour, but people who were trickier to reach were more Conservative. Hence they took “easy for face-to-face interviewers to reach” as a sort of proxy for “people likely to be included in a poll”. In this study Populus did the same for the social liberal questions and it didn’t work the same way: phone polls were much more liberal than the BES f2f poll, but the easy to reach people in the BES f2f poll were the most conservative and the hard to reach the most liberal, so “easy to reach f2f” didn’t resemble the telephone sample at all. Populus theorise that this is a mobile sampling issue, but I think it raises some deeper questions about the assumptions we’ve made about what difficulty of contacting in the BES f2f sample can teach us about other samples. I’ve never seen any logical justification as to why people who it takes multiple attempts to reach face-to-face will necessarily be the same group that it’s hard to reach online – they could easily be two completely different groups. Perhaps “takes multiple tries to reach face-to-face” is not a suitable proxy for the sort of people phone polls can’t reach either…)


-->

Ipsos MORI have released the EU referendum figures from their monthly political monitor. Topline figures are REMAIN 49%, LEAVE 41%, DK/WNV 10%. Full details are here

There are quite a few differences in how MORI asked the question this month. Up until now they’ve been asking the referendum question using a split sample, with half the sample getting their long term tracker on if Britain should leave the EU, and half getting the actual referendum question, without any squeeze question or similar. This month they’ve switched onto a referendum footing – the only question is now the referendum question mentioning the date, and there’s a squeeze question to people who say they don’t know yet. This means that, while these figures are considerably tighter than MORI’s previous polling (last month they gave REMAIN an 18 point lead) we can’t tell to what extent there’s been a shift in opinion, and to what extent it’s down to asking the question differently.

MORI also asked how likely people were to vote in the referendum. At present they are not factoring this into their topline figures and are still looking into the best way to do it, but if they used the same approach as they do with their general election polling it would have reduced the REMAIN lead to just two points.

It’s worth noting that the big gulf between telephone and online polls on the EU referendum has narrowed significantly. In December and January the average REMAIN lead in telephone poll was twenty points, the average lead in online polls was zero; a towering gulf between the two modes. Polls this month have averaged a 2 point REMAIN lead in online polls, a 6 point REMAIN lead in phone polls. Even excluding the ORB phone poll that seemed completely out of line with all other telephone polls, the average of ComRes, MORI and Survation was 9 points. There’s still a significant contrast between online and phone polls on the topic… but a gap of seven points is far, far less of a gulf than a gap of twenty points!

UPDATE: I’ve corrected the original post – MORI are NOT prompting for “Undecided”, it’s still something respondents have to volunteer themselves. The increase in don’t knows is suddenly not so easily explained. Perhaps it’s the effect of mentioning that the referendum isn’t until June that’s making people more willing to say they haven’t decided yet.


Following the MORI poll earlier today, there is also a fresh ComRes voting intention poll and a new Survation EU referendum poll.

ComRes for the Daily Mail is in line with what we’ve seen already in the YouGov, ICM and MORI polls – the Conservative lead has collapsed. Topline figures are CON 37%(-1), LAB 35%(+4), LDEM 7%(-1), UKIP 9%(-3). The poll was conducted Friday to Sunday, at the same time as IDS’s resignation. Tabs are here

Meanwhile a new Survation EU referendum poll has topline figures of REMAIN 46%(-2), LEAVE 35%(+2), DON’T KNOW 19%(nc). Fieldwork was again at the end of last week (so before the Belgium bombings) and changes are since February. The poll was conducted by telephone, so in this case the robust Remain lead in telephone polls remains mostly undiminished. Full tabs for that are here


Ipsos MORI’s monthly poll for the Evening Standard follows the trend we’ve seen in other recent polls of a tightening gap between Conservative and Labour. Topline figures are CON 36%(-3), LAB 34%(+1), LDEM 10%, UKIP 11%, GRN 3%.

They also echo YouGov’s recent polling in showing strongly negative figures for George Osborne. Just after IDS’s resignation YouGov found Osborne’s ratings dropping to 17% good job, 58% bad job. MORI find very similar in their poll: before the budget they had Osborne’s net approval rating at minus 6, now it has slumped to minus 33 (27% satisfied, 60% disatisfied). The budget gets a solid thumbs down in the MORI poll, 53% think it is bad for the country, 30% good for the country.

There was also new ComRes EU referendum poll yesterday, conducted for ITV. Topline voting intention figures were REMAIN 48%, LEAVE 41%, Don’t know 11%. The seven point lead for remain is the smallest ComRes have so far shown in their telephone polls on the referendum (indeed, apart from the unusual ORB poll earlier this month it’s the lowest lead any telephone poll has shown for Remain). Full details are here. It will be interesting to see what the EU voting intention figures are in the MORI poll, and whether that big gulf between online and telephone EU polling is narrowing a little.