While the gap between online and telephone polls on the EU referendum has narrowed of late, it is still there, and Populus have put out an interesting paper looking at possible explanations and written by James Kanagasooriam of Populus and Matt Singh of Number Cruncher Politics. The full paper is here.

Matt and James essentially suggest three broad reasons. The first thing is don’t knows. Most telephone polls don’t prompt people with the option of saying don’t know, but respondents are free to volunteer it. In contrast in online polls people can only pick from the options that are presented on the screen, so don’t know has to be presented up front as an option (Personally, I have a suspicion that there’s a mode effect as well as a prompting effect on don’t knows. When there is a human interviewer people may feel a certain social pressure to give an answer – saying don’t know feels somehow unhelpful).

Populus tested this in two parallel surveys, one online, one phone, both split. The phone survey was split between prompting people just with the options of Remain or Leave, or explicitly including don’t know as an option in the prompt. The online survey had a split offering don’t know as an option, and a split with the don’t know option hidden away in smaller font at the bottom of the page (a neat idea to try and simulate not explicitly prompting for an option in an online survey).

  • The phone test had a Remain lead of 11 points without a don’t know option (the way phone polls normally ask), but with an explicit don’t know it would have shown only a 3 point Remain lead. Prompting for don’t knows made a difference of eight points in the lead.
  • The online survey had a Leave lead of six points with a don’t know prompt (the way they normally ask), but with the don’t know option hidden down the page it had only a one point Leave lead. Making the don’t know prompt less prominent made a difference of six points in the lead.

The impact here is actually quite chunky, accounting for a fair amount of the difference. Comparing recent phone and online polls the gap is about seven or so points, so if you looked just at the phone experiment here the difference in don’t knows could in theory account for the whole lot! I don’t think that is the case though: things are rarely so simple, earlier this year there was a much bigger gap and I suspect there are probably also some issues to do with sampling make up and interviewer effect in the actual answers. In the Populus paper they assume it makes up about a third of a gap of fifteen points between phone and online, obviously that total gap is smaller now.

The second thing Populus looked at was attitudinal differences between online and phone samples. The examples looked at here are attitudes towards gender equality, racial equality and national identity. Essentially, people give answers that are more socially liberal in telephone polls than they did in online polls. This is not a new finding – plenty of papers in the past have found these sort of differences between telephone and online polling, but because attitudinal questions are not directly tested in general elections these are never compared against reality and it is impossible to be certain which are “right”. Neither can we really be confident how much of the difference is down to different types of people being reached by the two approaches, and interviewer effects (are people more comfortable admitting views that may be seen as racist or sexist to a computer screen than to a human interviewer?). It’s probably a mixture of both. What’s important is that how socially liberal people were on these scales correlated with how pro-or-anti EU they were, so to whatever extent there is a difference in sample make-up rather than interviewer effect, it explains another couple of points difference between EU referendum voting intention in telephone and online polls. The questions that Populus asked had also been used in the face-to-face BES survey: the answers there were in the middle – more socially liberal than online polls, less socially liberal that phone polls. Of course, if there are interviewer effects at play here, face-to-face polling also has a human interviewer.

Populus think these two factors explain most of the difference, but are left with a gap of about 3 points that they can’t readily explain. They float the idea that this could be because online samples have more partisan people who vote down the line (so, for example, online samples have fewer of those odd “UKIP for Remain” voters), when in reality people are more often rather contradictory and random. It’s a interesting possibility, and chimes with my own views about polls containing people who are too politically aware, too partisan. The impact of YouGov adopting sampling and weighting by attention paid to politics last month was mostly to increase don’t knows on questions, but when we were doing testing it before rollout it did increase the position of remain relative to leave on the EU question, normally by two or three points, so that would chime with Populus’s theory.

According to Populus, therefore, the gap comes down partially to don’t know, partially towards the different attitudinal make-up and a final chunk because they think online samples are more partisan. Their estimate is that the reality will be somewhere inbetween the results being shown by online and telephone, a little closer towards telephone. We shall see.

(A footnote for the just the really geeky among you who have paid close attention to the BPC inquiry and the BES team’s posts on the polling error, but is probably too technical for most readers. When comparing the questions on race and gender Populus also broke down the answers in the BES face-to-face survey by how many contacts it took to interview them. This is something the BES team and the BPC inquiry team also did when investigating the polling error last May. The inquiries looking at the election polls found that if you took just those people the BES managed to interview on their first or second go the make up of the sample was similar to that from phone polls, and was too Labour, but people who were trickier to reach were more Conservative. Hence they took “easy for face-to-face interviewers to reach” as a sort of proxy for “people likely to be included in a poll”. In this study Populus did the same for the social liberal questions and it didn’t work the same way: phone polls were much more liberal than the BES f2f poll, but the easy to reach people in the BES f2f poll were the most conservative and the hard to reach the most liberal, so “easy to reach f2f” didn’t resemble the telephone sample at all. Populus theorise that this is a mobile sampling issue, but I think it raises some deeper questions about the assumptions we’ve made about what difficulty of contacting in the BES f2f sample can teach us about other samples. I’ve never seen any logical justification as to why people who it takes multiple attempts to reach face-to-face will necessarily be the same group that it’s hard to reach online – they could easily be two completely different groups. Perhaps “takes multiple tries to reach face-to-face” is not a suitable proxy for the sort of people phone polls can’t reach either…)


Last year the election polls got it wrong. Since then most pollsters have made only minor interim changes – ComRes, BMG and YouGov have conducted the biggest overhauls, many others have made only tweaks, and all the companies have said they are continuing to look at further potential changes in the light of the polling review. In light of that I’ve seen many people assume that until changes are complete many polls probably still overestimate Labour support. While on the face of it that makes sense, I’m not sure it’s true.

The reason the polls were wrong in 2015 seems to be the samples were wrong. That’s sometimes crudely described as samples including too many Labour voters and too few Conservative voters. This is correct in one sense, but is perhaps describing the symptom rather than the cause. The truth is, as ever, rather more complicated. Since the polls got it wrong back in 1992 almost all the pollsters have weighted their samples politically (using how people voted at the last election) to try and ensure they don’t contain too many Labour people or too few Conservative people. Up until 2015 this broadly worked.

The pre-election polls were weighted to contain the correct number of people who voted Labour in 2010 and voted Conservative in 2010. The 2015 polls accurately reflected the political make up of Britain in terms how people voted at the previous election, what it got wrong it how they voted at the forthcoming election. Logically, therefore, what the polls got wrong was not the people who stuck with the same party, but the proportions of people who changed their vote between the 2010 and 2015 elections. There were too many people who said they’d vote Labour in 2015 but didn’t in 2010, too many people who voted Tory in 2010 but said they wouldn’t in 2015, and so on.

The reason for this is up for debate. My view is that it’s due to poll samples containing people who are too interested in politics, other evidence has suggested it is people who are too easy to reach (these two explanations could easily be the same thing!). The point of this post isn’t to have that debate, it’s to ask what it tells us about how accurate the polls are now.

The day after an election how you voted at the previous election is an extremely strong predictor of how you’d vote in an election the next day. If you voted Conservative on Thursday, you’d probably do so again on Friday given the chance. Over time events happen and people change their minds and their voting intention; how you voted last time becomes a weaker and weaker predictor. You also get five years of deaths and five years of new voters entering the electorate, who may or may not vote.

Political weighting is the reason why the polls in Summer 2015 all suddenly showed solid Conservative leads when the same polls had shown the parties neck-and-neck a few months earlier, it was just the switch to weighting to May 2015 recalled vote**. In the last Parliament, polls were probably also pretty much right early in the Parliament when people’s 2010 vote correlated well with their current support, but as the Lib Dems collapsed and UKIP rose, scattering and taking support from different parties and in different proportions polls must have gradually become less accurate, ending with the faulty polls of May 2015.

What does it tell us about the polls now? Well, it means while many polling companies haven’t made huge changes since the election yet, current polls are probably pretty accurate in terms of party support, simply because it is early in the Parliament and party support does not appear to have changed vastly since the election. At this point in time, weighting samples by how people voted in 2015 will probably be enough to produce samples that are pretty representative of the British public.

Equally, it doesn’t automatically follow that we will see the Conservative party surge into a bigger lead as polling companies do make changes, though it does largely depend on the approach different pollsters take (methodology changes to sampling may not make much difference until there are changes in party support, methodology changes to turnout filters or weighting may make a more immediate change).

Hopefully it means that polls will be broadly accurate for the party political elections in May, the Scottish Parliament, Welsh Assembly and London Mayoral elections (people obviously can and do vote differently in those elections to Westminster elections, but there will be a strong correlation to how they voted just a year before). The EU referendum is more of a challenge given it doesn’t correlate so closely to general election voting and will rely upon how well pollsters’ samples represent the British electorate. As the Parliament rolls on, we will obviously have to hope that the changes the pollsters do end up making keep polls accurate all the way through.

(**The only company that doesn’t weight politically is Ipsos MORI. Quite how MORI’s polls shifted from neck-and-neck in May 2015 to Tory leads afterwards I do not know. They have made only a relatively minor methodological change in their turnout filter. Looking at the data tables, it appears to be something to do with the sampling – ICM, ComRes and MORI all sample by dialing random telephone numbers, but the raw data they get before weighting it is strikingly different. Looking at the average across the last six surveys the raw samples that ComRes and ICM get before they weight their data has an equal number of people saying they voted Labour in 2015 and saying they voted Tory in 2015. MORI’s raw data has four percent more people saying they’d voted Conservative than saying they’d voted Labour, so a much less skewed raw sample. Perhaps MORI have done something clever with their quotas or their script, but it’s clearly working.)


-->

Today the polling inquiry under Pat Sturgis’ presented its initial findings on what caused the polling error. Pat himself, Jouni Kuha and Ben Lauderdale all went through their findings at a meeting at the Royal Statistical Society – the full presentation is up here. As we saw in the overnight press release the main finding was that unrepresentative samples were to blame, but today’s meeting put some meat on those bones. Just to be clear, when the team said unrepresentative samples they didn’t just mean the sampling part of the process, they meant the samples pollsters end up with as a result of their sampling AND their weighting: it’s all interconnected. With that out the way, here’s what they said.

Things that did NOT go wrong

The team started by quickly going through some areas that they have ruled out as significant contributors to the error. Any of these could, of course, have had some minor impact, but if they did it was only minor. The team investigated and dismissed postal votes, falling voter registration, overseas voters and question wording/ordering as causes of the error.

They also dismissed some issues that had been more seriously suggested – the first was differential turnout reporting (i.e, Labour people overestimating their likelihood to vote more than Conservative people), in vote validation studies the inquiry team did not found evidence to support this, suggesting if it was an issue it was too small to be important. The second was the mode effect – ultimately whether a survey was done online or by telephone made no difference to its final accuracy. This finding met with some surprise from the audience, given there were more phone polls showing Tory leads than online ones. Ben Lauderdale of the inquiry team suggested that was probably because phone polls had smaller sample sizes and hence more volatility, hence spat out more unusual results… but that the average lead in online polls and average lead in telephone polls were not that different, especially in the final polls.

On late swing the inquiry said the evidence was contradictory. Six companies had conducted re-contact survey, going back to people who had completed pre-election surveys to see how they actually voted. Some showed movement, some did not, but on average they showed a movement of only 0.6% to the Tories between the final polls and the result, so can only have made a minor contribution at most. People deliberately misreporting their voting intention to pollsters was also dismissed – as Pat Sturgis put it, if those people had told the truth after the election it would have shown up as late swing (but did not), if they had kept on lying it should have affected the exit poll, BES and BSA as well (it did not).

Unrepresentative Samples

With all those things ruled out as major contributors to the poll error the team were left with unrepresentative samples as the most viable explanation for the error. In terms of positive evidence for this they looked at the differences between the BES and BSA samples (done by probability sampling) and the pre-election polls (done by variations on quota sampling). This wasn’t a recommendation to use probability sampling (while they didn’t do recommendations, Pat did rule out any recommendation that polling switch to probability sampling wholesale, recognising that the cost and timing was wholly impractical, and that the BES & BSA had been wrong in their own way, rather than being perfect solutions).

The two probability based surveys were, however, useful as comparisons to pick up possible shortcomings in the sample – so, for example, the pre-election polls that provided precise age data for respondents all had age skews within age bands, specifically within the oldest age band there were too many people in their 60s, not enough in their 70s and 80s. The team agreed with the suggestions that samples were too politically engaged – in their investigation they looked at likelihood to vote, finding most polls had samples that were too likely to vote, and didn’t have the correct contrast between young and old turnout. They also found samples didn’t have the correct proportions of postal voters for young and old respondents. They didn’t suggest all of these errors were necessarily related to why the figures were wrong, but that they were illustrations of the samples not being properly representative – and that ultimately led to getting the election wrong.

Herding

Finally the team spent a long time going through the data on herding – that is, polls producing figures that were closer to each other than random variation suggests they should be. On the face of it the narrowing looks striking – the penultimate polls had a spread of about seven points between the poll with the biggest Tory lead and the poll with the biggest Labour lead. In the final polls the spread was just three points, from a one point Tory lead to a two point Labour lead.

Analysing the polls earlier in the campaign the spread between different was almost exactly what you would expect from a stratified sample (what the inquiry team considered the closest approximation to the politically weighted samples used by the polls). In the last fortnight the spread narrowed though, with the final polls all close together. The reason for this seems to be because of methodological change – several of the polling companies made adjustments to their methods during the campaign or for their final polls (something that has been typical at past elections, companies often add extra adjustments to their final polls). Without those changes them the polls would have been more variable….and less accurate. In other words, some pollsters did make changes in their methodology at the end of the campaign which meant the figures were clustered together, but they were open about the methods they were using and it made the figures LESS Labour, not more Labour. Pollsters may or may not, consciously or subconsciously, have been influenced in the methodological decisions they made by what other polls were showing. However, from the inquiry’s analysis we can be confident that any herding did not contribute to the polling error, quite the opposite – all those pollsters who changed methodology during the campaign were more accurate using their new methods.

For completeness, the inquiry also took everyone’s final data and weighted it using the same methods – they found a normal level of variation. They also took everyone’s raw data and applied the weighting and filtering the pollsters said they had used to see if they could recreate the same figures – the figures came out the same, suggesting there was no sharp practice going on.

So what next?

Today’s report wasn’t a huge surprise – as I wrote at the weekend, most of the analysis so far has pointed to unrepresentative samples as the root cause, and the official verdict is in line with that. In terms of the information released today there were no recommendations, it was just about the diagnosis – the inquiry will be submitting their full written report in March. It will have some recommendations on methodology – but no silver bullet – but with the diagnosis confirmed the pollsters can start working on their own solutions. Many of the companies released statements today welcoming the findings and agreeing with the cause of the error, we shall see what different ways they come up with to solve it.


The MRS/BPC Polling Inquiry under Pat Sturgis is due to release it’s initial findings at a meeting this afternoon (and a final written report in March). While I expect much more detail this afternoon they’ve press released the headline findings overnight. As I suggested here, they’ve pointed to unrepresentative samples as the main cause of the polling error back in May. There is not yet any further detail beyond “too much Labour, not enough Conservative”; we’ll have to wait until this afternoon to find out what they’ve said about the exact problems with samples and why they may have been wrong.

The inquiry conclude that other potential causes of error (such as respondents misreporting their intentions (“shy Tories”), unregistered voters, question wording and ordering) made at most a modest contribution to the error. They say the evidence on late swing was inconclusive, but that even if it did happen, it only accounted for a small proportion of the error. The inquiry also say they could not rule out any herding.

The overnight press release doesn’t hint at any conclusions or recommendations about how polls are reported or communicated to the press and the public, but again, perhaps there will be more on that this afternoon. Till then…


On Tuesday the BPC/MRS’s inquiry into why the polls went wrong publishes its first findings. Here’s what you need to know in advance.

The main thing the inquiry is looking at is why the polls were wrong. There are, essentially, three broad categories of problems that could have happened. First, there could have been a late swing – the polls could actually have been perfectly accurate at the time, but people changed their minds. Secondly respondents could have given inaccurate answers – people could have said they’d vote and not done so, said they’d vote Labour, but actually voted Tory and so on. Thirdly the samples themselves could have been wrong – people responding to polls were honest and didn’t change their minds, but the pollsters were interviewing the wrong mix of people to begin with.

Some potential problems can straddle those groups. For example, polls could be wrong because of turnout, but that could be because pollsters incorrectly identified which people would vote or because polls interviewed people who are too likely to vote (or a combination of the two). You end up with the same result, but the root causes are different and the solutions would be different.

Last year the BPC held a meeting at which the pollsters gave their initial thoughts on what went wrong. I wrote about it here, and the actual presentations from the pollsters are online here. Since then YouGov have also published a report (writeup, report), the BES team have published their thoughts based on the BES data (write up, report) and last week John Curtice also published his thoughts.

The most common theme through all these reports so far is that sampling is to blame. Late swing has been dismissed as a major cause by most of those who’ve looked at the data. Respondents giving inaccurate answers doesn’t look like it will be major factor in terms of who people will vote for (it’s hard to prove anyway, unless people suddenly start be honest after the event, but what evidence there is doesn’t seem to back it up), but could potentially be a contributory factor in how well people reported if they would vote. The major factor though looks likely to be sampling – pollsters interviewing people who are too easy to reach, too interested in politics and engaged with the political process and – consequently – getting the differential turnout between young and old wrong.

Because of the very different approaches pollsters use I doubt the inquiry will be overly prescriptive in terms of recommended solutions. I doubt they’ll say pollsters should all use one method, and the solutions for online polls may not be the same as the solutions for telephone polls. Assuming the report comes down to something around the polls getting it wrong because they had samples made up of people who were too easily contactable and too politically engaged and likely to vote, I see two broad approaches to getting it right. One is to change the sampling and weighting in way that gets more unengaged people, perhaps ringing people back more in phone polls, or putting some measure of political attention or engagement in sampling and weighting schemes. The other is to use post-collection filters, weights or models to get to a more realistic pattern of turnout. We shall see what the inquiry comes up with as the cause, and how far they go in recommending specific solutions.

While the central plank of the inquiry will presumably what went wrong, there were other tasks within the inquiry’s terms of reference. They were also asked to look at the issue of “herding” – that is, pollsters artificially producing figures that are too close to one another. Too some degree a certain amount of convergence is natural in the run up to an election given that some of the causes of the difference between pollsters are different ways treating things like don’t knows. As the public make their minds up, these will cause less of a difference (e.g. if one difference between two pollsters is how they deal with don’t knows, it will make more of a difference when 20% of people say don’t know than when 10% do). I think there may also be a certain sort of ratchet effect – pollsters are only human, and perhaps we scrutinise our methods more if we’re showing something different from anyone else. The question for the inquiry is if there was anything more than that? Any deliberate fingers on the scales to make their polls match?

Finally the inquiry have been asked about how polls were communicated to the commentariat and the public, what sort of information is provided and guidance given to how they should be understood and reported. Depending on what the inquiry find and recommend, in terms of how polls are released and reported in the future this area could actually be quite important. Again, we shall see what they come up with.