Yesterday the British Polling Council had an event talking about how the polls had changed since 2015. This included collecting up data from all the companies on what they’ve done to correct the error and what they are now doing differently – all those summaries are collected here.

In looking at what’s changed it’s probably best to start with what actually went wrong and what problem the pollsters are trying to solve. As all readers will know, the polls in 2015 wrongly overstated Labour support and understated the Conservatives. The BPC/MRS inquiry under Pat Sturgis concluded this was down to unrepresentative samples.

Specially, it looked as if polls had too many younger people who were too engaged and too interested in politics. The effect of this was that while in reality there was a big difference between the high turnout among old people and the low turnout among young people, among the sort of people who took part in polls this gap was too small. In short, the sort of young people who took part in polls went out and voted Labour; the sort of young people who weren’t interested and stayed at home didn’t take part in polls either.

So, what have polling companies done to correct the problems? There is a summary for each individual company here.

There have been a wide variety of changes (including YouGov interlocking past vote & region, ICM changing how they reallocate don’t knows, ICM and ComRes now both doing only online polls during the campaign). However, the core changes seem to boil down to two approaches: some companies have focused on improving the sample itself, trying to include more people who aren’t interested in politics, who are less well educated and don’t usually vote. Other companies have focused on correcting the problems caused by less than representative samples, changing their turnout model so it is based more on demographics, and forcing it to more accurately reflect turnout patterns in the real world. Some companies have done a bit of both.

Changes to make samples less politically engaged…

  • ICM and YouGov have both added a weight by respondents level of interest or attention to politics, based upon the British Election Study probability survey. YouGov have also added weights by level of educational qualification.
  • Ipsos MORI haven’t added political interest weights directly, but have added education weights and newspaper readership weights, which correlate with political interest.
  • Kantar have added education weighting, and also weight down turnout to the level they project it to be as a way of reducing the overall level of political engagement in their sample.

Changes to base turnout on demographics…

  • ComRes have changed their turnout model, so it is based more on respondents’ demographics rather than how likely they claim they are to vote. The effect of this is essentially to downweight people who are younger and more working class on the assumption that the pattern of turnout that we’ve seen at past elections remains pretty steady. ICM have a method that seems very similar in its aim (I’m not sure of the technicalities) – weighting the data so that the pattern of turnout by age & social grade is the same as in 2015.
  • Kantar (TNS) have a turnout model that is partially based on respondents age (so again, assuming that younger people are less likely to vote) and partially on their self-reported likelihood.
  • ORB weight their data by education and age so that it matches not the electorate as a whole, but the profile of people who the 2015 British Election Study who actually voted (they also use the usual self-reported likelihood to vote weighting on top of this).
  • Opinium, MORI and YouGov still base their turnout models on people’s answers rather than their demographics, but they have all made changes. YouGov and MORI now weight down people who didn’t vote in the past, Opinium downweight people who say they will vote for a party but disapprove of its leader.
  • Panelbase and Survation haven’t currently made any radical changes since 2015, but Panelbase say they are considering using BES data to estimate likelihood to vote in their final poll (which sounds to me as if they are considering something along the lines of what ICM are doing with their turnout model)

In terms of actual outcomes, the pollsters who have adopted demographic turnout-models (ComRes, ICM and Kantar) tend to show larger Conservative leads than companies who have tried to address the problem only through sampling and weighting changes. We cannot really tell which is more likely to be right until June 8th. In short, for companies who have concentrated only on making samples more representative, the risk is that it hasn’t worked well enough, and that there are still too many of the sort of young engaged voters who are attracted to Jeremy Corbyn in their samples. For companies who have instead concentrated on demographic-based turnout models, the risk is that the pattern of turnout in 2017 differs from that in 2015, and that Jeremy Corbyn’s Labour really does manage to get more young people to come out to vote than Ed Miliband did. We will see what happens and, I expect, the industry will learn from whatever is seen to work this time round.


Two new voting intention polls today. The first by Survation for Good Morning Britain had topline figures of CON 48%(+1), LAB 30%(nc), LDEM 8%(+1), UKIP 4%(nc). Clearly there is no substantial change since their poll a week ago. Fieldwork was conducted on Friday and Saturday, after the leak of the Labour manifesto, and doesn’t show any sign of any impact.

The second was the weekly ICM poll for the Guardian. Topline figures there are CON 48%(-1), LAB 28%(+1), LDEM 10%(+1), UKIP 6%(nc). As many have noted, ICM are now are, along with TNS, one of only two pollsters still showing Labour support below thirty points (MORI last poll did the same, but that was several weeks ago when everyone showed Labour that low). It’s not that ICM haven’t shown Labour support rising a little. ICM have been showing Labour recovering slightly, it’s just they’ve been doing so at a slightly lower figures: at the start of the campaign ICM had Labour at 25-26% and they now have them at 27%-28%.

This seems to be a consistent methodological difference. The methodological differences between pollsters are complicated and various, and some of them work in opposite directions (ICM, for example, also reallocate don’t knows in a way that helps Labour) but the most obvious one at the moment is probably the approach to turnout. Traditionally British pollsters have accounted for people’s likelihood to vote by getting respondents to estimate their own likelihood to vote – put crudely, they ask people to say how likely they are to vote on a scale of 0 to 10, and then either weight them accordingly (someone who says they are 8/10 likely to vote is only counted as 8/10ths of someone who says 10/10), or apply a cut off, ignoring people who rate their chances below 5/10 or 9/10 or 10/10. Since 2015 several companies, including YouGov and Ipsos MORI, have also factored in whether people say they have voted in the past, weighting down past non-voters.

ICM and ComRes have adopted new approaches. Rather than basing their turnout model on people’s self-reported likelihood to vote, they base it on their demographics – estimating respondent’s likelihood to vote based on their age and social grade – the assumption being that younger people and working class people will remain less likely than older, more middle class people to vote. This tends to have the effect of making the results substantially more Conservative, less Labour, meaning that ICM and ComRes tend to produce some of the biggest Tory leads.

Full tabs for the ICM poll are here and the Survation poll here.


-->

Polling myths

Whenever a poll goes up that shows bad news for someone you get the same sort of comments on social media. As I write this piece in May 2017 comments like these generally come from Jeremy Corbyn supporters, but that’s just the political weather at this moment in time. When the polls show Labour ahead you get almost exactly the same comments from Conservative supporters, when UKIP are doing badly you get them from UKIP supporters, when the Lib Dems are trailing you get them from Lib Dem supporters.

There are elements of opinion polling that are counter-intuitive and many of these myths will sound perfectly convincing to people who aren’t versed in how polls work. This post isn’t aimed at the hardcore conspiracists who are beyond persuasion – if you are truly convinced that polls are all a malevolent plot of some sort there is nothing I’ll be able to do to convince you. Neither is it really aimed at those who already know such arguments are nonsense: this is aimed at those people who don’t really want to believe what the polls are saying, see lots of people on social media offering comforting sounding reasons why you can ignore them, but are thinking, “Is that really true, or is it rather too convenient an excuse for waving away an uncomfortable truth…”

1) They only asked 1000 people out of 40 million. That’s not enough

This question has been about for as long as polling has. George Gallup, the trailblazer of modern polling, used to answer it by saying that it wasn’t necessary to eat a whole bowl of soup to know whether or not it was too salty, providing it had been stirred, a single spoonful was enough. The mention of stirring wasn’t just Gallup being poetic, it’s vital. Taking a single spoonful from the top of a bowl of soup might not work (that could be the spot where someone just salted it), but stirring the soup means that spoonful is representative of the whole bowl.

What makes a poll representative is not the size of the sample, it is its representativeness. You could have a huge sample size that was completely meaningless. Imagine, for example, that you did a poll of 1,000,000 over 65s. It would indeed be a huge sample, but it would be very skewed toward the Tories and Brexit. What makes a poll meaningful or not is whether it is representative of the country. Does it have the correct proportions of men and women? Old and young? Middle class and working class? Graduates and non-graduates? If the sample reflects British society as a whole in all these ways, then it should reflect it in terms of political opinion too. A poll of 1000 people is quite enough to get a representative sample.

The classic example of this was at the very birth of modern polling – in the US 1936 Presidential election a magazine called the Literary Digest did a survey of over two million people, drawn from magazine subscribers, telephone directories and so forth. It showed Alf Landon would win the Presidential election. The then newcomer George Gallup did a far, far smaller poll properly sampled by state, age, gender and so on. He correctly showed a landslide for Roosevelt. A poll with a sample skewed towards people wealthy enough to have phones and magazines in depression era America was worthless, despite have two million respondents.

2) Who do they ask? I’ve never been asked to take part in a poll!

Sometimes this is worked up to “…and neither has anyone I’ve met”, which does raise the question of whether the first thing these people do upon being introduced to a new person is to ask if MORI have ever rung them. That aside, it’s a reasonable question. If you’ve never been polled and the polls seem to disagree with your experience, where do all these answers come from?

The simple answer is that pollsters obtain their samples either by dialling randomly generated telephone numbers or by contacting people who are members of internet panels. Back when polls were mostly conducted by telephone the reason you had never been polled was simple maths – there were about forty million adults in Britain, there were about fifty or so polls of voting intention of a thousand people conducted each year. Therefore in any given year you had about a 0.1% chance of being invited to take part in a poll.

These days most opinion polls are conducted using online panels, but even if you are a member of a panel, your chances of being invited to a political poll are still relatively low. Most panels have tens of thousands of people (or for the better known companies, hundreds of thousands of people) and 95% of surveys are about commercial stuff like brands, pensions, grocery shopping and so on. You could still be waiting some time to be invited to a political one.

3) But nobody I know is voting for X!

We tend to know and socialise with people who are quite like ourselves. Our social circles will tend to be people who live in the same sort of area as us, probably people who have a similar sort of social status, a similar age. You probably have a fair amount in common with your friends or they wouldn’t be your friends. Hence people we know are more likely than the average person to agree with us (and even when they don’t, they won’t necessarily tell us; not everyone relishes a political argument). On social media it’s even worse – a large number of studies have shown that we tend to follow more people we agree with, producing self-reinforcing bubbles of opinion.

During the Labour leadership contest almost every one of my friends who is a member of the Labour party was voting for Liz Kendall. Yet the reality was that they were all from a tiny minority of 4.5% – it’s just that the Labour party members I knew all happened to be Blairite professionals working in politics in central London. Luckily I had proper polling data that was genuinely reflective of the whole of the Labour party, so I knew that Jeremy Corbyn was in fact in the lead.

In contrast to the typical friendship group, opinion polls samples will be designed so that they reflect the whole population and don’t fall into those traps. They will have the correct balance of people from all across the country, will have the correct age range, will have the correct balance of social class and past vote and so on. Perhaps there are people out there who, by some freak co-incidence, have a circle of acquaintances who form a perfectly representative sample of the whole British public, but I doubt there are very many.

4) Pollsters deliberately don’t ask Labour/Conservative supporters

In so far as there is any rationale behind the belief, it’s normally based upon the perception that someone said they were going to vote for x in a poll, and weren’t asked again. As we’ve seen above, it’s a lot more likely that the reason for this is simply that it’s relatively rare to be invited to a political poll anyway. If you’ve been asked once, the chances are you’re not going to be asked again soon whatever answers you gave.

Under the British Polling Council rules polling companies are required to publish the details of their samples – who was interviewed, what the sample was weighted by and so on. These days almost every company uses some form of political sampling or weighting to ensure that the samples are politically representative. Hence in reality pollsters deliberately include a specific proportion of 2015 Labour supporters in their polls, generally the proportion who did actually vote Labour in 2015. Pollsters are required to report these figures in their tables, or to provide them on request. Hence, if you look at last weekend’s Opinium poll you’ll find that 31% of people in the poll who voted in 2015 voted Labour, the proportion that actually did, if you look at the ICM poll you’ll find that 31% of the people who voted at the last election say they voted Labour, the proportion that actually did, and so on with every other company.

5) Pollsters are biased, and fix their figures

Again, this an accusation that is as old as polling – if you don’t like the message, say the person making it is biased. It’s made easier by the fact that a lot of people working in political polling do have a background in politics, so if you want to look for someone to build a conspiracy theory upon, you don’t need to look far. Over the years I think we’ve been accused of being biased towards and against every party at one time or another – when Labour were usually ahead in the polls YouGov used to be accused of bias because Peter Kellner was President. When the Conservatives were ahead different people accused us of being biased because Stephen Shakespeare was the CEO. The reality is, of course, that polling companies are made up of lots of people with diverse political views (which is, in fact, a great benefit when writing questions – you can get the opinion of colleagues with different opinions to your own when making sure things are fair and balanced).

The idea that polling companies would bias their results to a particular party doesn’t really chime with the economics of the business or the self-interest of companies and those who run them. Because political polls are by far the most visible output of a market research company there is a common misapprehension that it brings in lots of money. It does not. It brings in very little money and is often done as a loss-leader by companies in order to advertise their wares to the commercial companies that spend serious money doing research on brand perceptions, buying decisions and other consumer surveys. Voting intention polls are one of the very few measures of opinion that get checked against reality – it is done almost entirely as a way of the company (a) getting their name known and (b) demonstrating that their samples can accurately measure public opinion and predict behaviour. Getting elections wrong, however, risks a huge financial cost to market research companies through reputational damage and, therefore, huge financial cost to those running them. It would be downright perverse to deliberately get those polls wrong.

6) Polls always get it wrong

If the idea that polling companies would ruin themselves by deliberately getting things wrong is absurd, the idea that polls can get it wrong by poor design is sadly true: polls obviously can get it wrong. Famously they did so at the 2015 general election. Some polls also got Brexit wrong, though the picture is more mixed that some seem to think (most of the campaign polls on Brexit actually showed Leave ahead). Polls tend to get it right a lot more often than not though – even in recent years, when their record is supposed to have been so bad, the polls were broadly accurate on the London mayoral election, the Scottish Parliamentary election, the Welsh Assembly election and both of the Labour party leadership elections.

Nevertheless, it is obviously true to say that polls can be wrong. So what’s the likelihood that this election will be one of those occasions? Following the errors of the 2015 general election the British Polling Council and Market Research Society set up an independent inquiry into the polling error and what caused it, under the leadership of Professor Pat Sturgis at Southampton University. The full report is here, and if you have some spare time and want to understand how polling works and what can go wrong with them it is worth putting aside some time to read it. The extremely short version is, however, that the polls in 2015 weren’t getting samples that were representative enough of the general public – people who agreed to take part in a phone poll, or join an internet panel weren’t quite normal, they were too interested in politics, too engaged, too likely to vote.

Since then polling companies have made changes to try and address that problem. Different companies have taken different approaches. The most significant though are a mix of adding new controls on samples by education and interest in politics and changes to turnout models. We obviously won’t know until the election has finished whether these have worked or not.

So in that context, how does one judge current polls? Well, there are two things worth noting. The first is that while polls have sometimes been wrong in the past, their error has not been evenly distributed. They have not been just as likely to underestimate Labour as they have been to overestimate Labour: polling error has almost always overstated Labour support. If the polls don’t get it right, then all previous experience suggests it will be because they have shown Labour support as too *high*. Theoretically polls could have tried too hard to correct the problems of 2015 and be overstating Conservative support, but given the scale of the error in 2015 and the fact that some companies have made fairly modest adjustments, that seems unlikely to be the case across the board.

Secondly is the degree of error. When polls are wrong they are only so wrong. Even those elections where the polls got it most wrong, like 1992 and 2015, their errors were nowhere near the size of the Conservative party’s current lead.

Short version is, yes, the polls could be wrong, but even the very worst polls have not been wrong enough to cancel out the size of lead that the Tories currently have and when the polls have been that wrong, it’s always been by putting Labour too high.

So, if you aren’t the sort to go in for conspiracy theories, what comfort can I offer if the polls aren’t currently showing the results you’d like them to? Well, first the polls are only ever a snapshot of current opinion. They do not predict what will happen next week or next month, so there is usually plenty of time for them to change. Secondly, for political parties polls generally contain the seeds of their salvation, dismissing them misses the chance to find out why people aren’t voting for you, what you need to change in order to win. And finally, if all else fails, remember that public opinion and polls will eventually change, they always do. Exactly twenty years ago the polls were showing an utterly dominant Labour party almost annihilating a moribund Tory party – the pendulum will likely swing given enough time, the wheel will turn, another party will be on the up, and you’ll see Conservative party supporters on social media trying to dismiss their awful polling figures using exactly the same myths.


“But the sheer size of the survey […] makes it of interest…”

One of the most common errors in interpreting polls and surveys is the presumption that because something has a really huge sample size it is more meaningful. Or indeed, meaningful at all. Size isn’t what makes a poll meaningful, it is how representative the sample is. Picture it this way, if you’d done an EU referendum poll of only over 60s you’d have got a result that was overwhelmingly LEAVE… even if you polled millions of them. If you did a poll and only included people under 30 you’d have got a result that was overwhelmingly REMAIN… even if you polled millions of them. What matters is that the sample accurately reflects the wider population you want them to represent, that you have the correct proportions of both young and old (and male & female, rich & poor, etc, etc). Size alone does not guarantee that.

The classic real world example of this is the 1936 Presidential Election in the USA. I’ve referred to this many times but I thought it worth reciting the story in full, if only so people can direct others to it in future.

Back in 1936 the most respected barometers of public opinion was the survey conducted by the Literary Digest, a weekly news magazine with a hefty circulation. At each Presidential election the Digest carried out a survey by mail, sending surveys to its million-plus subscriber base and to a huge list of other people, gathered from phone directories, membership organisations, subscriber lists and so on. There was no attempt at weighting or sampling, just a pure numbers grab, with literally millions of replies. This method had correctly called the winner for the 1920, 1924, 1928 and 1932 Presidential elections.

In 1936 the Digest sent out more than ten million ballots. The sample size for their final results was 2,376,523. This was, obviously, huge. One can imagine how the today’s papers would write up a poll of that size and, indeed, the Digest wrote up their results with not a little hubris. If anything, they wrote it up with huge, steaming, shovel loads of hubris. They bought all the hubris in the shop, spread it across the newsroom floor and rolled about in it cackling. Quotes included:

  • “We make no claim to infallibility. We did not coin the phrase “uncanny accuracy” which has been so freely applied to our Polls”
  • “Any sane person can not escape the implication of such a gigantic sampling of popular opinion as is embraced in THE LITERARY DIGEST straw vote.”
  • “The Poll represents the most extensive straw ballot in the field—the most experienced in view of its twenty-five years of perfecting—the most unbiased in view of its prestige—a Poll that has always previously been correct.”

digestpoll

You can presumably guess what is going to happen here. The final vote shares in the 1936 Literary Digest poll were 57% for Alf Landon (Republican) and 43% for Roosevelt (Democrat). This worked out as 151 electoral votes for Roosevelt and 380 for Landon. The actual result was 62% Roosevelt, 38% for Landon. Roosevelt received 523 in the electoral college, Landon received 8, one of the largest landslide victories in US history. Wrong does not nearly begin to describe how badly off the Literary Digest was.

At the same time George Gallup was promoting his new business, carrying out what would become proper opinion polls and using them for a syndicated newspaper column called “America Speaks”. His methods were quite far removed from modern methods – he used a mixed mode method, mail-out survey for richer respondents and face-to-face for poorer, harder to reach respondents. The sample size was also still huge by modern standards, about 40,000*. The important different from the Literary Digest poll however was that Gallup attempted to get a representative sample – the mail out surveys and sampling points for face-to-face interviews had quotas on geography and on urban and rural areas, interviewers had quotas for age, gender and socio-economic status.

pic2097election

Gallup set out to challenge and defeat the Literary Digest – a battle between a monstrously huge sample and Gallup’s smaller but more representative sample. Gallup won. His final poll predicted Roosevelt 55.7%, Landon 44.3%.* Again, by modern standards it wasn’t that accurate (the poll by his rival Elmo Roper, who was setting quotas based on the census rather than his turnout estimates was actually better, predicting Roosevelt on 61%… but he wasn’t as media savvy). Nevertheless, Gallup got the story right, the Literary Digest hideously wrong. George Gallup’s reputation was made and the Gallup organisation became the best known polling company in the US. The Literary Digest’s reputation was shattered and the magazine folded a couple of years later. The story has remained a cautionary tale of why a representative poll with a relatively small sample is more use than a large poll that makes no effort to be representative, even if it is absolutely massive.

The question of why the Digest poll was so wrong is interesting itself. Its huge error is normally explained through where the sample came from – they drew it from things like magazine subscribers, automobile association members and telephone listings. In depression era America many millions of voters didn’t have telephones and couldn’t afford cars or magazine subscriptions, creating an inbuilt bias towards wealthier Republican voters. In fact it appears to be slightly more complicated than that – Republican voters were also far more likely to return their slips than Democrat voters were. All of these factors – a skewed sampling frame, differential response rate and no attempt to combat these – combined to make the Literary Digest’s sample incredibly biased, despite its massive and impressive size.

Ultimately, it’s not the size that matters in determining if a poll is any good. It’s whether it’s representative or not. Of course, a large representative poll is better than a small representative poll (though it is a case of diminishing returns) but the representativeness is a prerequisite for it being of any use at all.

So next time you see some open-access poll shouting about having tens of thousands of responses and are tempted to think “Well, it may not be that representative, but it’s got a squillion billion replies so it must mean something, mustn’t it?” Don’t. If you want something that you can use to draw conclusions about the wider population, it really is whether it reflects that population that counts. Size alone won’t cut it.

=

* You see different sample sizes quoted for Gallup’s 1936 poll – I’ve seen people cite 50,000 as his sample size or just 3,000. The final America Speaks column before the 1936 election doesn’t include the number of responses he got (though does mention he sent out about 300,000 mailout surveys to try and get it). However, the week after (8th Nov 1936) the Boston Globe had an interview with the organisation going through the details of how they did it that says they aimed at 40,000 responses.
** If you are wondering why the headline in that thumbnail says 54% when I’ve said Gallup called the final share as 55.7%, it’s because the polls were sometimes quoted as share of the vote for all candidates, sometimes for share of the vote for just the main two parties. I’ve quoted both polls as “share of the main party vote” to keep things consistent.


Almost a month on from the referendum campaign I’ve had chance to sit down and collect my thoughts about how the polls performed. This isn’t necessarily a post about what went wrong since, as I wrote on the weekend after the referendum, for many pollsters nothing at all went wrong. Companies like TNS and Opinium got the referendum resolutely right, and many polls painted a consistently tight race between Remain and Leave. However some did less well and in the context of last year’s polling failure there is plenty we can learn about what methodology approaches adopted by the pollsters did and did not work for the referendum.

Mode effects

The most obvious contrast in the campaign was between telephone and online polls, and this contributed to the surprise over the result. Telephone and online polls told very different stories – if one paid more attention to telephone polls then Remain appeared to have a robust lead (and many in the media did, having bought into a “phone polls are more accurate” narrative that turned out to be wholly wrong). If one had paid more attention to online polls the race would have appeared consistently neck-and-neck. If one made the – perfectly reasonable – assumption that the actual result would be somewhere in between phone and online, one would still have ended up expecting a Remain victory.

While there was a lot of debate about whether phone or online was more likely to be correct during the campaign, there was relatively little to go on. Pat Sturgis and Will Jennings of the BPC inquiry team concluded that the true position was probably in between phone and online, perhaps a little closer to online, by comparing the results of the 2015 BES face-to-face data to the polls conducted at the time. Matt Singh and James Kanagasooriam wrote a paper called Polls Apart that concluded the result was probably closer to the telephone polls because they were closer to the socially liberal results in the BES data (as issue I’ll return to later). A paper by John Curtice could only conclude that the real result was likely somewhere in between online and telephone, given that at the general election the true level of UKIP support was between phone and online polls. During the campaign there was also a NatCen mixed-mode survey based on recontacting respondents to the British Social Attitudes survey, which found a result somewhere in between online and telephone.

In fact the final result was not somewhere in between telephone and online at all. Online was closer to the final result, and far from being in between the actual result was more Leave than all of them.

As ever, the actual picture was not quite as simple as this and there was significant variation within modes. The final online polls from TNS and Opinium had Leave ahead, but Populus’s final poll was conducted online and had a ten point lead for Remain. The final telephone polls from ComRes and MORI showed large leads for Remain, but Survation’s final poll phone poll showed a much smaller Remain lead. ICM’s telephone and online polls had been showing identical leads, but ceased publication several weeks before the result. On average, however, online polls were closer to the result than telephone polls.

The referendum should perhaps also provoke a little caution about probability studies like the face-to-face BES. These are hugely valuable surveys, done to the highest possible standards… but nothing is perfect, and they can be wrong. We cannot tell what a probability poll conducted immediately before the referendum would have shown, but if it had been somewhere between online and phone – as the earlier BES and NatCen data were – then it would also have been wrong.

People who are easy or difficult to reach by phone

Many of the pieces looking of the mode effects in the EU polling looked at the differences between people who responded quickly and slowly to polls. The BPC inquiry into the General Election polls analysed the samples from the post-election BES/BSA face-to-face polls and showed how people who responded to the face-to-face surveys on the first contact were skewed towards Labour voters, only after including those respondents who took two or three attempts to contact did the polls correctly show the Conservatives in the lead. The inquiry team used this as an example of how quota sampling could fail, rather than evidence of actual biases which affected the polls in 2015, but the same approach has become more widely used in analysis of polling failure. Matt Singh and James Kanagasooriam’s paper in particular focused on how slow respondents to the BES were also likely to be more socially liberal and concluded, therefore, that online polls were likely to be have too many socially conservative people.

Taking people who are reached on the first contact attempt in a face-to-face poll seems like a plausible proxy for people who might be contacted by a telephone poll that doesn’t have time to ring back people who it fails to contact on the first attempt. Putting aside the growing importance of sampling mobile phones, landline surveys and face-to-face surveys do both depend on the interviewee being at home at a civilised time and willing to take part. It’s more questionable why it should be a suitable proxy for the sort of person willing to join an online panel and take part in online surveys that can be done on any internet device at any old time.

As the referendum campaign continued there were more studies that broke down people’s EU referendum voting intention by how difficult they were to interview. NatCen’s mixed-mode survey in May to June found the respondents that it took longer to contact tended to be more leave (as well as being less educated, and more likely to say don’t know). BMG’s final poll was conducted by telephone, but used a 6 day fieldwork period to allow multiple attempts to call-back respondents. Their analysis painted a mixed picture – people contacted on the first call were fairly evenly split between Remain and Leave (51% Remain), people on the second call were strongly Remain (57% Remain), but people on later calls were more Leave (49% Remain).

Ultimately, the evidence on hard-to-reach people ended up being far more mixed than initially assumed. While the BES found hard-to-reach people were more pro-EU, the NatCen survey’s hardest to reach people were more pro-Leave, and BMG found a mixed pattern. This also suggests that one suggested solution to make telephone sampling better – taking more time to make more call-backs to those people who don’t answer the first call – is not necessarily a guaranteed solution. ORB and BMG both highlighted their decision to spend longer over their fieldwork in the hope of producing better samples, both taking six days rather than the typical two or three. Neither were obviously more accurate than phone pollsters with shorter fieldwork periods.

Education weights

During the campaign YouGov wrote a piece raising questions about whether some polls had too many graduates. Level of educational qualifications correlated with how likely people were to support to EU membership (graduates were more pro-EU, people with no qualification more pro-Leave, even after controlling for age) so this did have the potential to skew figures.

The actual proportion of “graduates” in Britain depends on definitions (the common NVQ Level 4+ categorisation in the census includes some people with higher education qualifications below degree-level), but depending on how you define it and whether or not you include HE qualifications below degree level the figure is around 27% to 35%. In the Populus polling produced for Polls Apart 47% of people had university level qualifications, suggesting polls conducted by telephone could be seriously over-representing graduates.

Ipsos MORI identified the same issue with too many graduates in their samples and added education quotas and weights during the campaign (this reduced the Remain lead in their polls by about 3-4 points, so while their final poll still showed a large Remain lead, it would have been more wrong without education weighting). ICM, however, tested education weights on their telephone polls and found it made little difference, while education breaks in ComRes’s final poll suggest they had about the right proportion of graduates in their sample anyway.

This doesn’t entirely put the issue of education to bed. Data on the educational make-up of samples is spotty, and the overall proportion of graduates in the sample is not the end of the story – because there is a strong correlation between education and age, just looking at overall education levels isn’t enough. There need to be enough poorly qualified people in younger age groups, not just among older generations where it is commonplace.

The addition of education weights appears to have helped some pollsters, but it clearly depends on the state of the sample to begin with. MORI controlled for education, but still over-represented Remain. ComRes had about the right proportion of graduates to begin with, but still got it wrong. Getting the correct proportion of graduates does appear to have been an issue for some companies, and dealing with it helped some companies, but alone it cannot explain why some pollsters performed badly.

Attitudinal weights

Another change introduced by some companies during the campaign was weighting by attitudes towards immigration and national identity (whether people considered themselves to be British or English). Like education, both these attitudes were correlated with EU referendum voting intention. Where they differ from education is that there are official statistics on the qualifications held by the British population, but there are no official stats on national identity or attitudes towards immigration. Attitudes may also be more liable to change than qualifications.

Three companies adopted attitudinal weights during the campaign, all of them online. Two of these used the same BES questions on racial equality and national identity from the BES face-to-face survey that were discussed in Polls Apart… but with different levels of success. Opinium, who were the joint most-accurate pollster, weighted people’s attitudes to racial equality and national identity to a point half-way between the BES findings and their own findings (presumably on the assumption that half the difference was sample, half interviewer effect). According to Opinium this increased the relative position of remain by about 5 points when introduced. Populus weighted by the same BES questions on attitudes to race and people’s national identity, but in their case used the actual BES figures – presumably giving them a sample that was significantly more socially liberal than Opinium’s. Populus ended up showing the largest Remain lead.

It’s clear from Opinium and Populus that these social attitudes were correlated with EU referendum vote and including attitudinal weighting variables did make a substantial difference. Exactly what to weight them to is a different question though – Populus and Opinium weighted the same variable to very different targets, and got very different results. Given the sensitivity of questions about racism we cannot be sure whether people answer these questions differently by phone, online or face-to-face, nor whether face-to-face probability samples have their own biases, but choosing what targets to use for any attitudinal weighting is obviously a difficult problem.

While it may have been a success for Opinium, attitudinal weighting is unlikely to have improved matters for other online polls – online polls generally produce social attitudes that are more conservative than suggested by the BES/BSA face-to-face surveys, so weighting them towards the BES/BSA data would probably only have served to push the results further towards Remain and make them even less accurate. On the other hand, for telephone polls there could be potential for attitudinal weighting to make samples less socially liberal.

Turnout models

There was a broad consensus that turnout was going to be a critical factor at the referendum, but pollsters took different approaches towards it. These varied from a traditional approach of basing turnout weights purely on respondent’s self-assessment of their likelihood to vote, models that also incorporated how often people had voted in the past or their interest in the subject, through to a models that were based on the socio-economic characteristics of respondents, modelling people’s likelihood to vote based on their age and social class.

In the case of the EU referendum Leave voters generally said they were more likely to vote than Remain voters, so traditional turnout models were more likely to favour Leave. People who didn’t vote at previous elections leant towards Leave, so models that incorporated past voting behaviour were a little more favourable towards Remain. Demographic based models were more complicated, as older people were more likely to vote and more leave, but middle class and educated people were more likely to vote and more remain. On balance models based on socio-economic factors tended to favour Remain.

The clearest example is Natcen’s mixed mode survey, which explictly modelled the two different approaches. Their raw results without turnout modelling would have been REMAIN 52.3%, LEAVE 47.7%. Modelling turnout based on self-reported likelihood to vote would have made the results slightly more “leave” – REMAIN 51.6%, LEAVE 48.4%. Modelling the results based on socio-demographic factors (which is what NatCen chose to do in the end) resulted in topline figures of REMAIN 53.2%, LEAVE 46.8%.

In the event ComRes & Populus chose to use methods based on socio-economic factors, YouGov & MORI used methods combining self-assessed likelihood and past voting behaviour (and in the case of MORI, interest in the referendum), Survation & ORB a traditional approach based just on self-assessed likelihood to vote. TNS didn’t use any turnout modelling in their final poll.

In almost every case the adjustments for turnout made the polls less accurate, moving the final figures towards Remain. For the four companies who used more sophisticated turnout models, it looks as if a traditional approach of relying on self-reported likelihood to vote would have been more accurate. An unusual case was TNS’s final poll, which did not use a turnout model at all, but did include data on what their figures would have been if they had. Using a model based on people’s own estimate of their likelihood to vote, past vote and age (but not social class) TNS would have shown figures of 54% Leave, 46% Remain – less accurate than their final call poll, but with an error in the opposite direction to most other polls.

In summary, it looks as though attempts to improve turnout modelling since the general election have not improved matters – if anything the opposite was the case. The risk of basing turnout models on past voting behaviour at elections or the demographic patterns of turnout at past elections has always been what would happen if patterns of turnout changed. It’s true middle class people normally vote more than working class people, older people normally vote more than younger people. But how much more, and how much does that vary from election to election? If you build a model that assumes the same levels of differential turnout between demographic groups as the previous election then it risks going horribly wrong if levels of turnout are different… and in the EU ref it looks as if they were. In their post-referendum statement Populus have been pretty robust in rejecting the whole idea – “turnout patterns are so different that a demographically based propensity-to-vote model is unlikely ever to produce an accurate picture of turnout other than by sheer luck.”

That may be a little harsh, it would probably be a wrong turn if pollsters stopped looking for more sophisticated turnout models than just asking people, and past voting behaviour and demographic considerations may be part of that. It may be that turnout models that are based on past behaviour at general elections is more successful in modelling general election turnout than that for referendums. Thus far, however, innovations in turnout modelling don’t appear to have been particularly successful.

Reallocation of don’t knows

During the campaign Steve Fisher and Alan Renwick wrote an interesting piece about how most referendum polls in the past have underestimated support for the status quo, presumably because of late swing or don’t knows breaking for remain. Pollsters were conscious of this and rather than just ignore don’t knows in their final polls, the majority of pollsters attempted to model how don’t knows would vote. This went from simple squeeze questions, which way do don’t knows think they’ll end up voting, are they leaning towards or suchlike (TNS, MORI and YouGov), to projecting how don’t knows will vote based upon their answers to other questions. ComRes had a squeeze question and estimated how don’t knows would vote based on how people thought Brexit would effect the economy, Populus on how risky don’t knows thought Brexit was. ORB just split don’t knows 3 to 1 in favour of Remain.

In every case these adjustments helped remain, and in every case this made things less accurate. Polls that made estimates about how don’t knows would vote ended up more wrong than polls that just asked people how they might end up voting, but this is probably co-incidence, both approaches had a similar sort of effect. This is not to say they were necessarily wrong – it’s possible that don’t knows did break in favour of remain, and that that while the reallocation of don’t knows made polls less accurate, it was because it was adding a swing to data that was already wrong to begin with. Nevertheless, it suggests pollsters should be careful about assuming too much about don’t knows – for general elections at least such decisions can be based more firmly upon how don’t knows have split at past general elections, where hopefully more robust models can be developed.

So what we can learn?

Pollsters don’t get many opportunities to compare polling results against actual election results, so every one is valuable – especially when companies are still attempting to address the 2015 polling failure. On the other hand, we need to be careful about reading too much into a single poll that’s not necessarily comparable to a general election. All those final polls were subject to the ordinary margins of error and there are different challenges to polling a general election and a referendum.

Equally, we shouldn’t automatically assume that anything that would have made the polls a little more Leave is necessarily correct, anything that made polling figures more Remain is necessarily wrong – everything you do to a poll interacts with everything else, and taking each item in isolation can be misleading. The list of things above is by no means exhaustive either – my own view remains that the core problem with polls is that they tend to be done by people who are too interested and aware of politics, and the way to solve polling failure is to find ways of recruiting less political people, quota-ing and weighting by levels of political interest. We found that people with low political interest were more likely to support Brexit, but there is very little other information on political awareness and interest from other polling, so I can’t explore to what extent that was responsible for any errors in the wider polls.

With that said, what can we conclude?

  • Phone polls appeared to face substantially greater problems in obtaining a representative sample than online polls. While there was variation within modes, with some online polls doing better than others, some phone polls doing worse than others, on average online outperformed phone. The probability based samples from the BES and the NatCen mixed-mode experiment suggested a position somewhere between online and telephone, so while we cannot tell what they would have shown, we should not assume they would have been any better.
  • Longer fieldwork times for telephone polls are not necessarily the solution. The various analyses of how people who took several attempts to contact differed from those who were contacted on the first attempt were not consistent, and the companies who took longer over their fieldwork were no more accurate than those with shorter periods.
  • Some polls did contain too many graduates and correcting for that did appear to help, but it was not a problem that affected all companies and would not alone have solved the problem. Some companies weighted by education or had the correct proportion of graduates, but still got it wrong.
  • Attitudinal weights had a mixed record. The only company to weight attitudes to the BES figures overstated Remain significantly, but Opinium had more success at weighting them to a halfway point. Weighting by social attitudes faces problems in determining weighting targets and is unlikely to have made other online polls more Leave, but could be a consideration for telephone polls that may have had samples that were too socially liberal.
  • Turnout models that were based on the patterns of turnout at the last election and whether people voted at the last election performed badly and consistently made the results less accurate – presumably because of the unexpectedly high turnout, particular among more working class areas. Perhaps there is potential for such models to work in the future and at general elections, but so far they don’t appear successful.