To understand polling we need to start with the sample. A common naive criticism of polls is that 1000 people cannot possibly represent the views of 60,000,000 people. George Gallup, the father of modern polling, used to reply to the point by saying that you don’t need to drink a whole bowl of soup to know if it is too salty – providing it is properly stirred a single spoonful will suffice.
Of course, if a sample of 1,000 people is drawn, say, entirely from people having dinner at the Carlton Club or drinking in a Working Man’s Club, it won’t be representative of the country as a whole. The key challenge for pollsters is to get a sample of people that is representative. A sample of 1000 people needs the same proportions of young and old people, rich and poor people, southern and northern people, right wing and left wing people as the country as a whole does.
The mathematics behind polling is actually very simple. If I have a bag with 10000 red balls and 10000 blue balls in it, then it’s clear that if I take 1 ball out of the bag there is a 50% chance that it will be red, a 50% chance that it will be blue. If I take 10 balls out the bag it’s most likely that they will be roughly 50/50 between red and blue balls – we might get 4 red, 6 blue or 7 red and 3 blue, but it is very unlikely indeed that we will get 10 blue balls. If I take out a hundred balls, it’s increasingly likely that it will be roughly equal between red and blue balls. Splits like 53-47, 45-55, 43-57 are perfectly possible, but a split of 10-90 is spectacularly unlikely.
The perfect way to obtain a sample for a poll is to obtain it randomly, with each person in the country having an equal chance of being selecting and taking part in the poll. This is uncontroversial, you shouldn’t ever find a pollster who disagrees. The problem is that it is so far divorced from reality as to be laughable.
If you are conducting a poll of voters, then you do at least have a list of all the people who could be in your poll – the electoral register. You can easily randomly pick 2000 people from it. The problem is contacting them, the electoral register doesn’t have phone numbers, let along email addresses – you would need to send a letter, or arrive on the doorstep of each one (some well funded academic or government studies actually do this). That would be a genuinely random sample… assuming you made contact with every last one, and every last one of them agreed to be interviewed.
Of course, they don’t (especially for a postal survey, which have notoriously rubbish response rates), and for a political poll in the media the cost and time committment would make this approach untenable anyway. Instead media polls use two alternate routes to getting representative samples: quasi-random sampling, and quota sampling.
It does what it says on the tin. Genuinely random sampling is too costly and time-consuming, so quasi-random samples are taken from an alternate list of households, rather than people. It doesn’t include every household, but it does include the necessary information to contact them very quickly at a low cost: the telephone directory. In fact taking numbers directly from the telephone directory could produce a bias if ex-directory people have different views (apparently they do – they are more Tory), so what pollsters actually do it take phone directory numbers and randomise the last digit – since residential numbers are in blocks, this should ensure ex-directory people are also rung up.
This sampling is not entirely random of course. People with no landline or who use only a mobile phone cannot be picked (though people on the Telephone Preference Service can – it bars only sales and marketing calles, not market research), people who are unemployed, retired or work from home are more likely to be about when the pollster calls, people with busy social or work lives are less likely to be in. There may also be attitudinal biases – people who are willing to give 20 minutes of their life to a stranger on the phone asking impertinant questions may have a different outlook on life to those who won’t.
In fact this leads to some systemic biases, the main pollsters using quasi-random polling, ICM and Populus, both find too many Labour voters in their raw samples and have to use weighting to correct it. Weighting is dealt with in detail here.
If the logic behind quasi-random sampling is to rely on random chance to get a sample that has roughly the right demographic proportions, quota sampling does things the other way round. If we know that a representative sample has 520 women and 480 men, interviewers are sent out and told to find 520 women and 480 men. Rather than using chance, the sample profile is designed and then filled to order.
Typically quota sampling was associated with face-to-face polling, and interviewers would be sent off to one of 200 odd sampling points with each given a quota of something like “2 middle class women over 55, 1 working class man under 25 and 2 working class women 25-55”, etc. The sample was not random, interviewers would go door to door looking specifically for the type of people they needed to fill their quota.
Since 2008 Ipsos MORI who were the only regular pollster using this traditional method for their voting intention polls have switched to using phone polling, so there is no need to dwell on it in great detail. However, YouGov’s method of carrying out online polls have some similarities to it.
YouGov’s samples are drawn not from the public at large, but from a panel of volunteers who are paid to take polls. Clearly this panel is in itself unlikely to be representative so neither would a random sample drawn from it. Instead samples are designed, with invites to take part going out to the correct proportions of people of each gender, age group, social class and so on.
The fact that YouGov have a knowledge of the demographics of members of their panel means, unlike quasi-random samples, they know who they are inviting so in theory can design a sample that is representative by many different measures. In practice though, their samples still require weighting to be representative because of variable response rates.