I wrote last November about the dangers of cherrypicking out figures in crossbreaks to come up with sensationalist stories that don’t actually reflect the truth – and I spend an inordinate amount of time nagging about not paying too much attention to regional crossbreaks. Nevertheless, they never seem to go away.

On Friday, for example, the New Statesman was getting overexcited about the crossbreak for under 25s in the most recent YouGov poll, which showed Labour 47 points ahead of the Conservatives amongst young people. The figures were based on a sample of only 71 people, so the margin of error was about 12 points (in fact, given that the figures were re-percentaged to exclude Don’t knows and Won’t Votes it was actually even lower – only 45 people under 25 actually gave voting intentions, giving a margin of error of plus or minus 15 points.)

If the New Statesman had taken the time to look at other recent cross-breaks for young people it should have become clear that (a) the figures were very volatile, as you’d expect from such a small sub-sample and (b) that this was an outlier. The average figure for the rest of the last week was CON 24%, LAB 49%, a lead a little over half of Friday’s (this is still a very large Labour lead of course, but not unsurprising given they have a 12 point lead nationally and there tends to be a correlation between age and voting intention, with young people more Labour and older people more Conservative).

Another example this week was David Skelton at Platform 10, citing regional cross-breaks from Populus polling to demonstrate that support for gay marriage isn’t just amongst a metropolitan elite, but is actually higher in blue-collar Northern areas. Now, while I suspect David’s ultimate argument is correct (after all, it’s not like only Southern middle class people are gay or get married), the evidence he cites doesn’t really hold up. 81% of respondents in the North East did indeed tell Populus that they supported gay marriage… but it was on a sample size of 45 people, giving a margin of error of 15 points and meaning support for gay marriage in the North East was not actually significantly different to that in London.

Here’s what to remember about cross-breaks

1) Cross breaks often have small sample sizes and are not internally weighted.They are hence very volatile and imprecise, especially for things like age and region where some sample sizes are below 100, and very little weight should be given to them. For a sample size of 200 the margin of error rises to plus or minus 7 points, for 100 it rises to plus or minus 10 points.

2) Where you have a regular tracker such as the YouGov daily poll, the sheer volume of data means it is inevitable that volatile crossbreaks with large margins of error will sometimes produce results that look extreme. However odd these look, unless there is a sustained pattern they are not meaningful. If the actual figure is 50%, but you’ve only got 70 respondents, then you ARE sometimes going to get results showing 62% or 38%… purely from random variation.

3) All this goes double or triple for voting intention polls! For most polls the precise figures don’t matter – it is much the same story if 30% of people support a policy as if 40% do. In contrast, there is a world of difference between Labour being at 30% and Labour being at 40%. When it comes to voting intention, crossbreaks in a single poll should basically be ignored.

A couple of months ago Lewis Baston asked me an interesting question on Twitter. Given that regional cross-breaks on polls are so consistently misrepresented and misunderstood, should pollsters publish them at all? It does make me ponder. My starting point is always that it is good for pollsters to be as transparent as possible, unless there is a good reason not to be open, we should be.

Some crossbreaks are very useful in understanding and interpretting polls – think, for example, of how much voting intention cross-breaks help our understanding of leader approval ratings, best PM figures or my bete noire of “would policy X make you more likely to vote Y” questions. Sometimes they do show interesting things (look, for example, at the huge gender contrast you find in polls on nuclear power or nuclear weapons), or many issues where there is a clear correlation with age. Regional cross-breaks are, admittedly, less obviously useful but there are many instances when cross breaks are extremely beneficial to our understanding of polls if looked out as crude indicators of trends and correlations, rather than taken out of context.

I wouldn’t want to see pollsters stop giving out data, even data of limited use, for fear of it being misunderstood. The solution is really for political journalists to better understand polling and statistics. Some people will always misunderstand or misrepresent polls…but political journalists shouldn’t, they are too important a part of politics today.


73 Responses to “A reminder about crossbreaks”

1 2
  1. Sorry

    My post starting “A reasonable check on the accuracy of your calculations” was for Statgeek.

  2. @OLDNAT

    “In your graph, the Lab and SNP figures are both depressed – though there seems to be a similar 5% Lab lead.”

    According to my calcs (obviously YG poll data), Labour have a 10.5% lead in Scotland, which is interesting. The closest it has been is 3.6%, but that’s through a collection of polls, rather than a short trend.

  3. STATGEEK

    10.5% around 21 May? I must have misread your graph.

    It also suggests that there is something seriously different between a “real” Scottish poll and the X-break average, if you are showing a Labour lead double what a proper poll shows.

  4. @OLDNAT

    “If there is significant variation from these figures in your data, then there be a systemic flaw in YG methodology which your averaging can’t pick up.”

    I’d guess at the flaw being the sample sources. If the majority of Scots reside in the area South of Perth and North of Ayr, then it’s likely the majority of polling samples are taken from these areas. These areas are more Labour than any other part of Scotland, and less Con/Lib/SNP than other parts of Scotland.

    I have a sneaky suspicion that the majority of polling is done in the Central belt area, but have absolutely no basis for my suspicions other than the human nature of many people from the South of the UK having a ‘stop at Edinburgh / Glasgow’ mentality.

    North of Perth is more for carriage, so it must be barren and wild. :)

  5. STATGEEK

    :-)

    Though this discussion really proves Anthony’s point about X-breaks and VI!

  6. @OLNAT

    No. On the 21st May, it was 6.8%

    See the data since May 1st.

    h ttp://imageshack.us/photo/my-images/802/maddata.png

    (10.4% on most recent btw)

  7. What effect will Greece going through to the Euro quarter finals have on tomorrow’s vote?

    Czechs through as well!

  8. STATGEEK

    I didn’t think I’d misread your graph by that much!

    I’ll still go with the proper poll though.

    Lab led SNP by 5% on Westminster VI
    SNP led Lab by 8% on Holyrood Constituency VI

    Two dominant parties in Scotland, and circumstances will determine election results.

  9. Tonight I’m predicting

    Con 32.0
    Lab 43.2
    LD 8.5

  10. @OLDNAT

    “It also suggests that there is something seriously different between a “real” Scottish poll and the X-break average, if you are showing a Labour lead double what a proper poll shows.”

    Hmm. I don’t make the data, I just fiddle with it. :)

    Take the last ten crossbreaks:

    Lab: 45, 45, 36, 44, 39, 31, 43, 30, 35, 43,
    SNP: 31, 28, 39, 28, 30, 28, 27, 37, 35, 34,

    The 30 and 31 in the labour set are outliers on the 30-poll MAD calcs. On the SNP set however, the 27, 35, 37 and 39 are outliers. This means that unless the SNP start getting sustained scores above 34% the number won’t rise.

    For all I know the method used perhaps isn’t suitable for crossbreaks, but it does show suitbale trends. See the UK MAD for example:

    h ttp://imageshack.us/photo/my-images/715/ukmad.png

    That shows that the most recent MAD average is a Labour lead of 12.5%, which is pretty much in line with what we’ve been seeing in the past week or two.

    I think that the Labour lead in Scotland has risen due to Salmond’s impending Leveson appearance, and Lamont’s ‘better than Grey’ appearances at Holyrood. I expect the SNP’s VI to rise a little and Labour’s to drop a little in the next week or two (unless there’s some other impending attack on Salmond).

  11. OldNat

    I predict a very low turnout as the whole of Greece has a hangover tomorrow.

  12. I am just wondering about the John Cruddas input on the
    future of the labour party.I do think that EM is thinking
    along diffIerent lines to the present political orthodoxy.I
    think this could be very interesting.We cannot go on as we
    are perhaps.

  13. @ANN IN WALES
    `I do think that EM is thinking
    along diffIerent lines to the present political orthodoxy`

    I read the same article…What Ed has done well is get the most talented to work for him…I am glad that Crudas indicates that David Miliband and Purnell will join the team though David joining will cause some confusion amongst voters about Ed and some more confused presenters.

  14. Tonights YG prediction

    Labour 44%
    Tory 29%
    LD 8%
    UKIP 9%

  15. Statgeek

    Of course there’s “some other impending attack on Salmond”. That’s how SLab operate – despite none of their charges holding up.

    Whether such issues actually affect VI is another matter. If your thesis was correct, then there would be similar movement in Westminster and Holyrood VI. We have no way of knowing whether that is the case or not.

    That significant numbers of Scots vote differently for Westminster and Holyrood makes me dubious about YG’s idea that “party identity” works as well in Scotland as it does in England.

    I have long believed that YG really need to start reusing their loyal/disloyal labelling in Scotland – as they did in GB for Labour prior to the last UK GE.

  16. @Alan
    “I predict a very low turnout as the whole of Greece has a hangover tomorrow”

    I’m not sure how that differs from any other morning…

  17. POSTAGEINCLUDED

    Are we back to blaming “inferior” people for their problems?

    “Feckless drunks – what can you expect?” – so reminiscent of attitudes to the Irish and others over many years.

  18. Old Nat/Statgeek, my understanding from Anthony’s initial and subsequent comments in reply is that calculating VI from crossbreaks is not valid but that movements in the averages produced by crossbreaks can be a reasonable indicator of trends.

  19. Smukesh,I think DM would be great if he could be persuaded to come on board.Not sure about purnell
    though.What is interesting is the different way of looking
    at things.The winds of change are blowing.

  20. JIM JAM

    While I don’t disagree that trends among sub-samples can be suggested by X-breaks, they still remain subject to lots of variations. Hence the need to compare thim with properly weighted polls.

  21. New thread

  22. @OLDNAT

    “Are we back to blaming “inferior” people for their problems?”

    No. Keep your hair on. We just can’t see what’s so important about a football match. It’s only a game.

    Actually I have a lot of sympathy with the Greeks. A currency union without a single government will always benefit some at the expense of others (even if it’s only a currency union of two independent nations)

    Can’t see my Irish ancestors being all that pleased at the way you immediately axxociate them with drunkenness though, admittedly, they do drink a good drop more than the Greeks, as do we English, and you Scots.

  23. I mentioned last night I was following le Mans on its exciting circuit web site coverage on the net.

    On a political note, would a U turn be a ‘Tête à queue’?

1 2