Fifty percent of Bovine TB due to badgers? A spurious statistic and how it was created

A much-quoted statistic about the Bovine TB (bTB) transmission is that ‘it has been estimated that 50% of bTB incidents could be attributed to infected badgers’.

This 50% figure has appeared in the House of Commons debates on the subject, both in main debates and in committees, as well as in various DEFRA publications. As a simply-understood and memorable figure amongst a welter of quite complex statistics, it now forms one of the main planks in the pro-cull argument. The fact that it was produced by an eminent statistician goes to strengthen it further.

But in fact the figure is based on fundamentally mis-applied statistics , and has arisen from a process of ‘sexing-up’ figures derived from a very thin set of data. I use that term as it reminds me of the 45-minute figure in the Iraq debate: spurious, simple to take on board, and crucial in convincing Parliament.

And so before the fifty percent figure goes any further in its potentially destructive journey, I would like to outline how it was arrived at, why it can be regarded as spurious, and how the statistics surrounding it have been ‘sexed-up’.

Derivation

The figure was originally published in a paper that was attempting to derive various figures from the Randomised Badger Culling Trial (RBCT) triplet data (Donnelly & Hone 2010). In this case the derivation uses the observed TB incidence in each triplet in the year before proactive culling began, and compares it to a prediction of the overall incidence of TB if no infection came from badgers.

The latter figure was projected from a mathematical model derived from a comparison of TB in the cull areas and the control areas . The same paper also outlined a model to take into account the prevalence of TB infection in the badgers as the culls began.

While the modelling is explained in some detail, there is one calculation that did not appear in the paper: that of the final fifty percent ‘estimate’. This was sent subsequently to DEFRA in a letter in January 2012 (link at end of post) from just one of the authors, Donnelly, and reveals not only how spurious this value is, but how the statistics have oddly been presented in a way which hides its imprecision: a bit of a surprise considering its author is a Professor of Statistical Epidemiology.

A Spurious Estimate

First let’s look at the data used, and the explanation of the estimate. This was produced in a table:

Trial area

A

B

C

D

E

F

G

H

I

J

Observed

0.113

0.099

0.076

0.113

0.034

0.029

0.029

0.175

0.150

0.070

Fitted

0.087

0.062

0.041

0.122

0.059

0.051

0.068

0.068

0.126

0.087

Estimated

proportion attributed to infectious badgers*

60.6%

44.2%

16.8%

71.7%

41.2%

32.7%

49.0%

49.0%

72.7%

60.6%

The ‘Observed’ row is the per-herd bTB incidence in the year before the cull. The ‘Fitted’ row is the figure produced by the model as described in the 2010 paper, and is the figure used to produce the estimate.

The Estimated Proportion uses ‘the prediction, based on the same model, for each area had there been no badger-to-cattle transmission within the area (0.034)’. This is a direct quote from the letter. So for A it is simply 0.087-0.034/0.087 as a percentage, 60.6%. Simple!

But where did the 0.034 figure come from? That isn’t stated in the 2012 letter, but if you turn to the original 2010 paper, (and I wonder how many MPs have done this?) you can find it in the abstract. Quote: ‘Based on the model best fitting all the data, 3.4% of herds (95% CI: 0 –6.7%) would be expected to have TB infection newly detected (i.e. to experience a TB herd breakdown) in a year, in the absence of transmission from badgers.’

It is the qualification of the figure ‘(95% CI: 0 –6.7%)’ that I want to draw attention to. It means basically that, using the model, a prediction of the figure anywhere between 0 and 6.7% has a 95% chance of being right. Normal statistical practice, then, would be to use the two extremes to derive an estimated proportion from this prediction, so that you can state that it is very likely that the estimate lies between those two percentages.

Let’s do just that. Firstly, the zero: this means that 100% of bTB being due to badgers cannot be ruled out. So the estimated proportion in every case would be 100%.

Next, the 6.7%. Use this figure instead of the 3.4% and the table looks very different:

Trial area

A

B

C

D

E

F

G

H

I

J

Observed

0.113

0.099

0.076

0.113

0.034

0.029

0.029

0.175

0.150

0.070

Fitted

0.087

0.062

0.041

0.122

0.059

0.051

0.068

0.068

0.126

0.087

Estimated

proportion attributed to infectious badgers*

23.0%

0%

0%

45.1%

0%

0%

1.5%

1.5%

46.8%

23.0%

The mean is now 14.1%. So the estimate has, statistically, a good chance of being anywhere between 14.1% and 100%!

If the ‘fitted’ figures were themselves calculated statistically, with confidence limits, there would be a further inaccuracy in the 50% figure, but one source of inaccuracy is perfectly good enough for this argument.

Sexing-up the stats

I am perfectly happy to have my method and maths taken apart, but, even with my errors, the point is that the 50% estimate was based on very little data. That is why the confidence limits are so wide apart.

The correct statistical conclusion, one that I hope, say, an A-level student would make, is that there is too little data to make any firm pronouncement. Although it looks as though SOME bTB is transmitted by badgers, the proportion cannot realistically be estimated from this dataset.

However, this does not suit DEFRA’s agenda of justifying a badger cull. How tempting is it, therefore, to, ahem, ‘forget’ to mention those wide confidence limits in a non-peer-reviewed letter presented to MPs? They can always be mentioned at a later date, if queried. And this, in fact is what has happened in one of the subsequent committee minutes. But the main, firm-sounding, unqualified, sexed-up-by-omission figure of 50% transmission by badgers is what remains in the mind, and indeed still appears in unqualified form in publications like DEFRA’s consultation document where they present again their pro-cull stance.

But the spurious accuracy goes further. Not only is the 0.034 predicted incidence figure presented without confidence limits in the letter, but it is re-stated as 0.03447. A note states that ‘I have provided further decimal places here to make the calculation clear.’

But don’t those extra decimal places make the figure look even more precise? What stunning accuracy the predictive model must be able to achieve! No mention that the confidence limits are over six thousand times as wide as the accuracy implied in that final significant figure. If ever a statistical figure had come out ‘sexed-up’ to show spurious validity, this is it.

These two examples could be schoolboy-level mistakes in statistics: one might expect them from a naive but enthusiastic A Level student, and would put red ink all over them. But coming from a Professor of Statistical Epidemiology, it seems somewhat unlikely that they are mere inadvertent errors.

Either way, in my view Professor Donnelly should be professionally ashamed of herself for allowing her statistics to produce a spurious result like this: a figure that is hardly better than guesswork, but that has been subsequently misused to great effect by the pro-cull lobby, and could well be seen as misleading Parliament.

Jamie McMillan

Briantspuddle, Dorset

16 September 2013

Christl A. Donnelly  & Jim Hone (2010) : Is There an Association between Levels of Bovine Tuberculosis in Cattle Herds and Badgers? (Statistical Communications in Infectious Diseases Volume 2, Issue 1)

Donelly (2012) : Letter to DEFRA, 6 Jan 2012 9974_LettertoDefraregardingestimateoftheproportionduetobadgers-Jan2012

 

UPDATE 20 September

Here is a classic example of the use of the 50% pseudo-statistic to support culling:

http://www.tbfreeengland.co.uk/latest-news/tb-free-video-the-vets-perspective/

It was posted in one of DEFRA’s tweets which also quoted 50% as cast-iron.

Now it has been discredited, the pro-cull lobby and DEFRA will of course fight to rescue it, and it appears that maths are grinding away in an effort to narrow the confidence interval down from its near-random 86%.