Sometimes I worry about the level of statistical expertise required these days in many fields of scientific endeavour – remembering here once reading, more than thirty years ago, about the bad statistics in many of the psychology papers published in very respectable journals – this being when things were a lot simpler, when there were a lot fewer statistical packages and a lot fewer computers to abuse them with than there are now. Not to mention the heavy statistical lifting needed to get a Higgs boson out of the inverse femtobarns of raw stuff they get out of the LHC (see reference 1). Do the teams which knock out all the scientific paper take sufficient care with their statistics in their scramble up the academic ladder? Or their scramble for grants? Do they have enough of the sort of statistical support that they need? Are there enough statisticians of the right sort about – that is to say, not the sort that I used to be? One did need to know stuff, but one did not need to know much about statistics to do population statistics. And sometimes I struggle with the bad writing which often seems to clothe what one supposes – or at least hopes – to be good science.
Then a few days ago, more or less by chance, I came across a paper all about lack of statistical rigour in one particular corner of the neurological field. A paper which was written back in 2008 and made quite a splash for itself at the time by including the word ‘voodoo’ in its title, a word which was removed from the more prosaic title to the second edition, ‘Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition’. A paper which attracted a huge amount of debate, reaching out to the lay press, with one blogger listing near 150 contributions. A paper which led me onto others, and all of which was all the more alarming to me as I had some considerable trouble working out even some of what the authors were on about, despite having some statistical background, and despite some of the authors’ evident concern to keep things simple.
So I tried to help myself out with a story.
Suppose we do a series of paired fMRI scans to investigate the effect on the brain of some bit of behaviour. Perhaps with the first of the pair being neutral – resting state might be the right bit of jargon – and the second of the pair being when the subject is doing the bit of behaviour in question. I think the technical term for such a pair is a contrast. For example, thinking about something important, perhaps the price of red lentils from Ontario, a few months before the next harvest is in. Perhaps being shown a picture of same. Or perhaps being frightened or disgusted by something – but we stick with lentils for the moment. The idea then is that you add all this up, play spot the difference between the summed versions of each of the two elements of the pair and then come up with the bit of the brain which worries about the price of lentils.
Elaborating, let us further suppose that we do twice 5 scans for each of a 100 people, once for neutral and once for lentils. Simplifying, we still further suppose that each scan comprises a sequence (in time) of 50 volumes (a three dimensional image), with each volume made up of 100,000 voxels, roughly 3mm cubes of brain, perhaps arranged as 30 more or less horizontal slices, each of 60 by 60 voxels. For each such voxel we have the value of some tricky variable, probably a floating point number, perhaps a variable called activation, a measure of how busy the neurons in that voxel were at the time of the scan, a period of perhaps 3 seconds. Roughly 5 billion numbers, so quite a lot of data, although still quite a lot less than the raw output from the scanner.
To keep things in proportion, voxels might be quite small, but might contain, on average about 150,000 neurons each – a total of 15 billion. I exclude the cerebellum which has another 75 billion, with both counts being something of a moveable feast. But the important point here is that one could build a very big computer program, doing all manner of clever things, out of 150,000 of the artificial neurons implemented in lots of computer packages. A voxel could be doing a lot of stuff under the covers.
Let us simplify some more and suppose that we are not interested in progression through time, in time series, that we discard the many and varied possibilities for comparison offered by working with a time series for each voxel rather than just a number. We are just doing scans over short periods of time for which we have some bit of behavioural data. To lentil or not to lentil.
Quite a lot of fancy statistical footwork is needed to turn the raw output of the scanner into these numbers, into these volumes, that is to say into pictures of activation that you or the computer can look at.
Quite a lot more fancy statistical footwork is needed to map all those pictures onto a standard brain so that you can compare the one with the other. Or add the one to the other.
And note that despite all the fancy footwork there is still a lot of noise. The brain itself is noisy and the business of collecting data from lots of brain over lots of time is also noisy. You can’t smooth all this noise out without smoothing most of the signal out with it – so there will be plenty of noise left. However, provided it is the right sort of noise, when one adds up lots of scans, the noise should add to zero, leaving pure signal. Provided, that is, that the signals don’t add up to zero too.
Leaving those problems aside, you can then ask Cortana, Watson or whoever to find a region of the brain from which you can reliably predict whether the scan in hand is a lentil one or a not to lentil one. You might simplify things by requiring any such region to be sensible in shape, perhaps something like a small sphere, rather than a torus or, worse still, a collection of bits and pieces taken from all over the brain.
Which all sounds fair enough. But I think the problem now is that the chances are that you will find such regions even when you feed in random data rather than brain data. There are enough small regions that some of them are going to be significant some of the time. Known elsewhere as the multiple comparison problem, well known to wikipedia.
I think an expensive way to deal with this problem – I am thinking here of the considerable expense of all this scanning – is to use some brand new, clean data to test whatever hypothesis you had come up with. Perhaps to get some rival team to do the test. Does your significant region retain its significance with a new set of data? Or was your fine hypothesis grounded in noise and chance?
The good news is that all this may not as expensive as it would have been as recently as 2008. Lots and lots of fMRI data is now being loaded into public databases, for the use of all. There are lots of data out there to be mined.
So having got that bit of scanning for dummies off my chest, I hope to return to the substance of the voodoo paper in due course. Are the statistics really that dodgy? In the meantime, for the convenience of the keen reader, I list the references – with google well up for the ones that are not already links.
PS: I forget from where I got the illustration from. But thank you, whoever you are.
Reference 1: http://psmv2.blogspot.co.uk/2014/06/and-on-to-physics.html.
Reference 2: Voodoo Correlations in Social Neuroscience – Edward Vul and others – 2008
Reference 3: Circular analysis in systems neuroscience – the dangers of double dipping –Nikolaus Kriegeskorte and others – 2009
Reference 4: Confounds in multivariate pattern analysis: Theory and rule representation case study – Michael T. Todd and others – 2013
Reference 5: http://www.pea-lentil.com/.
Reference 6: http://blogs.discovermagazine.com/neuroskeptic/#.VomFHo_XLIU, the starting point of all this commotion.
Reference 7: http://www.ibm.com/smarterplanet/us/en/ibmwatson/. I don’t think Watson is actually onto all this yet. But it is probably only a matter of time.
No comments:
Post a Comment