Learning Bayesian stats with Sama: April


Now to activity two. I'm reading the papers in the order Sama suggests.

On the past and future of NHST

What strikes me in the abstract is:

Our primary conclusion is that NHST is most often useful as an adjunct to other results (e.g., effect sizes) rather than as a stand-alone result. We cite some examples, however, where NHST can be profitably used alone. Last, we find considerable experimental support for a less slavish attitude toward the precise value of the probability yielded from such procedures.

It reminds me of the point Mlodinow makes in his book, Subliminal: How Your Unconscious Mind Rules Your Behavior. He points to studies that show how folk cling to ideas and theories in which they are heavily invested. But as I work through the paper the balance they achieve is pretty good.

And it, NHST is 300 years old. So that puts it back to the time of the good Reverend's work. McGrayne cites Arbuthnot in relation to LaPlace's research on gender patterns of babies. A lot of dots to join but that's the case with any new terrain.

They write of the misuse of not only NHST but also familiar measures like the mean. It's also the case that all of the methods deployed on the other side of the fence can be misused but we tend not to talk to much about that. I like that fact that they go back to origins.

The history of Fisher and innovations strikes an ANT chord for me and the notoriously difficult job it is to move or replicate a change from one place to another.

There is also more than a touch of a Bayesian approach when they write (my emphasis):

Fisher believed NHST only made sense in the context of a continuing series of experiments that were aimed at nailing down the effects of specific treatments.


NHST as it is used today hardly resembles Fisher's original idea. Its critics worry that researchers too commonly interpret results where p > 0.05 as indicating no effect and rarely replicate results where p < 0.05 in a series of experiments designed to confirm the direction of the effect and better estimate its size. This conception is of a science built largely of single-shot studies where researchers choose to reach conclusions based on these obviously arbitrary criteria.

This is a really useful paper for folk wandering into this stuff. I suspect It will wrankle those heavily invested in single-shot studies.

Then to effect size considerations. Also useful. I'd probably add, effect sizes compared with known effect sizes. There was a point about the radiation does you get from eating 1,000 bananas on Q&I. Compared to background radiation hardly worth calculating. And always loved the Will Rogers quote:

What we don't know won't hurt us; it's what we do know that ain't!

Pace a great deal of the logics of teaching, learning and thinking about their wicked problems1.

Gems all through the paper. I'd vote to include this paper for most folk trying to map this stuff.

Now to Teaching Confidence Intervals (CIs) by Fidler and Cumming2. A neat illustration of the value of CIs. Useful collection of research re the misuse/misinterpretation of CIs. And then to Confidence Level Misconception! Stunning bit of research about how a sample of gurus got it wrong! Need to work through their example… time presses.

Now to Trafimow and Marks' editorial. The first mention of Bayes and the Laplacian assumption, i.e. begin with equal probabilities for each possibility.

Best line in the editorial:

the state of the art remains uncertain

Why is it that certainty holds such prominence in the social sciences? The more we know the less we know…

Love the use of the metaphor crutch for NHST. What are the crutches for the other side of the fence? I'm avoiding the use of the two Q labels which are simply unhelpful.

And the reactions to the editorial…. what fun! Love this from Stephen Senn:

The problem is not the inference in psychology it’s the psychology of inference. Some scientists have unreasonable expectations of replication of results and, unfortunately, many of those currently fingering p-values have no idea what a reasonable rate of replication should be.

and echoing Bayes again,

The editors would be better exercised in promising space to studies that try to repeat previous studies rather than trying to ban all inferential statistics. They should also try to promote a better standard of inference (proper control, pre-specification, avoiding spurious precision, dealing with regression to the mean etc). Even the most dogged anti- frequentists rarely go so far as outlawing the humble standard error.

If you don’t make mistakes you don’t learn. Attempting to eliminate false positives in inference is to attempt scientific sterility and banning formal inferential methods won’t even help to achieve this foolish aim.

Ah. Good to see the debate than the usual dry and too polite commentary. Cumming citing the work of John Ioannidis on the reproducibility of results. I did not realise the p-values danced :).

And the last word from Robert Grant

If we trained researchers to consider all subjectivities and personal biases, and to be open about them, in the way that good qualitative researchers are, far fewer errors would be made. A little dose of philosophy of science early on in training could help avoid common pitfalls later.

This post is excellent as a beginning point for thinking about these issues. Each point of view maps an argument that can be traced back. Compare that with the recipe-based approach to teaching stats.

And then the final word by a seriously heavy organisation, the American Statistical Association. It's as if they are saying, OK all you p-value users you need to do p-values 101! :)

Thinking about an answer to b): The debate spins from the provocations of the editorial but I suspect these issues have been simmering for a very long time. I never had a sense reading or listening to folk who research in education sharing the kinds of reservations and sensibilities that were put to the fore in the editorial and subsequent follow ups. To me, it seems to be saying that the poor use of p-values and related statistical measures is an indication of poor/bad/sloppy research. It made me wonder what were the bluffing devices used in the other approach, the largely numbers-free kind of research.

I did not do 3 because I "know" the problem. I really don't know it but am familiar with how to think about it. It does not mean my sensibility is good or useful, just rehearsed. I still stare at the kinds of examples Silver (The Signal and the Noise: Why So Many Predictions Fail--but Some Don’t) uses what I assume to be a classic illustration using mammogram screening for breast cancer. I want to get to the point where i can read the argument and the mind nods rather than still reaches for the intuitive response.

Final Exercise

Sama wants to know:

  • What was effective or could be improved?
  • What types of students would benefit (before or after improvement)?

I found the framing useful to a point but you might get a better sense from the way others responded to it. I think the first two chunks complement nicely, i.e. why do you want to know about this stuff and then a fun and entertaining, to me, airing of the issues. You could provide equally informative links back to other sources but for a once over lightly it works pretty well, for me.

For students who know zero about stats it might be worth some kind of non-recipe but case-based illustration of some good stats. Maybe a text that sits alongside an actual text saying things like, this means etc. So a highly and engaging text of a worked paper might be handy to get folk to realise that this stuff can be done well. Maybe a counter example as well. Same method, lots of annotation.

For others, particularly in education who treat stats as a kind of black box, much the way folk typically treat Leximancer-based analyses, this would open up the considerations that they probably did not get or ignored when they were learning how to do this stuff. What was interesting was the thread or plea to add more description in the discussion of that editorial. I keep thinking that we have an A4 mindset at play here, i.e. we write papers about this work because we still think there are limits to what can be published. That is true of the $-gouging publishers who cling to their old models of publication but there are moves to make/mandate that data generated via publicly funded research is available to the public. The same could easily apply to models where you could play with the model rather than simply being told how it behaved and what the significant features/limitations were.


So, to activity one. I became interested in Bayesian stats when, in 2014, I came across a presentation from [*edge.org Edge] by Sendhil Mullainathan that he gave in November of 2013. The talk was titled: What Big Data Means For Social Science. I had been aware of terms like machine learning, neural nets, deep learning, terms which Jerry Kaplan is his recent book, Humans Need Not Apply, calls synthetic intellects.

So, the problem for me was that in any number of places people were gesturing to the Bayesian basis of the algorithms being deployed to do machine, deep or whatever hyped version of AI you prefer, learning.

Then, in 2014, Parlo invited me to prepare a paper for a symposium she was heading up for AARE, Towards a Posthumanist Sociology of Education: Experiments in Emergent Worldly Configurations. I duly did and called it: Theory games: from monstrous puppetry to productive stupidity. The abstract is here and a copy of the slides here, the top entry in 2014. I recorded some of my further explorations of what I now refer to as the good Reverend's work also on this site.

At this point you'd be wondering why go to all this trouble, why not just blackbox these algorithms as most sociologists do and be done with it. Simply that perhaps my major obsession which derives from my interest in actor-network theory is the delegation of work to machines.

This is an extremely long winded answer to Sama's first question but there is no one specific paper or line that I could point to. I can trace my renewed interest in stats to Taleb's work. For example: Antifragile: Things That Gain from Disorder, Fooled by Randomness, and The Black Swan. Taleb brings a delightfully arrogant approach to his no nonsense attack on folk whom, in his view, misuse stats and models.

But I will conform. I think the book I enjoyed most was Sharon McGrayne's The Theory that would not Die from which I have some quotes:

Bayes is a measure of belief. And it says that we can learn even from missing and inadequate data, from approximations, and from ignorance.

I'm quite a fan of ignorance and equally puzzled by the L-word.

And it’s about a method that— refreshed by outsiders from physics, computer science, and artificial intelligence—was adopted almost overnight because suddenly it worked. In a new kind of paradigm shift for a pragmatic world, the man who had called Bayes “the crack cocaine of statistics… . seductive, addictive and ultimately destructive” began recruiting Bayesians for Google.

Prominent Bayesians even advise government agencies on energy, education, and research.

I can see Stephen's policy eyes twinkle a bit here.


But Bayes’ rule is not just an obscure scientific controversy laid to rest. It affects us all. It’s a logic for reasoning about the broad spectrum of life that lies in the gray areas between absolute truth and total uncertainty.

McGrayne's useful historical account helped me put Bayes in context. It's good, detailed and makes connections with the flurry of intellectual work that was bubbling along in the 18th Century.

To the question what do I think all of this means, to me, it is a better way to think about learning and sits learning right alongside, in a productive way, what we call ignorance3.

I'm still not as fluent as I'd like to be with thinking along these Bayesian lines. I have what I think is a workable sensibility about actor-network theory, a mindset if you prefer. I'd like to develop the same for Bayesian stats.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License