Ronald Fisher and P-Value

Check out Bold Signals 3.01 for an audio version of this piece. Much of this material was also appears in “Scenes from a Replication Crisis“.

It’s 1925.

Ronald Fisher is a geneticist and statistician working at Rothamsted Experimental Station, an agricultural research institute located in the English countryside.

Before coming to Rothamsted, Fisher was instrumental in reconciling Charles Darwin’s notion of evolution by natural selection with Gregor Mendel’s Laws of Genetics. Basically, if you’ve ever wondered how Darwin’s observations of Finches and Mendel’s experiments with pea plants led to our modern understanding of evolution, one of the people you have to thank for that is Ronald Fisher.

It’s also worth pointing out, given Fisher’s influence on genetics, that he was an outspoken eugenicist. After all, this was the early twentieth century and the history of science is not exactly a straight line of people or non-horrific views on society.

Anyway, back to the countryside.

Long-term experiments with wheat, grass, and roots abound at Rothamsted, giving Fisher a bumper crop of data to analyze. However, though the overall quantity of data is high, sample sizes are low. An influential study of the effects of rainfall on wheat incorporates data from just thirteen plots of land.

Concerned with generalizing the results of such experiments, after all, the point of this type of research is to increase crop production, Fisher synthesizes several recent advances in “small sample statistics” into a framework known as significance testing.

He takes a statistical test called the Student’s t-test, which was initially developed by statistician to monitor the quality of Guinness, and develops a complementary test known which he calls the Analysis of Variance (ANOVA).

To ensure these innovations are accessible to the research community beyond Rothamsted, Fisher publishes Statistical Methods for Research Workers. Central to the book, and significance testing more generally, is the null hypothesis- the position that there is no significant difference between groups of data. In Fisher’s conception, devices like t-tests and ANOVAs are tests of the null hypothesis. The results of such tests indicate the likelihood of observing a result when the null hypothesis is true. In quantitative terms, this likelihood is expressed as a p-value.

Fitting it’s origins in applied research, the utility of Fisher’s framework is best demonstrated with a practical example. Suppose Fisher and his colleagues want to study the effect of a particular method of fertilization on the growth of grass. To do this, they obtain yield measurements from ten plots that use the method and ten that do not. These numbers are small, but reflective of the time and effort that goes into harvesting good data. Before examining the two groups of data, Fisher reminds his colleagues that the null hypothesis stipulates that there is no difference between the fertilized and unfertilized plots. This is a really abstract way of talking about something as exciting as watching grass grow, so he reiterates that the null hypothesis is essentially that the fertilization method has no effect. Then, he runs a t-test.

A resulting p-value of 0.50 indicates that, assuming the fertilization method has no effect, the probability of Fisher and his colleagues obtaining their yield measurements is fifty percent. A resulting p-value of 0.10 indicates that the probability is ten percent. In Statistical Methods for Research Workers, Fisher introduces an informal criterion for rejecting the null hypothesis: p < 0.05.

“The value for which p = 0.05, or 1 in 20, is 1.96 or nearly 2 ; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.”

We’ve arrived finally at p<0.05

Almost a decade after the publication of Statistical Methods for Research Workers, Jerzy Neyman and Egon Pearson address what they view as a fundamental asymmetry in Fisher’s framework. Namely, though it’s intended to help researchers evaluate the results of experiments, the focus on null hypotheses doesn’t really give researchers any way to evaluate experimental hypotheses. Basically, the argue with increasing volume, you can use Fisher’s methods to evaluate if there’s a difference between two groups- but you can’t used it to make a statement about what’s causing it.

Though their “hypothesis testing” framework draws heavily from Fisher’s, Neyman and Pearson’s has a fundamentally different goal. Rather than giving researchers tools to evaluate the results of agricultural experiments, their goal is determining the most optimal test for deciding between competing hypotheses. These hypotheses include Fisher’s null hypothesis, but also a variety of “alternative” or experimental hypotheses. To this end, they introduce three important concepts to the burgeoning field of research-oriented statistics: Type I Error- The probability of incorrectly rejecting the null hypothesis, Type II Error- the probability of incorrectly accepting the null hypothesis, and Power- the probability of correctly rejecting the null hypothesis correctly.

Disagreements between Fisher and Neyman and Pearson soon escalates into open antagonism. No seriously, reading accounts of these debates you get the sense that Fisher’s true talent wasn’t in biology or statistics, but in expressing his ego mostly through yelling.

However, despite the controversy, the two frameworks are soon combined and presented as one in research methods textbooks. What emerges is an enormously and immediately influential model of statistical testing that incorporates Pearson’s null hypothesis, Neyman and Pearson’s alternative hypotheses, and a focus on observing p-values less than 0.05.

So when we talk about p-values and things like p-hacking, we’re talking about a method for evaluating the difference between groups of data that was designed by an evolutionary biologist and eugenicist for use in agriculture. We’re also talking about a debate about how what this number means and how to use it that has been ongoing for more than ninety years.

Additional Reading

Box, J. F. (1987). Guinness, Gosset, Fisher, and small samples. Statistical Science, 2(1), 45-52.

Halpin, P. F., & Stam, H. J. (2006). Inductive inference or inductive behavior: Fisher and Neyman-Pearson approaches to statistical testing in psychological research (1940-1960). The American Journal of Psychology, 119(4), 625-653. doi: 10.2307/20445367

Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399-433.

Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222, 309-368. doi: 10.1098/rsta.1922.0009

Fisher, R. A. (1925). Statistical methods for research workers. Oliver and Boyd.

Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289-337. doi: 10.1098/rsta.1933.0009

Lenhard, J. (2006). Models and statistical inference: The controversy between Fisher and Neyman–Pearson. The British Journal for the Philosophy of Science, 57(1), 69-91. doi: 10.1093/bjps/axi152

Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25.


03.01: Science and Technology with Brian Nosek

Welcome to the third season of Bold Signals!

In this Episode

  1. Scenes from the Replication Crisis: Ronald Fisher and the P-Value [0:01:00].
  2. An extended interview with Brian Nosek, Social Psychologist and Director of the Center for Open Science [0:09:25].
  3. Bold Signals Documentary Club: Cosmos: A Personal Journey (Episode 1) [0:56:58].


Brian Nosek on Twitter | Brian’s Website

Project Implicit

The Center for Open Science | The Open Science Framework

Bold Signals on Twitter | Facebook

Bold Signals on iTunes | Soundcloud | Figshare

John Borghi on Twitter | John’s Website

Recommended Reading

Greenwald, A. G., Banaji, M. R., Rudman, L. A., Farnham, S. D., Nosek, B. A., & Mellott, D. S. (2002). A unified theory of implicit attitudes, stereotypes, self-esteem, and self-concept. Psychological Review, 109(1), 3-25. doi: 10.1037/0033-295X.109.1.3

Nosek, B. A., Banaji, M., & Greenwald, A. G. (2002). Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics: Theory, Research, and Practice, 6(1), 101-115. doi: 10.1037/1089-2699.6.1.101

Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening scientific communication. Psychological Inquiry, 23(3), 217-243. doi: 10.1080/1047840X.2012.692215

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615-631. doi: 10.1177/1745691612459058

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. doi: 10.1126/science.aac4716

Media in this Episode

Cover Art

Homebrew” by Robert Tinney

DOI: 10.6084/m9.figshare.4308719

Bold Signals Season 3: Coming December 12th

If Bold Signals were to have a central thesis, it would be that science is people.

At the time of this writing, I have recorded the first five interviews for the third season of Bold Signals. I asked a social psychologist about studying unconscious biases and promoting reproducible research, chatted with a physicist about traveling long distances to examine tiny particles, listened to a neuropsychologist and a cognitive ecologist describe their work studying behavior in subcortical structures and in prairie voles, and talked with a science journalist about what it means to write about science from the outside. These interviews cover a lot of ground, but they’re united by the notion that science is a fundamentally human enterprise.

Look forward to all this and more, as Bold Signals returns for its third season on December 12th!

What does it mean to talk about science?

Since November 8th, I’ve thought a lot about what it means to talk about science in an environment full of fake news and “post truth”. As a scientist turned librarian, I want to believe that the accuracy and integrity of information matters. As a person who reads the news and lives in the world, I’m not sure what to believe.

I’ve never been shy about the fact that the whole point is to show that science is populated by a the diverse group of hard-working, dedicated people. In an environment where political figures are calling for end to “political correctness” and “politicized” science, I think it’s important to show that science, like politics, is all about people.

So, in addition to the usual interviews, this season will feature two regular segments borne from my anxiety about what it means to talk about science today.

  1. Scenes from a Replication Crisis: I’ve usually used the first segment of each episode to muse about current events in science. From my thoughts on a controversial new book to my concern about a breach of research ethics, I have always grounded these segments in science’s present. Instead, in an effort to further understand how science is a product of people, I’ll use this segment to explore science’s past and examine the pressures and incentives that have led science’s ongoing “replication crisis”. Listen as I describe the life and times of Ronald Fisher and try to make the history of p-values interesting to people without a background in statistics or epistemology!
  2. Bold Signals Documentary Club: If the first segment of the podcast is explicitly about the evolution of scientific methods, the last is about how we talk about science.  Last season I used this segment as an excuse to read a bunch of science fiction. This season, I’ll be watching documentaries. I’ll explore the ways in which science is presented and perceived. In the first batch of episodes, I’ll be reviewing Cosmos: A Personal Journey. Listen as I critique probably the most beloved science documentary series of all time!

A Note on the Release Schedule

If you were paying close attention, you may have noticed that the release schedule started to slip at the end of season two. I handle every aspect of producing the podcast myself and, though it may not sound like it, a lot of time and effort goes into each episode of Bold Signals. After moving across the country and starting a new job earlier this year, it became difficult to release a new episode every other Friday.

I’d like very much to keep to a regular release schedule, but I have another pretty major life event set to occur near the end of the year. I’m not sure to what extent this will affect the podcast’s release schedule, but I suspect it’ll be rather irregular this season. Sorry for the inconvenience and thanks in advance for your patience.

As always, episodes will be uploaded to both Soundcloud and figshare.