Education research is weak and sloppy. Why?
The replication crisis changed the social sciences — except for ed research

Jo Boaler is a professor of education at the Stanford Graduate School of Education, with an enormously influential body of work arguing that students learn math faster and more effectively through her “discovery”-based methods. Her work got Algebra removed from middle schools across the Bay Area.
It is some of the most incompetently or dishonestly conducted research I have seen in a decade as a journalist.
Take one example: A report she gave at the National Council of Teachers of Mathematics on the stunning success of her innovative new math curriculum at “Railside” (she did not disclose the name of the real school where the study took place). This was a poor, disadvantaged California school, where, she said, students adopting her curriculum rocketed ahead of students attending schools with traditional curricula.
When other researchers looked into her work — combing through every school in California to figure out which one “Railside” might be, so they could look at the performance data that Boaler had declined to share — they found that Boaler had compared the top two quartiles of students at “Railside” to the middle quartiles of students at the other schools; that “Railside” students were in fact dramatically underperforming students at the other schools on every single mathematical ability test conducted during the study period, except the one that Boaler highlighted in her presentation. And the one she did highlight was actually conducted on a population of students who weren’t even exposed to the innovative new curriculum.
They found that the “tests” Boaler used to evaluate whether students were succeeding generally:
Contained material two or three years below grade level.
Did not contain any significant Algebra 1 or Geometry material despite being for an Algebra 1 or Geometry class.
Had problems that were incorrectly graded.
Had no “predictive validity” for other measures of math performance like SAT scores.
There was simply no relationship between doing well on Boaler’s error-strewn test of basic math and having mastered the material that students were supposed to master. Furthermore, the paper claimed that Boaler’s tactics closed the mathematics performance gender gap, with girls scoring as well as boys, but performance on outside tests found the gender gap at “Railside” the same as everywhere else.
On a different occasion, Boaler claimed that a single four-week summer camp could give students several years of math performance gains. Her evidence, when people dug into it, was that she gave the same test at the start of the camp and at the end, and the students’ scores improved — but that, as other researchers pointed out, is probably just explained by the fact they had seen the exact same question only a few weeks earlier. These are cartoonishly bad standards for evidence.
I wish this were a critique specific to Jo Boaler, but it isn’t. Across the board, the state of education research is incredibly grim.
Keep reading with a 7-day free trial
Subscribe to The Argument to keep reading this post and get 7 days of free access to the full post archives.


