I’ve been writing about the dubiousness of “stereotype threat” since 2004 and now it looks like the Replication Crisis in the social sciences is coming for that popular notion that the reason some groups average lower scores than other groups on tests is because of stereotypes of them scoring lower threatens their psychological safe spaces. Via Steve Hsu and James Thompson, I see that social psychologist Michael Inzlicht has blogged about the spreading Replication Crisis
… No, this is not going to be an optimistic post.
… Instead, if you will allow me, I want to wallow.
I have so many feelings about the situation we’re in, and sometimes the weight of it all breaks my heart. I know I’m being intemperate, not thinking clearly, but I feel that it is only when we feel badly, when we acknowledge and, yes, grieve for yesterday, that we can allow for a better tomorrow. I want a better tomorrow, I want social psychology to change. But, the only way we can really change is if we reckon with our past, coming clean that we erred; and erred badly.
To be clear: I am in love with social psychology. I am writing here because I am still in love with social psychology. Yet, I am dismayed that so many of us are dismissing or justifying all those small (and not so small) signs that things are just not right, that things are not what they seem. “Carry-on, folks, nothing to see here,” is what some of us seem to be saying.
Our problems are not small and they will not be remedied by small fixes. Our problems are systemic and they are at the core of how we conduct our science. …
I think these three ideas—that data flexibility can lead to a raft of false positives, that this process might occur without researchers themselves being aware, and the unknown size of the file drawer—explains why so many of our cherished results can’t replicate. These three ideas suggest we might have been fooling ourselves into thinking we were chasing things that are real and robust, when we were pursuing neither.Some background: A dozen years ago I wrote in VDARE in the essay in which I coined the term “Occam’s Butterknife:”
More sobering still: What other phenomena, which we now consider obviously real and true, will be revealed to be just as fragile?
As I said, I’m in a dark place. I feel like the ground is moving from underneath me and I no longer know what is real and what is not.
I edited an entire book on stereotype threat, I have signed my name to an amicus brief to the Supreme Court of the United States [in the Fisher U. of Texas affirmative action case] citing stereotype threat, yet now I am not as certain as I once was about the robustness of the effect. I feel like a traitor for having just written that; like, I’ve disrespected my parents, a no no according to Commandment number 5. But, a meta-analysis published just last year suggests that stereotype threat, at least for some populations and under some conditions, might not be so robust after all. P-curving some of the original papers is also not comforting. Now, stereotype threat is a politically charged topic and there is a lot of evidence supporting it. That said, I think a lot more pain-staking work needs to be done on basic replications, and until then, I would be lying if I said that doubts have not crept in. Rumor has it that a RRR of stereotype threat is in the works.
A little experiment Claude [Steele, the identical twin of Shelby Steele] performed on some Stanford sophomores almost a decade ago has become wildly popular among liberals. They see it as the Rosetta Stone explaining the mystery of racial inequality. It supposedly proved that on standardized tests like the SAT college entrance exam, blacks would score the same as whites on average if only mean people like me wouldn’t ever mention the fact that they, uh, don’t score the same.In 2010, I wrote in VDARE:
What Steele found was that when he told his black subjects that the little custom-made verbal test he was giving them would measure their intellectual ability, they scored worse than when he provided a less threatening description of the exam.
Here’s the logic behind this extrapolation: At some point back in the mists of time, a stereotype somehow emerged that blacks do less well on the SAT. So, now, blacks are seized by panic over the possibility they might mess up and score so poorly that they validate this stereotype.
And, indeed, this nervousness makes them score exactly as badly as the stereotype predicted they would.
It’s really a lovely theory. In its solipsistic circularity, it’s practically unfalsifiable.
Still, you might object that Occam’s Razor suggests a simpler explanation—that the arrow of causation runs in the opposite direction, with the stereotype being the result, not the cause, of decades of poor black performance on the SAT.
But that just shows you are a mean person, too.
If you were a nice person, then you would know that if we all just believe that everybody will score the same, then everybody will score the same!
Just like when we were children and all clapped at a performance of Peter Pan to show we had faith that Tinkerbell would recover.
Of course, to me as a former marketing executive, there’s an obvious alternative explanation of Steele’s findings: the students figured out what this prominent professor wanted to see, and, being nice kids, they delivered the results he longed for. This happens all the time in market research. After all, this was just a meaningless little test, unlike a real SAT where the students would all want to do as well as possible.
Nevertheless, countless commentators have claimed Steele's study proves the only reason blacks score worse on the SAT than whites is because of this “stereotype threat.”
But now, it turns out that the vaunted evidence for this wildly popular concept rests heavily upon another Effect, the File Drawer Effect—defined as “the practice of researchers filing away studies with negative outcomes”. We seem to have another Climate Research Unit scandal on our hands.By the way, the difference between my 2004 and 2010 VDARE columns on stereotype threat is reflective of an ongoing discussion I’ve been having with crusading reformer statistics professor Andrew Gelman over the years in the comments section of his blog. I may seem like a cynical bastard, but in truth I’m a softy who likes to believe the best about everyone, or at least the nicest interpretation consistent with the facts.
A researcher, who doesn't want his name or any potentially identifying information mentioned, for unfortunately obvious career reasons, recently attended a presentation at a scientific conference. Here is his summary of what he heard:
“One talk presented a meta-analysis of stereotype threat. The presenter was able to find a ton of unpublished studies.
“The overall conclusion is that stereotype threat does not exist. The unpublished and published studies were compared on many indices of quality, including sample size, and the only variable predicting publication was whether a significant effect of stereotype threat was found. …
“This is quite embarrassing for psychology as a science.”
There’s two general explanations for why a social science experiment doesn’t replicate:
- The original finding was just a fluke or a fraud and there was never any effect to be found. (Dr. Gelman’s usual point of view on studies he deems junk.)
- Or maybe it was real but something has since changed. (My usual predilection.)
Prof. Gelman’s skepticism usually turns out to be more accurate than my sympathetic excuses.
(By the way, this reflects the science vs. marketing research distinction. In science, you are looking for permanent, general truths that don’t wear off. In marketing research, you assume effects wear off. For example, a third of a century ago, I worked on a test market of Bill Cosby ads for Jello. Bill Cosby was a famously effective endorser back then. These days, he’s not.)