On the blog of AEI's magazine The American, Charles Murray writes a brief response to David Brooks' "Harlem Miracle" column about the Harlem Children's Zone charter school "eliminating" the white-black gap:

It will be wonderful if the results are as good as they sound, but hold the champagne.

I’m not being mindlessly pessimistic. The problem is that we have had 40 years of “Miracle in X”—the early Head Start results, the Milwaukee Project, Perry Preschool, the Abecedarian Project, Marva Collins’s schools, and the Infant Health Development Project, to name some of the most widely known stories—and the history is depressingly consistent: an initial research report gets ecstatic attention in the press, then a couple of years later it turns out that the miracle is, at best, a marginal success that is not close to the initial claims.

I haven’t seen the study by Roland Fryer and Will Dobbie that was the basis for Brooks’s column, but if I’m going to be such a grinch I might as well lay out the kinds of things I will be looking for (these are generic issues, not things that I necessarily think are problems with this particular study) when I get hold of a copy:

1. Selection factors among the students. Did the program deal with a representative sample? Was random assignment used?

2. Comparison group. Who’s in it? Are they comparable to the students in the experimental group?

3. Attrition. What about the students who started the program but dropped out? How many were there? How were they doing when they dropped out?

4. Teaching to the test. After seven years of No Child Left Behind, everybody knows about this one. Worse, there are the school officials who have rigged attendance on the day the test was taken or simply faked the scores—that’s been happening too with high stakes testing.

5. Cherry-picking. Do the reported test scores include all of the tests that the students took, or just the ones that make the program look good?

6. The tests. Do they meet ordinary standards for statistical reliability, predictive validity, etc.

7. Fade-out. Large short-term test score improvements have, without exception to date, faded to modest ones within a few years.

Murray points to this pointed response on Gotham Schools, which cites data in this report:

Just How Gullible is David Brooks? by Aaron Pallas

Check out Pallas's graph of the NY State results here.

It’s true that eighth-graders in 2008 scored .20 standard deviations above the citywide average for white students. But it may also be apparent that this is a very unusual pattern relative to the other data represented in this figure, all of which show continuing and sizeable advantages for white students in New York City over HCZ students. The fact that HCZ seventh-graders in 2008 were only .3 standard deviations behind white students citywide in math is a real accomplishment, and represents a shrinkage of the gap of .42 standard deviations for these students in the preceding year. However, Fryer and Dobbie, and Brooks in turn, are putting an awful lot of faith in a single data point — the remarkable increase in math scores between seventh and eighth grade for the students at HCZ who entered sixth grade in 2006. If what HCZ is doing can routinely produce a .67 standard deviation shift in math test scores in the eighth grade, that would be great. But we’re certainly not seeing an effect of that magnitude in the seventh grade. And, of course, none of this speaks to the continuing large gaps in English performance.

But here’s the kicker. In the HCZ Annual Report for the 2007-08 school year submitted to the State Education Department, data are presented on not just the state ELA and math assessments, but also the Iowa Test of Basic Skills. Those eighth-graders who kicked ass on the state math test? They didn’t do so well on the low-stakes Iowa Tests. Curiously, only 2 of the 77 eighth-graders were absent on the ITBS reading test day in June, 2008, but 20 of these 77 were absent for the ITBS math test. For the 57 students who did take the ITBS math test, HCZ reported an average Normal Curve Equivalent (NCE) score of 41, which failed to meet the school’s objective of an average NCE of 50 for a cohort of students who have completed at least two consecutive years at HCZ Promise Academy. In fact, this same cohort had a slightly higher average NCE of 42 in June, 2007.

Normal Curve Equivalents (NCE’s) range from 1 to 99, and are scaled to have a mean of 50 and a standard deviation of 21.06. An NCE of 41 corresponds to roughly the 33

How are we to make sense of this? One possibility is that the HCZ students didn’t take the Iowa tests seriously, and that their performance on that test doesn’t reflect their true mastery of eighth-grade mathematics.^{rd}percentile of the reference distribution, which for the ITBS would likely be a national sample of on-grade test-takers. Scoring at the 33^{rd}percentile is no great success story.

Teaching to the test gets a bad rap, but it's only partially deserved. At least they're teaching something!

Also, scoring at the 33rd percentile nationwide in math on the national Iowa test isn't that bad (although it's not as good as scoring at the 33rd percentile of the white distribution of scores -- increasingly, NAM children are competing against other NAMs in the percentile rankings, so that makes the national grading increasingly easy relative to the white grading).