Has A 15-Year-Old Explained IQ's Flynn Effect? | Blog Posts

10-year-old Elijah Armstrong wins
2008 Marin County Spelling Bee

The "Flynn Effect," the name invented by Richard Herrnstein and Charles Murray in The Bell Curve for the phenomenon documented most thoroughly by James Flynn of rising raw scores on IQ tests, remains perhaps the most important (and technically daunting) conundrum in psychometrics.

Many worthy explanations have been offered, but we can use another one. And the brand new paper from Elijah Armstrong (see picture at right) and Michael Woodley is a standout.

One clue might be that the Flynn Effect tends to be largest on those types of IQ tests that seem designed by Mr. Spock-like aliens or robots, such as the Raven's Matrices, that tour d'force in minimalist test design from the late 1930s.

Raven's Matrices

The more broad-based Wechsler brand of IQ tests was introduced in the same era. On this, we see a wide disparity in magnitude of the Flynn Effect by subtests.

I adapted the table below from Flynn's 2007 book What Is Intelligence? On Wechsler Intelligence Scale for Children subtests, the size of raw score gains from 1947 to 2002 on general information, arithmetic, and vocabulary subtests were small. But they were quite large on the more Raven's-like subtests, along with the high-concept Similarities subtest:

Information

+2 (IQ Gain in Points, 1947-2002)

Example: On what continent is Argentina?

Arithmetic

+2 point gain

If a toy costs $6, how much do 7 cost?

Vocabulary

+4

What does "debilitating" mean?

Comprehension

+11

Why are streets usually numbered in order?

Picture Completion

+12

Indicate the missing part from an incomplete picture.

Block Design

+16

Use blocks to replicate a two-color design.

Object Assembly

+17

Assemble puzzles depicting common objects.

Coding

+18

Using a key, match symbols with shapes or numbers.

Picture Arrangement

+22

Reorder a set of scrambled picture cards to tell a story.

Similarities

+24

In what way are "dogs" and "rabbits" alike?

(Answer key: 2 points for "mammals," 1 point for "four-legged," and 0 points for "I wuv them.")

The last item deserves a separate explanation, but it's not hard to see that the first four subtests, on which the Flynn Effect has been restrained, are qualitatively different from the next five, on which it has been dramatic. All else being equal, more recent children, who grew up with an abundance of complex toys and electronic devices, would seem more likely to ace subtests five through nine. Robert Gordon said life is an IQ test, and life may well have become more like an IQ test, thus making it better training for taking IQ tests.

This pattern may help explain why kids these days don't seem all that hep when you try to talk to them about Grandma's debilitating hemorrhoids, but they are whizzes with their MyFace and Tweeter.

James Thompson blogs at Psychological Comments:

Flynn effect as a retesting, rule-based gain

It is very good to see a paper which takes a large scale effect, the secular rise in intelligence test results, and links it to an intriguing large scale explanation. A new contribution to understanding the Flynn Effect is to be found in the journal Learning and Individual Differences, which became available 30th October:

Elijah Armstrong and Michael Woodley

“The rule-dependence model explains the commonalities between the Flynn effect and IQ gains via retesting.”

And here is an uncorrected proof of Woodley and Armstrong's upcoming paper.

Woodley is a prominent young psychologist, now at the U. of Umea in Sweden.

There's young and then there's young. Elijah Armstrong is a 15-year-old who lives in Marin County, California. Above is the Marin News' picture of him winning the county's spelling bee for elementary school students. In the picture he is a fifth-grader at age 10—that was slightly less than five years ago. He's been working on his rule-dependence model of the Flynn Effect since early 2012.

Here's Elijah's blog.

Thompson continues:

Armstrong and Woodley argue that the Flynn effect is partly driven by the retest effect, whereby familiarity with the test material means that if you can learn a rule of thumb you can solve those particular sorts of problems when you see them again, without having to use much intelligence.

Civilization is a system for conservation-of-cognition.

In very simple terms, the test wears out quickly once you get to learn how it works. Using implicit learning and working memory, test takers learn how to solve rule dependent problems, which leads to apparent IQ gains which are partly independent of general intelligence.

As readers of this blog will know, the ultimate IQ test is the one for which no-one knows the answers at the moment. Intelligence tests in the real world are more modest affairs. Raven’s Matrices is a test based on progressions: you need to find the rule which underlies the visible changes in the problem arrays, and a good enough memory to hold in mind how those changes are progressing, so that you can correctly choose the final missing picture. ... Carpenter et al. (1990) found that 5 rules covered all the items in the test. Once you know that, it is less of a test.

I suspect the more "culture fair" a test is (such as the Raven's Matrices), the more you can test prep for it. The less you can effectively test prep for an IQ subtest, such as vocabulary or information on the Wechsler Children's IQ test, the more culturally biased it is. For instance, I read a huge amount of William F. Buckley in 9th and 10th grades, which helped my vocabulary no end, but (pre-Internet) if your high school library didn't have a subscription to National Review like mine did, you'd be at a disadvantage compared to me.

Another aspect of being test savvy is the capacity to de-contextualise, that is, to be able to generalise about types of problem, without being confused by the particular context in which the specific example is presented to you.

For example, the "trolley problem" appeals to high IQ individuals good at de-contextualizing—i.e., not asking a lot of stupid questions about how, exactly, do you push a fat man to his death to stop a runaway trolley. Instead, you should recognize that it's a question about consequentialism v. deontology and therefore only focus upon the details that the questioner wants you to focus upon.

Personally, the older and dumber I get, the more I enjoy "re-contextualizing"—taking abstract ideas and considering them in light of empirical realities. But, re-contextualizing tends to drive smart people crazy.

Armstrong and Woodley assert that, from the point of view of intelligence, education amounts to a vast re-testing enterprise. There are modest gains from rules of thumb, mnemonics and being “taught to the test”. Indeed, the reliance on exam results makes teachers and pupils confederates in ensuring that nothing is taught which is not taught to be examined. Incidentally, this view does not exclude what James Flynn calls “scientific spectacles” which more people now adopt when solving problems.

On average, kids in 2002 had watched a lot more nature documentaries on TV than kids in 1947 had, so scientific concepts like "mammals" are more common.

Armstrong and Woodley rank tests according to how much “cognitive scaffolding” they have. Raven’s Matrices is level IV: rules are very helpful, only a few of them are required; Catell Culture Fair is Level III: rules help, but will not help you on many items; the majority of IQ tests [e.g., Wechsler, but I don't think they used it because it's an oral test and they stuck to paper-and-pencil tests — I may be wrong here] are Level II: very many rules are required, but working out which to use is difficult, (and selecting the rule is what requires general intelligence); and Draw A Man test is Level I: no rule is of much help.

They then simply correlate the vector of the position of any particular test in the rule dependence typology with the vector of the size of the Flynn effect on that test. A positive correlation would indicate that tests that were more dependent upon rules were yielding the larger Flynn effects. They tested it on 14 data sets, and found a correlation of 0.6

r = 0.6 isn't huge, but it's a lot better than a sharp stick in the eye.

The authors say: “It is proposed that tests like the Raven’s are only highly g loaded when encountered initially — even basic familiarity with the rules and heuristics on a test, or familiarity with inductive reasoning itself, has the potential to radically diminish the g loading of this test over time, both under controlled conditions (such as in a retesting scenario) and over larger societal time scales (i.e., across generations in the case of the Flynn effect).”

To me, the Raven's looks as sinister as it's Edgar Allen Poe-like name implies. But, with some practice I could probably get the hang of it. In contrast, if you tested me on a random sample of vocabulary words drawn from, say, Dr. Johnson's Dictionary, I'd jump right in, but would only get slightly better as I went along.

(There's a separate issue that many IQ tests have, in practice, a limited question bank. So, if you practiced enough on old tests you'd eventually hear all the words in, say, the Wechsler's vocabulary subtests. But, in theory, that wouldn't be a problem. As Bruce Charlton pointed out, the use of the Wechsler as an admission exam for Manhattan four-year-olds with $40k to burn annually on kindergarten has becoming increasingly gamed because the WISC is intended as a clinical test for diagnostic purposes, not as a gatekeeper exam to select among the children of the most ambitious parents this side of Seoul.

They continue: “The increasing capacity of societies to detect and explicitly utilize rules as a function of the Flynn effect may be related to increasing rule exposure via mass education and to ‘ways of thinking’ endemic to cognitive modernity (Flynn, 2009).

This is a good paper. It contains lots of ideas, proposes a theory and then tests it, and draws out the conclusions in a thoughtful way. Not content with linking the observed phenomenon with the Flynn Effect and life speed theory, it also includes 5 testable predictions, to encourage other researchers to test whether their proposal has merit. It is a notable debut for the first author, whose first paper this is, and whose ideas formed the basis for the eventual publication.

Postscript

Elijah lives in Marin County, California, and is interested in philosophy and intelligence research. He originated the rule-dependence model in early 2012 and worked on it for eighteen months thereafter. He claims his conscientiousness is below the 10th percentile. He is also prone to end all his emails saying “Excuse typos, I typed this with my feet”. If you imagine that he is a sad old man gathering up a lifetime of scholarship into a well-honed rant, your imagination would be wrong. Elijah is 15.

Here's Armstrong and Woodley's abstract:

We present a new model of the Flynn effect. To wit, we propose that Flynn effect gains are partly a function of the degree to which a test is dependent on rules or heuristics. This means that testees can become better at solving ‘rule-dependent’ problems over time in response to changing environments, which lead to the improvement of lower-order cognitive processes (such as implicit learning and aspects of working memory). These in turn lead to apparent IQ gains that are partially independent of general intelligence. We argue that the Flynn effect is directly analogous to IQ gains via retesting, noting that Raven's Progressive Matrices is particularly sensitive to both the effects of retesting and the Flynn effect. After an extensive review of the relevant supporting literature, we test our thesis by developing a rule - dependence typology and then correlate the vector of a test's position in the typology with the vector of the Flynn effect that it yields. We find a significant vector correlation of r = ~ .60 (N = 14). Finally, we make a number of novel and testable predictions based on our model.

For some readable background on the Flynn Effect, here's my 2007 review of Flynn's What Is Intelligence?