NYT: "Throw Out the College Application System"—Because It Keeps Getting The Wrong Results
October 10, 2014, 11:50 AM
An op-ed from the NYT:
Throw Out the College Application System

By ADAM GRANT OCT. 4, 2014

THE college admissions system is broken. When students submit applications, colleges learn a great deal about their competence from grades and test scores, but remain in the dark about their creativity and character. …

This leaves many colleges favoring achievement robots who excel at the memorization of rote knowledge, and overlooking talented C students. Those with less than perfect grades might go on to dream up blockbuster films like George Lucas and Steven Spielberg or become entrepreneurs like Steve Jobs, Barbara Corcoran and Richard Branson.

There is a better way for colleges to gather comprehensive information about candidates. It’s called an assessment center, and it’s been in use for more than half a century to screen candidates for business, government and military positions.

The roots of the assessment center in the United States can be traced back to 1942, when President Franklin D. Roosevelt created the Office of Strategic Services, a precursor to the C.I.A. The O.S.S. was responsible for secret intelligence, research and analysis, and special operations behind enemy lines, but there was a major problem: No one had any clue how to select a spy.

The O.S.S. engaged a team of psychologists to establish an assessment unit. In 1944, the psychologist Donald W. MacKinnon ran Station S, where for 15 months he oversaw the assessment of hundreds of recruits, putting them through exhaustive personality tests and field trials. Over three and a half days, each candidate had to build up and maintain a comprehensive cover story. The candidates falsified their names, ages, professions and residences, and Dr. MacKinnon’s team evaluated their effectiveness, sending the highest-scoring spies on covert missions.

And after having a blast screwing with the heads of applicants for 84 straight hours, the OSS guys probably just went ahead and hired their fellow old Bonesmen like they were planning to do anyway. After all how much can you trust somebody who has just proven himself a world-class liar, unless you heard him confess some really embarrassing stuff in the Tomb back in 1939 that you can threaten to blackmail him with if he ever defects?
… Today, at a typical center, applicants spend a day completing a series of individual tasks, group activities and interviews. Some assessments are objectively scored for performance; others are observed by multiple trained evaluators looking for key behaviors.
Robert Heinlein’s 1948 novel Space Cadet starts out with all the Space Academy applicants arriving from all over the Solar System in Colorado for two days of excruciating admissions tests: e.g., while taking an exam, your chair turns upside down. Those who don’t pass have to take the two-year trip back to the Jovian moons (or wherever they came from) in shame. So you can see why real colleges tended to go for paper and pencil tests that could be taken anywhere.

Intensive testing was a major theme in post-War sci-fi. Heinlein uses it in several books, such as Time for the Stars. (Interestingly, my wife’s identical twin nephews participated in a two-day academic study of twins that was straight out of the one in Time for the Stars, except for the telepathy tests.)

Jerry Pournelle earned one of his Ph.D.’s in this kind of testing for astronauts.

And one of the comic highlights of Tom Wolfe’s The Right Stuff are the scenes where the doctors and scientists go hog wild testing astronaut volunteers — the docs have never before gotten their hands on so many extremely healthy guinea pigs who would do anything they ask to be allowed to go into space.

A decade ago I spent a good amount of time on the phone with a couple of veteran psychometricians who had retired from jobs heading up all testing for their branches of the U.S. military. They came out of this turn-your-chair-upside-down era, but my impression is that they had come around to the view that what’s cost-effective is paper and pencil tests, with g predominating.

If you want to pick future pilots, you test them on g, on math skills, on verbal ability, on 3-d skills, on experience flying private planes, and on biographical information that correlates with masculinity and leadership — have you ever shot a deer, that kind of thing. For example, George W. Bush did pretty good on the IQ stuff, extremely well on the officer material biography questions, poor on 3-d skills, and had no piloting experience.

It would have been fun to strap young Dubya into a flight simulator and suddenly turn him upside down (in fact, it would be even more fun today), but it wasn’t really worth it. (Bush turned out to be an okay pilot, never crashing his primitive supersonic flying death trap in the five years he bothered showing up for his weekend warrior job, but also not really having much of a knack or a passion for flying, either).

… Sending student applicants to assessment centers would solve at least three problems for college admissions.
A lot of fire departments use “assessment centers” for promotions, but from reading up on the 2009 Ricci Supreme Court case, it’s clear they are hideously expensive since they need to fly in a racially balanced panel of fire chiefs from around the country, etc.

This op-ed recommends having multiple assessment centers all around the country that all do exactly the same thing, but that runs into the usual problems. For example, how do you make sure the human judges are equally rigorous everywhere?

A more subtle but bigger problem for fire departments is disparate impact. Since any kind of test unless completely rigged will bring about disparate impact, the fire testing business is based on the idea that every single previous system of fire department testing was botched, but this time they will get it right. So, instead of having a cheap national assessment center, every assessment center is supposed to be customized. (That’s why it was supposed to be such a scandal that the New Haven test that Ricci passed had a question about directions involving the world “downtown,” a word not used in New Haven.)

Things like the SAT and its massive disparate impact are grandfathered in because, hey, the Ivy League needs the SAT, but starting a new system from scratch raises massive disparate impact concerns.

First, colleges have traditionally relied on recommendation letters from different teachers and interviews with different alumni who evaluate students in different situations. These idiosyncrasies create a great deal of noise: Reports reveal as much about the teachers, interviewers and situations as they do about the students. In an assessment center, students answer standardized questions and are rated by multiple evaluators on a common standard.
Of course, a big reason assessment centers are popular in fire department employment decisions is because having all those judges will all their unconscious biases injects more noise into the results, and randomness tends to lessen disparate impact.

But, yeah, letters of recommendation and the like seem pretty useless at present.

For example, everybody says they want applicants who are good at class participation, who make classroom discussions better. For example, while this may sound like egomaniacal boasting, but regular readers can probably imagine I’m not being wholly delusional in asserting that I was really good at classroom discussion, that I frequently made classes better.

But I’ve never seen any studies of whether colleges know how to evaluate that skill or not from their current application process.

My guess would be that the single most accurate decisionmaking process for college admissions would be like we have in big league sports: minor leagues, either farm clubs or academic institutions, open to public evaluation. If you want to go to the Ivy League for college, in high school you go to Official Ivy League Summer School, where your class participation and the like is evaluated by wily old scouts.

Of course, that sounds wildly expensive and exclusionary. But that’s probably a good conceptual starting point for rethinking.