I’d never heard the expression “violin plot” before, but it’s a good way to show a point I’ve been trying to get across for 20 years: when analyzed with modern genetic assays, there is surprisingly little overlap between non-Hispanic Americans who identify as white versus black.
It’s commonly said that Race Does Not Exist because social science statistics are collected based on self-identification rather than on genetic testing.
Indeed, I once planned to write the definitive take-down of how standard U.S. racial categories are absurd when viewed through the rapidly improving lens of genetic science. But I wound up concluding that, to my surprise, the Census Bureau’s categories were good enough for government work.
It turns out that, due to the workings of the one-drop rule, most people who self-identify as white (red in the above graph) really are highly white genetically (% sub-Saharan ancestry on the vertical axis). In contrast, the majority of Americans who self-identify as black (blue) are in the range of 70%-90% black. (The modal person who self-identities as white and black (green) is about 40% black due to having one self-identifying white parent of more or less 0% blackness and one self-identifying black parent of about 80% blackness.) These are non-Hispanics.
Genetic ancestry and social race are nearly interchangeable
OpenPsych , Dec. 22, 2021, ISSN: 2597-324X
Emil O. W. Kirkegaard
Ulster Institute for Social Research
It has been claimed that social race and genetic ancestry are at best weakly related. Here we test this claim by applying predictive modeling in both directions, i.e., predicting genetic ancestry from social race(s), and predicting social race(s) from genetic ancestry. We utilize the public Pediatric Imaging, Neurocognition, and Genetics (PING) dataset (n = 1,391), so that others may examine the data as well. In the simple scenario where we are only concerned with self-identified white, black, and mixed (black-white) race individuals (571 whites, 140 blacks, 25 mixed), model accuracy was very high. Predicting social race from genetic ancestry resulted in an area under curve (AUC) of .994, an overall accuracy (concordance) of 98.0%, and a pseudo-R2 of .951. Conversely, predicting genetic ancestry from social race had a model R2 adjusted of .992.Using the full dataset, there are 8 census-type categories of social race. Using cross-validated multinomial regession to predict social race from 6 genetic ancestry variables, we find that the AUC is .89. Using Dirichlet regression to predict ancestries from social race, we find an overall correlation of .94 (R2 = 88.4%). Further analyses using more sophisticated methods (random forest, support vector machine) found similar results. In conclusion, social race and genetic ancestry are nearly interchangeable.
These results are similar to those found in other recent admixture studies of young people.
In any case, if you dropped out of the black sample the not all that black blacks like Barack Obama and just included the highly black blacks like Michelle Obama, you’d wind up with even bigger race gaps in behavior than you do with our current system of self-identification.