A new study shows that leading AI models are 1.5 times more likely to flag tweets written by African Americans as “offensive” compared to other tweets.
By Shirin Ghaffary Aug 15, 2019, 11:00am EDT
Platforms like Facebook, YouTube, and Twitter are banking on developing artificial intelligence technology to help stop the spread of hateful speech on their networks. The idea is that complex algorithms that use natural language processing will flag racist or violent speech faster and better than human beings possibly can. Doing this effectively is more urgent than ever in light of recent mass shootings and violence linked to hate speech online.
But two new studies show that AI trained to identify hate speech may actually end up amplifying racial bias. In one study, researchers found that leading AI models for processing hate speech were one-and-a-half times more likely to flag tweets as offensive or hateful when they were written by African Americans, and 2.2 times more likely to flag tweets written in African American English (which is commonly spoken by black people in the US). Another study found similar widespread evidence of racial bias against black speech in five widely used academic data sets for studying hate speech that totaled around 155,800 Twitter posts.
This is in large part because what is considered offensive depends on social context. Terms that are slurs when used in some settings — like the “n-word” or “queer” — may not be in others. But algorithms — and content moderators who grade the test data that teaches these algorithms how to do their job — don’t usually know the context of the comments they’re reviewing.
Two lines of code could solve this problem of AI finding more hate speech in black people’s tweets: