How dated social beliefs enter modern news analysis through AI tools
In a world where artificial intelligence promises to revolutionize journalism, a new study warns that the future of news may still be haunted by its past. Researchers at Northeastern University, the University of Copenhagen, and Media Cloud have uncovered how decades-old racial biases embedded in training data can distort the performance of AI systems now entering newsrooms.
Their paper,“Impacts of Racial Bias in Historical Training Data for News AI,” dissects a multi-label classifier trained on the New York Times Annotated Corpus, a dataset of articles from the publication, published between 1987 and 2007. The model was designed to tag stories with thematic labels, but one particular tag, “blacks,” revealed just how powerfully the language of the past can linger in modern machines.
To contemporary readers, the term itself sounds antiquated, even jarring. The researchers found that the label “blacks” didn’t simply identify stories about Black Americans; it often acted as a general “racism detector.” Articles that mentioned words like “racial” or “minorities” were automatically flagged, regardless of the community being discussed. A Fox News story about anti-Asian hate during the COVID-19 pandemic, for instance, was tagged “blacks” simply because it contained the word “racism.”
When the researchers tested the model on coverage of the Black Lives Matter movement, they found similarly uneven results. Stories explicitly using the phrase “Black Lives Matter” were labeled correctly, but those referring only to “BLM,”a term that emerged after the model’s training period, were missed entirely. The model hadn’t learned about contemporary understandings of race or justice, but rather about how people used to write about those subjects.
“Any project considering integrating AI models into news analysis or production really needs to think about the potential impact of historical biases in the data that was used to build the models they integrate,” said Rahul Bhargava, co-author of the paper. “These technologies sound really futuristic, and are pitched as magical, but they’re really tied into our shared history.”
The findings expose a quiet but profound tension: as newsrooms race to embrace AI-driven tools, they may also be reviving the very stereotypes they’ve spent decades trying to dismantle. For journalists and technologists alike, the lesson is clear. Before we trust AI to help us tell the story of the present, we must first understand the stories it has already learned from the past.