Identifying topics in news, tracking their temporal dynamics, and understanding how different media sources cover them have important theoretical and practical implications for journalism researchers, producers, and consumers.
The explosive growth of online news sources, however, suggests that scalable approaches to topical analysis are needed. We introduce our ongoing efforts to enable large-scale topical analysis of the Media Cloud corpus, a repository of over 200 million online news articles.
Our initial experiments with 90 days of articles from 21 top media sources suggests that statistical topic modeling can identify reasonable news-related topics and produce interesting early insights into the online media ecosystem. We are currently examining mixed initiative approaches to automate the process of topic extraction and increase the quality of the extracted topics.
Finally, we discuss our further research directions on large scale news monitoring and measurement as well as analysis tools for news consumers and producers.