Media Cloud Tools Update for 2023 - New Search Interface

Credits
Published
December 13, 2022

We’re writing to share some news about Media Cloud and introducing important changes to our suite of online tools. The short story: new challenges and opportunities mean we’re deactivating Explorer, the current search tool, on Jan 1, 2023, and we are replacing it with a new cross-platform Search tool (https://search.mediacloud.org). There will be a period of adaptation, because it will take us a while to backfill our 10 years of content in our databases, and because the new search is still under construction, or “in BETA” as they used to say. Please help us kick the tires and try it out!

Users of Media Cloud are likely aware that over the past year, our small team has faced some challenges with the enormity of our data and the decades-old nature of our codebase. To meet these challenges and evolve our methodologies, we decided to simplify rather than continue growing our platform. In a way, we decided to take a step back to take two steps forward.

As part of the step back, we decided a few months ago to discontinue our Topic Mapper tool, the most complex and brittle part of our platform. We are now discontinuing Explorer, the previous common entry point to Media Cloud. But over the past months, we’ve also created a leaner and simpler back-end database, and we are now ready to introduce the first tool that will work on top of it – Media Cloud Search. The functionality of the tool is still in some ways limited compared with Explorer, but it already points towards the direction we would like to take:

  • First, it offers the ability to work with different data sources (such as Twitter, Reddit, and YouTube) in addition to Media Cloud’s online news database. We understand the digital environment as an ecosystem, and would like to work with a wide range of data in order to anaylze it.
  • Second, it is designed to serve as a basic research tool on which to build additional features, like the ones that were present in Explorer and Topic Mapper, but with a more modular approach. Our plan is to add to it research and exploratory tools, such as word clouds, theme detection, or entity extraction (to name a few), and present those as options that users can select to run. This will be an ongoing process in which our team will work closely with our partners.

Over the past months, we have already worked closely with the Internet Archive’s Wayback Machine team to create a version of our database that will eventually live within the Archive and be freely and fully accessible through it. We have also worked closely with Code for Africa to improve our platform and support the creation of their own Media Cloud instance in Africa (Civic Signal). And, we’re working with Pushshift.io to make their Reddit archive even more searchable with a simple web interface that can generate query syntax for you.

Finally, our directory of collections and sources is still active, and has improved due to extensive merging and pruning of duplicate media sources we had in our system. It is also gaining searchable collections for other platforms beyond online news - such as lists of Twitter users or subreddits.

In a year of changes, what has not changed is our ambition to maintain the best publicly accessible database of digital news out there, and to provide on top of it a set of tools to understand the extended digital media ecosystem. Please feel free to contact us to chat about your research needs and these new technical steps.