What Are Your Research Ideas?
The idea for Media Cloud emerged after a series discussions between faculty and friends of the Berkman Center. The conversations would follow a predictable pattern: one person would ask a provocative question about what was happening in the media landscape, someone else would suggest interesting follow-on inquiries, and everyone would realize that a good answer would require some real number crunching. Nobody had the time to develop a huge infrastructure and download all the news just to answer their one question. However, eventually there were enough of these questions that we decided to build a tool for everyone to use.
So what are your ideas? What questions could Media Cloud help you answer? Leave them in the comments below, and we’ll talk about them together. It will also help us to build the system to support your ideas.





This is a wonderful project, and I look forward to more. I was wondering just this morning how much coverage there has been, and if so, where, of the fact that many low-income workers do not qualify for unemployment compensation (for a variety of reasons having to do with the legal conditions, set by the states, for coverage). Even in boom times, low-income workers are far more likely than their middle-class peers both to (i) experience unemployment and (ii) not qualify for UC. But the focus of many news stories I read is on the middle-class experience; from that perspective, the complaint is that UC doesn’t cover living expenses, so that most must live on savings. But imagine having a low income, no savings, and no UC. The coverage gap is well-known in the Washington policy wonk community, and there are a number of solutions.
Is this free, and if yes, how can dit be?
Services like this alread excist like Carma and Lexis Nexis, what will be the difference you can offer?
Great project though, keep up the good work!
A fantastic project. As a grad student studying actions and events in the web space, I see great value in this analysis. Are you planning to add more sources focusing on web and startups? Perhaps some subcategories of sources?
I’d love to see more international English independent media such as
http://www.atlanticfreepress.com
http://www.pacificfreepress.com
http://www.chris-floyd.com
which are all Google News sources as well as syndicated to Lexis Nexis and Ebsco as well available on Amazon Kindle.
I would like to see a flowchart of events such as the recent finger-pointing in the AIG benefits’ flap. Everyone has a different opinion, Congressmen and ladies are as guilty in this as are the insurance execs, and now the Republicans try and blame it all on the Democrats. It is the topic of daily and nightly talk shows and even the president is not immune–for Pete’s sake he just arrived on the scene!I think a good visual presentation of how the media has covered this would really be enlightening!Thanks for your efforts. I am beginning to wonder if too much communication could push us over the edge.
I have been researching school bus collisions and pedestrian or passenger injuries for five years. I notice a great deal of variation in quality of the newspaper reports, video feeds “at the site of the accident” and follow-up articles and more video feeds. As a school bus accident reconstructionist I know that each of these media sources is virtually the “tip of the iceberg.” Many times the articles or the video coverage presents more questions than it provides answers. There are many reoccurring questions the public has about how school bus accidents can be reduced or avoided–how student passengers can avoid injury–whether there should be three-way seatbelts on school buses. All of these issues I have studied in depth and in detail–none of these issues have been presented, however, in the newsprint or
video media. I suggest a protocol for thorough news article and video analysis for public consumption. I totally disagree with the media moguls who believe that the message via print or video has to be “dumbed down.” What a disservice to the vast and often silent majority of parents, grandparents and other interested citizens who want the feds and their state governments and local school districts to improve the safety standards for school bus accident prevention or reduction of passenger injuries.
I am tempted to review each of the newspaper or video feeds and ask the questions for each of these presentations that come to mind with the thought that if I can ask the right questions perhaps the presenters or the stakeholders in the issue can find those answers or ask them as I do and finally get a better, more responsive media coverage?
Any thoughts on how to get Media Cloud underway in this venue?
This idea is exactly what I conceptualised some time ago as a ‘metameme’ analysis.
Term: Metameme
Definition: The meta level information gleaned from assimilating many memes that includes evaluation of the frequency, importance value, variation in, and diversity of memes in a given discipline or geographic area.
The information unit or meme is related to cultural transmission of e.g. a concept, fact, incident, etc.
The burgeoning quantity of memes disseminated via internet makes it impossible to know or understand the vast number of memes.
Attaining knowledge has become, like news aggregators, a matter of assimilation of ‘headlines’, in order to gain information on
1. Trends in occurrence of memes over time and space
2. Importance values and acceptability levels of memes via the levels of discussion, contexts, language, degree of transmission
3. Ranges of variation and diversity of related or opposing memes in a given subject area to assess (1) and (2).
There is probably a lot to be gained in this analysis by using the tools and approaches that population biologists are using to study metapopulations and dispersal.
I would like to see the coverage of the aggregeted medias from each country. How much coverage does Canada get from the Chinese media? How much coverage does an event get from each country? or city? It would also be nice to add a feature that could differentiate or judge whether the coverage is positive or negative. How does the world media view the United States, positive or negative?
Great idea to start with! I have started researching on Semantic Web and the very idea of extracting “meaning” out of the literally thousands of stories surfacing every hour. It would be an ambitious task and I would be really excited on working on interfacing it with other programming platforms and possibly bringing it on the Google’s Cloud (on which am currently working). There are a lot more ideas I would love to share!
my idea is to do a research paper on the fantacy books and why does human loves them even though they are not real
Really interesting and pretty much my area of research, Semantic Web. I think you should move ahead and give meanings to the content published in thousands every minute! But crawling all of those feeds and then extracting key phrases and tagging them would be a pointless task if someone just wants a simple set of statistics… They system has to know what the client WANTS and not produce the same results for just every client that queries the system! I would love to work on enriching the visualizations because I think that’s what the client will really be interested in….Google Visualizations are great but in my opinion we should build our library on top of it.
Just a thought…
I’d like to be able compare competitors in the news — for example to measure the quantity (number) and quality (tone as positive, negative, neutral) of mentions between two companies or products
I would like to know who else is researching a topic I am working on: the impact of new media on the military-media relationship and indeed beyond military-media to wider audiences, and the impact on strategic communications.
One thing that would be very helpful to visualize is the relative increase and/or decrease in media coverage on topics of interest over time. So for example, as health care reform goes from being a presidential election topic, to an actual policy agenda after the election, how has the dynamic of the media coverage been effected? Did it subside considerably on or around Nov 4 and again on Jan 20th? Did it increase around the time of the White House Forum on Health care Reform? If so, how much, and how did the coverage accelerate in quantity and decelerate afterwards? I believe such data, layered over domestic and global events, over on axis of time could help build virtual relief maps of media coverage and how the attention of the media leaves the focus of one news topic and shifts resources to another. For example, during the recent event of Michael Jackson’s death, we can presumably suspect that media coverage on the economy, global warming, the war in Iraq/Afghanistan, health care, all simultaneously dropped as resources were shifted to the announcement of his death. However, we have no way to definitively know how much the coverage shifted, nor can we definitively see respectfully how the coverage shifted differently between media outlets across different countries. A tool that could allow a user to almagamate that kind of data would be phenominal.
I am a journalist who is trying to carve out a niche in the area of improving intergenerational communications about the expanding digital world. It is astounding to me to see so many voices and resources are available on this topic. But it is daunting to attempt to harness them all in some sort of coherent, organized way. It would seem that the Cloud approach could assist me with my work. Could you please keep me in the loop somehow?
Hello, This sounds like a great way to begin to pull news hubs and authorities together to begin to make sense–it’s weird out there now. Thank you! I would like the ability to designate particular RSS feeds and have them “merge and purge” according to set parameters–editing criteria for selecting certain news items over others at certain times.
I’d love to see an analysis of coverage of the Iraq weapons of mass destruction debate before the war, particularly comparisons of the media impact of skeptical reporting, particularly by Knight-Ridder’s Jonathan Landay and Warren Strobel (sp?), and how it compares with impact of reporting by NYT and Washington Post that supported Bush administration claims of an Iraqi weapons program.
People. I want to know about the people in the news, who is being written/blogged about; who is heating up and who is cooling off. Give me lots and lots of people.
following the distribution of news/information would be useful, to see where stories are first found, and following the spread.
I tend to reliably stick to a small number of aggregators when in search of new things to read about. They are aldaily.com, scitechdaily.com, thesituationist.wordpress.com, and http://www.stat.columbia.edu/~gelman/blog/ (secondarily, nybooks.com, nytimes.com, etc). It would be nice to somehow identify salient features (ideas?) in these sources, and broaden the scope of a search in a themed way. Perhaps the best results would come from collaborative filtering, but how to attract a critical mass of recommenders?
I’m a research assistant at Berkman and throughout the summer, I’ve been interested in seeing how much / how little of the content on blogs e.g. http://www.drudgereport.com ends up on mainstream media like ABC. Did the blog report on something first and then was it taken up by mainstraem media or the other way round? And what are the differences/similarities in the “life spans” of stories?
I did my graduate thesis on the difference between the rhetorical visions of the “Plymouth rock” Pilgrims and the Puritans of the Massachusetts Bay colony next door. The scholar Ernest Bormann traced the language of the Puritans through “conservative” rhetoric right up to the modern discourse of Ronald Reagan and found a recurring “fantasy type” of “restoration” to a former golden age (through the method of “purification” — removal of that which is seen as evil — according to my research). Using Bormann’s critical method of “fantasy theme analysis” I discovered that the Pilgrims had a far different fantasy type of “the common good” in which the goal was to uplift all people through the method of inner peace and outward love. I was able to find direct references to the phrase “the common good” in various “progressive” political discourse (most notably in Al Gore’s concession speech) and I am thinking that your media-cloud tool might be a way to potentially track these fantasy types through current political discourse to see if a) these themes are still present and relevant as internal rhetorical divisions in American politics, b) these themes/types truly correspond to “conservative” and “progressive” discourse c) these themes/types are promulgated by any particular media outlets consistently d) what sorts of media pick up on these themes/types (i.e. does the pattern of such media picking up and spreading such themes consistently provide us with a clear picture of separate rhetorical communities in America now. It would be interesting to discover if the rhetorical divisions, that were in existence at the founding of the current nation, are still in play almost 400 years later.
I’ve just been playing around with this and as a tools it’s great. Seriously, very nice work.
I’m working on a few projects at the moment which rely on hyperlink analysis as a way to map online networks. What hyperlink analysis won’t tell you though is what is going on within the network and why sites relate to one another.
If you could develop this so it could look at content on any site, not just news sites, this might have some interesting applications as a way to explain why networks form.
Keep up the good work and thank you.
I use discourse analytics to study language in new media. I have looked at, for instance, transnational public debate in an Indian diaspora community blog as well as uptake of popular ideological discourse in English-only debates in the U.S. both in traditional and new media forms, such as the comment features of on-line newspapers. These geneological snapshots would be strongly complemented with corpus data. The task of compiling such corpora, though, is massive. From the sounds of it (nice article in today’s NY Times), Media Cloud could help me, and others doing similar work, critically examine, as Mr. Zuckerman says in the Times article, how rhetoric changes over time and what role the Internet and the mainstream media play in it. So, I’d like to know more about the pivot term tool and what developments you have for it in the future as a means of charting language use and changes in rhetoric.
I really like this concept. Question: With all the dialogue and so-called “pundits” on cyberspace, how do you authenticate the author is who they say they are? For example, recently Tony LaRussa was a victim of a perpetrator on Twitter. There appears to be know authenticity verification of people in cyberspace so I am currious to hear your thoughts and comments on the topic.
@Zach
you should check out http://www.watchingamerica.com. the site aggregates and translates articles about America from hundreds of foreign news outlets. it’s fascinating and gives a great view of how the rest of the world thinks of us.
I’d like to understand better the “Chilling Effect” i.e. how real world events affect media and citizens behavior in covering controversial stories.
Does the drift of discussion on the Internet affect legislators and policy makers? A correlation between much used terms on blogs.etc., and the words of politicians would be one way to begin to examine this key question about how democracy functions now .
Integration with the newly released Dispute Finder would be very interesting. This tool allows people to create a counter-bibliography with refuting evidence for news claims, or any internet claim. It was one of the more interesting talks at an OSCON presentation on computational journalism a few weeks ago.
Release announcement from Intel on June 20th:
http://stuff.techwhack.com/6808-dispute-finder
Home site at UC Berkeley:
http://disputefinder.cs.berkeley.edu/
OSCON Presentation covering this and other tools:
http://www.slideshare.net/bradstenger/oscon-presentationcomputational-journalism
I work on very different data — Chinese Chan (Zen) texts from the sixth to the eleventh century — and I wonder how adaptable your software tools, or the underlying code, might be. One of the major problems we’re facing is tracing the links between story lines, discussion topics, and doctrinal threads as they moved through the monastic grapevine. We have, or are developing, electronic texts. So, what kind of tagging would we have to do to use software such as this, what other sorts of data preparation would be necessary, what would be the problems working with non-English data (where words are not defined by intervening spaces, for one thing), and what might we expect to discover?