Do Russian Blogs Represent an Alternative Public Sphere? Early Results from Russian Media Cloud

Question: What role, then, is the Internet playing in Russian media?

Answer: Elena Vartanova ( Moscow State University Journalism Faculty): It really is a new part of our media system. People are increasingly consuming online news, and online news often takes the first step in agenda-setting. Only then do consumers get more analysis and commentary from print sources.

One of the functions of online media is creating an alternative news agenda. If you watch big television channels you see distilled content, which is double-checked by company managers, by people in power ¬ you won’t find problematic material. The alternative agenda on the Internet is helping Russians see pitfalls and problems. And the Internet has become a tool for people to create public opinion, to support the “man on the street.” In Russia, when mainstream media says something, you should double-check on the Internet. It provides a different point of view.

Interview by Josh Tapper, Nieman Journalism Lab

In the above quote, Elena Vartanova echos two key research questions we have for Russian Media Cloud:

1. Are blogs and other online media provide an alternative public sphere, and;
2. What role do they play in agenda setting of the news.

To begin to test these hypotheses we have built off the hard work by Ethan Zuckerman, Hal Roberts, David Larochelle, Yochi Benkler and Zoe Fraade-Blanar on English Media Cloud, which collects data on different sets of English language blogs and popular traditional media available online (mostly newspapers). For the Russia effort we have an even larger and more varied set of feeds, including:

1. 1000 popular Russian blogs: The Yandex Top 1000 list

2. Over 11,000 Russian language blogs divided into link-based attentive clusters, based on the results of our previous Russian blog research

3. 1000 random, or long tail, blogs based on our own spider of the Russian blogosphere

4. Top 25 ‘mainstream media’: This is currently the Google Ad Planner list of the top 25 most popular news Web sites in Russia, which we filtered to remove sites any sites that are not news related or not primarily about Russia (*See list at bottom of this post)

5. Russian TV news transcripts: Channel 1, Vesti, REN TV, TV Tsentra, NTV, Channel 5, Mir, Zvezda, and TV Stolitsa

6. Russian government Web sites: President Medvedev’s official site, Putin’s official site, the Russian government portal government.ru, and sites of the Ministry of Emergency Situations, Ministry of Justice, Ministry of Defense, and the Ministry of Foreign Affairs

Using the same method as Ethan describes in his blog post on calculating cosine similarity among sources and sets of sources, we are able to draw a visual map that shows how similar these different sets of feeds are to one another, based on content (as opposed to links). What this method allows us to do, and what we have done with all of the below examples, is compare the similarity of bags of words in different media sets. Media Cloud outputs alone do not say anything about the meaning behind those differences between different sources. However, with additional context about what we know of the political situation and media ownership in Russia, as well qualitative analysis of sentences within queries, we can begin to hypothesize about the possible meaning behind similarity scores, word clouds, polar maps and other automated outputs.

As Ethan writes about cosine similarity:

This is a technique computer scientists use to detect a type of similarity between documents. Basically, a computer program counts the appearances of words in a document (in this case, a week’s worth of media coverage by 25 outlets) and compares that frequency list to that of another document. If those documents are identical in word frequency – both mention Obama 23 times, Libya 5 times and basketball twice – they score a 1. If they’ve got no words in common, they score a zero.

(The actual math behind this is wonderfully cool, if slightly mind-bending. Imagine a set of documents with only two words in them – “Obama” and “NCAA”. In source A, Obama is mentioned 8 times, NCAA 2 times. Put a point on a graph at (8,2) – Obama’s our X axis, NCAA our Y axis, and draw a line that passes through 0,0 and 8,2 – that’s the vector that represents set A. In source B, Obama gets mentioned twice, NCAA 8 times – put the point at 2,8 and draw the vector for source B. The angle between vectors A and B is a measure of how similar the sets are, and taking the cosine of that angle is a simple way to scale the value to be between 0 and 1 for angles between 0 and 90 degrees. The trick, of course, is that documents contain words other than Obama and NCAA, and cosine similarity adds a new dimension to our graph for each new term. So the vectors we’re measuring when we compare all the words in 25 media sources over a week to another comparable week exist in 3000-dimensional space. Don’t bother imagining 3000-dimensional space – it will make your head hurt. Just imagine three dimensional space and think about two vectors that each emerge from 0,0,0 and each pass through an arbitrary point in positive x,y,z space – it’s easy enough to imagine measuring the angle between those two vectors. Then take it on faith that, mathematically, you can do the same thing in many-dimensional space.)

Popular Blogs Compared to the Government and Traditional Media

As a first test of whether blogs are different than Russian traditional media and government information channels, in the first polar map we compare the similarity of the Yandex Top 1000 popular blogs compared to the Russian government, TV news transcripts, and top 25 MSM over the period of December 15, 2010 to February 21, 2011. The center node, or pole around which the map is drawn, is the collective content of Russian government feeds over that same time period. The further a source is from the black dot in the center, the more different it is from Russian government feeds. What we see at first glance from this map is that, although fairly overwhelming because of their large number, most blogs are located near the outer ring of this map, while the government, MSM and TV sources are located more closely to the center of the map, showing that the media are more similar to the government than most blogs. This is probably at least in part due to the fact that Russian popular blogs are not focused exclusively on politics, which we see from the content clustering (color) process.

Polar Map

Center Node: Russian Government

The color (and related title) of the nodes is determined by a slightly different process than the location (polar mapping) one. The clustering process is agnostic to the source of the feed, and splits the individual sources into different clusters based on the similarity of words that each uses in a given query made by researchers. The clustering engine uses a simple kmeans implementation based on the cosine similarity of the list of the top 100 non-stopword query words of each media source. This approach returns a different, randomized solution each time, so we run clustering about 20 times and keep the clustering run with the highest sum of total similarity for each cluster. The title of the cluster is the most popular word within the cluster that is ranked lower than that word for all clusters (so if three clusters all have ‘Russia’ as the most popular word, none of them can use ‘Russia’ as the cluster title).

The main clusters that emerge from this query are Film (green), Russia (tan/light orange), Photograph (orange), Site (light blue), and Russian (dark blue). The Russian government, TV and MSM are primarily still found near the center of the map (which is centered around the Russian government feeds), and most of the nodes are colored tan, which represents the “Russia” cluster. Although fairly overwhelming because of their numbers, we see most all of the blogs are located near the outer ring of this map, as in the previous polar maps. This is probably at least in part due to the fact that Russian popular blogs are not focused exclusively on politics.

The color (and related title) of the nodes is determined by a slightly different process than the location (polar mapping) one. The clustering process is agnostic to the source of the feed, and splits the individual sources into different clusters based on the similarity of words that each uses in a given query made by researchers. The clustering engine uses a simple kmeans implementation based on the cosine similarity of the list of the top 100 non-stopword query words of each media source. This approach returns a different, randomized solution each time, so we run clustering about 20 times and keep the clustering run with the highest sum of total similarity for each cluster. The title of the cluster is the most popular word within the cluster that is ranked lower than that word for all clusters (so if three clusters all have ‘Russia’ as the most popular word, none of them can use ‘Russia’ as the cluster title).

The main clusters that emerge from this query are Film (green), Russia (tan/light orange), Photograph (orange), Site (light blue), and Russian (dark blue). The Russian government, TV and MSM are primarily still found near the center of the map (which is centered around the Russian government feeds), and most of the nodes are colored tan, which represents the “Russia” cluster. Although fairly overwhelming because of their numbers, we see most all of the blogs are located near the outer ring of this map, as in the other polar maps.

Oppositional Political Blogs

In the next experiment, we focused just on known political blogs (that we identified in our previous blog research, based on links), to see how different political blogs are from the government and more traditional media sources. In the below polar map, we mapped the similarity of the content in Russian democratic blogs, Russian nationalist blogs, Top 25 mainstream media, Russian TV channels and Russian government Web sites, all compared to how similar they are to the Russian government feeds. The center node, or pole around which the map is drawn, is the collective content of Russian government feeds over a two-month period (in this case, from November 29, 2010 to January 31, 2011.) Again, the further a source is from the black dot in the center, the less similar it is to Russian government feeds.

Center Node: Russian Government
1. Kremlin.ru (Kremlin Web site)
2. Government.ru (Government of Russia Portal)
3. Premier.ru (Vladimir Putin’s Web site)

On the map we see that Russian political blogs on both extremes of the Russian opposition (nationalist and democratic) are the least similar to the Russian government and located in a zone almost completely separated from traditional and online news sources. TV and popular mainstream media are found close to center of the map, and also typically blue in color. The content clusters in this clustering run are ‘crowd,’ ‘Russian (russkaya),’ country, ‘Russian (rossiskaya)’ and a very small cluster around the term ‘happy.’

An example of a democratic opposition blog is that of the Strategy 31 movement, which attempts to organize protests against the government on the 31st of each month that has 31 days, and is located in the outer ring of the map. In the above map we’ve also highlighted a typical nationalist blog. The two word clouds below show the terms used most often by each. The Strategy 31 blog preferentially uses the terms ‘freedom,’ ‘constitution’ and ‘rally (miting).’ The blog from the nationalist cluster includes nationalist language (e.g., using the word Rossiyankovo instead of Rossiskovo), as well as Chechens, Tadzhiks, Pay, Lenin, and Domodevo (the airport where a bombing blamed on Chechens took place).

Word Cloud: Democratic Opposition Blog

Popular words in a democratic opposition blog: Strategy, rally, gathering, Triumfal’noi, freedom, constitution, Nemtsov (an opposition politician arrested at a political protest)

Word Cloud: Nationalist Blog

Popular words in a nationalist blog include: Lenin, Domodedovo, Russian (Rossiyanskovo), Tadzhik, Chechen, Kavkaz, and pay

As one would expect, Russian government Web sites such as Kremlin.ru and Premier.ru are very close to the center. The official Russian government newspaper, Rossiskaya Gazeta, is the newspaper that is most similar to the government.

As one would expect, Russian government Web sites such as Kremlin.ru and Premier.ru are very close to the center. The official Russian government newspaper, Rossiskaya Gazeta, is the newspaper that is most similar to the government.

It is surprising that TV channels are not that different from other news media according to our data. One would have expected TV to be closer to the Russian government than they are based on known ownership and editorial influence over TV channels, and for other online and offline newspapers to be further from the center than Russian TV. It is quite surprising to see Channel 1 as far from the center as it is, but looking at the stories coming through news feed, it seems that this is likely due to a fair number of advertisements for entertainment and other programming highlights on the channel not related to political or other news that are included in its ‘news feed.’ It is worth further investigation to see if our other news feeds capture similar promotional material for non-hard-news stories.

Among the government Web sites, the Ministry of Defense is the least similar to the collective government feeds, while the official Kremlin Web site (primarily about Medvedev) and the official Russian government Web portal Government.ru appear to be the most similar to all government feeds.

The mainstream news sites that are the least similar to the government are 3dnews (by a long shot, it is found in the outer blog ring) and Cnews.ru, which is explained by the heavy technology news for both sites instead of a Russian politics focus. The most similar TV channels to the government are TV Tsentra and Zvezda, a Russian military channel.

Further, we also see that clustering in this map according to content shows that the mainstream media and TV sources are all clustered together in dark blue. And the word frequency cloud also shows that this group is highly focused on Russian government and politics, with ‘Russia,’ ‘President,’ ‘government’ and ‘Putin’ among the most frequently used words.

These early findings seem to indicate that, for whatever reason, Russian TV channels and newspapers (traditional and Web native) cover topics similar to each other and to the Russian government. It will require more research to understand why this might be the case. However, a few theories are possible. This may also be a reflection of the dominance of two individuals over Russian politics, Medvedev and Putin. As the only two people whose decisions really matter in politics these may be the only political stories that ever get covered. However, it may be support for the theory of US media scholar Robert Entman, who argues that in the US the White House sets the news agenda, especially regarding international affairs, and Lance Bennett, who argues that the media simply index opinion of elites, including government elites, as well as the more general theories around media gatekeepers. This effect may be amplified in semi-authoritarian settings like Russia where sources of power and authority are more limited than in liberal democracies. It is also possible that we are detecting some level of self-censorship or even bias in the traditional media, caused by concerns over upsetting the Kremlin. Again, our research cannot yet say why traditional media are so similar to Russian government official information channels, simply that they are similar in the words they use, and we infer from that the stories that they cover.

We are currently exploring if using word frequency counts are a good way of measuring the agenda of a given media set (what that set or individual media sources talk about). However, even if they are, this will likely not tell us what frame a given source employs (how they talk about a given issue). So, just because they both frequently talk about Putin and Medvedev, does not necessarily mean they are talking about him in the same way, which would require human coding of blog posts or automated sentiment analysis.

Still, it seems that based on this early output from Russian media cloud that opposition blogs are indeed different from both government information channels and popular media, and that they are likely providing an alternative agenda to mainstream sources. More research is required to understand how these different sources talk about the same topic, and if blogs in any way have a different agenda than other media. The recent events in Egypt provide an excellent example of the appearance of an agenda item in the blogosphere that is almost completely absent from official Russian government information channels. That will be the focus of my next Media Cloud post.

Cross posted on the Internet & Democracy Blog.

*”Top 25 Mainstream Media” Currently in Media Cloud (We are updating this list based on analysis of additional rankings of Russia media besides Google Ad Planner)
RIA Novosti
Komsomolskaya Pravda
lenta.ru
gazeta.ru
3D News
Regnum
Vzglad
Newsru
Svobodnaya Pressa
Inosmi
Vedomosti
Argumenti i Fakti
Rossiskaya Gazeta
Pravda
Cnews
Dni.ru
Rosbalt
Interfax
Kommersant
Moskovskii Komsomolets
expert.ru
Izvestiya
bfm.ru
Trud
fontanka.ru

About Bruce Etling

Director of the Internet & Democracy Project at Harvard's Berkman Center for Internet & Society
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

3 Responses to Do Russian Blogs Represent an Alternative Public Sphere? Early Results from Russian Media Cloud

  1. Pingback: Russia: Quantitative Research Proves Blogs Set an Alternative News Agenda · Global Voices

  2. Pingback: Russian research shows blogs set alternate news agenda at Mary P Madigan's Journal

  3. Pingback: Official Russia | Russia: Quantitative Research Proves Blogs Set an Alternative News Agenda

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>