Advanced Search Query Guide

💡

This guide will help you structure queries using the ‘Advanced Search’ option in the Media Cloud Search tool. The default simple search automatically helps to translate your query text to the proper syntax. Example queries are provided and written in [brackets]. The bracket symbol is not part of the query itself and should not be used in your queries.

Online News
(via Media Cloud Legacy)

Keywords

Building a query consists of choosing the words or phrases to search for and entering them into the “search for” field. Media Cloud’s tools now search at the story level for the keywords you enter. 

  • Example: [Refugees] will return all stories that include the term refugees anywhere in the story content.

Boolean Connectors

OR

This is the default connector for queries. That is, if you enter a list of words without a Boolean connector, or a list of words connected by OR, the query will retrieve stories that contain any of the words in your list.

  • Example: [Donald Trump] will return the same results as [Donald OR Trump]. These results will include any stories that match the word “Donald” or the word “Trump.” For example, you may get a story about Donald Duck, or about Ivanka Trump, that does not mention Donald Trump.

AND

Using AND to connect terms in your query will allow you to find stories in which all the terms appear. This is particularly useful when the terms you are looking for do not always appear in the same order. If the terms do appear in a certain order, quotation marks should be used. If you want to find stories in which two terms are used close to each other, you can use a proximity search, detailed in the next section.

  • Example: Searching for [child AND marriage] will return all stories that match the word “child” and the word “marriage,” which is likely to cover a broad range of subtopics (e.g., news of famous people having children, articles about contraception, lifestyle editorials). If you want to research the issue of child marriage specifically, you should search for [“child marriage”], to return only stories that have 'marriage' immediately following "child".

NOT

In order to refine a search, it is often useful to eliminate from your query results those stories in which a certain term appears. When you use NOT between your terms, the tool will return stories that match the first term but do not include the second term.  You can also use a minus sign to negate a given phrase as an alternative to NOT.

  • Example: If you are interested in finding information about the Zika virus, but not about its incidence in Brazil, you can search for [zika NOT Brazil] or [zika -Brazil].

Other Search Parameters

Capital letters

Queries are NOT case sensitive, so using lowercase or capital letters does not make a difference.

  • Example: Searching for [West] will produce the same results as [west]. If you are looking for articles about Kanye West and do not want articles about the direction west, you should search [“Kanye West”] or [Kanye AND West].

Quotations

When searching for a phrase (a series of words that always appear in the same order) it is necessary to use quotation marks. If you cut and paste a query that uses quotation marks from a program like Word, Media Cloud will not understand them. We recommend directly typing your query into the search, or pasting from a text editor like Notepad.

  • Example: To search about the topic of climate change, search [“climate change”] to find stories that match that phrase (i.e., match the words in that order, without any words in between them). Otherwise, if you only search [climate change], you will get stories that contain the word climate or the word change anywhere in the story.

Hyphens

When searching for a multi-word term written with a hyphen (well-being, for instance), place the term between quotation marks. Otherwise, Media Cloud will convert the hyphen into a blank space and treat the term’s words as separate. You should consider though that searching for a hyphenated word between quotation marks will retrieve that hyphenated word but also the consecutive occurrences of the two words.

  • Example: If you want to search on the term “well-being,” you should use the query [“well-being”], which will retrieve stories that have the term “well-being” or consecutive use of the terms “well” and “being.” If you were to instead search [well AND being], you would get stories that had the words “well” and “being” anywhere in the story.

Different forms of words

If you are interested in searching for all the different forms of a word, you should use the wildcard symbol *. This will return stories that match any word beginning with the stem keyword you searched. If you want to search for only one wildcard character, you can use the ? character to represent any single character. Note that you can only use the wildcard symbol at the end of a word, not the beginning (i.e., you can search key* but not *key). 

  • Example: Searching for [key*] will retrieve stories that contain a word of any length that begins with “key,” such as key, keys, keyboard, keystone, keynote, keynesian, keywords, etc. Searching for [key?] will return stories that match only 4-letter words that begin with “key,” such as keys. 

Parentheses

If your query is somewhat complex you will probably need to use parentheses to structure it and nest search terms. All of the above rules and the Boolean connectors still apply within the parentheses.  

  • Example: A query such as [(“illegal immigration”) AND (politics OR economy OR campaign)] will retrieve any story containing the phrase “illegal immigration” AND any of the other three terms.

Other characters

Media Cloud does not support searching any other punctuation other than those listed - parenthesis, hyphen, asterisk, and quotation mark. Any other punctuation, such as the @ symbol or the # symbol, will not be recognized by the search feature.

  • Example: Searching for @POTUS or #POTUS is equivalent to searching for POTUS, even if you place the term between quotation marks.

Searching in another language

Media Cloud supports searching for stories by the primary language in which each story was written. To run a search query for stories written in a specific language, type your term and then use the Boolean connector AND to add the language search tag of “language:” followed by the two-character code for the language you want to search.

  • Example: The query [queso AND language:es] will retrieve stories that contain the word “queso” that have also been detected by our system as being written in Spanish.

Searching for headlines

You can search only in the headline or title of an article by using the “title:” search tag.

  • Example: To find articles that have “stem cell” or “stem cells” in the title, use the query [title:(“stem cell*”)].

Proximity search

A normal search in Media Cloud will return stories that match your query. If you want to search at a more narrow level and find only sentences that match your query, we recommend using a proximity search. Enter your keywords in quotation marks and then include a tilde ~, followed by the number of words you want to limit your search to in terms of proximity. For a typical sentence search, we recommend using the number 10. Learn more about proximity searches. Please note that we do support searching for different forms of a word in proximity search through the use of the wildcard symbol *.

  • Example: To find stories that contain the words “Trump” and a reference to the Republican party in the same sentence (which could be “Republican,” or “Republicans”), use the query [“Trump republican*”~10]. This will return stories in which Trump and any word of any length beginning with “republican” are within 10 words of each other.

Frequency Search

If you want to find stories that mention a certain keyword multiple times, you can search for frequency by hacking the proximity search method. Simply enter your keyword in quotation marks, repeating the keyword as many times as you want the story to contain the keyword. Close the quotation marks and then include a tilde ~, followed by the number 1000; this is the word length and should cover most news stories. 

  • Example: To find stories that mention children multiple times, use the query ["children children children"~1000].

‍

Online News
(via Wayback Machine or Media Cloud)

Using Boolean Operators

Remember "order of operations"? Parentheses group clauses together. AND and OR create clauses. Example: (church OR faith) AND (lesbian OR gay OR LGTBQ)

Finding Phrases

Use quotes to look for multiple words used together. Example: "mass incarceration"

Word Stems

Match multiple conjugations/versions of a word. This will match "learn", "learner", "learning", etc. Example: learn*

Negations

Find stories that don't contain certain words that are messing up your results Example: faith NOT "Faith Hill"

Proximity

Find sets of words that appear close to each other. This will find them within 100 words of each other. Example: "girl education" ~100

Reddit Submissions
(via Pushshift.io)

Keywords search syntax

  • Use "+" to find submissions with all of a set of phrases (AND) massachusetts+religion
  • Use "|" to find submissions with one of a set of phrases (OR) massachusetts|ma
  • Use "-" to find submissions with one phrase and not another (NOT) shooting-basketball
  • Use double quotes to find exact phrases "mass shooting"
  • Use parentheses to build up more complicated queries ((massachusetts|ma)+(religion|faith))-(moma|"good faith")

‍

Twitter
(via official academic API)

  • Supports historical Twitter search
  • Filters for submissions between the start and end date
  • Attention chart only shows 30 days before end date

Keywords search syntax

  • Defaults to AND: "Massachusetts religion"
  • Use OR for either: "massachusetts or ma"
  • Use - for negations: "shooting -basketball"
  • Hashtags work normally: "#soccer"
  • Double quotes to find exact phrases - "mass shooting"

‍

Still have questions?

Send us an email at info@mediaecosystems.org or fill out our support form.