Google Trends for Campaigns

June 11, 2019

Over the past few years we have seen Google Trends becoming quite ubiquitous in politics. Pundits have used Google seach trends as talking points. It is not uncommon to hear news about a candidates search trends the days following a town hall or significant rally. It seems that Google trends are becoming the go to proxy for a candidate’s salience.

As a campaign, you are interested in the popularity of a candidate relative to another one. If candidate A has seen a gain from 50 to 70, that is all well and good. But how does that compare with candidates C and D? There are others potential use cases—that may be less fraught with media interruptions. For example, one can keep track of the popularity of possible policy issues—i.e. healthcare, gun safety, women’s rights.

Though the usefulness of Google Trends search popularity is still unclear, it may be something that your campaign might like to track. In this chapter we will explore how to acquire and utilize trend data using R. This chapter will describe how one can utilize Google Trends data to compare candidate search popularity and view related search terms. This will be done with the tidyverse, and the package trendyy for accessing this data.

Google Trends Data

Relative Popularity

The key metric that Google Trends provides is the relative popularity of a search term by a given geography. Relative search popularity is scaled from 0 to 100. This number is scaled based on population and geography size (for more information go here). This number may be useful on it’s own, but the strength of Google Trends is it’s ability to compare multiple terms. Using Google Trends we can compare up to 5 search terms—presumably candidates.

`trendyy`

Now that we have an intuition of how Google Trends may be utilized, we will look at how actually acquire these data in R. To get started install the package using install.packages("trendyy").

Once the package is installed, load the tidyverse and trendyy.

library(trendyy)
library(tidyverse)

## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0          ✔ purrr   0.3.0.9000
## ✔ tibble  2.1.1          ✔ dplyr   0.7.8     
## ✔ tidyr   0.8.2          ✔ stringr 1.4.0     
## ✔ readr   1.2.1          ✔ forcats 0.3.0

## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

In this example we will look at the top five polling candidates as of today (6/10/2019). These are, in no particular order, Joe Biden, Kamala Harris, Beto O’Rourke, Bernie Sanders, and Elizabeth Warren. Create a vector with the search terms that you will use (in this case the above candidates).

candidates <- c("Joe Biden", "Kamala Harris", "Beto O'Rourke", "Bernie Sanders", "Elizabeth Warren")

Next we will use the trendyy package to get search popularity. The function trendy() has three main arguments: search_terms, from, and to (in the form of "yyyy-mm-dd"). The first argument is the only mandatory one. Provide a vector of length 5 or less as the first argument. Here we will use the candidates vector and look at data from the past two weeks. I will create two variables for the beginning and end dates. This will be to demonstrate how functions can be used to programatically search date ranges.

# to today
end <- Sys.Date()
# from 2 weeks ago
begin <- Sys.Date() - 14

Pass these arguments to trendy() and save them to a variable.

candidate_trends <- trendy(search_terms = candidates, from = begin, to = end)

candidate_trends

## ~Trendy results~
## 
## Search Terms: Joe Biden, Kamala Harris, Beto O'Rourke, Bernie Sanders, Elizabeth Warren
## 
## (>^.^)> ~~~~~~~~~~~~~~~~~~~~ summary ~~~~~~~~~~~~~~~~~~~~ <(^.^<)
## # A tibble: 5 x 5
##   keyword          max_hits min_hits from       to        
##   <chr>               <dbl>    <dbl> <date>     <date>    
## 1 Bernie Sanders         42       33 2019-05-28 2019-06-09
## 2 Beto O'Rourke           3        1 2019-05-28 2019-06-09
## 3 Elizabeth Warren       65       20 2019-05-28 2019-06-09
## 4 Joe Biden              71       34 2019-05-28 2019-06-09
## 5 Kamala Harris         100       12 2019-05-28 2019-06-09

Trendy creates an object of class trendy see class(candidate_trends) trendy. There are a number of accessor functions. We will use get_interest() and get_related_queries(). See the documentation of the others.

To access to relative popularity, we will use get_interest(trendy).

popularity <- get_interest(candidate_trends)

popularity

## # A tibble: 65 x 7
##    date                 hits geo   time          keyword  gprop category   
##    <dttm>              <int> <chr> <chr>         <chr>    <chr> <chr>      
##  1 2019-05-28 00:00:00    58 world 2019-05-28 2… Joe Bid… web   All catego…
##  2 2019-05-29 00:00:00    71 world 2019-05-28 2… Joe Bid… web   All catego…
##  3 2019-05-30 00:00:00    61 world 2019-05-28 2… Joe Bid… web   All catego…
##  4 2019-05-31 00:00:00    43 world 2019-05-28 2… Joe Bid… web   All catego…
##  5 2019-06-01 00:00:00    34 world 2019-05-28 2… Joe Bid… web   All catego…
##  6 2019-06-02 00:00:00    36 world 2019-05-28 2… Joe Bid… web   All catego…
##  7 2019-06-03 00:00:00    35 world 2019-05-28 2… Joe Bid… web   All catego…
##  8 2019-06-04 00:00:00    43 world 2019-05-28 2… Joe Bid… web   All catego…
##  9 2019-06-05 00:00:00    53 world 2019-05-28 2… Joe Bid… web   All catego…
## 10 2019-06-06 00:00:00    49 world 2019-05-28 2… Joe Bid… web   All catego…
## # … with 55 more rows

For related queries we will use get_related_queries(trendy). Note that you can either pipe the object or pass it directly.

candidate_trends %>% 
  get_related_queries() %>% 
  # picking queries for a random candidate
  filter(keyword == sample(candidates, 1))

## # A tibble: 36 x 5
##    subject related_queries value                keyword   category      
##    <chr>   <chr>           <chr>                <chr>     <chr>         
##  1 100     top             trump                Joe Biden All categories
##  2 50      top             joe biden 2020       Joe Biden All categories
##  3 44      top             bernie sanders       Joe Biden All categories
##  4 39      top             donald trump         Joe Biden All categories
##  5 36      top             joe biden age        Joe Biden All categories
##  6 30      top             joe biden news       Joe Biden All categories
##  7 24      top             how old is joe biden Joe Biden All categories
##  8 17      top             joe biden abortion   Joe Biden All categories
##  9 17      top             joe biden polls      Joe Biden All categories
## 10 16      top             creepy joe biden     Joe Biden All categories
## # … with 26 more rows

Visualizing Trends

I’m guessing your director enjoys charts—so do I. To make the data more accessible, use the popularity tibble to create a time series chart of popularity over the past two weeks. We will use ggplot2. Remember that time should be displayed on the x axis. We want to have a line for each candidate, so map the color aesthetic to the keyword.

ggplot(popularity, 
       aes(x = date, y = hits, color = keyword)) + 
  geom_line() +
  labs(x = "", y = "Search Popularity", 
       title = "Google popularity of top 5 polling candidates") + 
  theme_minimal() +
  theme(legend.position = "bottom", 
        legend.title = element_blank())

Google Trends for Campaigns

Google Trends Data

Relative Popularity

Related Queries

trendyy

Visualizing Trends

`trendyy`