• R pacaghes needed in this section
library(gghighlight)
library(ggplot2) # {tidyverse} includes {ggplot2}
library(ggrepel)
library(magrittr)# {tidyverse} includes {magrittr}
library(patchwork)
library(plotly)
library(reactable)

1. Drawing a graph on R

  • The most basic syntax of R is called “base R.”
  • The {ggplot2} is a package included in a packaged called the {tidyverse}.
  • Two ways of drawing a graph on R
1. Base R The most basic syntax of R
2. ggplot2 Draw a graph by adding layers
  • Examples of drawing a graph on base R and {ggplot2}

  • You can draw a graph either on base R or on {ggplot2}
  • But, drawring a graph on{ggplot2} is way convenient

2. What is {ggplot2}?

  • An R package made by Hadley Wickham for data visualization
  • ggplot: the grammer of graphics
  • Hadley Wickham distributes {ggplot2} for free

2.1 Two core features on {ggplot2}

1. {ggplot2} implements a graphical grammar proposed by Wilkinson, L. (2005)
2. {ggplot2} has multiple functions corresponding to structural elements of a graph, which work like a layer

2.2 Advantages of using {ggplot2}

  1. R code is easier to read and neat
  2. Easy to change a graphical outlook
  3. Reproducibility of a graph is guranteed
  4. You can write R code based on graphical grammar
  5. Graphical components works as layers
  6. Superior as a tool for an academic explorataion
  7. Abundant R pachages
  8. Abndant {ggplot2} communities

2.3 Preparing {ggplot2}

  • You need to install and read {tidyverse} to start using {ggplot2} on Rmd file
  • The {tidyverse} is a family of packages, including {ggplott2}, {dplyr}, {tidyr}, {readr}, {purrr}, {tibble} and a few others which are useful for data manipuration and data visualization 
  • The {tidyverse} is in continuing development, meaning that some of the fuctions are subject to change though many of the core functions are fairly unchaged.
  • If you encounter difficulties getting your R code to run, it may be because of the {tidyverse} has changed.
  • If this is the case, then you can look up documentation about the package to learn how to update your code.

  • Since {tidyverse} contains {ggplot2}, all you have to do is to load {tidyverse}
  • To use {tidyverse}, you need to take the following 2 steps:
1. install {tidyverse}
  • You type install.packages(“tidyverse”) in Console
    → Hit the Return key
install.packages("tidyverse")
  • You need to do this only once
  • install.packages("tidyverse") will download the required package materials to your computer.
2. Load {tidyverse}
  • After you have successfully download {tidyverse}, you load it for your current Rmd file with the library(tidyverse)
  • You type library(tidyverse) in a chunk
    → Click knit
library(tidyverse)
  • Although you do not need to install {tidyverse} at the start of every R session (or R script), you do need to load {tidyverse} every time.
  • If you receive an error that a certain function is not found, it may mean that you forget to run a library(tidyverse) command for {tidyverse} that contains that function, such as ggplot() or readr(), etc.

3. How does {ggplot2} work?

  • I will introduce how {ggplot2} works here
  • As an example, I will draw a scatterplot using {ggplot2} and how to customize it
  • You need the following three minimum components to draw a scatterplot using {ggplot2}
3 minimum components to draw a scatterplot using {ggplot2}
3 minimum components to draw a scatterplot using {ggplot2}
Item {package}:function() What the item does
1. Assign the data ggplot(data = ) Select the data
2. Assign the variables aes() Select the variables, such as x, y
3. Assgin a graph geom_*() Select the type of graph
1. Select the data
  • In drawing a graph using {ggplt2}, you need to use a tidy data which is easy to read for a computer, but not for us
  • For details on tidy data, see 1. Data Handling(Advance)
2. Assign the variables
  • Selecting variables for x axis and y axis is called “mapping”
  • “Mapping” means corresponding dots, lines and surface to data and variables
  • “Mapping” means assigning how data is visualized
  • In mapping, aes() function is used
  • aes() means aesthetics
  • You can assign multiple arguments within the parenthesis, ()
3. Assign a graph
  • You can choose a most appropriate graph for your data by selecting geometric objects, shown as geom_*()
  • The following four geometric objects are the most frequently used

geom_*()  

Type of variables Graph Geometric Object
Discrete Bar chart geom_bar()
Continuous Histogram geom_histogram()
Continuous Boxplot geom_box()
Continuous Scatter plot geom_point()
Continuous Line graph geom_line()
Continuous Dot plot geom_point()
  • In this section, we use geom_point() to draw scatter plot

Image of how {ggplot2} complete a graph

  • {ggplot2} takes a graphical element as a layer

  • Geometric objects, such as ggplot(), geom_point(), scale() generate the layers

  • Each individual geometric object is added with an operator, +

  • First, you assign three essential items:ggplot(), aes(), geom_point()

  • Depending on what kind of graph you want to draw, you can customize colors, shape, and size of your graph using scale function - You can customize x axis and y axis using coord function

  • Let me show you the image of how {ggplot2} complete a graph

The image of how {ggplot2} complete a graph
The image of how {ggplot2} complete a graph

Core items to draw a graph
Layers customizing outlook of a graph
Layers customizing outlook of a graph

  • You don’t have to memorize entire R code
  • It would be good for you to understand the mechanism how {ggplot2} works, and write your R code googling on the web
  • For details on how to write a graph using {ggplot2}, see 5. ggplot2 (Barchart)ggplot2 (Dot plot)

Let’s draw a scatter plot using a fake data (df1)

  • Suppose you want to use data frame (df1)
  • Set math score (math) as x axis
  • Set statistics score (stat) as y axis
  • Change colors of the dot by gender
Two ways of assigning aes() for entire graph:
Range of assignment
(1) Assign aes() as an independent layer Whole graph
(2) Assign mapping argument within ggplot() Whole graph
(1) Assign mapping argument, aes(), as an independent layer
library(ggplot2)  
ggplot(data = df1) +  
  aes(x = math,   
  y = stat,       
  color = gender) +   
  geom_point()     

(2) Assign mapping argument, aes(), within ggplot()
library(ggplot2) 
ggplot(data = df1,   
   mapping = aes(x = math, 
y = stat,   
color = gender)) +  
  geom_point()   
  • You get the same results from (2) and (1)

How to assign aes() ・If you type the following R code, you can get a simple scatter plot
・You can get the same scatter plot in three ways:

(1) Assign mapping argument, aes(), as an independent layer  

ggplot(df1) +      
  aes(x = math,  # aes() is an independent layer
  y = stat,       
  color = gender) +  
  geom_point()   

(2) Assign mapping argument, aes(), within ggplot()

ggplot(df1,
   mapping = aes(x = math, # mapping = omissible  
y = stat,
color = gender)) +
  geom_point()

3. Assign mapping argument, aes(), within geom_point()

ggplot(df1) +
  geom_point(mapping = aes(x = math, # mapping = は省略できる
  y = stat,
  color = gender))

  • You can draw a simple scatter plot in these three ways
  • But, you need to make it a better one by customizing it
  • You can make customize it by adding another layer

10 most commonly used layers:

Elements {package}:function() What you can do
1. Labels labs*() Customize the labels on legend and axis
2. Theme theme_(), theme_*() Customize the outlook of a graph
3. Coordinate System coord_*() Customize the outlook of coordinate system
4. Statistics stat_*() Customize statistical stuff
5. Scale scale_*()_ Customize glaph’s colors and shapes
6. Hilight {gghighlight} Emphasizing a particular data
7. Identifying dots {ggrepel} Identify what dots represent in scatter plot
8. Facet facet_*() Making multiple graphs
9. Patchwork {patchwork} Arranging multiple graphs
10. Interactive graph {plotly} Making an interactive graph
Note:
  • Regarding 3. Coordinate System, {ggplot2} automatically optimizes your setting as default
  • You need to install {gghighlight} to use 7. Hilight

4. Let’s draw a scatter plot using {ggplot2}

4.1 A simple scatter plot

  • x axis・・・ math score
  • y axis・・・ stat score
  • Each dot colored by gender

Step 1: Prepare data

  • Make four fake variables (name, math, stat, gender) and merge them into data frame, named df1
# Make 4 variables  

name <- c("Joe", "Ze'ev", "David", "Mike", "Ross", "Woojin", "Inha", "Jih-wen", 
  "Mark", "Dennis", "Carol", "Shira",  "Mimi", "Amital", "Rachel", "Ariel", 
  "Kelly", "RongRong", "Kathy", "Barbara")
math <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 4, 5, 6, 7, 9, 8, 9, 8, 10)
stat <- c(2, 4, 6, 5, 7, 9, 7, 10, 12, 15, 14, 13, 12, 13, 11, 10, 9, 8, 6, 4)
gender <- c("Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female")

# Merge the 4 variables into data frame, df1   

df1 <- tibble(name, math, stat, gender)
  • Let’s check how df1 looks like
df1
# A tibble: 20 × 4
   name      math  stat gender
   <chr>    <dbl> <dbl> <chr> 
 1 Joe          1     2 Male  
 2 Ze'ev        2     4 Male  
 3 David        3     6 Male  
 4 Mike         4     5 Male  
 5 Ross         5     7 Male  
 6 Woojin       6     9 Male  
 7 Inha         7     7 Male  
 8 Jih-wen      8    10 Male  
 9 Mark         9    12 Male  
10 Dennis      10    15 Male  
11 Carol        2    14 Female
12 Shira        4    13 Female
13 Mimi         5    12 Female
14 Amital       6    13 Female
15 Rachel       7    11 Female
16 Ariel        9    10 Female
17 Kelly        8     9 Female
18 RongRong     9     8 Female
19 Kathy        8     6 Female
20 Barbara     10     4 Female
  • Looks good!

Assign data: ggplot()

  • You need to do mapping, meaning assigning data within aes()
  • ggplot() has two arguments: data and mapping
  • First, let’s assign df1 as data to use
  • Then, we can see the canvas where we draw a graph
ggplot(data = df1)  

  • We can see a blank canvas

Step 2: Assign the variables

  • You can add a new layer (in this case, aes()) by using +
  • What yo do here is to link aesthetic element to data
  • math is linked to x axis
  • stat is linked to y axis
    → aes(x = math, y = stat)
ggplot(data = df1) +
  aes(x = math, y = stat) 

  • Now, you see the following canvas with two labels on x axis and y axis

Step 3: Select a graph

  • You choose a geometric object, geom_point() to draw a scatter plot
  • You can add geom_point() by using +
ggplot(data = df1) +
  aes(x = math, y = stat) +
  geom_point()

  • aes(x = math, y = stat) is an independent layer
  • But, you can put aes(x = math, y = stat) within geom_point()
  • This means that you map aes() within geom_point()
  • Within geom_point(), you need to map by tying mapping =
  • Within aes(), you need to link each element (such as a dot, line, or surface) to a particular variable
    → Here, we link x axis is linked to math, y axis is linked to stat, and each dots are colored by gender
ggplot(data = df1) +
  geom_point(mapping = aes(x = math, 
   y = stat, 
   color = gender))

  • Both data = and mapping = are the first argument and they are omissible
    → You can rewrite the R code above as follows:
ggplot(df1) +
  geom_point(aes(x = math, 
 y = stat, 
 color = gender))
  • Further, by using the pipe operator (%>% or |>), you can rewrite it as follows:
  • You can use the pipe operator (%>% or |>) to link command together.
  • This tell R to do something and then something else to the output of the first something.
  • Chaining functions together like this will become very useful as your tasks become more complicated.
df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
                 y = stat, 
                 color = gender))
Mapping aes() and omission of R code

Summary on mapping aes() ・In Step 2: Assign the variables, aes(x = math, y = stat) is added by + after ggplot(data = df1)

ggplot(data = df1) +
  aes(x = math, y = stat) +
  geom_point()

aes(x = math, y = stat) can not only be added within ggplot()、but also be added within geom_point()

  • In sum, you have two ways of mapping aes():
Mapping aes() within ggplot()
df1 %>% 
  ggplot(aes(x = math, 
 y = stat, 
 color = gender) +
  geom_point()
Mapping aes() within geom_plot()
df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
 y = stat, 
 color = gender))
  • Which way you choose all depends on what you want to do in your analysis

How pipes (%>% or |>) are used ・The pipes (%>% or |>) allow you to express a sequence of multiple operations
%>% and |> can be used interchangeably.
・Pipes can greatly simplify your code and make your operations more intuitive
・The pipe operator (%>%) is automatically imported as part of the {tidyverse} library
・Pipes (%>%) are included in {magrittr} package
・{magrittr} package is included in {tidyverse} package
→ You need to read either of the following packages to use the pipe operator (%>%)

library(magrittr)  
library(tidyverse)

The pipe operator (%>%) automatically passes the output from the first line into the next line as the input

  • You can use the pipe operator (%>%) to link command together.
  • This tell R to do something and then something else to the output of the first something.
  • Chaining functions together like this will become very useful as your tasks become more complicated.
Let’s take a look at an exmple of using the pipe
  • Generate vectors from 1 to 10
1:10
 [1]  1  2  3  4  5  6  7  8  9 10
  • Calculate 1 + 2 + .... + 10
sum(1:10)
[1] 55
  • If you use pipe (%>%), you write R code as follows:
1:10 %>%  # Generate vectors from 1 to 10 
  sum()  # Add them all  
[1] 55
  • If you want to calculate the square root, …
1:10 %>%# Generate vectors from 1 to 10 
  sum() %>% # Add them all  
  sqrt()   # Calculate the square root  
[1] 7.416198
  • Generate vectors from 1 to 10 → Add them all → Calculate the square root
  • This is easier to interpret the sequence of operations
If you don’t use pipes,…
sqrt(sum(1:10))
[1] 7.416198
  • This is less intuitive because you have to think backward
  • Calculate the square roof ← Add them all ← Generate vectors

How to interpret R code with pipes (%>%) ・You can interpret the R code you made in 4.1 A simple scatter plot as follows:

df1 %>%       # Use df1 as data
  ggplot(aes(x = math,   # Assign x = math 
 y = stat,   # Assign y = stat
 color = gender)) + # Dots are colored by gender
  geom_point()  # Draw a scatter plot

df1 %>% ggplot() means the first argument of ggplot() is df1 Interpretation of the R code:
Use df1 as data
Assign x = math
Assign y = stat
Dots are colored by gender
Draw a scatter plot
・You don’t have to go backward in interpreting the R code
・R code with pipes (%>%) are intuitive and easy to follow

5. Customizing a graph (basic)

  • Here, we are customizing a simple scatter plot we made above by assigning the following 9 most commonly used aesthetic elements:
Elements function
① Dot color aes(color = ...)
② Dot shape aes(shape = ...)
③ Dot size size = ...
④ Background color theme_bw()
⑤ Label the axis labs(x = "...", y = "...")
⑥ Main title ggtitle(".....)
⑦ Show Japanese in a graph theme_bw(base_family = "HiraKakuProN-W3")
⑧ Legend Location theme(legend.position = "bottom")
⑨ Show a regression line geom_smooth(stat = "lm")

5.1 Dot color, shape, size, and background

  • Let’s customize ①, ②, ③, and ④
  • Note that since size and shape are applied for every point, you assign them out of aes()
df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
   y = stat, 
   color = gender),
size = 3, 
shape = 17) +
  theme_bw() # Change the backgroud color to black-and-white

  • You can select dot shape from the following list:

Depending on the location of argument of aes(), dot color changes

Dot color changes depending on the location of argument of aes()

(1) color mapped within aes() Dots are colored depending on the value of the variable
(2) color mapped out of aes() : Applied to all dots
(1) When color = gender is mapped within aes()
plt_1 <- df1 %>% 
  ggplot() +
  geom_point(aes(x = math,
   y = stat, 
   color = gender)) +   # mapped within aes()
  ggtitle("mapped within aes()") +
  theme_bw(base_family = "HiraKakuProN-W3")
plt_1

(2) When color = "magenta" is mapped out of aes() : Applied to all dots
plt_2  <- df1 %>% 
  ggplot() +
  geom_point(aes(x = math,
   y = stat), 
   color = "magenta") +    # mapped out of aes()
  ggtitle("mapped out of aes()") +
  theme_bw(base_family = "HiraKakuProN-W3")
plt_2

library(patchwork)
plt_1 + plt_2

5.2 Label the axis

  • Customize ⑤

  • If you want to put a labe on x-axis and y-axis, you use labs()

  • labs(x = "...", y = "...")

df1 %>% 
  ggplot() +  
  geom_point(mapping = aes(x = math, 
y = stat,
color = gender)) + 
  labs(x = "test score(math)", y = "test score(stat)")

  • You see the garbled characters on the labels of x-axis and y-axis
    → For the solution of the garbled characters, see Step 8

5.3 Main title

  • Customize ⑥
  • If you want to put main title, use ggtitle("...")
df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,# mapping = は省略可
y = stat,
color = gender)) +
  labs(x = "test score(math)", y = "test score(stat)") +
  ggtitle("Scatter plot of the mathematics and statistics exam")

  • You still see the garbled characters on the labels of x-axis and y-axis
    → For the solution of the garbled characters, see Step 8

5.4 Show Japanese in a graph

  • Customize ⑦
  • You can fix the garbled characters on RMarkdown, you should set the following:
How to fix the garbled characters on RMarkdown
1. In RStudio menu, ToolsGlobal OptionsGraphics
2. Choose AGG in Backend → Click Apply

:::

Setting to use Japanese in a graph using theme()

  • If you want to use Japanese in variable name or main title, you add the following R code after +
  • theme_bw(base_family = "HiraKakuProN-W3")
  • You can change the backgroud color to black and white by typing theme_bw
  • You can also change the font size with base_size = ...
df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,# mapping = は省略可
y = stat,
color = gender)) + 
  labs(x = "test score (math)", y = "test score (stat)") +
  ggtitle("Scatter plot of the mathematics and statistics exam.") +
  theme_bw(base_family = "HiraKakuProN-W3", 
   base_size = 12)           

  • You see that the garbled characters are normalized displaying Japanese on x-axis and y-axis

2 ways of adding main title to a graph 1. ggtitle(".....")
2. labs(title = "....")

・You get the same result if you use the following R code:

df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,# mapping = は省略可
y = stat,
color = gender)) + 
  labs(x = "test score (math)", 
  y = "test score (stat)",
  title = "Scatter plot of the mathematics and statistics exam.") +
  theme_bw(base_family = "HiraKakuProN-W3", 
   base_size = 12)           

5.5 Legend Location

  • Customize ⑧

  • You can customize the location of legend adding the following R code: theme(legend.position ="")

  • You have the following 5 options:
    "none"
    "top"
    "left"
    "right"
    "bottom"

  • Here, lets put the legend at the bottom: theme(legend.position = "bottom")

df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,
y = stat,
color = gender)) + 
  labs(x = "test score (math)", y = "test score (stat)") +
  theme_bw(base_family = "HiraKakuProN-W3", 
   base_size = 12) +          
  theme(legend.position = "bottom") # put the legend at the bottom  

  • For details on the locatio of legend using {ggplot2}, see this

5.6 Show a regression line

  • Customize ⑨
  • geom_stat(stat = "lm")

If you want to use the same color for dots and a line

  • You need to map within aes()
  • Add geom_smooth(method = lm), then you draw a line for your linear model
  • Add geom_smooth() following an operator +
df1  %>% 
  ggplot() +
  geom_point(aes(x = math,
   y = stat, 
   color = gender)) +
  geom_smooth(method = lm) +   # Draw a regression line  
  ggtitle("Scatter plot of the mathematics and statistics exam") +
  theme_bw(base_family = "HiraKakuProN-W3")
  • In this case, you face the following error message:

  • This message says x and y are missing, so you cannot draw a regression line
  • Why this happen?:Two aesthetic elements (x and y) are only linked WITHIN geom_point()
  • Solution:You need to link two aesthetic elements (x and y) within ggplot() (which is a higher layer than geom_point())
df1  %>% 
  ggplot(aes(x = math,   # Map x and y within ggplot() 
y = stat, 
color = gender)) +
  geom_point() +
  geom_smooth(method = lm) +
  ggtitle("Scatter plot of the hight and weight") +
  theme_bw(base_family = "HiraKakuProN-W3")

If you want to use the dots in different colors, but the color of the lines is the same

  • Here, let’s draw lines in black
df1  %>% 
  ggplot(aes(x = math,   # ggplot() の内部で x と y を指定
y = stat)) +
  geom_point(aes(color = gender)) +
  geom_smooth(aes(group = gender), 
method = lm,
color = "black") +
  ggtitle("Scatter plot of the mathematics and statistics exam") +
  theme_bw(base_family = "HiraKakuProN-W3")

Color variations available for {ggplot2}As many as 657 colors ara available for {ggplot2} you can use
・You can choose and assign colors, like "deeppink", "skyblue", "royalblue" OUT OF aes()

df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
   y = stat),
size = 3,  
color = "deeppink", # Assign colors out of aes() 
shape = 8) +
  theme_bw()

・You can check how the name of the colors you can use for {ggplot2} by typing colors() in Console
・Here, I will show you an example:

head(colors())
[1] "white"         "aliceblue"     "antiquewhite"  "antiquewhite1"
[5] "antiquewhite2" "antiquewhite3"

・If you use HEmath Code, you can use as many as 16,777,216 colors!!!
・For instance, if you assign red, you type "#FF0000"
・The followings are a part of entire colors available in HEmath Code

5. Customizing a graph (advance)

  • Here I introduce how to customize a graph using 10 most commonly used layers

10 most commonly used layers:

Elements {package}:function() What you can do
1. Labels labs*() Customize the labels on legend and axis
2. Theme theme_(), theme_*() Customize the outlook of a graph
3. Coordinate System coord_*() Customize the outlook of coordinate system
4. Statistics stat_*() Customize statistical stuff
5. Scale scale_*()_ Customize glaph’s colors and shapes
6. Identifying dots {ggrepel} Identify what dots represent in scatter plot
7. Hilight {gghighlight} Emphasizing a particular data
8. Facet facet_*() Making multiple graphs
9. Patchwork {patchwork} Arranging multiple graphs
10. Interactive graph {plotly} Making an interactive graph

Data preparation

  • Download Japanese lower house election data (1996-2021): hr96-21.csv
  • In your RProject folder, make a new folder, named it data, and put hr96-21.csv in it
hr <- read_csv("data/hr96-21.csv", na = ".")
  • hr96_21.csv is a collection of Japanese lower house election data covering 9 national elections (1996, 2000, 2003, 2005, 2009, 2012, 2014, 2017, 2021)
  • Check the name of variables hr contains
names(hr)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"
  • hr has the following 23 variables
variable detail
year Election year (1996-2017)
pref Prefecture
ku Electoral district name
kun Number of electoral district
rank Ascending order of votes
wl 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
nocand Number of candidates in each district
seito Candidate’s affiliated party (in Japanese)
j_name Candidate’s name (Japanese)
name Candidate’s name (English)
previous Previous wins
gender Candidate’s gender:“male”, “female”
age Candidate’s age
exp Election expenditure (yen) spent by each candidate
status 0 = challenger / 1 = incumbent / 2 = former incumbent
vote votes each candidate garnered
voteshare Vote share (%)
eligible Eligible voters in each district
turnout Turnout in each district (%)
castvote Total votes cast in each district
seshu_dummy 0 = Not-hereditary candidates, 1 = hereditary candidate
jiban_seshu Relationship between candidate and his predecessor
nojiban_seshu Relationship between candidate and his predecessor
2 ways of checking the data downloaded:
(1) DT::datatable()
DT::datatable(hr)
(2) reactable:reactable()
  • Using reactable:reactable() enables us to search with multiple conditions
reactable::reactable(hr,
  filterable = TRUE,    # Set you can search
  defaultPageSize = 10) # Set the maximum number of rows  

6.1 Labels

  • You need to put an appropriate variable name
  • It should be not too long and not too short
  • It should be short enough to correctly deliver what it means to you and others
  • You use labs() to put a variable name
Elements {package}:function() What you can do
1. ラベル labs*() Customize the labels on legend and axis
  • Let’s draw a scatter plot between exp and voteshare in the 2005 lower house election:
  • Suppose you are interested in the difference between the two major political parties in Japan on this scatter plot: LDP
  • LDP (Liberal Democratic Party): shown as 自民 in variable seito
    x-axis: exp・・・Election expenditure (yen) spent by each candidate
    y-axis: voteshare・・・Vote share (%)
The scatter plot without customizing labels
hr %>% 
  filter(year == 2005) %>% # Select 2005 election data only  
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% # Make a dummy variable for LDP: ldp = 1, 0 otherwise  
  ggplot() +
  geom_point(aes(x = exp,
   y = voteshare, 
   color = as.factor(ldp),
   alpha = 0.5)) +
  theme_bw() 

The scatter plot with customizing labels
hr %>% 
  filter(year == 2005) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% 
  ggplot() +
  geom_point(aes(x = exp,
   y = voteshare, 
   color = as.factor(ldp),
   alpha = 0.5)) +
  ggtitle("Money and Vote Share in 2005 HR election: LDP vs Non-LDP(with labs)") +
  theme_bw() +
  labs(x = "Campaign money (yen)",
  y = "Vote Share (%)")

6.2 Theme

Elements {package}:function() What you can do
2. Theme theme_(), theme_*() Customize the outlook of a graph
  • You can customize the things which are nothing to do with a graph, such as font, background color, and its style

  • theme_*() has base_family argument
    → You can assign the type of fonts applied to all elements in a graph

  • If you want to use Japanese, you should add theme_bw(base_family = "HiraKakuProN-W3"

List of theme_()
Function What you can do
theme_grey() Set the backgroud color as grey
theme_bw() Set the backgroud color as white
theme_minimal() Use the minimum items
theme_classic() Set the classical style

Without using theme_()

plt_3 <- hr %>% 
  filter(year == 2005) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% 
  ggplot() +
  geom_point(aes(x = exp,
   y = voteshare, 
   color = as.factor(ldp),
   alpha = 0.5)) +
  ggtitle("Money and Vote Share in 2005 HR election: LDP vs Non-LDP(with labs)") +
  labs(x = "Campaign money (yen)",
  y = "Vote Share (%)")

plt_3

  • You see the backgroud color is grey (grey is default)

If you want to use theme_classic()…..

plt_3 + 
theme_classic()

List of theme_*() and their outlook

  • You can choose whatever you like

6.3 Coordinate System

Elements {package}:function() What you can do
3. Coordinate System coord_*() Customize the outlook of coordinate system

coord_() has the following variations

coord_flip() Exchange x-axis and y-axis
coord_cartesian() Enlarge and shrink axes
coord_fixed() Fix axes

6.3.1 coord_flip(): Exchange x-axis and y-axis

  • Let’s show math test scores by gender using boxplot

  • You use an geometric object, geom_box()

plt_4 <- df1  %>%
  ggplot() +
  geom_boxplot(aes(x = gender, 
y = math,
fill = gender),
 show.legend = FALSE) + 
  labs(x = "gender", y = "math score") +
  ggtitle("Without using coord_flip()") 

plt_4

  • Exchange x-axis and y-axis by using coord_flip()
plt_5 <- df1  %>%
  ggplot() +
  geom_boxplot(aes(x = gender, 
y = math,
fill = gender),
 show.legend = FALSE) + 
  labs(x = "gender", y = "math score") +
  ggtitle("Using coord_flip()") +
  coord_flip()  # Exchange axes  

plt_5

  • Using {patchwork}, let’s show the two graphs in one row
library(patchwork)
plt_4 + plt_5

6.3.2 coord_cartesian()

  • You use coord_cartesian() when you enlarge and shrink axes
hr <- read_csv("data/hr96-21.csv", na = ".")
  • Use the 2005 election data and name the data frame, hr2005
hr2005 <- hr %>% 
  filter(year == 2005)
  • Represent the relationship between exp and vote share using a curb line
plt_6 <- hr2005 %>% 
  ggplot() +
  aes(exp, voteshare) + 
  geom_point() + 
  geom_smooth(se=FALSE) + # Hide the 95 confidence level
  theme_bw()
  • Show the all range
plt_6

Using coord_cartesian(), you can limit the range of output

plt_6 + 
  coord_cartesian(xlim = c(5.0e+6, 2.6e+07))

Using xlim(), you can limit the range of output

plt_6 + 
  xlim(5.0e+6, 2.6e+07)
Warning: Removed 353 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 353 rows containing missing values or values outside the scale range
(`geom_point()`).

・We see the two warnings: “Removed 353 rows containing non-finite values (stat_smooth)” & “Removed 353 rows containing missing values (geom_point)”

Why we should use coord_cartesian() ・ If you use xlim(), then R treats the data out of the range as missing data (NA)
→ You see a different graph generated with the original data
→ If you just want to “enlarge” a certain part of a graph, you should use coord_cartesian() instead of xlim

6.3.3 coord_fixed()

  • You use coord_fixed() when you want to fix axes
  • Let’s draw a scatter plot with a fake data
ggplot() +
  geom_point(aes(x = 1:10,
   y = 1:10))

  • Using coord_fixed(), let’s assign the ratio of x:y = 1:2
ggplot() +
  geom_point(aes(x = 1:10,
   y = 1:10)) +
  coord_fixed(ratio = 2)

  • Originally, x and y are exactly the same
  • But, y looks larger than x
  • Let’s type coord_fixed(ratio = 1)
ggplot() +
  geom_point(aes(x = 1:10,
   y = 1:10)) +
  coord_fixed(ratio = 1)

6.4 統計処理

Elements {package}:function() What you can do
4. 統計処理 stat_*() Statistical Treatment
  • Let’s draw a bar chart of the number of candidates running in the 2021 lower house election in Japan using geom_bar()

geom_bar(stat = "count")

→ Map seito like aes(x = seito) and the number of party candidates and display it on y-axis
- You don’t have to map y
→ stat = "count" is applied in group
→ The number of party candidates is calculated  → Displayed in y-axis

hr %>% 
  filter(year == 2021) %>% 
  ggplot(aes(x = seito)) + 
  geom_bar(stat = "count") +
  theme_bw(base_family = "HiraKakuProN-W3", # You need to add this layer because party name is written in Japanese
   base_size = 12) +
  ggtitle("The number of candidates in the 2021HR election in Japan (by party")

  • Let’s make a double check about whether the number of candidates by party is correct in the 2021 HR election by using group_by()
hr %>% 
  filter(year == 2021) %>% 
  group_by(seito) %>% 
  summarize(N = n())
# A tibble: 11 × 2
   seito     N
   <chr> <int>
 1 れい     12
 2 公明      9
 3 共産    105
 4 国民     21
 5 無所     80
 6 社民      9
 7 立憲    214
 8 維新     94
 9 自民    277
10 諸派      9
11 N党     27
  • Let’s draw a bar chart of the total number of candidates winning elections (by party) in lower house election in Japan (1996-2017) using geom_bar()

geom_bar(stat = “identity”)

→ This argument enables us to calculate the total number of candidates’ previous wins (previous) by party

hr %>% 
  filter(year == 2021) %>% 
  ggplot(aes(x = seito, y = previous)) + 
  geom_bar(stat = "identity") +
  theme_bw(base_family = "HiraKakuProN-W3", # You need to add this layer because party name is written in Japanese
   base_size = 12) +
  ggtitle("The total number of candidates' previous wins in the loser house elections (1996-2017) in Japan (by party")

  • Let’s make a double check about whether the total number of candidates’ previous wins by party using group_by()
hr %>% 
  filter(year == 2021) %>% 
  group_by(seito) %>% 
  summarize(sum(previous))
# A tibble: 11 × 2
   seito `sum(previous)`
   <chr>           <dbl>
 1 れい                6
 2 公明               51
 3 共産               28
 4 国民               30
 5 無所               60
 6 社民                0
 7 立憲              481
 8 維新               50
 9 自民             1142
10 諸派                5
11 N党                0

6.5 Scale

  • scale_*()_ enables us to change the colors and shapes of dots
  • {ggplot2} has various geometric objects
  • You can draw various kinds of graphs by using these various geometric objects
  • For example, in drawing a scatter plot, data is interpreted as a dot
  • In drawing a line graph, data is interpreted as a line
  • In drawing a bar chart and histogram, data is interpreted as a face
  • The information these dot, line, and face have is called scale 
  • We use scale_*()_ when we need different colors or shapes which is different from the default of {ggplot2}
Elements {package}:function() What you can do
5. Scale scale_*() Customize glaph’s colors and shapes

List of Scale_*()

scale_x_continuous() Customize the scale of x-axiscontinuous variable
scale_y_discrete() Customize the scale of y-axisdiscrete variable
scale_color_manual() Customize the colors and shapes

The major reasons why you customize the scale of axes

  1. You want to change axes
  2. You want to change the label of axes

6.5.1 Changing axes of continuous variables

scale_*_continuous()

  • When a continuous variable is mapped
  • You add scale_*_continuous() as a new layer
  • * can be either x or y
  • Let’s draw a scatter plot using the 2005 HR election data:

x-axis ・・・ expenditure (exp)
y-axis ・・・ vote share (voteshare)

hr <- read_csv("data/hr96-21.csv", na = ".")
  • Draw a scatter plot and save it as plt_money_vs
plt_7 <- hr %>% 
  select(seito, exp, voteshare, year) %>% 
  filter(year == 2005) %>% 
  ggplot(aes(x = exp, 
y = voteshare, 
col = seito)) +
  geom_point(alpha = 0.5) +
  labs(x = "Campaign money (yen)", 
  y = "Vote share (%)",
  title = "Campaign money and vote share: 2005HR election in Japan") +
  geom_smooth(method = lm, se = FALSE) +
  theme_bw(base_family = "HiraKakuProN-W3")

plt_7

  • The minimum value of x-axis is 0, and the maximum value is 2.5e+07 (25 million yen).

  • The scale is divided by 5,000,000 yen, but it is not easy to see.

  • Let’s make it much easier to see.

  • We need to use coord_*() to change the scale of x-asis.

  • Let’s use a 0.5 million interval and put labels, like 0, 5, 10, 15, 20, 25 (million yen)

  • Since exp is a continuous variable, you use scale_x_continuous() to customize it.

plt_7 +
  scale_x_continuous(breaks = seq(0, 2.5e+07, by = 0.5e+07),
  labels = c("0", "5M", "10M", "15M", "20M", "25M"))

If you want to change the scale of y-axis…

  • You can use scale_*_continuous() and set limits as an argument
  • For instance, suppose you want to limit the range of y-axis between 0 and 60
    → Add scale_y_continuous(limits = c(0, 60))
plt_7 +
  scale_x_continuous(breaks = seq(0, 2.5e+07, by = 0.5e+07),
  labels = c("0", "5M", "10M", "15M", "20M", "25M")) +
  scale_y_continuous(limits = c(0, 60))

How to change the loacation of the scale label

  • You can move the the postion of the scale label to the right: scale_y_continuous(position = "right")
  • You can move the the location of the legend to the left: theme(legend.position = "left")
plt_7 +
  scale_x_continuous(breaks = seq(0, 2.5e+07, by = 0.5e+07),
  labels = c(0, 500, 1000, 1500, 2000, 2500)) +
  scale_y_continuous(limits = c(0, 60)) +
  scale_y_continuous(position = "right") +
  theme(legend.position = "left")

6.5.2 Customize the scale (discrete variables)

scale_*_discrete()

  • When you want to change the label of x-axis which is mapped with a discrete variable
    → You use scale_*_discrete()
  • For instance, you draw a bar chart of the number of candidates nominated by candidate status: challengers, incumbents, and former-incumbents in the HR elections (1996-2021).
plt_8 <- hr %>% 
  ggplot() +
  geom_bar(aes(x = status)) +
  labs(x = "Candidate's status", y = "The number of Candidates") + 
  ggtitle("The number of Candidates by status (1996-2021HR Elections in Japan)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

plt_8

  • You see the three values: 0, 1, and 2 on x-axis
  • If you want to customize it, use scale_x_discrete()
plt_8 +
  scale_x_continuous(breaks = c(0, 1, 2),
  labels = c("Challengers", "Incumbents", "Former-incumbents"))

6.5.3 Customize the color of dots

scale_color_manual()

plt_9 <- hr %>% 
  filter(year == 2012) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% 
  ggplot() +
  geom_point(mapping = aes(x = exp,    # mapping = は省略可
y = voteshare,
alpha = 0.5,
color = as.factor(ldp))) + 
  labs(x = "Campaign money (yen)", y = "Vote share (%)") +
  theme_bw(base_family = "HiraKakuProN-W3", 
   base_size = 12)

plt_9

  • Let’s change the color of dots by party affiliation from tomato and blue to different ones, such as deeppink and aquamarine.
plt_9 +
  scale_color_manual(values = c("1" = "deeppink", "0" = "aquamarine"))

6.6 Highlight

  • You need to install and load a package, {gghighlight}
Elements {package}:function() What you can do
7. Highlight gghighlight() Emphasizing a particular data
  • Let’s draw a scatter plot using the 2005 HR election data:

x-axis ・・・ expenditure (exp)
y-axis ・・・ vote share (voteshare)

plt_10 <- hr %>% 
  select(seito, exp, voteshare, year) %>% 
  filter(year == 2005) %>% 
  ggplot(aes(x = exp, 
y = voteshare, 
col = seito)) +
  geom_point(alpha = 0.5) +
  labs(x = "Campaign money", 
  y = "Vote share (%)",
  title = "Money and vote share in the 2005HR election: Highlighting LDP candididates") +
  geom_smooth(method = lm, se = FALSE) +
  theme_bw(base_family = "HiraKakuProN-W3")

plt_10

  • Using gghighlight(), you can highlight a particular data
  • Keep the other data as grey
  • Emphasize the data you want to highlight in conspicuous colors
library(gghighlight)
  • Let’s emphasize the LDP and DPJ candidates in colors while the other candidates in grey
plt_10 +
  gghighlight(seito == "自民" | seito == "民主")  

6.7 Identifying dots

{ggrepel}
  • You have the following two objectives to emphasize a particular data
  1. You want to show what is important in a graph
  2. You want to compare multiple graphs
  • You cannot do this only with {ggplot2}
    → You need to add additional package, {ggrepel}
Elements {package}:function() What you can do
6. Identifying dots {ggrepel} Identify what dots represent in scatter plot
Elements Function
ggrepel_text_repel() Adjust overlapping the values
ggrepel_label_repel() Adjust overlapping the values
  • geom_text_repel() enables us to display values without overlapping
  • Let’s put individual labels in the scatter plot we made with df1
library(ggrepel)

If you want to display every single dot,…

plt_11 <- df1 %>% 
  ggplot() +
  aes(math, 
 stat,
 color = gender) +
  geom_point() + 
  theme_bw()

plt_11 + geom_text_repel(aes(label = name)) 

  • You can only display a particular data and pur labels on them

Display only top 5 people in math score

plt_11 + 
  geom_text_repel(aes(label = name), 
    data = function(data) 
 dplyr::slice_max(data, 
 math,
 n = 5)
    )

Display only top 12 people in statistics score

plt_11 + 
  geom_text_repel(aes(label = name),  
    data = function(data) 
 dplyr::filter(data,   
 stat >= 12)
    )

Self-made function() ・Suppose you want to calculate \(3x^2 + 5\)
・You can calculte this on R like this:

3^2 * 1 + 5 # when x = 1
[1] 14
3^2 * 2 + 5 # when x = 2
[1] 23

・However, this way of calculation is extremely inefficient and very likely to face human errors.

A solution:

・ Make a Sel-made function() and use it.
→ You can repeat the similar calculation efficiently with fewer errors

  • Define \(3x^2 + 5\) as f1
f1 <- function(x){
  3^2 * x + 5
}
  • Calculate the value when x = 1
f1(x = 1)   # when x = 1
[1] 14
  • Calculate the value when x = 2
f1(x = 2)   # when x = 2
[1] 23
f1() The name of Function
x in function(x) Argument
R code within {} What f1() does
  • In R, a function is an object.
    → If you plug in a variable, the function operates
  • An argument x works as a variable within {}
  • When the function operates, the value plugged in is executed.
  • The R code within {} can be more than one line
  • The last line in the R code can be used for the outcome

An Example of Self-made function():

  • Read the HR election data (1996-2021) in Japan
hr <- read_csv("data/hr96-21.csv", na = ".")
  • Draw the 2005 HR election data and name it hr2005
hr2005 <- hr %>% 
  filter(year == 2005)
  • Check the variable names hr2005 contains
names(hr2005)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"
  • Make a self-made function to draw a box plot with a variable in hr2005 and name the function bp_hr2005
bp_hr2005 <- function(x){
  ggplot(hr2005) +
    aes(.data[[x]]) + 
    geom_boxplot() +
    coord_flip()
}

What does .data mean?.data means a special variable the {tidyverse} has.
.data functions as an alias for the data frame under search.
Example:
・If you are working on the data frame, hr2005
hr2005[["age"]] means .data[["age"]]
・This means that you select age from the data frame, hr2005.

  • Let’s try age on x
bp_hr2005("age")

If you want to produce lots of graphs….

  • By using for(), you can make your work more efficient.
  • For instance, if you want to simultaneously make box plots on age, voteshare and exp and display their results…..
bp_hr2005_1 <- function(x){
  ggplot(hr2005) +
    aes(x = .data[[x]],  
y = .data[["seito"]],
color = .data[["seito"]]) + 
    geom_boxplot() +
    theme_bw(base_family = "HiraKakuProN-W3") +
    theme(legend.position = "none") 
}

for (x in c("age", "voteshare", "exp")){
  print(
    bp_hr2005_1(x) 
  )
}

Combination of highlight() and ggrepel_text_repel()

  • You can simultaneously highlight and put labels a particular person or party
  • For example, suppose you want to draw a box plot of campaign money and vote share highlighting LDP and DPJ candidates.
# 散布図の作成
plt_13 <- hr2005 %>% 
  ggplot() +
  aes(x = exp, 
y = seito,
color = seito) +  
  geom_boxplot() +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "none") + 
  labs(x = "Campaign money (yen)",
  y = "Party",
  title = "Distribution of Campaign Money by party: 2005HR Election in Japan")


plt_13 +
  gghighlight(seito == "自民" | seito == "民主",
label_key = seito, 
unhighlighted_params = list(
  color = "grey50") 
) 

6.8 Facet

Elements {package}:function() What you can do
8. Facet facet_*() Making multiple graphs
  • You can make multiple graphs by levels of variable by using facet()
  • Let’s make multiple scatter plots on campaign money and vote share by party in the 2005 HR election in Japan
    ・ Read hr96-21.csv
hr <- read_csv("data/hr96-21.csv", na = ".")
plt_14 <- hr %>% 
  select(seito, exp, voteshare, year) %>% 
  filter(year == 2005) %>%  # Select the 2005 HR election data 
  ggplot(aes(x = exp, 
y = voteshare, 
col = seito)) +
  geom_point(alpha = 0.5) +
  labs(x = "Campaign money", y = "vote share (%)") +
  geom_smooth(method = lm) +
  theme_bw(base_family = "HiraKakuProN-W3") +
  ggtitle("Campaign money and vote share by party:2005 HR election in Japan") 

plt_14

  • Adding facet_wrap(~seito), you get the following multiple scatter plots by party
plt_14 +
  facet_wrap(~seito) # Divide a scatter plot into several plots by party

  • Within facet(), you can set the variable by which you change colors, the number of rows, and the number of colums
plt_14 +
  facet_wrap(
    vars(seito), # set the variable 
    nrow = 2,  # set the number of rows
    ncol = 5  # set the number of colums
  ) 

6.9 Patchwork

Elements {package}:function() What you can do
9. Patchwork {patchwork} Arranging multiple graphs
  • {patchwork} enables us to arrange different figures in one graph by using an operator (\(|\) and \(/\))
  • You can also compare these multiple graphs
Operator What you can do
| Arrange in a row
/ Arrange in a column
  • For instance, you make multiple scatter plots with different colors
    → You see whether there is an interaction between variables and how strong they are

What we want to do here:

Check the relationship between campaign money and vote share by 3 variables in the 2005 HR election

  1. LDP vs. non-LDP candidates
  2. Gender
  3. Candidate Status (Challenger, Incumbent, Non-incumbent)
  • Define the function to get this done
  • Name the function as scatter_hr() (whatever name you like)
  • Using patchwork::wrap_plots(nrow = 1), merge multiple results into one list to draw a graph
library(patchwork)
  • Read the HR election data
hr <- read_csv("data/hr96-21.csv", na = ".")
  • Draw the 2005 HR election data from hr, make LDP dummy (= 1 if a candidate is “自民”, 0 otherwise), and name the data frame as hr2005
hr2005 <- hr %>% 
  filter(year == 2005) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0))  # "自民" means "LDP" in Japanese  
  • Change class of the three variables (status, gender, ldp) to factor
hr2005$status <- as.factor(hr2005$status)
hr2005$gender <- as.factor(hr2005$gender)
hr2005$ldp <- as.factor(hr2005$ldp)
# This is the function to draw a scatter plot of campaign money and vote share in the 2005 HR election  

scatter_hr <- function(color = "seito"){ # Define scatter_hr()
  ggplot(hr2005) +
    aes(
 exp, voteshare,
 color = .data[[color]]
    ) +
    geom_point(size = 1, alpha = 0.5) +
    theme_bw(base_family = "HiraKakuProN-W3") + 
    labs(x = "Campaign money (yen)",
 y = "Vote share (%)",
 title = "2005 HR election in Japan") +
    theme(legend.position = "bottom")
}

hr2005 %>% 
  select(where(is.factor)) %>%
  names() %>%
  purrr::map(scatter_hr) %>%  
  patchwork::wrap_plots(nrow = 1) 

6.10 Interactive graph

Elements {package}:function() What you can do
10. Interactive graph {plotly} Making an interactive graph
  • plotly::ggplotly() enables us to explore the relationship used in a scatter plot made with {ggplot2}

・Read the HR election data

hr <- read_csv("data/hr96-21.csv", na = ".")
  • Draw the 2005 HR election data from hr and choose only winners and name it as hr2005
hr2005 <- hr %>% 
  filter(year == 2005) %>% 
  filter(wl > 0)
  • Make an interactive graph: plt_15
plt_15 <- ggplot(hr2005) +
  aes(x = exp,
 y = voteshare,
 color = seito) +
  geom_point(aes(alpha = 0.5)) +
  theme_bw(base_family = "HiraKakuProN-W3") 

plotly::ggplotly(plt_15)
  • You can see the 8 party names in Japanese on the right side legend
  • If you click one of them, the dots corresponding to the party stops displaying on the scatter plot
  • Click 6 party names except LDP (“自民”) and DPJ (“民主”)

  • If you click the Camera icon below, then you can capture the graph you see and save it as .png file

  • If you want to save the scatter plot which only displays LDP and DPJ candidates, you can display it without using {plotly} as follow:
plt_15 +
  gghighlight::gghighlight(seito == "民主" | seito == "自民")

6.11 Save a graph

  • If you use {plotly}, you can capture a graph and save it.
  • When you want save it with high a resolution, then use ggsave()
  • Make a new folder in your R Project, and name it fig (whatever name you like)
  • Save a graph you made in fig folder
ggsave("fig/plt_15.png", # Name the file you want to save  
  plt_15,                # Choose an object you made on RMarkdown  
  width = 10,            # Set the width of the object
  height = 10,          # Set the height of the object
  units = "cm")          # Set the unit of the size 
                             # "in" or "cm

7. Variation in R code

  • You can use various ways of R code
  • For instance, suppose you want to use the following R code:
ggplot(data = hr, mapping = aes(x = exp, y = voteshare))
  • There are the other 9 ways of writing your R code:
1. ggplot(hr, aes(x = exp, y = voteshare))
2. ggplot(hr, aes(exp, voteshare))
3. ggplot(hr) + aes(x = exp, y = voteshare))
4. ggplot(hr) + aes(exp, voteshare))
R code with pipe %>%
5. hr %>% ggplot(mapping = aes(x = exp, y = voteshare))
6. hr %>% ggplot(aes(x = exp, y = voteshare))
7. hr %>% ggplot(aes(exp, voteshare))
8. hr %>% ggplot() + aes(x = exp, y = voteshare))
9. hr %>% ggplot() + aes(exp, voteshare))

8. Useful Websites on {ggplot2}

from Data to vis
  • This web site shows how to visualize data not only on R but Python and D3.js
TidyTuesday
  • Every week you see an updated quize on how to visalize data.
  • Participants upload their R code in Twitter

10. Exercise

Q10.1: Reffering to 6.7 Identifying dots: Combination of highlight() and ggrepel_text_repel(), draw a boxplot of campaign money (exp) in the 2009 HR election in Japan.

  • In drawing your graph, see to highlight the LDP (“自民”) and the DPJ (“民主”) candidates

  • Use hr96-21.csv in your analysis.   

Q10.2: Reffering to 6.6 Highlight, draw a scatter plot of vote share (voteshare) in the 2009 HR election in Japan.

  • In drawing your graph, see to highlight the LDP (“自民”) and the DPJ (“民主”) candidates
  • Use hr96-21.csv in your analysis.  

Q10.3: Reffering to 6.8 Facet, draw a scatter plot of campaign money (exp) and Vote share (voteshare) for the LDP (“自民”) and the DPJ (“民主”) in the 2009 HR election in Japan.

  • In drawing your graph, use facet_*() and show the two regression lines for the LDP and the DPJ.

Q10.4: Reffering to 6.7 Identifying dots: Self-made function(), simultaneously draw multiple boxplots of age, voteshare, and exp for each party.

Q10.5: Reffering to 6.9 Patchwork, show how the relationship between campaign money (exp) and vote share (voteshare) changes in the 2009 HR election by the DPJ vs. non-DPJ, gender, and candidate status (status)

  • In drawing your scatter plots, use patchwork::wrap_plots().

Reference

  • 宋財泫 (Jaehyun Song)・矢内勇生 (statuki statanai)「私たちのR: ベストプラクティスの探究」
  • 宋財泫「可視化理論」
  • 土井翔平(北海道大学公共政策大学院)「Rで計量政治学入門」
  • 矢内勇生(高知工科大学)授業一覧
  • 浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年
  • 浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年
  • Winston Chang, R Graphics Coo %>% kbook, O’Reilly Media, 2012.
  • Kieran Healy, DATA VISUALIZATION, Princeton, 2019
  • Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017