R pacaghes needed in this section

library(gghighlight)
library(ggplot2) # {tidyverse} includes {ggplot2}
library(ggrepel)
library(magrittr)# {tidyverse} includes {magrittr}
library(patchwork)
library(plotly)
library(reactable)

1. Drawing a graph on R

The most basic syntax of R is called “base R.”
The {ggplot2} is a package included in a packaged called the {tidyverse}.
Two ways of drawing a graph on R

1. Base R	The most basic syntax of `R`
2. ggplot2	Draw a graph by adding layers

Examples of drawing a graph on base R and {ggplot2}

You can draw a graph either on base R or on {ggplot2}
But, drawring a graph on{ggplot2} is way convenient

2. What is `{ggplot2}`?

An R package made by Hadley Wickham for data visualization
ggplot: the grammer of graphics
Hadley Wickham distributes {ggplot2} for free

2.1 Two core features on `{ggplot2}`

1. {ggplot2} implements a graphical grammar proposed by Wilkinson, L. (2005)

2. {ggplot2} has multiple functions corresponding to structural elements of a graph, which work like a layer

2.2 Advantages of using `{ggplot2}`

R code is easier to read and neat
Easy to change a graphical outlook
Reproducibility of a graph is guranteed
You can write R code based on graphical grammar
Graphical components works as layers
Superior as a tool for an academic explorataion
Abundant R pachages
Abndant {ggplot2} communities

2.3 Preparing `{ggplot2}`

You need to install and read {tidyverse} to start using {ggplot2} on Rmd file
The {tidyverse} is a family of packages, including {ggplott2}, {dplyr}, {tidyr}, {readr}, {purrr}, {tibble} and a few others which are useful for data manipuration and data visualization
The {tidyverse} is in continuing development, meaning that some of the fuctions are subject to change though many of the core functions are fairly unchaged.
If you encounter difficulties getting your R code to run, it may be because of the {tidyverse} has changed.
If this is the case, then you can look up documentation about the package to learn how to update your code.

Since {tidyverse} contains {ggplot2}, all you have to do is to load {tidyverse}
To use {tidyverse}, you need to take the following 2 steps:

1. install `{tidyverse}`

You type install.packages(“tidyverse”) in Console
→ Hit the Return key

install.packages("tidyverse")

You need to do this only once
install.packages("tidyverse") will download the required package materials to your computer.

2. Load `{tidyverse}`

After you have successfully download {tidyverse}, you load it for your current Rmd file with the library(tidyverse)
You type library(tidyverse) in a chunk
→　Click knit

library(tidyverse)

Although you do not need to install {tidyverse} at the start of every R session (or R script), you do need to load {tidyverse} every time.
If you receive an error that a certain function is not found, it may mean that you forget to run a library(tidyverse) command for {tidyverse} that contains that function, such as ggplot() or readr(), etc.

3. How does {ggplot2} work?

I will introduce how {ggplot2} works here
As an example, I will draw a scatterplot using {ggplot2} and how to customize it
You need the following three minimum components to draw a scatterplot using {ggplot2}

3 minimum components to draw a scatterplot using {ggplot2}

Item	`{package}:function()`	What the item does
1. Assign the data	`ggplot(data = )`	Select the data
2. Assign the variables	`aes()`	Select the variables, such as `x`, `y`
3. Assgin a graph	`geom_*()`	Select the type of graph

1. Select the data

In drawing a graph using {ggplt2}, you need to use a tidy data which is easy to read for a computer, but not for us
For details on tidy data, see 1. Data Handling(Advance)

2. Assign the variables

Selecting variables for x axis and y axis is called “mapping”
“Mapping” means corresponding dots, lines and surface to data and variables
“Mapping” means assigning how data is visualized
In mapping, aes() function is used
aes() means aesthetics
You can assign multiple arguments within the parenthesis, ()

3. Assign a graph

You can choose a most appropriate graph for your data by selecting geometric objects, shown as geom_*()
The following four geometric objects are the most frequently used

`geom_*()` 　

Type of variables	Graph	Geometric Object
Discrete	Bar chart	`geom_bar()`
Continuous	Histogram	`geom_histogram()`
Continuous	Boxplot	`geom_box()`
Continuous	Scatter plot	`geom_point()`
Continuous	Line graph	`geom_line()`
Continuous	Dot plot	`geom_point()`

In this section, we use geom_point() to draw scatter plot

Image of how {`ggplot2`} complete a graph

{ggplot2} takes a graphical element as a layer
Geometric objects, such as ggplot(), geom_point(), scale() generate the layers
Each individual geometric object is added with an operator, +
First, you assign three essential items:ggplot(), aes(), geom_point()
Depending on what kind of graph you want to draw, you can customize colors, shape, and size of your graph using scale function - You can customize x axis and y axis using coord function
Let me show you the image of how {ggplot2} complete a graph

The image of how {ggplot2} complete a graph

・Core items to draw a graph
・Layers customizing outlook of a graph
・Layers customizing outlook of a graph

You don’t have to memorize entire R code
It would be good for you to understand the mechanism how {ggplot2} works, and write your R code googling on the web
For details on how to write a graph using {ggplot2}, see 「5. ggplot2 (Barchart) 〜 ggplot2 (Dot plot)」

Let’s draw a scatter plot using a fake data (`df1`)

Suppose you want to use data frame (df1)
Set math score (math) as x axis
Set statistics score (stat) as y axis
Change colors of the dot by gender

Two ways of assigning `aes()` for entire graph:

	Range of assignment
(1) Assign `aes()` as an independent layer	Whole graph
(2) Assign mapping argument within `ggplot()`	Whole graph

(1) Assign mapping argument, `aes()`, as an independent layer

library(ggplot2)

ggplot(data = df1) +  
  aes(x = math,   
  y = stat,　　　　   
  color = gender) +   
  geom_point()

(2) Assign mapping argument, `aes()`, within `ggplot()`

library(ggplot2)

ggplot(data = df1,   
   mapping = aes(x = math, 
y = stat,　  
color = gender)) +  
  geom_point()

You get the same results from (2) and (1)

How to assign aes()）・If you type the following R code, you can get a simple scatter plot
・You can get the same scatter plot in three ways:

(1) Assign mapping argument, `aes()`, as an independent layer 　

ggplot(df1) +　　　　　 
  aes(x = math,  # aes() is an independent layer
  y = stat,　　　 　  
  color = gender) +  
  geom_point()

(2) Assign mapping argument, `aes()`, within `ggplot()`

ggplot(df1,
   mapping = aes(x = math, # mapping = omissible  
y = stat,
color = gender)) +
  geom_point()

3. Assign mapping argument, `aes()`, within `geom_point()`

ggplot(df1) +
  geom_point(mapping = aes(x = math, # mapping = は省略できる
  y = stat,
  color = gender))

You can draw a simple scatter plot in these three ways
But, you need to make it a better one by customizing it
You can make customize it by adding another layer

10 most commonly used layers:

Elements	`{package}:function()`	What you can do
1. Labels	`labs*()`	Customize the labels on legend and axis
2. Theme	`theme_()`, `theme_*()`	Customize the outlook of a graph
3. Coordinate System	`coord_*()`	Customize the outlook of coordinate system
4. Statistics	`stat_*()`	Customize statistical stuff
5. Scale	`scale_*()_`	Customize glaph’s colors and shapes
6. Hilight	`{gghighlight}`	Emphasizing a particular data
7. Identifying dots	`{ggrepel}`	Identify what dots represent in scatter plot
8. Facet	`facet_*()`	Making multiple graphs
9. Patchwork	`{patchwork}`	Arranging multiple graphs
10. Interactive graph	`{plotly}`	Making an interactive graph

Note:

Regarding 3. Coordinate System, {ggplot2} automatically optimizes your setting as default
You need to install {gghighlight} to use 7. Hilight

4. Let’s draw a scatter plot using `{ggplot2}`

4.1 A simple scatter plot

x axis・・・ math score
y axis・・・ stat score
Each dot colored by gender

Step 1: Prepare data

Make four fake variables (name, math, stat, gender) and merge them into data frame, named df1

# Make 4 variables  

name <- c("Joe", "Ze'ev", "David", "Mike", "Ross", "Woojin", "Inha", "Jih-wen", 
  "Mark", "Dennis", "Carol", "Shira",  "Mimi", "Amital", "Rachel", "Ariel", 
  "Kelly", "RongRong", "Kathy", "Barbara")
math <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 4, 5, 6, 7, 9, 8, 9, 8, 10)
stat <- c(2, 4, 6, 5, 7, 9, 7, 10, 12, 15, 14, 13, 12, 13, 11, 10, 9, 8, 6, 4)
gender <- c("Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female")

# Merge the 4 variables into data frame, df1   

df1 <- tibble(name, math, stat, gender)

Let’s check how df1 looks like

df1

# A tibble: 20 × 4
   name      math  stat gender
   <chr>    <dbl> <dbl> <chr> 
 1 Joe          1     2 Male  
 2 Ze'ev        2     4 Male  
 3 David        3     6 Male  
 4 Mike         4     5 Male  
 5 Ross         5     7 Male  
 6 Woojin       6     9 Male  
 7 Inha         7     7 Male  
 8 Jih-wen      8    10 Male  
 9 Mark         9    12 Male  
10 Dennis      10    15 Male  
11 Carol        2    14 Female
12 Shira        4    13 Female
13 Mimi         5    12 Female
14 Amital       6    13 Female
15 Rachel       7    11 Female
16 Ariel        9    10 Female
17 Kelly        8     9 Female
18 RongRong     9     8 Female
19 Kathy        8     6 Female
20 Barbara     10     4 Female

Looks good!

Assign data: `ggplot()`

You need to do mapping, meaning assigning data within aes()
ggplot() has two arguments: data and mapping
First, let’s assign df1 as data to use
Then, we can see the canvas where we draw a graph

ggplot(data = df1)

We can see a blank canvas

Step 2: Assign the variables

You can add a new layer (in this case, aes()) by using +
What yo do here is to link aesthetic element to data
math is linked to x axis
stat is linked to y axis
→　aes(x = math, y = stat)

ggplot(data = df1) +
  aes(x = math, y = stat)

Now, you see the following canvas with two labels on x axis and y axis

Step 3: Select a graph

You choose a geometric object, geom_point() to draw a scatter plot
You can add geom_point() by using +

ggplot(data = df1) +
  aes(x = math, y = stat) +
  geom_point()

aes(x = math, y = stat) is an independent layer
But, you can put aes(x = math, y = stat) within geom_point()
This means that you map aes() within geom_point()
Within geom_point(), you need to map by tying mapping =
Within aes(), you need to link each element (such as a dot, line, or surface) to a particular variable
→ Here, we link x axis is linked to math, y axis is linked to stat, and each dots are colored by gender

ggplot(data = df1) +
  geom_point(mapping = aes(x = math, 
   y = stat, 
   color = gender))

Both data = and mapping = are the first argument and they are omissible
→　You can rewrite the R code above as follows:

ggplot(df1) +
  geom_point(aes(x = math, 
 y = stat, 
 color = gender))

Further, by using the pipe operator (%>% or |>), you can rewrite it as follows:
You can use the pipe operator (%>% or |>) to link command together.
This tell R to do something and then something else to the output of the first something.
Chaining functions together like this will become very useful as your tasks become more complicated.

df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
                 y = stat, 
                 color = gender))

Mapping `aes()` and omission of R code

Summary on mapping aes() ・In Step 2: Assign the variables, aes(x = math, y = stat) is added by + after ggplot(data = df1)

ggplot(data = df1) +
  aes(x = math, y = stat) +
  geom_point()

aes(x = math, y = stat) can not only be added within ggplot()、but also be added within geom_point()

In sum, you have two ways of mapping aes():

Mapping `aes()` within `ggplot()`

df1 %>% 
  ggplot(aes(x = math, 
 y = stat, 
 color = gender) +
  geom_point()

Mapping `aes()` within `geom_plot()`

df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
 y = stat, 
 color = gender))

Which way you choose all depends on what you want to do in your analysis

How pipes (%>% or |>) are used ・The pipes (%>% or |>) allow you to express a sequence of multiple operations
・%>% and |> can be used interchangeably.
・Pipes can greatly simplify your code and make your operations more intuitive
・The pipe operator (%>%) is automatically imported as part of the {tidyverse} library
・Pipes (%>%) are included in {magrittr} package
・{magrittr} package is included in {tidyverse} package
→　You need to read either of the following packages to use the pipe operator (%>%)

library(magrittr)  
library(tidyverse)

The pipe operator (%>%) automatically passes the output from the first line into the next line as the input

You can use the pipe operator (%>%) to link command together.
This tell R to do something and then something else to the output of the first something.
Chaining functions together like this will become very useful as your tasks become more complicated.

Let’s take a look at an exmple of using the pipe

Generate vectors from 1 to 10

1:10

 [1]  1  2  3  4  5  6  7  8  9 10

Calculate 1 + 2 + .... + 10

sum(1:10)

[1] 55

If you use pipe (%>%), you write R code as follows:

1:10 %>%  # Generate vectors from 1 to 10 
  sum()  # Add them all

[1] 55

If you want to calculate the square root, …

1:10 %>%# Generate vectors from 1 to 10 
  sum() %>% # Add them all  
  sqrt()   # Calculate the square root

[1] 7.416198

Generate vectors from 1 to 10 → Add them all → Calculate the square root
This is easier to interpret the sequence of operations

If you don’t use pipes,…

sqrt(sum(1:10))

[1] 7.416198

This is less intuitive because you have to think backward
Calculate the square roof ← Add them all ← Generate vectors

How to interpret R code with pipes (%>%) ・You can interpret the R code you made in 4.1 A simple scatter plot as follows:

df1 %>%　　  　　 # Use df1 as data
  ggplot(aes(x = math,   # Assign x = math 
 y = stat,   # Assign y = stat
 color = gender)) + # Dots are colored by gender
  geom_point()  # Draw a scatter plot

・df1 %>% ggplot() means the first argument of ggplot() is df1 Interpretation of the R code:
・Use df1 as data
→ Assign x = math
→ Assign y = stat
→ Dots are colored by gender
→ Draw a scatter plot
・You don’t have to go backward in interpreting the R code
・R code with pipes (%>%) are intuitive and easy to follow

5. Customizing a graph (basic)

Here, we are customizing a simple scatter plot we made above by assigning the following 9 most commonly used aesthetic elements:

Elements	function
① Dot color	`aes(color = ...)`
② Dot shape	`aes(shape = ...)`
③ Dot size	`size = ...`
④ Background color	`theme_bw()`
⑤ Label the axis	`labs(x = "...", y = "...")`
⑥ Main title	`ggtitle(".....)`
⑦ Show Japanese in a graph	`theme_bw(base_family = "HiraKakuProN-W3")`
⑧ Legend Location	`theme(legend.position = "bottom"`)
⑨ Show a regression line	`geom_smooth(stat = "lm")`

5.1 Dot color, shape, size, and background

Let’s customize ①, ②, ③, and ④
Note that since size and shape are applied for every point, you assign them out of aes()

df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
   y = stat, 
   color = gender),
size = 3, 
shape = 17) +
  theme_bw() # Change the backgroud color to black-and-white

You can select dot shape from the following list:

Depending on the location of argument of aes(), dot color changes

Dot color changes depending on the location of argument of `aes()`

(1) `color` mapped within `aes()`	Dots are colored depending on the value of the variable
(2) color mapped out of `aes()`	: Applied to all dots

(1) When `color = gender` is mapped within `aes()`

plt_1 <- df1 %>% 
  ggplot() +
  geom_point(aes(x = math,
   y = stat, 
   color = gender)) + 　　# mapped within aes()
  ggtitle("mapped within aes()") +
  theme_bw(base_family = "HiraKakuProN-W3")
plt_1

(2) When `color = "magenta"` is mapped out of `aes()` : Applied to all dots

plt_2  <- df1 %>% 
  ggplot() +
  geom_point(aes(x = math,
   y = stat), 
   color = "magenta") + 　　 # mapped out of aes()
  ggtitle("mapped out of aes()") +
  theme_bw(base_family = "HiraKakuProN-W3")
plt_2

library(patchwork)

plt_1 + plt_2

5.2 Label the axis

Customize ⑤
If you want to put a labe on x-axis and y-axis, you use labs()
labs(x = "...", y = "...")

df1 %>% 
  ggplot() +  
  geom_point(mapping = aes(x = math, 
y = stat,
color = gender)) + 
  labs(x = "test score(math)", y = "test score(stat)")

You see the garbled characters on the labels of x-axis and y-axis
→　For the solution of the garbled characters, see Step 8

5.3 Main title

Customize ⑥
If you want to put main title, use ggtitle("...")

df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,# mapping = は省略可
y = stat,
color = gender)) +
  labs(x = "test score(math)", y = "test score(stat)") +
  ggtitle("Scatter plot of the mathematics and statistics exam")

You still see the garbled characters on the labels of x-axis and y-axis
→　For the solution of the garbled characters, see Step 8

5.4 Show Japanese in a graph

Customize ⑦
You can fix the garbled characters on RMarkdown, you should set the following:

How to fix the garbled characters on `RMarkdown`

1. In RStudio menu, Tools → Global Options → Graphics

2. Choose AGG in Backend → Click Apply

:::

Setting to use Japanese in a graph using `theme()`

If you want to use Japanese in variable name or main title, you add the following R code after +
theme_bw(base_family = "HiraKakuProN-W3")
You can change the backgroud color to black and white by typing theme_bw
You can also change the font size with base_size = ...

df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,# mapping = は省略可
y = stat,
color = gender)) + 
  labs(x = "test score (math)", y = "test score (stat)") +
  ggtitle("Scatter plot of the mathematics and statistics exam.") +
  theme_bw(base_family = "HiraKakuProN-W3",　
   base_size = 12)

You see that the garbled characters are normalized displaying Japanese on x-axis and y-axis

2 ways of adding main title to a graph 1. ggtitle(".....")
2. labs(title = "....")

・You get the same result if you use the following R code:

df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,# mapping = は省略可
y = stat,
color = gender)) + 
  labs(x = "test score (math)", 
  y = "test score (stat)",
  title = "Scatter plot of the mathematics and statistics exam.") +
  theme_bw(base_family = "HiraKakuProN-W3",　
   base_size = 12)

5.5 Legend Location

Customize ⑧
You can customize the location of legend adding the following R code: theme(legend.position ="")
You have the following 5 options:
・"none"
・"top"
・"left"
・"right"
・"bottom"
Here, lets put the legend at the bottom: theme(legend.position = "bottom")

df1 %>% 
  ggplot() +
  geom_point(mapping = aes(x = math,
y = stat,
color = gender)) + 
  labs(x = "test score (math)", y = "test score (stat)") +
  theme_bw(base_family = "HiraKakuProN-W3",　
   base_size = 12) +　　　　　　　　　 
  theme(legend.position = "bottom") # put the legend at the bottom

For details on the locatio of legend using {ggplot2}, see this

5.6 Show a regression line

Customize ⑨
geom_stat(stat = "lm")

If you want to use the same color for dots and a line

You need to map within aes()
Add geom_smooth(method = lm), then you draw a line for your linear model
Add geom_smooth() following an operator +

df1  %>% 
  ggplot() +
  geom_point(aes(x = math,
   y = stat, 
   color = gender)) +
  geom_smooth(method = lm) +　　　# Draw a regression line  
  ggtitle("Scatter plot of the mathematics and statistics exam") +
  theme_bw(base_family = "HiraKakuProN-W3")

In this case, you face the following error message:

This message says x and y are missing, so you cannot draw a regression line
Why this happen?：Two aesthetic elements (x and y) are only linked WITHIN geom_point()
Solution：You need to link two aesthetic elements (x and y) within ggplot() (which is a higher layer than geom_point())

df1  %>% 
  ggplot(aes(x = math,   # Map x and y within ggplot() 
y = stat, 
color = gender)) +
  geom_point() +
  geom_smooth(method = lm) +
  ggtitle("Scatter plot of the hight and weight") +
  theme_bw(base_family = "HiraKakuProN-W3")

If you want to use the dots in different colors, but the color of the lines is the same

Here, let’s draw lines in black

df1  %>% 
  ggplot(aes(x = math,   # ggplot() の内部で x と y を指定
y = stat)) +
  geom_point(aes(color = gender)) +
  geom_smooth(aes(group = gender),　
method = lm,
color = "black") +
  ggtitle("Scatter plot of the mathematics and statistics exam") +
  theme_bw(base_family = "HiraKakuProN-W3")

Color variations available for {ggplot2} ・As many as 657 colors ara available for {ggplot2} you can use
・You can choose and assign colors, like "deeppink", "skyblue", "royalblue" OUT OF aes()

df1 %>% 
  ggplot() +
  geom_point(aes(x = math, 
   y = stat),
size = 3,  
color = "deeppink", # Assign colors out of aes() 
shape = 8) +
  theme_bw()

・You can check how the name of the colors you can use for {ggplot2} by typing colors() in Console
・Here, I will show you an example:

head(colors())

[1] "white"         "aliceblue"     "antiquewhite"  "antiquewhite1"
[5] "antiquewhite2" "antiquewhite3"

・If you use HEmath Code, you can use as many as 16,777,216 colors!!!
・For instance, if you assign red, you type "#FF0000"
・The followings are a part of entire colors available in HEmath Code

Source:https://blog.albertkuo.me/post/point-shape-options-in-ggplot/

5. Customizing a graph (advance)

Here I introduce how to customize a graph using 10 most commonly used layers

10 most commonly used layers:

Elements	`{package}:function()`	What you can do
1. Labels	`labs*()`	Customize the labels on legend and axis
2. Theme	`theme_()`, `theme_*()`	Customize the outlook of a graph
3. Coordinate System	`coord_*()`	Customize the outlook of coordinate system
4. Statistics	`stat_*()`	Customize statistical stuff
5. Scale	`scale_*()_`	Customize glaph’s colors and shapes
6. Identifying dots	`{ggrepel}`	Identify what dots represent in scatter plot
7. Hilight	`{gghighlight}`	Emphasizing a particular data
8. Facet	`facet_*()`	Making multiple graphs
9. Patchwork	`{patchwork}`	Arranging multiple graphs
10. Interactive graph	`{plotly}`	Making an interactive graph

Data preparation

Download Japanese lower house election data (1996-2021): hr96-21.csv
In your RProject folder, make a new folder, named it data, and put hr96-21.csv in it

hr <- read_csv("data/hr96-21.csv", na = ".")

hr96_21.csv is a collection of Japanese lower house election data covering 9 national elections (1996, 2000, 2003, 2005, 2009, 2012, 2014, 2017, 2021）
Check the name of variables hr contains

names(hr)

 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"

hr has the following 23 variables

variable	detail
year	Election year (1996-2017)
pref	Prefecture
ku	Electoral district name
kun	Number of electoral district
rank	Ascending order of votes
wl	0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
nocand	Number of candidates in each district
seito	Candidate’s affiliated party (in Japanese)
j_name	Candidate’s name (Japanese)
name	Candidate’s name (English)
previous	Previous wins
gender	Candidate’s gender:“male”, “female”
age	Candidate’s age
exp	Election expenditure (yen) spent by each candidate
status	0 = challenger / 1 = incumbent / 2 = former incumbent
vote	votes each candidate garnered
voteshare	Vote share (%)
eligible	Eligible voters in each district
turnout	Turnout in each district (%)
castvote	Total votes cast in each district
seshu_dummy	0 = Not-hereditary candidates, 1 = hereditary candidate
jiban_seshu	Relationship between candidate and his predecessor
nojiban_seshu	Relationship between candidate and his predecessor

2 ways of checking the data downloaded:

(1) `DT::datatable()`

DT::datatable(hr)

(2) `reactable:reactable()`

Using reactable:reactable() enables us to search with multiple conditions

reactable::reactable(hr,
  filterable = TRUE,    # Set you can search
  defaultPageSize = 10) # Set the maximum number of rows

6.1 Labels

You need to put an appropriate variable name
It should be not too long and not too short
It should be short enough to correctly deliver what it means to you and others
You use labs() to put a variable name

Elements	`{package}:function()`	What you can do
1. ラベル	`labs*()`	Customize the labels on legend and axis

Let’s draw a scatter plot between exp and voteshare in the 2005 lower house election:
Suppose you are interested in the difference between the two major political parties in Japan on this scatter plot: LDP
LDP (Liberal Democratic Party): shown as 自民 in variable seito
・ x-axis: exp・・・Election expenditure (yen) spent by each candidate
・ y-axis: voteshare・・・Vote share (%)

The scatter plot without customizing labels

hr %>% 
  filter(year == 2005) %>% # Select 2005 election data only  
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% # Make a dummy variable for LDP: ldp = 1, 0 otherwise  
  ggplot() +
  geom_point(aes(x = exp,
   y = voteshare, 
   color = as.factor(ldp),
   alpha = 0.5)) +
  theme_bw()

The scatter plot with customizing labels

hr %>% 
  filter(year == 2005) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% 
  ggplot() +
  geom_point(aes(x = exp,
   y = voteshare, 
   color = as.factor(ldp),
   alpha = 0.5)) +
  ggtitle("Money and Vote Share in 2005 HR election: LDP vs Non-LDP(with labs)") +
  theme_bw() +
  labs(x = "Campaign money (yen)",
  y = "Vote Share (%)")

6.2 Theme

Elements	`{package}:function()`	What you can do
2. Theme	`theme_()`, `theme_*()`	Customize the outlook of a graph

You can customize the things which are nothing to do with a graph, such as font, background color, and its style
theme_*() has base_family argument
→　You can assign the type of fonts applied to all elements in a graph
If you want to use Japanese, you should add theme_bw(base_family = "HiraKakuProN-W3"

List of `theme_()`

Function	What you can do
`theme_grey()`	Set the backgroud color as grey
`theme_bw()`	Set the backgroud color as white
`theme_minimal()`	Use the minimum items
`theme_classic()`	Set the classical style

Without using `theme_()`

plt_3 <- hr %>% 
  filter(year == 2005) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% 
  ggplot() +
  geom_point(aes(x = exp,
   y = voteshare, 
   color = as.factor(ldp),
   alpha = 0.5)) +
  ggtitle("Money and Vote Share in 2005 HR election: LDP vs Non-LDP(with labs)") +
  labs(x = "Campaign money (yen)",
  y = "Vote Share (%)")

plt_3

You see the backgroud color is grey (grey is default)

If you want to use `theme_classic()`…..

plt_3 + 
theme_classic()

List of `theme_*()` and their outlook

You can choose whatever you like

6.3 Coordinate System

Elements	`{package}:function()`	What you can do
3. Coordinate System	`coord_*()`	Customize the outlook of coordinate system

`coord_()` has the following variations

`coord_flip()`	Exchange `x-axis` and `y-axis`
`coord_cartesian()`	Enlarge and shrink `axes`
`coord_fixed()`	Fix `axes`

6.3.1 `coord_flip()`: Exchange `x-axis` and `y-axis`

Let’s show math test scores by gender using boxplot
You use an geometric object, geom_box()

plt_4 <- df1  %>%
  ggplot() +
  geom_boxplot(aes(x = gender, 
y = math,
fill = gender),
 show.legend = FALSE) + 
  labs(x = "gender", y = "math score") +
  ggtitle("Without using coord_flip()") 

plt_4

Exchange x-axis and y-axis by using coord_flip()

plt_5 <- df1  %>%
  ggplot() +
  geom_boxplot(aes(x = gender, 
y = math,
fill = gender),
 show.legend = FALSE) + 
  labs(x = "gender", y = "math score") +
  ggtitle("Using coord_flip()") +
  coord_flip()  # Exchange axes  

plt_5

Using {patchwork}, let’s show the two graphs in one row

library(patchwork)

plt_4 + plt_5

6.3.2 `coord_cartesian()`

You use coord_cartesian() when you enlarge and shrink axes

hr <- read_csv("data/hr96-21.csv", na = ".")

Use the 2005 election data and name the data frame, hr2005

hr2005 <- hr %>% 
  filter(year == 2005)

Represent the relationship between exp and vote share using a curb line

plt_6 <- hr2005 %>% 
  ggplot() +
  aes(exp, voteshare) + 
  geom_point() + 
  geom_smooth(se=FALSE) + # Hide the 95 confidence level
  theme_bw()

Show the all range

plt_6

Using `coord_cartesian()`, you can limit the range of output

plt_6 + 
  coord_cartesian(xlim = c(5.0e+6, 2.6e+07))

Using `xlim()`, you can limit the range of output

plt_6 + 
  xlim(5.0e+6, 2.6e+07)

Warning: Removed 353 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 353 rows containing missing values or values outside the scale range
(`geom_point()`).

・We see the two warnings: “Removed 353 rows containing non-finite values (stat_smooth)” & “Removed 353 rows containing missing values (geom_point)”

Why we should use coord_cartesian() ・ If you use xlim(), then R treats the data out of the range as missing data (NA)
→　You see a different graph generated with the original data
→　If you just want to “enlarge” a certain part of a graph, you should use coord_cartesian() instead of xlim

6.3.3 `coord_fixed()`

You use coord_fixed() when you want to fix axes
Let’s draw a scatter plot with a fake data

ggplot() +
  geom_point(aes(x = 1:10,
   y = 1:10))

Using coord_fixed(), let’s assign the ratio of x:y = 1:2

ggplot() +
  geom_point(aes(x = 1:10,
   y = 1:10)) +
  coord_fixed(ratio = 2)

Originally, x and y are exactly the same
But, y looks larger than x
Let’s type coord_fixed(ratio = 1)

ggplot() +
  geom_point(aes(x = 1:10,
   y = 1:10)) +
  coord_fixed(ratio = 1)

6.4 統計処理

Elements	`{package}:function()`	What you can do
4. 統計処理	`stat_*()`	Statistical Treatment

Let’s draw a bar chart of the number of candidates running in the 2021 lower house election in Japan using geom_bar()

`geom_bar(stat = "count")`

→　Map seito like aes(x = seito) and the number of party candidates and display it on y-axis
- You don’t have to map y
→　stat = "count" is applied in group
→　The number of party candidates is calculated　 →　Displayed in y-axis

hr %>% 
  filter(year == 2021) %>% 
  ggplot(aes(x = seito)) + 
  geom_bar(stat = "count") +
  theme_bw(base_family = "HiraKakuProN-W3", # You need to add this layer because party name is written in Japanese
   base_size = 12) +
  ggtitle("The number of candidates in the 2021HR election in Japan (by party")

Let’s make a double check about whether the number of candidates by party is correct in the 2021 HR election by using group_by()

hr %>% 
  filter(year == 2021) %>% 
  group_by(seito) %>% 
  summarize(N = n())

# A tibble: 11 × 2
   seito     N
   <chr> <int>
 1 れい     12
 2 公明      9
 3 共産    105
 4 国民     21
 5 無所     80
 6 社民      9
 7 立憲    214
 8 維新     94
 9 自民    277
10 諸派      9
11 Ｎ党     27

Let’s draw a bar chart of the total number of candidates winning elections (by party) in lower house election in Japan (1996-2017) using geom_bar()

geom_bar(stat = “identity”)

→　This argument enables us to calculate the total number of candidates’ previous wins (previous) by party

hr %>% 
  filter(year == 2021) %>% 
  ggplot(aes(x = seito, y = previous)) + 
  geom_bar(stat = "identity") +
  theme_bw(base_family = "HiraKakuProN-W3", # You need to add this layer because party name is written in Japanese
   base_size = 12) +
  ggtitle("The total number of candidates' previous wins in the loser house elections (1996-2017) in Japan (by party")

Let’s make a double check about whether the total number of candidates’ previous wins by party using group_by()

hr %>% 
  filter(year == 2021) %>% 
  group_by(seito) %>% 
  summarize(sum(previous))

# A tibble: 11 × 2
   seito `sum(previous)`
   <chr>           <dbl>
 1 れい                6
 2 公明               51
 3 共産               28
 4 国民               30
 5 無所               60
 6 社民                0
 7 立憲              481
 8 維新               50
 9 自民             1142
10 諸派                5
11 Ｎ党                0

For details on bar chart, see5. ggplot2 (Barchart)

6.5 Scale

scale_*()_ enables us to change the colors and shapes of dots
{ggplot2} has various geometric objects
You can draw various kinds of graphs by using these various geometric objects
For example, in drawing a scatter plot, data is interpreted as a dot
In drawing a line graph, data is interpreted as a line
In drawing a bar chart and histogram, data is interpreted as a face
The information these dot, line, and face have is called scale
We use scale_*()_ when we need different colors or shapes which is different from the default of {ggplot2}

Elements	`{package}:function()`	What you can do
5. Scale	`scale_*()`	Customize glaph’s colors and shapes

List of `Scale_*()`

`scale_x_continuous()`	Customize the scale of `x-axis`（continuous variable）
`scale_y_discrete()`	Customize the scale of `y-axis`（discrete variable）
`scale_color_manual()`	Customize the colors and shapes

The major reasons why you customize the scale of axes

You want to change axes
You want to change the label of axes

6.5.1 Changing axes of continuous variables

`scale_*_continuous()`

When a continuous variable is mapped
You add scale_*_continuous() as a new layer
* can be either x or y
Let’s draw a scatter plot using the 2005 HR election data:

・x-axis ・・・ expenditure (exp)
・y-axis ・・・ vote share (voteshare)

hr <- read_csv("data/hr96-21.csv", na = ".")

Draw a scatter plot and save it as plt_money_vs

plt_7 <- hr %>% 
  select(seito, exp, voteshare, year) %>% 
  filter(year == 2005) %>% 
  ggplot(aes(x = exp, 
y = voteshare, 
col = seito)) +
  geom_point(alpha = 0.5) +
  labs(x = "Campaign money (yen)", 
  y = "Vote share (%)",
  title = "Campaign money and vote share: 2005HR election in Japan") +
  geom_smooth(method = lm, se = FALSE) +
  theme_bw(base_family = "HiraKakuProN-W3")

plt_7

The minimum value of x-axis is 0, and the maximum value is 2.5e+07 (25 million yen).
The scale is divided by 5,000,000 yen, but it is not easy to see.
Let’s make it much easier to see.
We need to use coord_*() to change the scale of x-asis.
Let’s use a 0.5 million interval and put labels, like 0, 5, 10, 15, 20, 25 (million yen)
Since exp is a continuous variable, you use scale_x_continuous() to customize it.

plt_7 +
  scale_x_continuous(breaks = seq(0, 2.5e+07, by = 0.5e+07),
  labels = c("0", "5M", "10M", "15M", "20M", "25M"))

If you want to change the scale of y-axis…

You can use scale_*_continuous() and set limits as an argument
For instance, suppose you want to limit the range of y-axis between 0 and 60
→ Add scale_y_continuous(limits = c(0, 60))

plt_7 +
  scale_x_continuous(breaks = seq(0, 2.5e+07, by = 0.5e+07),
  labels = c("0", "5M", "10M", "15M", "20M", "25M")) +
  scale_y_continuous(limits = c(0, 60))

How to change the loacation of the scale label

You can move the the postion of the scale label to the right: scale_y_continuous(position = "right")
You can move the the location of the legend to the left: theme(legend.position = "left")

plt_7 +
  scale_x_continuous(breaks = seq(0, 2.5e+07, by = 0.5e+07),
  labels = c(0, 500, 1000, 1500, 2000, 2500)) +
  scale_y_continuous(limits = c(0, 60)) +
  scale_y_continuous(position = "right") +
  theme(legend.position = "left")

6.5.2 Customize the scale (discrete variables)

`scale_*_discrete()`

When you want to change the label of x-axis which is mapped with a discrete variable
→　You use scale_*_discrete()
For instance, you draw a bar chart of the number of candidates nominated by candidate status: challengers, incumbents, and former-incumbents in the HR elections (1996-2021).

plt_8 <- hr %>% 
  ggplot() +
  geom_bar(aes(x = status)) +
  labs(x = "Candidate's status", y = "The number of Candidates") + 
  ggtitle("The number of Candidates by status (1996-2021HR Elections in Japan)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

plt_8

You see the three values: 0, 1, and 2 on x-axis
If you want to customize it, use scale_x_discrete()

plt_8 +
  scale_x_continuous(breaks = c(0, 1, 2),
  labels = c("Challengers", "Incumbents", "Former-incumbents"))

6.5.3 Customize the color of dots

`scale_color_manual()`

plt_9 <- hr %>% 
  filter(year == 2012) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0)) %>% 
  ggplot() +
  geom_point(mapping = aes(x = exp,    # mapping = は省略可
y = voteshare,
alpha = 0.5,
color = as.factor(ldp))) + 
  labs(x = "Campaign money (yen)", y = "Vote share (%)") +
  theme_bw(base_family = "HiraKakuProN-W3",　
   base_size = 12)

plt_9

Let’s change the color of dots by party affiliation from tomato and blue to different ones, such as deeppink and aquamarine.

plt_9 +
  scale_color_manual(values = c("1" = "deeppink", "0" = "aquamarine"))

6.6 Highlight

You need to install and load a package, {gghighlight}

Elements	`{package}:function()`	What you can do
7. Highlight	`gghighlight()`	Emphasizing a particular data

Let’s draw a scatter plot using the 2005 HR election data:

・x-axis ・・・ expenditure (exp)
・y-axis ・・・ vote share (voteshare)

plt_10 <- hr %>% 
  select(seito, exp, voteshare, year) %>% 
  filter(year == 2005) %>% 
  ggplot(aes(x = exp, 
y = voteshare, 
col = seito)) +
  geom_point(alpha = 0.5) +
  labs(x = "Campaign money", 
  y = "Vote share (%)",
  title = "Money and vote share in the 2005HR election: Highlighting LDP candididates") +
  geom_smooth(method = lm, se = FALSE) +
  theme_bw(base_family = "HiraKakuProN-W3")

plt_10

Using gghighlight(), you can highlight a particular data
Keep the other data as grey
Emphasize the data you want to highlight in conspicuous colors

library(gghighlight)

Let’s emphasize the LDP and DPJ candidates in colors while the other candidates in grey

plt_10 +
  gghighlight(seito == "自民" | seito == "民主")

6.7 Identifying dots

`{ggrepel}`

You have the following two objectives to emphasize a particular data

You want to show what is important in a graph
You want to compare multiple graphs

You cannot do this only with {ggplot2}
→　You need to add additional package, {ggrepel}

Elements	`{package}:function()`	What you can do
6. Identifying dots	`{ggrepel}`	Identify what dots represent in scatter plot

Elements	Function
`ggrepel_text_repel()`	Adjust overlapping the values
`ggrepel_label_repel()`	Adjust overlapping the values

geom_text_repel() enables us to display values without overlapping
Let’s put individual labels in the scatter plot we made with df1

library(ggrepel)

If you want to display every single dot,…

plt_11 <- df1 %>% 
  ggplot() +
  aes(math, 
 stat,
 color = gender) +
  geom_point() + 
  theme_bw()

plt_11 + geom_text_repel(aes(label = name))

You can only display a particular data and pur labels on them

Display only top 5 people in math score

plt_11 + 
  geom_text_repel(aes(label = name), 
    data = function(data) 
 dplyr::slice_max(data, 
 math,
 n = 5)
    )

Display only top 12 people in statistics score

plt_11 + 
  geom_text_repel(aes(label = name),　 
    data = function(data) 
 dplyr::filter(data,　　　
 stat >= 12)
    )

Self-made function() ・Suppose you want to calculate \(3x^2 + 5\)
・You can calculte this on R like this:

3^2 * 1 + 5 # when x = 1

[1] 14

3^2 * 2 + 5 # when x = 2

[1] 23

・However, this way of calculation is extremely inefficient and very likely to face human errors.

A solution:

・ Make a Sel-made function() and use it.
→　You can repeat the similar calculation efficiently with fewer errors

Define \(3x^2 + 5\) as f1

f1 <- function(x){
  3^2 * x + 5
}

Calculate the value when x = 1

f1(x = 1)   # when x = 1

[1] 14

Calculate the value when x = 2

f1(x = 2)   # when x = 2

[1] 23

`f1()`	The name of Function
`x in function(x)`	Argument
`R code within {}`	What `f1()` does

In R, a function is an object.
→　If you plug in a variable, the function operates
An argument x works as a variable within {}
When the function operates, the value plugged in is executed.
The R code within {} can be more than one line
The last line in the R code can be used for the outcome

An Example of `Self-made function()`:

Read the HR election data (1996-2021) in Japan

hr <- read_csv("data/hr96-21.csv", na = ".")

Draw the 2005 HR election data and name it hr2005

hr2005 <- hr %>% 
  filter(year == 2005)

Check the variable names hr2005 contains

names(hr2005)

 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"

Make a self-made function to draw a box plot with a variable in hr2005 and name the function bp_hr2005

bp_hr2005 <- function(x){
  ggplot(hr2005) +
    aes(.data[[x]]) + 
    geom_boxplot() +
    coord_flip()
}

What does .data mean? ・.data means a special variable the {tidyverse} has.
・.data functions as an alias for the data frame under search.
Example：
・If you are working on the data frame, hr2005
→ hr2005[["age"]] means .data[["age"]]
・This means that you select age from the data frame, hr2005.

Let’s try age on x

bp_hr2005("age")

If you want to produce lots of graphs….

By using for(), you can make your work more efficient.
For instance, if you want to simultaneously make box plots on age, voteshare and exp and display their results…..

bp_hr2005_1 <- function(x){
  ggplot(hr2005) +
    aes(x = .data[[x]],  
y = .data[["seito"]],
color = .data[["seito"]]) + 
    geom_boxplot() +
    theme_bw(base_family = "HiraKakuProN-W3") +
    theme(legend.position = "none") 
}

for (x in c("age", "voteshare", "exp")){
  print(
    bp_hr2005_1(x)　
  )
}

Combination of `highlight()` and `ggrepel_text_repel()`

You can simultaneously highlight and put labels a particular person or party
For example, suppose you want to draw a box plot of campaign money and vote share highlighting LDP and DPJ candidates.

# 散布図の作成
plt_13 <- hr2005 %>% 
  ggplot() +
  aes(x = exp, 
y = seito,
color = seito) +  
  geom_boxplot() +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "none") + 
  labs(x = "Campaign money (yen)",
  y = "Party",
  title = "Distribution of Campaign Money by party: 2005HR Election in Japan")


plt_13 +
  gghighlight(seito == "自民" | seito == "民主",
label_key = seito, 
unhighlighted_params = list(
  color = "grey50") 
)

6.8 Facet

Elements	`{package}:function()`	What you can do
8. Facet	`facet_*()`	Making multiple graphs

You can make multiple graphs by levels of variable by using facet()
Let’s make multiple scatter plots on campaign money and vote share by party in the 2005 HR election in Japan
・ Read hr96-21.csv

hr <- read_csv("data/hr96-21.csv", na = ".")

plt_14 <- hr %>% 
  select(seito, exp, voteshare, year) %>% 
  filter(year == 2005) %>%  # Select the 2005 HR election data 
  ggplot(aes(x = exp, 
y = voteshare, 
col = seito)) +
  geom_point(alpha = 0.5) +
  labs(x = "Campaign money", y = "vote share (%)") +
  geom_smooth(method = lm) +
  theme_bw(base_family = "HiraKakuProN-W3") +
  ggtitle("Campaign money and vote share by party:2005 HR election in Japan") 

plt_14

Adding facet_wrap(~seito), you get the following multiple scatter plots by party

plt_14 +
  facet_wrap(~seito) # Divide a scatter plot into several plots by party

Within facet(), you can set the variable by which you change colors, the number of rows, and the number of colums

plt_14 +
  facet_wrap(
    vars(seito), # set the variable 
    nrow = 2,　 # set the number of rows
    ncol = 5　　# set the number of colums
  )

6.9 Patchwork

Elements	`{package}:function()`	What you can do
9. Patchwork	`{patchwork}`	Arranging multiple graphs

{patchwork} enables us to arrange different figures in one graph by using an operator (\(|\) and \(/\))
You can also compare these multiple graphs

Operator	What you can do
`\|`	Arrange in a row
`/`	Arrange in a column

For instance, you make multiple scatter plots with different colors
→ You see whether there is an interaction between variables and how strong they are

What we want to do here:

Check the relationship between campaign money and vote share by 3 variables in the 2005 HR election

LDP vs. non-LDP candidates
Gender
Candidate Status (Challenger, Incumbent, Non-incumbent)

Define the function to get this done
Name the function as scatter_hr() (whatever name you like)
Using patchwork::wrap_plots(nrow = 1), merge multiple results into one list to draw a graph

library(patchwork)

Read the HR election data

hr <- read_csv("data/hr96-21.csv", na = ".")

Draw the 2005 HR election data from hr, make LDP dummy (= 1 if a candidate is “自民”, 0 otherwise), and name the data frame as hr2005

hr2005 <- hr %>% 
  filter(year == 2005) %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0))  # "自民" means "LDP" in Japanese

Change class of the three variables (status, gender, ldp) to factor

hr2005$status <- as.factor(hr2005$status)
hr2005$gender <- as.factor(hr2005$gender)
hr2005$ldp <- as.factor(hr2005$ldp)

# This is the function to draw a scatter plot of campaign money and vote share in the 2005 HR election  

scatter_hr <- function(color = "seito"){ # Define scatter_hr()
  ggplot(hr2005) +
    aes(
 exp, voteshare,
 color = .data[[color]]
    ) +
    geom_point(size = 1, alpha = 0.5) +
    theme_bw(base_family = "HiraKakuProN-W3") + 
    labs(x = "Campaign money (yen)",
 y = "Vote share (%)",
 title = "2005 HR election in Japan") +
    theme(legend.position = "bottom")
}

hr2005 %>% 
  select(where(is.factor)) %>%
  names() %>%
  purrr::map(scatter_hr) %>%  
  patchwork::wrap_plots(nrow = 1)

6.10 Interactive graph

Elements	`{package}:function()`	What you can do
10. Interactive graph	`{plotly}`	Making an interactive graph

plotly::ggplotly() enables us to explore the relationship used in a scatter plot made with {ggplot2}

・Read the HR election data

hr <- read_csv("data/hr96-21.csv", na = ".")

Draw the 2005 HR election data from hr and choose only winners and name it as hr2005

hr2005 <- hr %>% 
  filter(year == 2005) %>% 
  filter(wl > 0)

Make an interactive graph: plt_15

plt_15 <- ggplot(hr2005) +
  aes(x = exp,
 y = voteshare,
 color = seito) +
  geom_point(aes(alpha = 0.5)) +
  theme_bw(base_family = "HiraKakuProN-W3") 

plotly::ggplotly(plt_15)

You can see the 8 party names in Japanese on the right side legend
If you click one of them, the dots corresponding to the party stops displaying on the scatter plot
Click 6 party names except LDP (“自民”) and DPJ (“民主”)

If you click the Camera icon below, then you can capture the graph you see and save it as .png file

If you want to save the scatter plot which only displays LDP and DPJ candidates, you can display it without using {plotly} as follow:

plt_15 +
  gghighlight::gghighlight(seito == "民主" | seito == "自民")

6.11 Save a graph

If you use {plotly}, you can capture a graph and save it.
When you want save it with high a resolution, then use ggsave()
Make a new folder in your R Project, and name it fig (whatever name you like)
Save a graph you made in fig folder

ggsave("fig/plt_15.png", # Name the file you want to save  
  plt_15,                # Choose an object you made on RMarkdown  
  width = 10,            # Set the width of the object
  height = 10,         　# Set the height of the object
  units = "cm")          # Set the unit of the size 
                             # "in" or "cm

7. Variation in R code

You can use various ways of R code
For instance, suppose you want to use the following R code:

ggplot(data = hr, mapping = aes(x = exp, y = voteshare))

There are the other 9 ways of writing your R code:

1. ggplot(hr, aes(x = exp, y = voteshare))

2. ggplot(hr, aes(exp, voteshare))

3. ggplot(hr) + aes(x = exp, y = voteshare))

4. ggplot(hr) + aes(exp, voteshare))

R code with pipe %>%

5. hr %>% ggplot(mapping = aes(x = exp, y = voteshare))

6. hr %>% ggplot(aes(x = exp, y = voteshare))

7. hr %>% ggplot(aes(exp, voteshare))

8. hr %>% ggplot() + aes(x = exp, y = voteshare))

9. hr %>% ggplot() + aes(exp, voteshare))

8. Useful Websites on {`ggplot2`}

ggplot2 extensions-gallery

This website visually show you what you can do with {ggplot2}

from Data to vis

This web site shows how to visualize data not only on R but Python and D3.js

TidyTuesday

Every week you see an updated quize on how to visalize data.
Participants upload their R code in Twitter

9. CHEAT SHEET

This cheat sheet is very useful

file:///Users/asanomasahiko/Dropbox/statistics/class_materials/graphs_tables/data-visualization-2.1.pdf

10. Exercise

・Q10.1: Reffering to 6.7 Identifying dots: Combination of highlight() and ggrepel_text_repel(), draw a boxplot of campaign money (exp) in the 2009 HR election in Japan.

In drawing your graph, see to highlight the LDP (“自民”) and the DPJ (“民主”) candidates
Use hr96-21.csv in your analysis.

・Q10.2: Reffering to 6.6 Highlight, draw a scatter plot of vote share (voteshare) in the 2009 HR election in Japan.

In drawing your graph, see to highlight the LDP (“自民”) and the DPJ (“民主”) candidates
Use hr96-21.csv in your analysis.

・Q10.3: Reffering to 6.8 Facet, draw a scatter plot of campaign money (exp) and Vote share (voteshare) for the LDP (“自民”) and the DPJ (“民主”) in the 2009 HR election in Japan.

In drawing your graph, use facet_*() and show the two regression lines for the LDP and the DPJ.

・Q10.4: Reffering to 6.7 Identifying dots: Self-made function(), simultaneously draw multiple boxplots of age, voteshare, and exp for each party.

・Q10.5: Reffering to 6.9 Patchwork, show how the relationship between campaign money (exp) and vote share (voteshare) changes in the 2009 HR election by the DPJ vs. non-DPJ, gender, and candidate status (status)

In drawing your scatter plots, use patchwork::wrap_plots().

Reference

宋財泫 (Jaehyun Song)・矢内勇生 (statuki statanai)「私たちのR: ベストプラクティスの探究」

宋財泫「可視化理論」

土井翔平（北海道大学公共政策大学院）「Rで計量政治学入門」

矢内勇生（高知工科大学）授業一覧

浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年

浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年

Winston Chang, R Graphics Coo %>% kbook, O’Reilly Media, 2012.

Kieran Healy, DATA VISUALIZATION, Princeton, 2019

Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017

4. {ggplot2} (basic)

Masahiko Asano

2024-11-06

1. Drawing a graph on R

2. What is {ggplot2}?

2.1 Two core features on {ggplot2}

2.2 Advantages of using {ggplot2}

2.3 Preparing {ggplot2}

1. install {tidyverse}

2. Load {tidyverse}

3. How does {ggplot2} work?

1. Select the data

2. Assign the variables

3. Assign a graph

geom_*()

Image of how {ggplot2} complete a graph

Let’s draw a scatter plot using a fake data (df1)

Two ways of assigning aes() for entire graph:

(1) Assign mapping argument, aes(), as an independent layer

(2) Assign mapping argument, aes(), within ggplot()

(1) Assign mapping argument, aes(), as an independent layer

(2) Assign mapping argument, aes(), within ggplot()

3. Assign mapping argument, aes(), within geom_point()

10 most commonly used layers:

Note:

4. Let’s draw a scatter plot using {ggplot2}

4.1 A simple scatter plot

Step 1: Prepare data

Assign data: ggplot()

Step 2: Assign the variables

Step 3: Select a graph

Mapping aes() and omission of R code

Mapping aes() within ggplot()

Mapping aes() within geom_plot()

Let’s take a look at an exmple of using the pipe

If you don’t use pipes,…

5. Customizing a graph (basic)

5.1 Dot color, shape, size, and background

Dot color changes depending on the location of argument of aes()

(1) When color = gender is mapped within aes()

(2) When color = "magenta" is mapped out of aes() : Applied to all dots

5.2 Label the axis

5.3 Main title

5.4 Show Japanese in a graph

How to fix the garbled characters on RMarkdown

Setting to use Japanese in a graph using theme()

5.5 Legend Location

5.6 Show a regression line

If you want to use the same color for dots and a line

If you want to use the dots in different colors, but the color of the lines is the same

5. Customizing a graph (advance)

10 most commonly used layers:

Data preparation

2 ways of checking the data downloaded:

(1) DT::datatable()

(2) reactable:reactable()

6.1 Labels

The scatter plot without customizing labels

The scatter plot with customizing labels

6.2 Theme

List of theme_()

Without using theme_()

If you want to use theme_classic()…..

List of theme_*() and their outlook

6.3 Coordinate System

coord_() has the following variations

6.3.1 coord_flip(): Exchange x-axis and y-axis

6.3.2 coord_cartesian()

Using coord_cartesian(), you can limit the range of output

Using xlim(), you can limit the range of output

6.3.3 coord_fixed()

6.4 統計処理

geom_bar(stat = "count")

geom_bar(stat = “identity”)

6.5 Scale

List of Scale_*()

The major reasons why you customize the scale of axes

6.5.1 Changing axes of continuous variables

scale_*_continuous()

If you want to change the scale of y-axis…

2. What is `{ggplot2}`?

2.1 Two core features on `{ggplot2}`

2.2 Advantages of using `{ggplot2}`

2.3 Preparing `{ggplot2}`

1. install `{tidyverse}`

2. Load `{tidyverse}`

`geom_*()` 　

Image of how {`ggplot2`} complete a graph

Let’s draw a scatter plot using a fake data (`df1`)

Two ways of assigning `aes()` for entire graph:

(1) Assign mapping argument, `aes()`, as an independent layer

(2) Assign mapping argument, `aes()`, within `ggplot()`

(1) Assign mapping argument, `aes()`, as an independent layer 　

(2) Assign mapping argument, `aes()`, within `ggplot()`

3. Assign mapping argument, `aes()`, within `geom_point()`

4. Let’s draw a scatter plot using `{ggplot2}`

Assign data: `ggplot()`

Mapping `aes()` and omission of R code

Mapping `aes()` within `ggplot()`

Mapping `aes()` within `geom_plot()`

Dot color changes depending on the location of argument of `aes()`

(1) When `color = gender` is mapped within `aes()`

(2) When `color = "magenta"` is mapped out of `aes()` : Applied to all dots

How to fix the garbled characters on `RMarkdown`

Setting to use Japanese in a graph using `theme()`

(1) `DT::datatable()`

(2) `reactable:reactable()`

List of `theme_()`

Without using `theme_()`

If you want to use `theme_classic()`…..

List of `theme_*()` and their outlook

`coord_()` has the following variations

6.3.1 `coord_flip()`: Exchange `x-axis` and `y-axis`

6.3.2 `coord_cartesian()`

Using `coord_cartesian()`, you can limit the range of output

Using `xlim()`, you can limit the range of output

6.3.3 `coord_fixed()`

`geom_bar(stat = "count")`

List of `Scale_*()`

`scale_*_continuous()`

`scale_*_discrete()`

`scale_color_manual()`

`{ggrepel}`

An Example of `Self-made function()`:

Combination of `highlight()` and `ggrepel_text_repel()`

8. Useful Websites on {`ggplot2`}