• R packages used in this section
library(DT)
library(gapminder)
library(gghighlight)
library(ggrepel)
library(stargazer)
library(tidyverse)

1. What is bar chart?

  • A histogram and a bar chart look very similar
  • A bar chart visualizes a discrete variable
  • A histogram visualizes a continuous variable
  • A bar chart has lines between bars
  • A histogram does not have lines between bars
Variables Graph Feature
Discrete Bar chart Lines between bars
Continuous Histogram No Lines between bars

Bar chart:
  • x-axis: Election year (1996-2021)
  • y-axis: The average number of candidates in each electoral district
  • For instance, there is no election held between 2017 and 2021
  • Election year looks a number, but it is not treated as a number in bar chart
  • Election year is a number, but the value itself does not mean anything
Histogram:
  • x-axis: Vote share (%) in lower house election in Japan (1996-2021)
  • y-axis: The number of candidates
  • Vote share ranged from 0% to 100%, and there are a number of values, such as 0.1%, 0.01%,….
  • You need to set up bars without lines between them
    → Histogram does not have bars
    → But, each bar contains the following two information:
  1. location in x-axis
  2. height in y-axis

2. Data preparation

The lower house election dara in Japan: 1996-2021 (hr96-21.csv)
・Clidk hr96-21.csv and download to your computer
・Read the election data and name it as df

df <- read_csv("data/hr96-21.csv",
na = ".")  

Check whether you safely read the data

  • hr96_21.csv is a collection of Japanese lower house election data covering 9 national elections (1996, 2000, 2003, 2005, 2009, 2012, 2014, 2017, 2021)
  • Check the name of variables hr contains
names(df)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"
  • hr has the following 23 variables
variable detail
year Election year (1996-2017)
pref Prefecture
ku Electoral district name
kun Number of electoral district
rank Ascending order of votes
wl 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
nocand Number of candidates in each district
seito Candidate’s affiliated party (in Japanese)
j_name Candidate’s name (Japanese)
name Candidate’s name (English)
previous Previous wins
gender Candidate’s gender:“male”, “female”
age Candidate’s age
exp Election expenditure (yen) spent by each candidate
status 0 = challenger / 1 = incumbent / 2 = former incumbent
vote votes each candidate garnered
voteshare Vote share (%)
eligible Eligible voters in each district
turnout Turnout in each district (%)
castvote Total votes cast in each district
seshu_dummy 0 = Not-hereditary candidates, 1 = hereditary candidate
jiban_seshu Relationship between candidate and his predecessor
nojiban_seshu Relationship between candidate and his predecessor
  • Check the class of variables included in df
str(df)
spec_tbl_df [9,660 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year         : num [1:9660] 1996 1996 1996 1996 1996 ...
 $ pref         : chr [1:9660] "愛知" "愛知" "愛知" "愛知" ...
 $ ku           : chr [1:9660] "aichi" "aichi" "aichi" "aichi" ...
 $ kun          : num [1:9660] 1 1 1 1 1 1 1 2 2 2 ...
 $ wl           : num [1:9660] 1 0 0 0 0 0 0 1 0 2 ...
 $ rank         : num [1:9660] 1 2 3 4 5 6 7 1 2 3 ...
 $ nocand       : num [1:9660] 7 7 7 7 7 7 7 8 8 8 ...
 $ seito        : chr [1:9660] "新進" "自民" "民主" "共産" ...
 $ j_name       : chr [1:9660] "河村たかし" "今枝敬雄" "佐藤泰介" "岩中美保子" ...
 $ gender       : chr [1:9660] "male" "male" "male" "female" ...
 $ name         : chr [1:9660] "KAWAMURA, TAKASHI" "IMAEDA, NORIO" "SATO, TAISUKE" "IWANAKA, MIHOKO" ...
 $ previous     : num [1:9660] 2 2 2 0 0 0 0 2 0 0 ...
 $ age          : num [1:9660] 47 72 53 43 51 51 45 51 71 30 ...
 $ exp          : num [1:9660] 9828097 9311555 9231284 2177203 NA ...
 $ status       : num [1:9660] 1 2 1 0 0 0 0 1 2 0 ...
 $ vote         : num [1:9660] 66876 42969 33503 22209 616 ...
 $ voteshare    : num [1:9660] 40 25.7 20.1 13.3 0.4 0.3 0.2 32.9 26.4 25.7 ...
 $ eligible     : num [1:9660] 346774 346774 346774 346774 346774 ...
 $ turnout      : num [1:9660] 49.2 49.2 49.2 49.2 49.2 49.2 49.2 51.8 51.8 51.8 ...
 $ seshu_dummy  : num [1:9660] 0 0 0 0 0 0 0 0 1 0 ...
 $ jiban_seshu  : chr [1:9660] NA NA NA NA ...
 $ nojiban_seshu: chr [1:9660] NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   year = col_double(),
  ..   pref = col_character(),
  ..   ku = col_character(),
  ..   kun = col_double(),
  ..   wl = col_double(),
  ..   rank = col_double(),
  ..   nocand = col_double(),
  ..   seito = col_character(),
  ..   j_name = col_character(),
  ..   gender = col_character(),
  ..   name = col_character(),
  ..   previous = col_double(),
  ..   age = col_double(),
  ..   exp = col_double(),
  ..   status = col_double(),
  ..   vote = col_double(),
  ..   voteshare = col_double(),
  ..   eligible = col_double(),
  ..   turnout = col_double(),
  ..   seshu_dummy = col_double(),
  ..   jiban_seshu = col_character(),
  ..   nojiban_seshu = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
  • chr means character and num means numeric
    → No problem

Descriptive statistics: df

library(stargazer)
  • Type {r, results = "asis"} as chunk option
stargazer(as.data.frame(df), 
type ="html",
digits = 2)
Statistic N Mean St. Dev. Min Max
year 9,660 2,007.88 7.68 1,996 2,021
kun 9,660 5.75 5.07 1 25
wl 9,660 0.48 0.67 0 2
rank 9,660 2.65 20.39 1 2,003
nocand 9,660 3.89 1.09 2 9
previous 9,660 1.48 2.46 0 19
age 9,656 51.22 11.13 25 94
exp 6,829 7,551,393.00 5,482,684.00 535 27,462,362
status 9,660 0.49 0.62 0 2
vote 9,660 55,987.87 40,626.34 177 210,515
voteshare 9,660 27.67 19.34 0.10 95.30
eligible 8,785 330,268.30 80,058.87 115,013 495,212
turnout 7,849 62.09 6.53 44.71 83.80
seshu_dummy 8,875 0.12 0.33 0 1

3. Draw a bar chart

3.1 Draw a simple bar chart (1)

  • Let’s draw a simple bar chart of the number of candidates between 1996 and 2021 in Japan’s lower house elections (9 elections)
  • You use two geometric objects, ggplot() and geom_bar()
Geometric object What you can do
ggplot() Prepare a canvas
geom_bar() Draw a bar chart
  • You need the following two data:
  1. year (1996, 2000, 2003,…, 2021)
  2. the number of candidates in each election
  • You use the data frame, df
  • Using table(), check the number of candidates in each election
table(df$year)

1996 2000 2003 2005 2009 2012 2014 2017 2021 
1261 1199 1026  989 1139 1294  959  936  857 
  • You have the two data necessary to draw a bar chart
  • Assign year (x-axis), and the number of candidates (y-axis)
  • df contains year, but df does not have the variable representing the number of candidates
  • In drawing a bar chart, you need to map x in geom_bar(aes())
  • You do not have to map y in geom_bar(aes())
  • {ggplot2} automatically calculates and showsit in a bar chart
    → You map as geom_bar(aes(x = year))
  • If you add labs() layer, you can put a label on x-axis and y-axis
df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

  • It does not look good with different size of intervals between bars
  • This because the class of year is numeric
  • Check the class of year by using str()
str(df$year)
 num [1:9660] 1996 1996 1996 1996 1996 ...
  • As expected, the calss of year if numeric
    → R understands year as a continuous variable
    → The lower house elections in Japan are held with different intervals within 4 years
    → That is why, we have this ugly bar chart

Solution

  • Change the class of year from numeric to factor
df$year <- factor(df$year) 
  • Check the class of year
str(df$year)
 Factor w/ 9 levels "1996","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
  • The class of year is changed to factor
df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

- You can save this bar chart with an appropriate name on it, like bar_plot1
- If this is the case, then you type as follows:

bar_plot1 <- df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of can didates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 
  • See 5. How to save graphs for how to save graphs

  • If you want to output the graph, just type the object name, in this case, bar_plot1 in a chunk and knit it

What we can see from this graph ・The number of candidates in Japan’s lower house elections has been decreasing (except 2009 and 2012)

3.2 Draw a simple bar chart (2)

  • Next, let’s drawa simple bar chart of the average number of candidates in each single-member-district in Japanese lower house election between 1996 and 2021
  • You need the following two data:
  1. year (1996, 2000, 2003,…, 2021)
  2. the number of candidates in each electoral district and in each election
  • You use the data frame, df

  • Using table(), check the number of candidates in each election

  • Assign year (x-axis), and the average number of candidates in each electoral district and in each election (y-axis)

  • df contains year, but df does not have the variable representing the average number of candidates in each electoral district and in each election

  • You can calculate this variable by using group_by() and save its results as ave_nocand1

df1 <- df %>% 
  group_by(year, ku) %>% 
  summarise(ave_nocand1 = mean(nocand, na.rm = TRUE)) 

df1
# A tibble: 423 × 3
# Groups:   year [9]
   year  ku        ave_nocand1
   <fct> <chr>           <dbl>
 1 1996  aichi            6.33
 2 1996  akita            3.4 
 3 1996  aomori           3.93
 4 1996  chiba            4.75
 5 1996  ehime            3.57
 6 1996  fukui            4.17
 7 1996  fukuoka          3.90
 8 1996  fukushima        3.67
 9 1996  gifu             3.78
10 1996  gunma            3.84
# … with 413 more rows
  • What you need here is the average number of candidates in each district and in each election
  • You calculate the average of ave_nocand1 in each election and save it as ave_nocand2
df1 <- df1 %>% 
  group_by(year) %>% 
  summarise(ave_nocand2 = mean(ave_nocand1, na.rm = TRUE))

df1
# A tibble: 9 × 2
  year  ave_nocand2
  <fct>       <dbl>
1 1996         4.11
2 2000         3.93
3 2003         3.45
4 2005         3.40
5 2009         3.86
6 2012         4.20
7 2014         3.23
8 2017         3.27
9 2021         3.00
  • You use an geometric object, geom_bar() and map aes() as follows:
    x-axis: x = year
    y-axis: y = ave_nocand2
    → map geom_bar(aes(x = year, y = ave_nocand2))
  • When you map both x and y, you need to assign as stat = "identity" inside of geom_bar() and out of aes()
  • If you add labs() layer, you can put a label on x-axis and y-axis
df1 %>% 
  ggplot() +
  geom_bar(aes(x = year,
               y = ave_nocand2),
           stat = "identity") +
  labs(x = "Election Year", y = "The average number of candidate in each district and in each election") + 
  ggtitle("The average number of candidates in each district (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

What you can see from this bar chart ・ The number of candidates in each sigle-member-district has been decreasing over time since 1996 (except in 2009 and 2012)
- Japanese lower house election seems to be getting closer to a two-candidate election

3.3 Modify numeric values into characters

  • Let’s draw a bar chart of the number of candidates by status (incumbent, challenger, and former-incumbent) in Japanese lower house elections (1996-2021)
  • You need the following two data:
  1. status: (challenger = 0, incumbent = 1, former-incumbent = 2)
  2. N : The number of candidates by status
  • You use the data frame, df
  • Let’s check status, by using table()
table(df$status)

   0    1    2 
5517 3510  633 
  • You have all data you need to draw a bar chart
  • x-axis: status
  • y-axis: N
  • df contains status, but df does not have N
  • In drawing a bar chart, you need to map x in geom_bar(aes())
  • You do not have to map y in geom_bar(aes())
  • {ggplot2} automatically calculates and shows it in a bar chart
    → You map as geom_bar(aes(x = status))
  • If you add labs() layer, you can put a label on x-axis and y-axis
df %>% 
  ggplot() +
  geom_bar(aes(x = status)) +
  labs(x = "Candidate status", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

  • x-axis shows 0, 1, and 2 showing candidate’s status
  • You need to fix this to challengers, incumbents, and former-incumbents
  • Using case_when(), you re-code status
  • Add status_J as a new variable to df and override it as df2
df2 <- df %>%
  mutate(status_J = case_when(status == "0" ~ "challengers",
 status == "1" ~ "incumbents",
 TRUE  ~ "former-incumbents"))
  • Instead of typing x = status, you need to type x = status_J in mapping
df2 %>% 
  ggplot() +
  geom_bar(aes(x = status_J)) +
  labs(x = "Candidate's status", 
  y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

How to fix the garbled characters in a graph
For Windows users: Add the following R code
theme_bw(base_family = "HiraKakuProN-W3")

For Mac users: Add the following R code
theme_bw(base_family = "HiraKakuProN-W3")

3.4 How to change the order of bars

3.4.1 factor()   

  • The bar chart you made in the previous sub-section shows challengers, former-incumbents, and incumbents in order
  • Suppose you want to change the order to incumbents, challengers, former-incumbents
  • To do this, you need to change the class of status_J to factor
  • In using mutate(), you change the class of status_j to factor and then pass it to ggplot() → Change the order by hand: levels = c("incumbents", "challengers", "former-incumbents")
df2 %>%
mutate(status_J = factor(status_J,
levels = c("incumbents", "challengers", "former-incumbents"))) %>% 
ggplot() +
  geom_bar(aes(x = status_J)) +
  labs(x = "Candidate status", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

  • Now, you get what you want!

3.4.2 fct_inorder()   

  • To use fct_inorder(), you need to install {forcats}   
  • {tidyverse} includes {forcats}
  • If you install {tidyverse}, then you do not have to install {forcats}.
  • fct_inorder() change the class of variables in the parenthesis to factor and put them in order
  • Let’s check order of the value in status_J
head(df2$status_J)
[1] "incumbents"        "former-incumbents" "incumbents"       
[4] "challengers"       "challengers"       "challengers"      
  • The order of the value in status_J is incumbents, former-incumbent, challengers
  • If you want to fix the order of value in status_J, then you use fct_inorder()   
df2 %>%
mutate(status_J = fct_inorder(status_J)) %>% 
  ggplot() +
  geom_bar(aes(x = status_J)) +
  labs(x = "Candidate status", y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

3.5 Add another dimention

3.5.1 The number of candidates by party in the 2021 HR election in Japan

  • Usually, a bar chart has x-axis and y-axis

  • A bar includes the information of a HR election (x-axis) and the number of candidates (y-axis)

  • The number of dimentions means the amount of information

  • This information is the number of arguments within aes()

  • Let’s draw a bar chart of the number of candidates by party (seito) in the 2021 HR election

df21 <- df %>% 
  filter(year == 2021) 
  • Check the list of parties running for the 2021 HR election in Japan.
unique(df21$seito)
 [1] "自民" "立憲" "N党" "国民" "維新" "共産" "れい" "無所" "社民" "諸派"
[11] "公明"
  • The number of parties is 11
  • Let’s change the class of seito to factor and assign the order of values of seito as you like

How to assign the order of the values in seito
・For instance, suppose you want to arrange, 自民、立憲、共産 in order, you assign as follows:

mutate(seito = factor(seito,
levels = c("自民", "立憲", "共産")))

df21 <- df21 %>%
drop_na(seito) %>%
mutate(seito = factor(seito, 
levels = c("自民", 
 "立憲", 
 "共産", 
 "維新",
 "無所", 
 "N党", 
 "国民",
 "れい",
 "公明",  
 "諸派", 
 "社民")))

3.5.2 Calulate the number of candidates (N)

  • Calculate the number of candidates (N) by gender and party
    → Put the object name (res_1) on the outcome
 res_1 <- df21 %>%
group_by(seito, gender, wl) %>%  
summarise(N = n())
DT::datatable(res_1)
  • Let’s visualize the results
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N), 
           stat = "identity") + 
  ggtitle("The number of candidates by party in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

  • This bar chart has the following two information:
  • x-axis: Party names seito
  • y-axis: The number of candidates (N)

3.5.2 Add another dimension

fill = gender
  • Let’s add another dimension (gender) to the bar chart above
  • When you add another dimension, you add an argument within aes()
  • You can customize surface with fill
  • You can customize the color of the framer border with color - You can customize the type of the framer border with linetype
  • You can customize the transparency of a bar with alphs
  • In the case of a bar chart, you can customize the surface color and the color of the framer border
  • Let’s map the surface color with fill
  • Add fill = gender within aes() を追加する
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N, 
               fill = gender), 
           stat = "identity") + 
  ggtitle("The number of candidates by party & gender in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

3.5.3 Dodge bars

position = "dodge"
  • The bar chart above is not user friendly in terms of visualization
  • Let’s dodge bars by adding position = "dodge" within geom_bar()
  • Note: Assign position = "dodge" OUT OF aes()
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N,
               fill = gender), 
           stat = "identity",
           position = "dodge") + # position = "dodge" is mapped out of aes()
  ggtitle("The number of candidates by party & gender in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3") 

3.5.4 Customize the location of legend

theme(legend.position = "bottom")
  • Let’s change the location of legend
  • theme() enables us to precisely customize the outlook of a graph
  • To change the location lf legend, you map legend.position = "bottom" within theme()
    → The legend moves to the bottom
  • The default is "right"
  • You have thre other options: "top", "left", and "none"
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N, 
fill = gender), 
 stat = "identity",
 position = "dodge") + 
  ggtitle("The number of candidates by party & gender in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom") 

3.6 facet_wrap()

  • If you want to avoid using too many colors in a graph, you should use facet_wrap()
  • You add facet_wrap(~ variable name)
  • Suppose you want to draw a bar chart of the number of candidates (1) by party and (2) by gender
res_1 %>%
  ggplot() +
  geom_bar(aes(x = gender, 
y = N), 
 stat = "identity") +
  ggtitle("The number of candidates by party & gender in the 2021 HR election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  facet_wrap(~ seito) 

  • You can replace gender with seito in mapping aes() and replace seito with gender in mapping facet_wrap() and draw the following bar chart.
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N), 
 stat = "identity") +
  ggtitle("The number of candidates by party & gender in the 2021 HR election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  facet_wrap(~ gender) 

3.7 Customize the labels

theme(axis.text.x = element_text(angle = XX, vjust = X, hjust = X)
  • Sometimes, the labels on x-axisare too long overlapping one another
  • If this is the case, then you can rotate the labels on x-axis so that you can see them without overlapping one another
  • You add theme(axis.text.x= element_text() as a new layer to get this done
  • You can adjust an appropriate degrees of rotating the labels with angle = ...
  • Here, I assign angle = 30
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N), 
 stat = "identity") + 
  facet_wrap(~ gender) + 
  labs(x = "Political Party", 
       y = "The number of candidates") +
  ggtitle("The number of candidates by party & gender in the 2021 HR election") +
  theme_minimal() + 
  theme_bw(base_family = "HiraKakuProN-W3") + 
  theme(axis.text.x= element_text(angle = 30, vjust = 1, hjust = 1),
panel.grid.major.x = element_blank()) 

3.8 Add a new layer to facet

  • Map fill = variable name within aes(), then you can add a new layer
  • First, make a variable elec_result, and then draw a bar chart with ggplot()
res_1 %>%
  mutate(elec_result = if_else(wl == "0", "lose", "win")) %>% 
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N,
               fill = elec_result),  
           stat = "identity") + 
  facet_wrap(~ gender) +
  labs(x = "Political party", y = "The number of candidates") +  
  ggtitle("The number of candidates by party & win/lose in the 2021 HR election") +
  theme_minimal() +
  theme_bw(base_family = "HiraKakuProN-W3") +  
  theme(axis.text.x = element_text(angle = 40, 
                                   vjust = 1,
                                   hjust = 1),
        panel.grid.major.x = element_blank())  +
  theme(legend.position = "bottom")

3.9 How to swich x-asis and y-axis

  • When the labels on x-asix is too long overlapping one another, you should simply swich x-asis and y-axis
res_1 %>%
  ggplot() +
  geom_bar(aes(x = N, 
y = seito), 
 stat = "identity") + 
  facet_wrap(~ gender) + 
  labs(x = "The number of candidates", y = "Politial party") + # ラベル修正
  ggtitle("The number of candidates by party & win/lose in the 2021 HR election") +
  theme_minimal() + 
  theme_bw(base_family = "HiraKakuProN-W3") 

Another way to swich x-axis and y-axis: coord_flip()

  • You can swich axes by adding coord_flip()
res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N), 
 stat = "identity") + 
  facet_wrap(~ gender) +
  labs(x = "The number of an", y = "政党") + 
  theme_minimal() + 
  theme_bw(base_family = "HiraKakuProN-W3") +
  coord_flip()

4. Display numbers on bars

geom_text(aes(label = variable name))

4.1 A bar chart with numbers displayed

  • Suppose you want to display numbers on the following bar chart

  • The number of numbers (values) you want to displays on bars is 11

  • The number of candidates these 11 parties nominated

  • Check the data frame res_1

Check how you made res_1

df <- read_csv("data/hr96-21.csv", 
na = ".")
df21 <- df %>% 
  filter(year == 2021) %>% # Select the 2021 HR election data
  drop_na(seito) %>%    # Drop missing values in seito
mutate(seito = factor(seito, # change the class of seito to factor and arrange in order you want 
levels = c("自民", 
 "立憲", 
 "共産", 
 "維新",
 "無所", 
 "N党", 
 "国民",
 "れい",
 "公明",  
 "諸派", 
 "社民")))
  • Calculate the number of candidates by party, gender and win/lose and name it res_1
res_1 <- df21 %>%  
group_by(seito, gender, wl) %>%
summarise(N = n()) 
  • Display res_1
DT::datatable(res_1)
  • Using res_1, you calculate the number of candidates each party nominated (= party_cand) and name it res_2
res_2 <- res_1 %>%
  group_by(seito) %>%
  summarize(party_cand = sum(N))
  • Display res_2
DT::datatable(res_2)
  • Adding geom_text(aes(label = party_cand), you can display the numbers on bars
res_2 %>%
  ggplot(aes(x = seito, 
y = party_cand)) +
  geom_bar(aes(), 
 stat = "identity",
 position = position_dodge(width = 0.9)) +  
  geom_text(aes(label = party_cand), 
  vjust = 1.2, 
  colour = "yellow", 
  position = position_dodge(width = 0.9),
  size = 3) +
  ggtitle("The number of candidates by party in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom") 

4.2 A bar chart with numbers displayed (dodge)

  • Suppose you want to display numbers on the following bar chart

  • The number of numbers (values) you want to displays on bars is 22
  • The 22 values are the number of candidates 11 parties nominated in the 2021 HR election by gender
  • Check the data frame res_1
DT::datatable(res_1)
  • Using res_1, you calculate the number of candidates 11 parties nominated by gender (= total) and name it res_3
res_3 <- res_1 %>%
  group_by(seito, gender) %>%
  summarize(total = sum(N))
  • Display res_3
DT::datatable(res_3)
  • Adding geom_text(aes(label = total), you can display the numbers on bars
res_3 %>%
  ggplot(aes(x = seito, 
             y = total, 
             fill = gender)) +
  geom_bar(aes(),
           stat = "identity",
           position = position_dodge(width = 0.9)) +
  geom_text(aes(label = total), 
            vjust = 1.2, 
            colour = "white", 
            position = position_dodge(width = 0.9),
            size = 3) +
  ggtitle("The number of candidates by party & gender in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom") 

  • The left side of the bar chart displays the numbers
  • But, the right side of the bar chart fails to appropriately display the numbers
  • You need to fix this
  • Delete vjust = 1.2
  • Change colour = "white" to colour = "black"
res_3 %>%
  ggplot(aes(x = seito, 
y = total, 
fill = gender)) +
  geom_bar(aes(), 
 stat = "identity",
 position = position_dodge(width = 0.9)) + 
  geom_text(aes(label = total), 
  colour = "black", # Change the color of the number  
  position = position_dodge(width = 0.9),
  size = 3) +
   ggtitle("The number of candidates by party & gender in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom") 

4.3 A bar chart with numbers displayed (stacked)

position = "stack"

  • Suppose you want to display numbers on the following bar chart

  • The number of numbers (values) you want to displays on bars is 22

  • The 22 values are the number of candidates 11 parties nominated in the 2021 HR election by gender

  • Check the data frame res_1

  • Using res_1, you calculate the number of candidates 11 parties nominated by gender (= total) and name it res_4

res_4 <- res_1 %>%
  group_by(seito, gender) %>%
  summarize(total = sum(N))
  • Display res_4
DT::datatable(res_4)
  • Adding geom_text(aes(label = total), you can display the numbers on bars
res_4 %>%
  ggplot(aes(x = seito, 
             y = total, 
             fill = gender)) +
  geom_bar(aes(), 
           stat = "identity",
           position = "stack") +  # Change to [position = "stack"]  
  geom_text(aes(label = total),
            colour = "black", 
            position = "stack", # Change to [position = "stack"]  
            size = 3) +
  ggtitle("The number of candidates by party & gender in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom") 

5. How to save a graph

  • You have two ways of saving a graph you made
Type of format Extension File size Quality
Vector .pdf (recommended) .svg large Good
Bitmap .png (recommended) .bmp .jpg .jpeg small OK
  • Let’s save the graph we made (bar_plot_1) with Vector and png format
  • Using ggsave(), you save a graph made on ggplot2
Step 1:
  • Name a graph you made, like bar_plot1 (whatever name you like)
  • Make a folder where you save your graphs, like Figs in you RProject folder
Step 2:
  • Load {ragg}
library(ragg)
ggsave(filename = "Figs/plot1.png", # Assign where you want to save your graph and name it
  plot= bar_plot1,  # The name of graph you made  
  width= 6, # Inch
  height= 3, # Inch
  dpi = 400,  # Assign graph resolution 
  device= ragg::agg_png) # Avoid the garbled characters in a graph
  • The name of the graph is bar_plot1
  • You put a name on it and save it as plot1.png
  • Make a folder where you save your graphs and name it as Figs in your RProject folder
  • You can customize the size of the graph
  • You can also cusomize the graph resolution, like 400
  • {ragg} enables us to avoid the garbled characters in a graph

6. Exercise

  • Q6.1:
    In reference to 3.1 Draw a simple bar chart (1), draw a bar chart of the number of male candidates between the lower house elections between 1996 and 2021 in Japan.

  • Q6.2:
    In reference to 3.1 Draw a simple bar chart (1), draw a bar chart of the number of female candidates between the lower house elections between 1996 and 2021 in Japan.

  • Q6.3:
    In reference to 3.5.3 Dodge bars, draw a bar chart of the number of candidates dodged by gender between the lower house elections between 1996 and 2021 in Japan.

  • Q6.4(Advance):
    In reference to 3.8 Add a new layer to facet, draw a bar chart of the number of winners (wl = 1 & 2) and losers (wl = 0) by party faceted by candidate status (status) in the 2021 HR election in Japan.
    ・winners include both those who won in a single-member-district (wl = 1) and zombie winners (wl = 2)

  • Q6.5(Advance):
    In reference to 4.2 A bar chart with numbers displayed (dodge), draw a bar chart of the number of single-member-district winners (wl = 1), zombie winners (wl = 2), and losers (wl = 0) by party faceted by candidate status (status) in the 2021 HR election in Japan.

Reference
  • Tidy Animated Verbs
  • 宋財泫 (Jaehyun Song)・矢内勇生 (statuki statanai)「私たちのR: ベストプラクティスの探究」
  • 宋財泫「ミクロ政治データ分析実習(2022年度)」
  • 土井翔平(北海道大学公共政策大学院)「Rで計量政治学入門」
  • 矢内勇生(高知工科大学)授業一覧
  • 浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年
  • 浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年
  • Winston Chang, R Graphics Coo %>% kbook, O’Reilly Media, 2012.
  • Kieran Healy, DATA VISUALIZATION, Princeton, 2019
  • Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017