R packages used in this section

library(DT)
library(gapminder)
library(gghighlight)
library(ggrepel)
library(stargazer)
library(tidyverse)

1. What is bar chart?

A histogram and a bar chart look very similar
A bar chart visualizes a discrete variable
A histogram visualizes a continuous variable
A bar chart has lines between bars
A histogram does not have lines between bars

Variables	Graph	Feature
Discrete	Bar chart	Lines between bars
Continuous	Histogram	No Lines between bars

Bar chart:

x-axis: Election year (1996-2021)
y-axis: The average number of candidates in each electoral district
For instance, there is no election held between 2017 and 2021
Election year looks a number, but it is not treated as a number in bar chart
Election year is a number, but the value itself does not mean anything

Histogram:

x-axis: Vote share (%) in lower house election in Japan (1996-2021)
y-axis: The number of candidates
Vote share ranged from 0% to 100%, and there are a number of values, such as 0.1%, 0.01%,….
You need to set up bars without lines between them
→ Histogram does not have bars
→ But, each bar contains the following two information:

location in x-axis
height in y-axis

2. Data preparation

The lower house election dara in Japan: 1996-2021 (hr96-21.csv)
・Clidk hr96-21.csv and download to your computer
・Read the election data and name it as df

df <- read_csv("data/hr96-21.csv",
na = ".")

Check whether you safely read the data

hr96_21.csv is a collection of Japanese lower house election data covering 9 national elections (1996, 2000, 2003, 2005, 2009, 2012, 2014, 2017, 2021）
Check the name of variables hr contains

names(df)

 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"

hr has the following 23 variables

variable	detail
year	Election year (1996-2017)
pref	Prefecture
ku	Electoral district name
kun	Number of electoral district
rank	Ascending order of votes
wl	0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
nocand	Number of candidates in each district
seito	Candidate’s affiliated party (in Japanese)
j_name	Candidate’s name (Japanese)
name	Candidate’s name (English)
previous	Previous wins
gender	Candidate’s gender:“male”, “female”
age	Candidate’s age
exp	Election expenditure (yen) spent by each candidate
status	0 = challenger / 1 = incumbent / 2 = former incumbent
vote	votes each candidate garnered
voteshare	Vote share (%)
eligible	Eligible voters in each district
turnout	Turnout in each district (%)
castvote	Total votes cast in each district
seshu_dummy	0 = Not-hereditary candidates, 1 = hereditary candidate
jiban_seshu	Relationship between candidate and his predecessor
nojiban_seshu	Relationship between candidate and his predecessor

Check the class of variables included in df

str(df)

spec_tbl_df [9,660 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year         : num [1:9660] 1996 1996 1996 1996 1996 ...
 $ pref         : chr [1:9660] "愛知" "愛知" "愛知" "愛知" ...
 $ ku           : chr [1:9660] "aichi" "aichi" "aichi" "aichi" ...
 $ kun          : num [1:9660] 1 1 1 1 1 1 1 2 2 2 ...
 $ wl           : num [1:9660] 1 0 0 0 0 0 0 1 0 2 ...
 $ rank         : num [1:9660] 1 2 3 4 5 6 7 1 2 3 ...
 $ nocand       : num [1:9660] 7 7 7 7 7 7 7 8 8 8 ...
 $ seito        : chr [1:9660] "新進" "自民" "民主" "共産" ...
 $ j_name       : chr [1:9660] "河村たかし" "今枝敬雄" "佐藤泰介" "岩中美保子" ...
 $ gender       : chr [1:9660] "male" "male" "male" "female" ...
 $ name         : chr [1:9660] "KAWAMURA, TAKASHI" "IMAEDA, NORIO" "SATO, TAISUKE" "IWANAKA, MIHOKO" ...
 $ previous     : num [1:9660] 2 2 2 0 0 0 0 2 0 0 ...
 $ age          : num [1:9660] 47 72 53 43 51 51 45 51 71 30 ...
 $ exp          : num [1:9660] 9828097 9311555 9231284 2177203 NA ...
 $ status       : num [1:9660] 1 2 1 0 0 0 0 1 2 0 ...
 $ vote         : num [1:9660] 66876 42969 33503 22209 616 ...
 $ voteshare    : num [1:9660] 40 25.7 20.1 13.3 0.4 0.3 0.2 32.9 26.4 25.7 ...
 $ eligible     : num [1:9660] 346774 346774 346774 346774 346774 ...
 $ turnout      : num [1:9660] 49.2 49.2 49.2 49.2 49.2 49.2 49.2 51.8 51.8 51.8 ...
 $ seshu_dummy  : num [1:9660] 0 0 0 0 0 0 0 0 1 0 ...
 $ jiban_seshu  : chr [1:9660] NA NA NA NA ...
 $ nojiban_seshu: chr [1:9660] NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   year = col_double(),
  ..   pref = col_character(),
  ..   ku = col_character(),
  ..   kun = col_double(),
  ..   wl = col_double(),
  ..   rank = col_double(),
  ..   nocand = col_double(),
  ..   seito = col_character(),
  ..   j_name = col_character(),
  ..   gender = col_character(),
  ..   name = col_character(),
  ..   previous = col_double(),
  ..   age = col_double(),
  ..   exp = col_double(),
  ..   status = col_double(),
  ..   vote = col_double(),
  ..   voteshare = col_double(),
  ..   eligible = col_double(),
  ..   turnout = col_double(),
  ..   seshu_dummy = col_double(),
  ..   jiban_seshu = col_character(),
  ..   nojiban_seshu = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

chr means character and num means numeric
→ No problem

Descriptive statistics: `df`

library(stargazer)

Type {r, results = "asis"} as chunk option

stargazer(as.data.frame(df), 
type ="html",
digits = 2)


Statistic	N	Mean	St. Dev.	Min	Max

year	9,660	2,007.88	7.68	1,996	2,021
kun	9,660	5.75	5.07	1	25
wl	9,660	0.48	0.67	0	2
rank	9,660	2.65	20.39	1	2,003
nocand	9,660	3.89	1.09	2	9
previous	9,660	1.48	2.46	0	19
age	9,656	51.22	11.13	25	94
exp	6,829	7,551,393.00	5,482,684.00	535	27,462,362
status	9,660	0.49	0.62	0	2
vote	9,660	55,987.87	40,626.34	177	210,515
voteshare	9,660	27.67	19.34	0.10	95.30
eligible	8,785	330,268.30	80,058.87	115,013	495,212
turnout	7,849	62.09	6.53	44.71	83.80
seshu_dummy	8,875	0.12	0.33	0	1

3. Draw a bar chart

3.1 Draw a simple bar chart (1)

Let’s draw a simple bar chart of the number of candidates between 1996 and 2021 in Japan’s lower house elections (9 elections)
You use two geometric objects, ggplot() and geom_bar()

Geometric object	What you can do
`ggplot()`	Prepare a canvas
`geom_bar()`	Draw a bar chart

You need the following two data:

year (1996, 2000, 2003,…, 2021)
the number of candidates in each election

You use the data frame, df
Using table(), check the number of candidates in each election

table(df$year)


1996 2000 2003 2005 2009 2012 2014 2017 2021 
1261 1199 1026  989 1139 1294  959  936  857

You have the two data necessary to draw a bar chart
Assign year (x-axis), and the number of candidates (y-axis)
df contains year, but df does not have the variable representing the number of candidates
In drawing a bar chart, you need to map x in geom_bar(aes())
You do not have to map y in geom_bar(aes())
{ggplot2} automatically calculates and showsit in a bar chart
→　You map as geom_bar(aes(x = year))
If you add labs() layer, you can put a label on x-axis and y-axis

df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

It does not look good with different size of intervals between bars
This because the class of year is numeric
Check the class of year by using str()

str(df$year)

 num [1:9660] 1996 1996 1996 1996 1996 ...

As expected, the calss of year if numeric
→　R understands year as a continuous variable
→　The lower house elections in Japan are held with different intervals within 4 years
→　That is why, we have this ugly bar chart

Solution

Change the class of year from numeric to factor

df$year <- factor(df$year)

Check the class of year

str(df$year)

 Factor w/ 9 levels "1996","2000",..: 1 1 1 1 1 1 1 1 1 1 ...

The class of year is changed to factor

df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

- You can save this bar chart with an appropriate name on it, like bar_plot1
- If this is the case, then you type as follows:

bar_plot1 <- df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of can didates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

See 5. How to save graphs for how to save graphs
If you want to output the graph, just type the object name, in this case, bar_plot1 in a chunk and knit it

What we can see from this graph ・The number of candidates in Japan’s lower house elections has been decreasing (except 2009 and 2012)

3.2 Draw a simple bar chart (2)

Next, let’s drawa simple bar chart of the average number of candidates in each single-member-district in Japanese lower house election between 1996 and 2021
You need the following two data:

year (1996, 2000, 2003,…, 2021)
the number of candidates in each electoral district and in each election

You use the data frame, df
Using table(), check the number of candidates in each election
Assign year (x-axis), and the average number of candidates in each electoral district and in each election (y-axis)
df contains year, but df does not have the variable representing the average number of candidates in each electoral district and in each election
You can calculate this variable by using group_by() and save its results as ave_nocand1

df1 <- df %>% 
  group_by(year, ku) %>% 
  summarise(ave_nocand1 = mean(nocand, na.rm = TRUE)) 

df1

# A tibble: 423 × 3
# Groups:   year [9]
   year  ku        ave_nocand1
   <fct> <chr>           <dbl>
 1 1996  aichi            6.33
 2 1996  akita            3.4 
 3 1996  aomori           3.93
 4 1996  chiba            4.75
 5 1996  ehime            3.57
 6 1996  fukui            4.17
 7 1996  fukuoka          3.90
 8 1996  fukushima        3.67
 9 1996  gifu             3.78
10 1996  gunma            3.84
# … with 413 more rows

What you need here is the average number of candidates in each district and in each election
You calculate the average of ave_nocand1 in each election and save it as ave_nocand2

df1 <- df1 %>% 
  group_by(year) %>% 
  summarise(ave_nocand2 = mean(ave_nocand1, na.rm = TRUE))

df1

# A tibble: 9 × 2
  year  ave_nocand2
  <fct>       <dbl>
1 1996         4.11
2 2000         3.93
3 2003         3.45
4 2005         3.40
5 2009         3.86
6 2012         4.20
7 2014         3.23
8 2017         3.27
9 2021         3.00

You use an geometric object, geom_bar() and map aes() as follows:
・x-axis: x = year
・y-axis: y = ave_nocand2
→　map geom_bar(aes(x = year, y = ave_nocand2))
When you map both x and y, you need to assign as stat = "identity" inside of geom_bar() and out of aes()
If you add labs() layer, you can put a label on x-axis and y-axis

df1 %>% 
  ggplot() +
  geom_bar(aes(x = year,
               y = ave_nocand2),
           stat = "identity") +
  labs(x = "Election Year", y = "The average number of candidate in each district and in each election") + 
  ggtitle("The average number of candidates in each district (1996-2021）") +
  theme_bw(base_family = "HiraKakuProN-W3")

What you can see from this bar chart ・ The number of candidates in each sigle-member-district has been decreasing over time since 1996 (except in 2009 and 2012)
- Japanese lower house election seems to be getting closer to a two-candidate election

3.3 Modify numeric values into characters

Let’s draw a bar chart of the number of candidates by status (incumbent, challenger, and former-incumbent) in Japanese lower house elections (1996-2021)
You need the following two data:

status: (challenger = 0, incumbent = 1, former-incumbent = 2)
N : The number of candidates by status

You use the data frame, df
Let’s check status, by using table()

table(df$status)


   0    1    2 
5517 3510  633

You have all data you need to draw a bar chart
x-axis: status
y-axis: N
df contains status, but df does not have N
In drawing a bar chart, you need to map x in geom_bar(aes())
You do not have to map y in geom_bar(aes())
{ggplot2} automatically calculates and shows it in a bar chart
→　You map as geom_bar(aes(x = status))
If you add labs() layer, you can put a label on x-axis and y-axis

df %>% 
  ggplot() +
  geom_bar(aes(x = status)) +
  labs(x = "Candidate status", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

x-axis shows 0, 1, and 2 showing candidate’s status
You need to fix this to challengers, incumbents, and former-incumbents
Using case_when(), you re-code status
Add status_J as a new variable to df and override it as df2

df2 <- df %>%
  mutate(status_J = case_when(status == "0" ~ "challengers",
 status == "1" ~ "incumbents",
 TRUE  ~ "former-incumbents"))

Instead of typing x = status, you need to type x = status_J in mapping

df2 %>% 
  ggplot() +
  geom_bar(aes(x = status_J)) +
  labs(x = "Candidate's status", 
  y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

How to fix the garbled characters in a graph
For Windows users: Add the following R code
theme_bw(base_family = "HiraKakuProN-W3")

For Mac users: Add the following R code
theme_bw(base_family = "HiraKakuProN-W3")

3.4 How to change the order of bars

3.4.1 `factor()` 　　

The bar chart you made in the previous sub-section shows challengers, former-incumbents, and incumbents in order
Suppose you want to change the order to incumbents, challengers, former-incumbents
To do this, you need to change the class of status_J to factor
In using mutate(), you change the class of status_j to factor and then pass it to ggplot() →　Change the order by hand: levels = c("incumbents", "challengers", "former-incumbents")

df2 %>%
mutate(status_J = factor(status_J,
levels = c("incumbents", "challengers", "former-incumbents"))) %>% 
ggplot() +
  geom_bar(aes(x = status_J)) +
  labs(x = "Candidate status", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

Now, you get what you want!

3.4.2 `fct_inorder()` 　　

To use fct_inorder(), you need to install {forcats} 　　
{tidyverse} includes {forcats}
If you install {tidyverse}, then you do not have to install {forcats}.
fct_inorder() change the class of variables in the parenthesis to factor and put them in order
Let’s check order of the value in status_J

head(df2$status_J)

[1] "incumbents"        "former-incumbents" "incumbents"       
[4] "challengers"       "challengers"       "challengers"

The order of the value in status_J is incumbents, former-incumbent, challengers
If you want to fix the order of value in status_J, then you use fct_inorder()

df2 %>%
mutate(status_J = fct_inorder(status_J)) %>% 
  ggplot() +
  geom_bar(aes(x = status_J)) +
  labs(x = "Candidate status", y = "The number of candidates") + 
  ggtitle("The number of candidates by stasus in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")

3.5 Add another dimention

3.5.1 The number of candidates by party in the 2021 HR election in Japan

Usually, a bar chart has x-axis and y-axis
A bar includes the information of a HR election (x-axis) and the number of candidates (y-axis)
The number of dimentions means the amount of information
This information is the number of arguments within aes()
Let’s draw a bar chart of the number of candidates by party (seito) in the 2021 HR election

df21 <- df %>% 
  filter(year == 2021)

Check the list of parties running for the 2021 HR election in Japan.

unique(df21$seito)

 [1] "自民" "立憲" "Ｎ党" "国民" "維新" "共産" "れい" "無所" "社民" "諸派"
[11] "公明"

The number of parties is 11
Let’s change the class of seito to factor and assign the order of values of seito as you like

How to assign the order of the values in seito
・For instance, suppose you want to arrange, 自民、立憲、共産 in order, you assign as follows:

mutate(seito = factor(seito,
levels = c("自民", "立憲", "共産")))

df21 <- df21 %>%
drop_na(seito) %>%
mutate(seito = factor(seito, 
levels = c("自民", 
 "立憲", 
 "共産", 
 "維新",
 "無所", 
 "Ｎ党", 
 "国民",
 "れい",
 "公明",  
 "諸派", 
 "社民")))

3.5.2 Calulate the number of candidates (`N`)

Calculate the number of candidates (N) by gender and party
→　Put the object name (res_1) on the outcome

 res_1 <- df21 %>%
group_by(seito, gender, wl) %>%  
summarise(N = n())

DT::datatable(res_1)

Let’s visualize the results

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N), 
           stat = "identity") + 
  ggtitle("The number of candidates by party in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3")

This bar chart has the following two information:
x-axis: Party names seito
y-axis: The number of candidates (N)

3.5.2 Add another dimension

`fill = gender`

Let’s add another dimension (gender) to the bar chart above
When you add another dimension, you add an argument within aes()
You can customize surface with fill
You can customize the color of the framer border with color - You can customize the type of the framer border with linetype
You can customize the transparency of a bar with alphs
In the case of a bar chart, you can customize the surface color and the color of the framer border
Let’s map the surface color with fill
Add fill = gender within aes() を追加する

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N, 
               fill = gender), 
           stat = "identity") + 
  ggtitle("The number of candidates by party & gender in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3")

3.5.3 Dodge bars

`position = "dodge"`

The bar chart above is not user friendly in terms of visualization
Let’s dodge bars by adding position = "dodge" within geom_bar()
Note: Assign position = "dodge" OUT OF aes()

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N,
               fill = gender), 
           stat = "identity",
           position = "dodge") + # position = "dodge" is mapped out of aes()
  ggtitle("The number of candidates by party & gender in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3")

3.5.4 Customize the location of legend

`theme(legend.position = "bottom")`

Let’s change the location of legend
theme() enables us to precisely customize the outlook of a graph
To change the location lf legend, you map legend.position = "bottom" within theme()
→　The legend moves to the bottom
The default is "right"
You have thre other options: "top", "left", and "none"

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N, 
fill = gender), 
 stat = "identity",
 position = "dodge") + 
  ggtitle("The number of candidates by party & gender in the 2021 HR Election)") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom")

3.6 `facet_wrap()`

If you want to avoid using too many colors in a graph, you should use facet_wrap()
You add facet_wrap(~ variable name)
Suppose you want to draw a bar chart of the number of candidates (1) by party and (2) by gender

res_1 %>%
  ggplot() +
  geom_bar(aes(x = gender, 
y = N), 
 stat = "identity") +
  ggtitle("The number of candidates by party & gender in the 2021 HR election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  facet_wrap(~ seito)

You can replace gender with seito in mapping aes() and replace seito with gender in mapping facet_wrap() and draw the following bar chart.

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N), 
 stat = "identity") +
  ggtitle("The number of candidates by party & gender in the 2021 HR election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  facet_wrap(~ gender)

3.7 Customize the labels

`theme(axis.text.x = element_text(angle = XX, vjust = X, hjust = X)`

Sometimes, the labels on x-axisare too long overlapping one another
If this is the case, then you can rotate the labels on x-axis so that you can see them without overlapping one another
You add theme(axis.text.x= element_text() as a new layer to get this done
You can adjust an appropriate degrees of rotating the labels with angle = ...
Here, I assign angle = 30

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N), 
 stat = "identity") + 
  facet_wrap(~ gender) + 
  labs(x = "Political Party", 
       y = "The number of candidates") +
  ggtitle("The number of candidates by party & gender in the 2021 HR election") +
  theme_minimal() + 
  theme_bw(base_family = "HiraKakuProN-W3") + 
  theme(axis.text.x= element_text(angle = 30, vjust = 1, hjust = 1),
panel.grid.major.x = element_blank())

3.8 Add a new layer to facet

Map fill = variable name within aes(), then you can add a new layer
First, make a variable elec_result, and then draw a bar chart with ggplot()

res_1 %>%
  mutate(elec_result = if_else(wl == "0", "lose", "win")) %>% 
  ggplot() +
  geom_bar(aes(x = seito, 
               y = N,
               fill = elec_result),  
           stat = "identity") + 
  facet_wrap(~ gender) +
  labs(x = "Political party", y = "The number of candidates") +  
  ggtitle("The number of candidates by party & win/lose in the 2021 HR election") +
  theme_minimal() +
  theme_bw(base_family = "HiraKakuProN-W3") +  
  theme(axis.text.x = element_text(angle = 40, 
                                   vjust = 1,
                                   hjust = 1),
        panel.grid.major.x = element_blank())  +
  theme(legend.position = "bottom")

3.9 How to swich `x-asis` and `y-axis`

When the labels on x-asix is too long overlapping one another, you should simply swich x-asis and y-axis

res_1 %>%
  ggplot() +
  geom_bar(aes(x = N, 
y = seito), 
 stat = "identity") + 
  facet_wrap(~ gender) + 
  labs(x = "The number of candidates", y = "Politial party") + # ラベル修正
  ggtitle("The number of candidates by party & win/lose in the 2021 HR election") +
  theme_minimal() + 
  theme_bw(base_family = "HiraKakuProN-W3")

Another way to swich `x-axis` and `y-axis`: `coord_flip()`

You can swich axes by adding coord_flip()

res_1 %>%
  ggplot() +
  geom_bar(aes(x = seito, 
y = N), 
 stat = "identity") + 
  facet_wrap(~ gender) +
  labs(x = "The number of an", y = "政党") + 
  theme_minimal() + 
  theme_bw(base_family = "HiraKakuProN-W3") +
  coord_flip()

4. Display numbers on bars

`geom_text(aes(label = variable name))`

4.1 A bar chart with numbers displayed

Suppose you want to display numbers on the following bar chart
The number of numbers (values) you want to displays on bars is 11
The number of candidates these 11 parties nominated
Check the data frame res_1

Check how you made res_1

df <- read_csv("data/hr96-21.csv", 
na = ".")

df21 <- df %>% 
  filter(year == 2021) %>% # Select the 2021 HR election data
  drop_na(seito) %>%　　　 # Drop missing values in seito
mutate(seito = factor(seito, # change the class of seito to factor and arrange in order you want 
levels = c("自民", 
 "立憲", 
 "共産", 
 "維新",
 "無所", 
 "Ｎ党", 
 "国民",
 "れい",
 "公明",  
 "諸派", 
 "社民")))

Calculate the number of candidates by party, gender and win/lose and name it res_1

res_1 <- df21 %>%  
group_by(seito, gender, wl) %>%
summarise(N = n())

Display res_1

DT::datatable(res_1)

Using res_1, you calculate the number of candidates each party nominated (= party_cand) and name it res_2

res_2 <- res_1 %>%
  group_by(seito) %>%
  summarize(party_cand = sum(N))

Display res_2

DT::datatable(res_2)

Adding geom_text(aes(label = party_cand), you can display the numbers on bars

res_2 %>%
  ggplot(aes(x = seito, 
y = party_cand)) +
  geom_bar(aes(), 
 stat = "identity",
 position = position_dodge(width = 0.9)) +  
  geom_text(aes(label = party_cand), 
  vjust = 1.2, 
  colour = "yellow", 
  position = position_dodge(width = 0.9),
  size = 3) +
  ggtitle("The number of candidates by party in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom")

4.2 A bar chart with numbers displayed (dodge)

Suppose you want to display numbers on the following bar chart

The number of numbers (values) you want to displays on bars is 22
The 22 values are the number of candidates 11 parties nominated in the 2021 HR election by gender
Check the data frame res_1

DT::datatable(res_1)

Using res_1, you calculate the number of candidates 11 parties nominated by gender (= total) and name it res_3

res_3 <- res_1 %>%
  group_by(seito, gender) %>%
  summarize(total = sum(N))

Display res_3

DT::datatable(res_3)

Adding geom_text(aes(label = total), you can display the numbers on bars

res_3 %>%
  ggplot(aes(x = seito, 
             y = total, 
             fill = gender)) +
  geom_bar(aes(),
           stat = "identity",
           position = position_dodge(width = 0.9)) +
  geom_text(aes(label = total), 
            vjust = 1.2, 
            colour = "white", 
            position = position_dodge(width = 0.9),
            size = 3) +
  ggtitle("The number of candidates by party & gender in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom")

The left side of the bar chart displays the numbers
But, the right side of the bar chart fails to appropriately display the numbers
You need to fix this
Delete vjust = 1.2
Change colour = "white" to colour = "black"

res_3 %>%
  ggplot(aes(x = seito, 
y = total, 
fill = gender)) +
  geom_bar(aes(), 
 stat = "identity",
 position = position_dodge(width = 0.9)) + 
  geom_text(aes(label = total), 
  colour = "black", # Change the color of the number  
  position = position_dodge(width = 0.9),
  size = 3) +
   ggtitle("The number of candidates by party & gender in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom")

4.3 A bar chart with numbers displayed (stacked)

`position = "stack"`

Suppose you want to display numbers on the following bar chart
The number of numbers (values) you want to displays on bars is 22
The 22 values are the number of candidates 11 parties nominated in the 2021 HR election by gender
Check the data frame res_1
Using res_1, you calculate the number of candidates 11 parties nominated by gender (= total) and name it res_4

res_4 <- res_1 %>%
  group_by(seito, gender) %>%
  summarize(total = sum(N))

Display res_4

DT::datatable(res_4)

Adding geom_text(aes(label = total), you can display the numbers on bars

res_4 %>%
  ggplot(aes(x = seito, 
             y = total, 
             fill = gender)) +
  geom_bar(aes(), 
           stat = "identity",
           position = "stack") +  # Change to [position = "stack"]  
  geom_text(aes(label = total),
            colour = "black", 
            position = "stack", # Change to [position = "stack"]  
            size = 3) +
  ggtitle("The number of candidates by party & gender in the 2021 HR Election") +
  theme_bw(base_family = "HiraKakuProN-W3") +
  theme(legend.position = "bottom")

5. How to save a graph

You have two ways of saving a graph you made

Type of format	Extension	File size	Quality
Vector	`.pdf (recommended) .svg`	large	Good
Bitmap	`.png (recommended) .bmp .jpg .jpeg`	small	OK

Let’s save the graph we made (bar_plot_1) with Vector and png format
Using ggsave(), you save a graph made on ggplot2

Step 1:

Name a graph you made, like bar_plot1 (whatever name you like)
Make a folder where you save your graphs, like Figs in you RProject folder

Step 2:

Load {ragg}

library(ragg)

ggsave(filename = "Figs/plot1.png", # Assign where you want to save your graph and name it
  plot= bar_plot1,　 # The name of graph you made  
  width= 6,　# Inch
  height= 3,　# Inch
  dpi = 400, 　# Assign graph resolution 
  device= ragg::agg_png) # Avoid the garbled characters in a graph

The name of the graph is bar_plot1
You put a name on it and save it as plot1.png
Make a folder where you save your graphs and name it as Figs in your RProject folder
You can customize the size of the graph
You can also cusomize the graph resolution, like 400
{ragg} enables us to avoid the garbled characters in a graph

6. Exercise

Q6.1:
In reference to 3.1 Draw a simple bar chart (1), draw a bar chart of the number of male candidates between the lower house elections between 1996 and 2021 in Japan.
Q6.2:
In reference to 3.1 Draw a simple bar chart (1), draw a bar chart of the number of female candidates between the lower house elections between 1996 and 2021 in Japan.
Q6.3:
In reference to 3.5.3 Dodge bars, draw a bar chart of the number of candidates dodged by gender between the lower house elections between 1996 and 2021 in Japan.
Q6.4（Advance）:
In reference to 3.8 Add a new layer to facet, draw a bar chart of the number of winners (wl = 1 & 2) and losers (wl = 0) by party faceted by candidate status (status) in the 2021 HR election in Japan.
・winners include both those who won in a single-member-district (wl = 1) and zombie winners (wl = 2)
Q6.5（Advance）:
In reference to 4.2 A bar chart with numbers displayed (dodge), draw a bar chart of the number of single-member-district winners (wl = 1), zombie winners (wl = 2), and losers (wl = 0) by party faceted by candidate status (status) in the 2021 HR election in Japan.

Reference

Tidy Animated Verbs

宋財泫 (Jaehyun Song)・矢内勇生 (statuki statanai)「私たちのR: ベストプラクティスの探究」

宋財泫「ミクロ政治データ分析実習（2022年度）」

土井翔平（北海道大学公共政策大学院）「Rで計量政治学入門」

矢内勇生（高知工科大学）授業一覧

浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年

浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年

Winston Chang, R Graphics Coo %>% kbook, O’Reilly Media, 2012.

Kieran Healy, DATA VISUALIZATION, Princeton, 2019

Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017

5. ggplot2 (Bar Chart)

Masahiko Asano

2022-09-29