• R packages used in this section
library(DT)
library(gapminder)
library(gghighlight)
library(ggrepel)
library(stargazer)
library(tidyverse)

1. What is bar chart?

  • A histogram and a bar chart look very similar
  • A bar chart visualizes a discrete variable
  • A histogram visualizes a continuous variable
  • A bar chart has lines between bars
  • A histogram does not have lines between bars
Variables Graph Feature
Discrete Bar chart Lines between bars
Continuous Histogram No Lines between bars

Bar chart:
  • x-axis: Election year (1996-2021)
  • y-axis: The average number of candidates in each electoral district
  • For instance, there is no election held between 2017 and 2021
  • Election year looks a number, but it is not treated as a number in bar chart
  • Election year is a number, but the value itself does not mean anything
Histogram:
  • x-axis: Vote share (%) in lower house election in Japan (1996-2021)
  • y-axis: The number of candidates
  • Vote share ranged from 0% to 100%, and there are a number of values, such as 0.1%, 0.01%,….
  • You need to set up bars without lines between them
    → Histogram does not have bars
    → But, each bar contains the following two information:
  1. location in x-axis
  2. height in y-axis

2. Data preparation

The lower house election dara in Japan: 1996-2021 (hr96-21.csv)
・Clidk hr96-21.csv and download to your computer
・Read the election data and name it as df

df <- read_csv("data/hr96-21.csv",
na = ".")  

Check whether you safely read the data

  • hr96_21.csv is a collection of Japanese lower house election data covering 9 national elections (1996, 2000, 2003, 2005, 2009, 2012, 2014, 2017, 2021)
  • Check the name of variables hr contains
names(df)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "wl"            "rank"          "nocand"        "seito"        
 [9] "j_name"        "gender"        "name"          "previous"     
[13] "age"           "exp"           "status"        "vote"         
[17] "voteshare"     "eligible"      "turnout"       "seshu_dummy"  
[21] "jiban_seshu"   "nojiban_seshu"
  • hr has the following 23 variables
variable detail
year Election year (1996-2017)
pref Prefecture
ku Electoral district name
kun Number of electoral district
rank Ascending order of votes
wl 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
nocand Number of candidates in each district
seito Candidate’s affiliated party (in Japanese)
j_name Candidate’s name (Japanese)
name Candidate’s name (English)
previous Previous wins
gender Candidate’s gender:“male”, “female”
age Candidate’s age
exp Election expenditure (yen) spent by each candidate
status 0 = challenger / 1 = incumbent / 2 = former incumbent
vote votes each candidate garnered
voteshare Vote share (%)
eligible Eligible voters in each district
turnout Turnout in each district (%)
castvote Total votes cast in each district
seshu_dummy 0 = Not-hereditary candidates, 1 = hereditary candidate
jiban_seshu Relationship between candidate and his predecessor
nojiban_seshu Relationship between candidate and his predecessor
  • Check the class of variables included in df
str(df)
spec_tbl_df [9,660 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ year         : num [1:9660] 1996 1996 1996 1996 1996 ...
 $ pref         : chr [1:9660] "愛知" "愛知" "愛知" "愛知" ...
 $ ku           : chr [1:9660] "aichi" "aichi" "aichi" "aichi" ...
 $ kun          : num [1:9660] 1 1 1 1 1 1 1 2 2 2 ...
 $ wl           : num [1:9660] 1 0 0 0 0 0 0 1 0 2 ...
 $ rank         : num [1:9660] 1 2 3 4 5 6 7 1 2 3 ...
 $ nocand       : num [1:9660] 7 7 7 7 7 7 7 8 8 8 ...
 $ seito        : chr [1:9660] "新進" "自民" "民主" "共産" ...
 $ j_name       : chr [1:9660] "河村たかし" "今枝敬雄" "佐藤泰介" "岩中美保子" ...
 $ gender       : chr [1:9660] "male" "male" "male" "female" ...
 $ name         : chr [1:9660] "KAWAMURA, TAKASHI" "IMAEDA, NORIO" "SATO, TAISUKE" "IWANAKA, MIHOKO" ...
 $ previous     : num [1:9660] 2 2 2 0 0 0 0 2 0 0 ...
 $ age          : num [1:9660] 47 72 53 43 51 51 45 51 71 30 ...
 $ exp          : num [1:9660] 9828097 9311555 9231284 2177203 NA ...
 $ status       : num [1:9660] 1 2 1 0 0 0 0 1 2 0 ...
 $ vote         : num [1:9660] 66876 42969 33503 22209 616 ...
 $ voteshare    : num [1:9660] 40 25.7 20.1 13.3 0.4 0.3 0.2 32.9 26.4 25.7 ...
 $ eligible     : num [1:9660] 346774 346774 346774 346774 346774 ...
 $ turnout      : num [1:9660] 49.2 49.2 49.2 49.2 49.2 49.2 49.2 51.8 51.8 51.8 ...
 $ seshu_dummy  : num [1:9660] 0 0 0 0 0 0 0 0 1 0 ...
 $ jiban_seshu  : chr [1:9660] NA NA NA NA ...
 $ nojiban_seshu: chr [1:9660] NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   year = col_double(),
  ..   pref = col_character(),
  ..   ku = col_character(),
  ..   kun = col_double(),
  ..   wl = col_double(),
  ..   rank = col_double(),
  ..   nocand = col_double(),
  ..   seito = col_character(),
  ..   j_name = col_character(),
  ..   gender = col_character(),
  ..   name = col_character(),
  ..   previous = col_double(),
  ..   age = col_double(),
  ..   exp = col_double(),
  ..   status = col_double(),
  ..   vote = col_double(),
  ..   voteshare = col_double(),
  ..   eligible = col_double(),
  ..   turnout = col_double(),
  ..   seshu_dummy = col_double(),
  ..   jiban_seshu = col_character(),
  ..   nojiban_seshu = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
  • chr means character and num means numeric
    → No problem

Descriptive statistics: df

library(stargazer)
  • Type {r, results = "asis"} as chunk option
stargazer(as.data.frame(df), 
type ="html",
digits = 2)
Statistic N Mean St. Dev. Min Max
year 9,660 2,007.88 7.68 1,996 2,021
kun 9,660 5.75 5.07 1 25
wl 9,660 0.48 0.67 0 2
rank 9,660 2.65 20.39 1 2,003
nocand 9,660 3.89 1.09 2 9
previous 9,660 1.48 2.46 0 19
age 9,656 51.22 11.13 25 94
exp 6,829 7,551,393.00 5,482,684.00 535 27,462,362
status 9,660 0.49 0.62 0 2
vote 9,660 55,987.87 40,626.34 177 210,515
voteshare 9,660 27.67 19.34 0.10 95.30
eligible 8,785 330,268.30 80,058.87 115,013 495,212
turnout 7,849 62.09 6.53 44.71 83.80
seshu_dummy 8,875 0.12 0.33 0 1

3. Draw a bar chart

3.1 Draw a simple bar chart (1)

  • Let’s draw a simple bar chart of the number of candidates between 1996 and 2021 in Japan’s lower house elections (9 elections)
  • You use two geometric objects, ggplot() and geom_bar()
Geometric object What you can do
ggplot() Prepare a canvas
geom_bar() Draw a bar chart
  • You need the following two data:
  1. year (1996, 2000, 2003,…, 2021)
  2. the number of candidates in each election
  • You use the data frame, df
  • Using table(), check the number of candidates in each election
table(df$year)

1996 2000 2003 2005 2009 2012 2014 2017 2021 
1261 1199 1026  989 1139 1294  959  936  857 
  • You have the two data necessary to draw a bar chart
  • Assign year (x-axis), and the number of candidates (y-axis)
  • df contains year, but df does not have the variable representing the number of candidates
  • In drawing a bar chart, you need to map x in geom_bar(aes())
  • You do not have to map y in geom_bar(aes())
  • {ggplot2} automatically calculates and showsit in a bar chart
    → You map as geom_bar(aes(x = year))
  • If you add labs() layer, you can put a label on x-axis and y-axis
df %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", 
       y = "The number of candidates") + 
  ggtitle("The number of candidates in Japan's lower house elections (1996-2021)") +
  theme_bw(base_family = "HiraKakuProN-W3")