Load the following packages
library(DT)
library(gapminder)
library(gghighlight)
library(stargazer)
library(tidyverse)
Line Graph
We are going to deal with the following four cases:
gapminder
)・gapminder
is a built-in data set in R
(1) year |
: 1952-2007 (every 5 years) |
(2) lifeExp |
: Life expectancy |
(3) country |
: Country name |
(4) continent |
: Continent name |
(5) pop |
: Population |
(6) gdpPercap |
: GDP per capita |
gapminder
library(gapminder)
glimpse
functiondata(gapminder)
glimpse(gapminder)
Rows: 1,704
Columns: 6
$ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
$ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
$ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
$ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
・Show the summary of gapminder
summary(gapminder)
country continent year lifeExp
Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
Algeria : 12 Asia :396 Median :1980 Median :60.71
Angola : 12 Europe :360 Mean :1980 Mean :59.47
Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
Australia : 12 Max. :2007 Max. :82.60
(Other) :1632
pop gdpPercap
Min. :6.001e+04 Min. : 241.2
1st Qu.:2.794e+06 1st Qu.: 1202.1
Median :7.024e+06 Median : 3531.8
Mean :2.960e+07 Mean : 7215.3
3rd Qu.:1.959e+07 3rd Qu.: 9325.5
Max. :1.319e+09 Max. :113523.1
・Show the list of country
unique(gapminder$country)
[1] Afghanistan Albania Algeria
[4] Angola Argentina Australia
[7] Austria Bahrain Bangladesh
[10] Belgium Benin Bolivia
[13] Bosnia and Herzegovina Botswana Brazil
[16] Bulgaria Burkina Faso Burundi
[19] Cambodia Cameroon Canada
[22] Central African Republic Chad Chile
[25] China Colombia Comoros
[28] Congo, Dem. Rep. Congo, Rep. Costa Rica
[31] Cote d'Ivoire Croatia Cuba
[34] Czech Republic Denmark Djibouti
[37] Dominican Republic Ecuador Egypt
[40] El Salvador Equatorial Guinea Eritrea
[43] Ethiopia Finland France
[46] Gabon Gambia Germany
[49] Ghana Greece Guatemala
[52] Guinea Guinea-Bissau Haiti
[55] Honduras Hong Kong, China Hungary
[58] Iceland India Indonesia
[61] Iran Iraq Ireland
[64] Israel Italy Jamaica
[67] Japan Jordan Kenya
[70] Korea, Dem. Rep. Korea, Rep. Kuwait
[73] Lebanon Lesotho Liberia
[76] Libya Madagascar Malawi
[79] Malaysia Mali Mauritania
[82] Mauritius Mexico Mongolia
[85] Montenegro Morocco Mozambique
[88] Myanmar Namibia Nepal
[91] Netherlands New Zealand Nicaragua
[94] Niger Nigeria Norway
[97] Oman Pakistan Panama
[100] Paraguay Peru Philippines
[103] Poland Portugal Puerto Rico
[106] Reunion Romania Rwanda
[109] Sao Tome and Principe Saudi Arabia Senegal
[112] Serbia Sierra Leone Singapore
[115] Slovak Republic Slovenia Somalia
[118] South Africa Spain Sri Lanka
[121] Sudan Swaziland Sweden
[124] Switzerland Syria Taiwan
[127] Tanzania Thailand Togo
[130] Trinidad and Tobago Tunisia Turkey
[133] Uganda United Kingdom United States
[136] Uruguay Venezuela Vietnam
[139] West Bank and Gaza Yemen, Rep. Zambia
[142] Zimbabwe
142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe
・We need the following three variables to draw the graph
(1) year |
: 1952-2007 (every 5 years) |
(2) lifeExp |
: Life expectancy |
(3) country |
: Country name |
・Select the data for Japanese and name the data frame Japan
<- gapminder %>%
Japan ::filter(country == "Japan") %>%
dplyr::select(year, lifeExp)
dplyr
Japan
# A tibble: 12 x 2
year lifeExp
<int> <dbl>
1 1952 63.0
2 1957 65.5
3 1962 68.7
4 1967 71.4
5 1972 73.4
6 1977 75.4
7 1982 77.1
8 1987 78.7
9 1992 79.4
10 1997 80.7
11 2002 82
12 2007 82.6
・Draw the graph for Japanese life expectancy
ggplot(Japan, aes(x = year, y = lifeExp)) +
geom_point() +
geom_line()
・Select the data for Japanese and Chinese and name the data frame jpn.chi
<- gapminder %>%
jpn.chi ::filter(country == "China" | country == "Japan") %>%
dplyr::select(year, country, lifeExp)
dplyr
jpn.chi
# A tibble: 24 x 3
year country lifeExp
<int> <fct> <dbl>
1 1952 China 44
2 1957 China 50.5
3 1962 China 44.5
4 1967 China 58.4
5 1972 China 63.1
6 1977 China 64.0
7 1982 China 65.5
8 1987 China 67.3
9 1992 China 68.7
10 1997 China 70.4
# … with 14 more rows
jpn.chi
::datatable(jpn.chi) DT
ggplot(jpn.chi, aes(x = year, y = lifeExp, color = country)) +
geom_point() +
geom_line() +
labs(x = "Year", y = "Life Expectanct",
title = "Averaged Life Expectanct (1952-2007): Japanese and Chinese")
Question:
Use the bulit-in data setgapminder
, select the two countries you like and draw a line graph showing life expectancy for the two countries.
hr96-17.csv
)data
within your RProjct folderhr96-17.csv
into data
tidyverse
package to read csv.filetidyverse
library(tidyverse)
hr
<- read_csv("data/hr96-17.csv",
hr na = ".") # replace missing data with "."
hr
names(hr)
[1] "year" "pref" "ku" "kun"
[5] "mag" "rank" "wl" "nocand"
[9] "seito" "j_name" "name" "term"
[13] "gender" "age" "exp" "status"
[17] "vote" "voteshare" "eligible" "turnout"
[21] "castvotes" "seshu_dummy" "jiban_seshu" "nojiban_seshu"
variable | detail |
---|---|
year | Election year (1996-2017) |
age | Candidate’s age |
wl | 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner |
seito | Candidate’s affiliated party |
age.median
age.median
: Winner’s median age for the LDP and DPJ by electionsが必要(1) year |
: Election year (1996-2017) |
(2) age.median |
: Candidate’s median age |
(3) wl | 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner |
(4) seito | Candidate’s affiliated party |
age.median
is not included in the date frame hr
# | year | age.median | seito |
1 | 1996 | numeric | LDP(自民) |
・ | ・ | ・ | ・ |
・ | ・ | ・ | ・ |
7 | 2017 | numeric | LDP(自民) |
8 | 1996 | numeric | DPJ(民主) |
・ | ・ | ・ | ・ |
・ | ・ | ・ | ・ |
14 | 2014 | numeric | DPJ(民主) |
・Note: the DPJ was dissolved just prior to the 2017 lower house election
age.median
Using dplyr
package
・Check the variable, age prior to calculation
summary(hr$age)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
25.0 43.0 51.0 50.9 59.0 94.0 4
<- hr %>% # Save as seito.median
seito.median ::filter(seito == "自民" | seito == "民主") %>% # 自民党と民主党だけを残す
dplyr::filter(wl == 1) %>% # we only need winners(wl = 1)
dplyrdrop_na(age) %>% # Delete the 5 missing valus (NA's)
group_by(year, seito) %>% # Calculate the median by year & by seito
summarise(age.median = median(age))
::datatable(seito.median) DT
・We got the data needed to draw the line graph
ggplot(seito.median, aes(x = year, y = age.median,
color = seito, linetype = seito, shape = seito)) +
geom_point() +
geom_line() +
lims(y = c(40, 60)) + # y 軸の範囲を指定
ggtitle("Winner's Median Age by Party: 1996-2017") +
scale_color_discrete(name ="Party",
labels = c("DPJ", "LDP")) +
scale_linetype_discrete(name ="Party",
labels = c("DPJ", "LDP")) +
scale_shape_discrete(name ="Party",
labels = c("DPJ", "LDP")) +
ggtitle("Winner's Median Age by Party: 1996-2017") +
labs(x = "Election Year", y = "Median Age") +
theme_bw(base_family = "HiraKakuProN-W3")
Question 1:
Using the Japanese election data hr96-17.csv), draw a line graph of Winner’s voteshare (median) between 1996 and 2017 for the LDP and the DPJ.
Question 2
: What could you say about the line graph?
(1) year |
: Election year (1996-2017) |
(2) voteshare | Voteshare (%) |
(3) wl | 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner |
(4) seito | Candidate’s affiliated party |
congress.csv
)DW-NOMINATE score
dwnom1 |
: Economic Issue ・・・ -1 (liberal) 〜 1 (consevative) |
dwnom2 |
: Race Issue ・・・ -1 (liberal) 〜 1 (conservative) |
Using Economic dimention (dwnom1), draw a graph of median by party
econ.median
(1) congress |
: Congressional session number (80-112) |
(2) econ.median |
: Politician’s most ideal median points on economy |
(3) party |
: Party of the congressional representative |
econ.median
is not included in the date framedata
congress.csv
in the data
foldercongress.csv
, we need to load readr
package, which is included in tidyverse
packagelibrary(tidyverse)
US
<- read_csv("data/congress.csv") US
summary(US$congress)
Min. 1st Qu. Median Mean 3rd Qu. Max.
80.00 88.00 96.00 96.01 104.00 112.00
NA
)<- US %>%
US ::filter(party == "Republican" | party == "Democrat") %>%
dplyrgroup_by(congress, party) %>%
summarise(econ.median = median(dwnom1))
::datatable(US) DT
Draw a line graph
ggplot(US, aes(x = congress, y = econ.median, color = party)) +
geom_point() +
geom_line() +
theme_bw() +
ggtitle("Political Polarization: ECON Dimention(US Congress:1947-2012)") +
labs(x = "Congress",
y = "DW-NOMINATE score (economic dimention)")
Conclusion ・We can see how the economy score changes over time
・ We can see a clear polarization between two parties
・ The political polarization started around the 95th Congress session
・ Recently, the Democrat becomes more liberal (-) and the Republican becomes more conservative (+)
Question 1:
Using the variable dwnom2
, draw a line graph to check whether we can see any Race polarization in the US Congress (the 80th - the 112nd) between the Democrat and the Republican.
Question 2:
What could you say about the US polarization in Race issue?
DW-NOMINATE score
dwnom1 |
: Economic Issue ・・・ -1 (liberal) 〜 1 (consevative) |
dwnom2 |
: Race Issue ・・・ -1 (liberal) 〜 1 (conservative) |
COVID19_Worldwide.csv
)Variables | Details |
---|---|
ID | ID |
Country | Country name |
Date | Date survey conducted |
Confirmed_Day | The number of people infected COVID-19 per day |
Confirmed_Total | The cummulated number of people infected COVID-19 (total) |
Death_Day | The number of death of COVID-19 per day |
Death_Total | The cummulated number of people died of COVID-19 (total) |
Test_Day | The number of people newly tested COVID-19 per day |
Test_Total | The cummulated number of people tested COVID-19 (total) |
<- read_csv("data/COVID19_Worldwide.csv",
COVID19_df guess_max = 10000)
# Set to judge the class after reading the first 10000 cases
str(COVID19_df)
spec_tbl_df [31,806 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ID : num [1:31806] 1 2 3 4 5 6 7 8 9 10 ...
$ Country : chr [1:31806] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
$ Date : chr [1:31806] "2020/1/22" "2020/1/23" "2020/1/24" "2020/1/25" ...
$ Confirmed_Day : num [1:31806] 0 0 0 0 0 0 0 0 0 0 ...
$ Confirmed_Total: num [1:31806] 0 0 0 0 0 0 0 0 0 0 ...
$ Death_Day : num [1:31806] 0 0 0 0 0 0 0 0 0 0 ...
$ Death_Total : num [1:31806] 0 0 0 0 0 0 0 0 0 0 ...
$ Test_Day : num [1:31806] NA NA NA NA NA NA NA NA NA NA ...
$ Test_Total : num [1:31806] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. ID = col_double(),
.. Country = col_character(),
.. Date = col_character(),
.. Confirmed_Day = col_double(),
.. Confirmed_Total = col_double(),
.. Death_Day = col_double(),
.. Death_Total = col_double(),
.. Test_Day = col_double(),
.. Test_Total = col_double()
.. )
chr
, we need to change it into Date
<- COVID19_df %>%
COVID19_df mutate(Date = as.Date(Date))
library(stargazer)
{r, results = "asis"}
as chunk optionstargazer(as.data.frame(COVID19_df),
type ="html",
digits = 2)
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
ID | 31,806 | 16,124.51 | 9,379.38 | 1 | 7,952.2 | 24,235.8 | 32,413 |
Confirmed_Day | 31,806 | 392.96 | 2,430.30 | -10,034 | 0 | 58 | 66,627 |
Confirmed_Total | 31,806 | 18,250.14 | 115,471.60 | 0 | 0 | 2,505.8 | 3,184,582 |
Death_Day | 31,806 | 17.61 | 112.88 | -1,918 | 0 | 1 | 2,614 |
Death_Total | 31,806 | 1,039.01 | 6,565.51 | 0 | 0 | 55 | 134,094 |
Test_Day | 8,281 | 14,332.71 | 61,911.94 | -3,743.00 | 839.00 | 9,041.00 | 2,022,722.00 |
Test_Total | 8,686 | 559,831.30 | 2,146,252.00 | 1.00 | 21,545.25 | 353,916.50 | 39,011,749.00 |
::datatable(COVID19_df) DT
Date
Death_Total
)Death_Total
%>%
COVID19_df ggplot() +
geom_line(aes(x = Date,
y = Death_Total)) +
scale_y_continuous(breaks = c(10, 100, 1000, 10000, 100000, 1000000),
labels = c("10", "100", "1000", "10000",
"100000", "1000000"),
trans = "log10") +
labs(x = "Month", y = "Cummurated number death", color = "Country")
Date
(x-axis) has overlappping valuesunique(COVID19_df$Country)
[1] "Afghanistan" "Albania"
[3] "Algeria" "Andorra"
[5] "Angola" "Antigua and Barbuda"
[7] "Argentina" "Armenia"
[9] "Australia" "Austria"
[11] "Azerbaijan" "Bahamas"
[13] "Bahrain" "Bangladesh"
[15] "Barbados" "Belarus"
[17] "Belgium" "Belize"
[19] "Benin" "Bhutan"
[21] "Bolivia" "Bosnia and Herzegovina"
[23] "Botswana" "Brazil"
[25] "Brunei" "Bulgaria"
[27] "Burkina Faso" "Burma"
[29] "Burundi" "Cabo Verde"
[31] "Cambodia" "Cameroon"
[33] "Canada" "Central African Republic"
[35] "Chad" "Chile"
[37] "China" "Colombia"
[39] "Comoros" "Congo (Brazzaville)"
[41] "Congo (Kinshasa)" "Costa Rica"
[43] "Cote d'Ivoire" "Croatia"
[45] "Cuba" "Cyprus"
[47] "Czechia" "Denmark"
[49] "Djibouti" "Dominica"
[51] "Dominican Republic" "Ecuador"
[53] "Egypt" "El Salvador"
[55] "Equatorial Guinea" "Eritrea"
[57] "Estonia" "Eswatini"
[59] "Ethiopia" "Fiji"
[61] "Finland" "France"
[63] "Gabon" "Gambia"
[65] "Georgia" "Germany"
[67] "Ghana" "Greece"
[69] "Grenada" "Guatemala"
[71] "Guinea" "Guinea-Bissau"
[73] "Guyana" "Haiti"
[75] "Holy See" "Honduras"
[77] "Hungary" "Iceland"
[79] "India" "Indonesia"
[81] "Iran" "Iraq"
[83] "Ireland" "Israel"
[85] "Italy" "Jamaica"
[87] "Japan" "Jordan"
[89] "Kazakhstan" "Kenya"
[91] "South Korea" "Kosovo"
[93] "Kuwait" "Kyrgyzstan"
[95] "Laos" "Latvia"
[97] "Lebanon" "Lesotho"
[99] "Liberia" "Libya"
[101] "Liechtenstein" "Lithuania"
[103] "Luxembourg" "Madagascar"
[105] "Malawi" "Malaysia"
[107] "Maldives" "Mali"
[109] "Malta" "Mauritania"
[111] "Mauritius" "Mexico"
[113] "Moldova" "Monaco"
[115] "Mongolia" "Montenegro"
[117] "Morocco" "Mozambique"
[119] "Namibia" "Nepal"
[121] "Netherlands" "New Zealand"
[123] "Nicaragua" "Niger"
[125] "Nigeria" "North Macedonia"
[127] "Norway" "Oman"
[129] "Pakistan" "Panama"
[131] "Papua New Guinea" "Paraguay"
[133] "Peru" "Philippines"
[135] "Poland" "Portugal"
[137] "Qatar" "Romania"
[139] "Russia" "Rwanda"
[141] "Saint Kitts and Nevis" "Saint Lucia"
[143] "Saint Vincent and the Grenadines" "San Marino"
[145] "Sao Tome and Principe" "Saudi Arabia"
[147] "Senegal" "Serbia"
[149] "Seychelles" "Sierra Leone"
[151] "Singapore" "Slovakia"
[153] "Slovenia" "Somalia"
[155] "South Africa" "South Sudan"
[157] "Spain" "Sri Lanka"
[159] "Sudan" "Suriname"
[161] "Sweden" "Switzerland"
[163] "Syria" "Taiwan"
[165] "Tajikistan" "Tanzania"
[167] "Thailand" "Timor-Leste"
[169] "Togo" "Trinidad and Tobago"
[171] "Tunisia" "Turkey"
[173] "Uganda" "Ukraine"
[175] "United Arab Emirates" "United Kingdom"
[177] "United States" "Uruguay"
[179] "Uzbekistan" "Venezuela"
[181] "Vietnam" "West Bank and Gaza"
[183] "Western Sahara" "Yemen"
[185] "Zambia" "Zimbabwe"
<- c("Cananda", "France", "Germany", "Italy", "Japan",
G7 "United Kingdom", "United States")
%>%
COVID19_df filter(Country %in% G7) %>%
ggplot() +
geom_line(aes(x = Date,
y = Death_Total,
color = Country)) +
scale_y_continuous(breaks = c(10, 100, 1000, 10000, 100000, 1000000),
labels = c("10", "100", "1000", "10000",
"100000", "1000000"),
trans = "log10") +
labs(x = "Month", y = "Cummurated number death", color = "Country")
fct_reorder2()
function%>%
COVID19_df filter(Country %in% G7) %>%
mutate(Country = fct_reorder2(Country,
Date,
Death_Total, %>%
last2)) ggplot() +
geom_line(aes(x = Date,
y = Death_Total,
color = Country)) +
scale_y_continuous(breaks = c(10, 100, 1000, 10000, 100000, 1000000),
labels = c("10", "100", "1000", "10000",
"100000", "1000000"),
trans = "log10") +
labs(x = "Month", y = "Cummurated number death", color = "Country")
- It gets much better!
gglighlight
package, you can put emphasis on certain coutrieslibrary(gghighlight)
%>%
COVID19_df ggplot() +
geom_line(aes(x = Date,
y = Death_Total,
color = Country)) +
gghighlight(Country %in% c("Japan", "China", "South Korea",
"United States", "Taiwan")) +
scale_y_continuous(breaks = c(10, 100, 1000, 10000, 100000, 1000000),
labels = c("10", "100", "1000", "10000",
"100000", "1000000"),
trans = "log10") +
labs(x = "Month", y = "Cummurated number death", color = "Country")
Question 1:
Pick up 5 countries you like and draw a line graph of cummurated number of death and date, highliting them.
Question 2:
What could you say about the ling graph you get?