R packages we use in this section
library(broom)
library(ggstance)
library(interplot)
library(margins)
library(msm)
library(patchwork)
library(stargazer)
library(tidyverse)
categorial
or
continous
?Does the impact of campaign expenditure on vote share differ depending on the number of eligible voters?
Marginal Effects ・Marginal effects
are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes
in regression
analysis.
・Marginal effects tells us how a dependent variable (outcome) changes
when a specific independent variable (explanatory variable)
changes.
・Other covariates (= control variables) are assumed to be held
constant.
→ In this case, we hold nocand
constant.
data
folder in your
RProject
folderhr
hr
[1] "year" "pref" "ku" "kun"
[5] "wl" "rank" "nocand" "seito"
[9] "j_name" "gender" "seshu_dummy" "jiban_seshu"
[13] "name" "previous" "...15" "age"
[17] "exp" "status" "vote" "voteshare"
[21] "eligible" "turnout" "castvotes" "...24"
[25] "...25" "nojiban_seshu" "mag"
[1] 1996 2000 2003 2005 2009 2012 2014 2017 2021
df1 <- hr %>%
dplyr::filter(year == 2005) %>%
dplyr::select(year, age, voteshare, exp, eligible, nocand, previous)
df1
contains the following 6 variables
variable | detail |
---|---|
year | Election Year |
age | Candidate’s age |
voteshare | Voteshare (%) |
exp | Election expenditure (yen) spent by each candidate |
eligible | Eligible voters in each district |
nocand | Number of candidates in each district |
previous | Number of previouss served as a lower house member |
exppv
)exppv
show campaign expenditure spent by each candidate
per voter in their electoral district.eligible.t
)eligible.t
show the number of eligible voters by
thousanddf1
Statistic | N | Mean | St. Dev. | Min | Max |
year | 989 | 2,005.000 | 0.000 | 2,005 | 2,005 |
age | 989 | 50.292 | 10.871 | 25 | 81 |
voteshare | 989 | 30.333 | 19.230 | 0.600 | 73.600 |
exp | 985 | 8,142,244.000 | 5,569,641.000 | 62,710 | 24,649,710 |
eligible | 989 | 344,654.300 | 63,898.230 | 214,235 | 465,181 |
nocand | 989 | 3.435 | 0.740 | 2 | 6 |
previous | 989 | 1.550 | 2.412 | 0 | 15 |
exppv | 985 | 24.627 | 17.907 | 0.148 | 89.332 |
eligible.t | 989 | 344.654 | 63.898 | 214.235 | 465.181 |
F1 <- ggplot(df1, aes(exppv, voteshare)) +
geom_point() +
labs(x = "Campaign expenditure per voter (yen)", y = "vote share(%)",
title = "Campagin expenditure and vote share") +
stat_smooth(method = lm, se = FALSE)
F2 <- ggplot(df1, aes(eligible.t, voteshare)) +
geom_point() +
labs(x = "Eligible Voters (thousands)", y = "vote share(%)",
title = "Eligible Voters and vote share") +
stat_smooth(method = lm, se = FALSE)
exppv
and
voteshare
eligible.t
and voteshare
age
, nocand
, and previous
are
control variablesNote ・A control
variable is anything that is held constant or limited in a
research study.
・It’s a variable that is not of interest to our research aims, but is
controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it
is reasonable for us to assume other variables (such as age of
candidates, campaign expenditure, candidate’s gender, etc.) which could
influence the votes share.
・Here I add three control variables; age
,
nocand
, andprevious.
To see interaction effects, we make an interaction term by
multiplying the major independent variable (in this case,
exppv
) and a numerical moderator variable (in this case,
eligible.t
): eligible.t:exppv
We make an interaction term by multiplying the following two independent variables:
exppv
)eligible.t
)Note ・A moderator variable can be
two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable
(numeric).
・In 20. Multiple Regression 3 (Interaction Effects 1)
, I
will deal with a moderator variable (categorical).
Model_1
eligible.t
)
influences the relationship between an explanatory and outcome
variable.The slope (exppv
→ voteshare
) differs
bdepending on the number of eligible.t voters.
This slope is equivalent to Marginal Effect
We estimate the following model:
\[voteshare = α_1 + α_2 exppv + α_3 eligible.t + α_{4} eligible.t:exppv +\\ α_{5} age + α_{6} nocand + α_{7} previous + ε \]
\[voteshare = α_1 + (α_2 + α_{4} eligible.t) exppv + α_3 eligible.t\\ + α_{5}age + α_{6} nocand + α_{7} previous +ε\]
\(\alpha_2\) | vote share (%) of a candidate when eligible.t = 0 |
\((\alpha_2 + \alpha_4 \textrm{eligible.t})\) | slope (exppv → voteshare )= Marginal Effect |
Does the impact of campaign expenditure on vote share differ depending on the number of eligible.t voters?
Marginal Effects ・Marginal effects
are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes
in regression
analysis.
・Marginal effects tells us how a dependent variable (outcome) changes
when a specific independent variable (explanatory variable)
changes.
・Other covariates (= control variables) are assumed to be held
constant.
→ In this case, we hold age
, nocand
, and
previous
constant.
tidy()
# A tibble: 7 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 38.7 3.95 9.80 1.12e-21
2 exppv 0.0682 0.0972 0.702 4.83e- 1
3 eligible.t 0.00536 0.00914 0.586 5.58e- 1
4 age -0.277 0.0355 -7.81 1.50e-14
5 nocand -4.88 0.470 -10.4 4.86e-24
6 previous 2.96 0.184 16.1 5.28e-52
7 exppv:eligible.t 0.00175 0.000302 5.78 9.93e- 9
stargazer()
stargazer(model_1,
digits = 3,
style = "ajps",
title = "Results of model_1 (2005HR election)",
type ="html")
voteshare | |
exppv | 0.068 |
(0.097) | |
eligible.t | 0.005 |
(0.009) | |
age | -0.277*** |
(0.035) | |
nocand | -4.881*** |
(0.470) | |
previous | 2.960*** |
(0.184) | |
exppv:eligible.t | 0.002*** |
(0.0003) | |
Constant | 38.680*** |
(3.949) | |
N | 985 |
R-squared | 0.691 |
Adj. R-squared | 0.689 |
Residual Std. Error | 10.706 (df = 978) |
F Statistic | 364.721*** (df = 6; 978) |
p < .01; p < .05; p < .1 |
model_1
, let’s make another model which
does not include the interaction term (model_2
) and compare
them to have a better understanding of the results.stargazer()
stargazer(model_1, model_2,
digits = 3,
style = "ajps",
title = "Results of model_1 and model_2(2005HR election)",
type ="html")
voteshare | ||
Model 1 | Model 2 | |
exppv | 0.068 | 0.614*** |
(0.097) | (0.024) | |
eligible.t | 0.005 | 0.047*** |
(0.009) | (0.006) | |
age | -0.277*** | -0.279*** |
(0.035) | (0.036) | |
nocand | -4.881*** | -4.909*** |
(0.470) | (0.478) | |
previous | 2.960*** | 3.130*** |
(0.184) | (0.184) | |
exppv:eligible.t | 0.002*** | |
(0.0003) | ||
Constant | 38.680*** | 25.169*** |
(3.949) | (3.235) | |
N | 985 | 985 |
R-squared | 0.691 | 0.681 |
Adj. R-squared | 0.689 | 0.679 |
Residual Std. Error | 10.706 (df = 978) | 10.882 (df = 979) |
F Statistic | 364.721*** (df = 6; 978) | 417.161*** (df = 5; 979) |
p < .01; p < .05; p < .1 |
# A tibble: 7 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 38.7 3.95 9.80 1.12e-21
2 exppv 0.0682 0.0972 0.702 4.83e- 1
3 eligible.t 0.00536 0.00914 0.586 5.58e- 1
4 age -0.277 0.0355 -7.81 1.50e-14
5 nocand -4.88 0.470 -10.4 4.86e-24
6 previous 2.96 0.184 16.1 5.28e-52
7 exppv:eligible.t 0.00175 0.000302 5.78 9.93e- 9
Important point The results of Model_1 is the results when eligible.t
= 0
→ Since eligible.t = 0 is unrealistic, we need to check marginal
efffects when the number of eligible.t voters varies.
The coefficient of exppv
(0.068) in model_1
→ when exppv
increased one yen, voteshare
increased by 0.068 percentage points when eligible.t = 0.
To confirm this, let’s check the Sample Regression Function equations for model_1.
\[\widehat{voteshare}\ = 38.7 + 0.068exppv
+ 0.002eligible:exppv + 0.005eligible.t \\ - 0.28age - 4.9 nocand +
2.96previous\]
\[= 38.7 + (0.068 + 0.002eligible)exppv +
0.005eligible - 0.28age \\- 4.9 nocand + 2.96previous\]
exppv
on voteshare
\((α_2 + α_4eligible)\) is:\[{0.068 + 0.002eligible}\]
exppv
→ voteshare
)when the number of
eligible.t voter increases by one unit (which means one thousand
voters), then vote share increases by 0.068 percentage points (0.068 +
0.002*1 = 0.07)eligible.t
intercept
and coefficients
of
model_1
(Intercept) exppv eligible.t age
38.680300268 0.068168095 0.005359147 -0.276786422
nocand previous exppv:eligible.t
-4.880536533 2.959739246 0.001746856
eligible.t
) Min. 1st Qu. Median Mean 3rd Qu. Max.
214.2 297.4 347.9 344.7 397.2 465.2
eligible.t
between its min (214.2) and
max (465.2) with the interval of 50.exppv
) and the 7th
coefficients (exppv:eligible
) to calculate these marginal
effects.at.eligible.t <- seq(214.2, 465.2, 50)
slopes <- model_1$coef[2] + model_1$coef[7]*at.eligible.t
slopes
[1] 0.4423446 0.5296874 0.6170302 0.7043730 0.7917158 0.8790586
delta method
, we calculate
standard error
with 95% confidence intervals for each
marginal effects.estmean <- coef(model_1)
var <- vcov(model_1)
SEs <- rep(NA, length(at.eligible.t))
for (i in 1:length(at.eligible.t)){
j <- at.eligible.t[i]
SEs[i] <- deltamethod (~ (x2) + (x7)*j, estmean, var)
}
upper <- slopes + 1.96*SEs # 5% significant level
lower <- slopes - 1.96*SEs # 5% significant level
cbind(at.eligible.t, slopes, upper, lower)
at.eligible.t slopes upper lower
[1,] 214.2 0.4423446 0.5161208 0.3685684
[2,] 264.2 0.5296874 0.5833488 0.4760261
[3,] 314.2 0.6170302 0.6625217 0.5715388
[4,] 364.2 0.7043730 0.7592613 0.6494847
[5,] 414.2 0.7917158 0.8672757 0.7161559
[6,] 464.2 0.8790586 0.9798515 0.7782658
at.eligible.t
shows the 6 values between mim (214.2)
and max (465.2) with the intervals of 50
slopes
is the marginal effects of exppv
on voteshare
The value, (0.4854664), which is
located between [2, ]
and slopes
→ The marginal effect of exppv
on voteshare
when eligible.t = 214.2
The value, (0.5636534), which is
located between [3, ]
and slopes
→ The marginal effect of exppv
on voteshare
when eligible.t = 314.2
upper
and lower
show the upper & the
lower bound on 95% confidence intervals.→ This is important information when you check their statistical significance.
msm_1
at.eligible.t slopes upper lower
1 214.2 0.4423446 0.5161208 0.3685684
2 264.2 0.5296874 0.5833488 0.4760261
3 314.2 0.6170302 0.6625217 0.5715388
4 364.2 0.7043730 0.7592613 0.6494847
5 414.2 0.7917158 0.8672757 0.7161559
6 464.2 0.8790586 0.9798515 0.7782658
exppv
on
voteshare
for the size of eligible votersmsm_1 <- msm_1 %>%
ggplot(aes(x = at.eligible.t,
y = slopes,
ymin = lower,
ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(alpha = 0.5, fill = "gray") +
geom_line() +
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
ylim(0, 1)
msm_1
Conclusion ・As the number of
eligible voter increases, the impact of campaign expenditure on vote
share increases.
・This is statistically significant with the 5% significant level.
msm package
is one way of visualize the results
on marginal effects.interplot
interplot_1 <- interplot(m = model_1,
var1 = "exppv", # Major independent variable
var2 = "eligible.t") + # Moderator variable
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
geom_hline(yintercept = 0, linetype = 2) +
ylim(0, 1)
interplot_1
margins
margins_1 <- cplot(model_1,
x = "eligible.t", # Moderator variable
dx = "exppv", # Major independent variable
what = "effect", # Marginal Effects
n = 6, # Number assigned
draw = FALSE)
margins_1
xvals yvals upper lower factor
214.2350 0.4424 0.5162 0.3686 exppv
264.4242 0.5301 0.5837 0.4765 exppv
314.6134 0.6178 0.6632 0.5723 exppv
364.8026 0.7054 0.7605 0.6503 exppv
414.9918 0.7931 0.8690 0.7172 exppv
465.1810 0.8808 0.9821 0.7795 exppv
margins_1 <- margins_1 %>%
ggplot(aes(x = xvals, y = yvals, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(alpha = 0.5, fill = "gray") +
geom_line() +
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
ylim(0, 1)
margins_1
exppv
) on vote share (voteshare
) differ as
the number of winning (previous
) for candidates in the
2012 lower house election in Japan.hr96-21.csv
and
select the following 5 variables from this dataset and analyze
them.variable | detail |
---|---|
voteshare | Voteshare (%) |
exppv | Campaign expenditure spent by each candidate per voter in their electoral district (yen) |
age | Candidate’s age |
nocand | Number of candidates in each district |
previous | Number of candidates in each district |
exppv
is not included in the dataset, so
create it yourself using the two variables exp
and
eligible
in hr96-21.csv
.Q1: Display the descriptive statistics for the above variables.
Q2: Display a scatter plot of election expenses and vote share, and briefly comment on it.
Q3: Display a scatter plot of the number of times elected and vote share, and briefly comment on it.
Q4: State your hypothesis regarding whether the impact of election expenses on vote share in the House of Representatives election is related to the number of times elected. Also, briefly state the reasons for your hypothesis.
Q5: Display the results of the following two multiple regression analyses using the stargazer package.
model_5 <- lm(voteshare ~ exppv*previous + age + nocand,
data = df3)
model_6 <- lm(voteshare ~ exppv + previous + age + nocand,
data = df3)
Q6: Using the interplot
package,
illustrate the impact of election expenses on vote share. Display the
results in an easily understandable graph and describe the analysis
results.