R packages we use in this section
library(tidyverse)
library(stargazer)
library(margins)
library(interplot)
library(msm)
library(patchwork)
library(jtools)
ldp
), then we can see the difference in outcome
variable (voteshare
) between LDP candidates
(ldp = 1
) and non-LDP candidates(ldp = 0
) as
shown in the graph on the left.Interaction effects indicate that a third variable (in this
example, ldp
) influences the relationship between an
explanatory and outcome variable.
When we include an interaction term
(ldp:exppv
), then we can see the difference in
slope (exppv
→ voteshare
) between LDP
candidates (ldp = 1
) and non-LDP
candidates(ldp = 0
) as shown in the graph on the
right.
This type of effect makes the model more complex, but if the real world actually behaves this way, it is critical to incorporate it in your model.
nocand
is control variableNote ・A control
variable is anything that is held constant or limited in a
research study.
・It’s a variable that is not of interest to our research aims, but is
controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it
is reasonable for us to assume other variables (such as age of
candidates, campaign expenditure, candidate’s gender, etc.) which could
influence the votes share.
・Here I only add nocand
as control variable.
To see interaction effects, we make an interaction term by
multiplying the major independent variable (in this case,
exppv
) and a categorical moderator variable (in this case,
ldp
): ldp:exppv
We make an interaction term by multiplying the following two independent variables:
exppv
)ldp
)Note ・A moderator variable can be
two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable
(categorical).
・In 21. Multiple Regression 4 (Interaction Effects 2)
, I
will deal with a moderator variable (continuous).
ldp
)
influences the relationship between an explanatory and outcome
variable.The slope (exppv → voteshare) differs between LDP and non-LDP candidates.
This slope is equivalent to Marginal Effect
We estimate the following model:
\[\mathrm{{voteshare}\ = \alpha_0 + \alpha_1
exppv + \alpha_2 ldp + \alpha_3 ldp:exppv + \alpha_4 nocand +
\varepsilon}\]
We can rewrite the equation asa follows:
\[\mathrm{{voteshare}\ = \alpha_0 + (\alpha_1 + \alpha_3 ldp) exppv + \alpha_2 ldp + \alpha_4 nocand + \varepsilon}\]
\(\alpha_0\) | : vote share (%) of a non-LDP member (ldp = 0 ) when
exppv = 0 |
\((\alpha_1 + \alpha_3 \textrm{ldp})\) | slope (exppv → voteshare ) = Marginal Effect |
Does the impact of campaign expenditure on vote share differ between LDP candidates and non-LDP candidates?
Marginal Effects ・Marginal effects
are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes
in regression
analysis.
・Marginal effects tells us how a dependent variable (outcome) changes
when a specific independent variable (explanatory variable)
changes.
・Other covariates (= control variables) are assumed to be held
constant.
→ In this case, we hold nocand
constant.
data
folder in your
RProject
folderhr
hr
[1] "year" "pref" "ku" "kun"
[5] "wl" "rank" "nocand" "seito"
[9] "j_name" "gender" "seshu_dummy" "jiban_seshu"
[13] "name" "previous" "...15" "age"
[17] "exp" "status" "vote" "voteshare"
[21] "eligible" "turnout" "castvotes" "...24"
[25] "...25" "nojiban_seshu" "mag"
[1] 1996 2000 2003 2005 2009 2012 2014 2017 2021
df1 <- hr %>%
dplyr::filter(year == 2005) %>%
dplyr::select(year, voteshare, exp, eligible, seito, nocand)
df1
contains the following 6 variables
variable | detail |
---|---|
year | Election Year |
voteshare | Voteshare (%) |
exp | Election expenditure (yen) spent by each candidate |
eligible | Eligible voters in each district |
seito | Candidate’s affiliated party |
nocand | Number of candidates in each district |
ldp
)ldp = 0
if a candidate is an LDP, 0 otherwiseexppv
)exppv
show campaign expenditure spent by each candidate
per voter in their electoral district.df1
Statistic | N | Mean | St. Dev. | Min | Max |
year | 989 | 2,005.000 | 0.000 | 2,005 | 2,005 |
voteshare | 989 | 30.333 | 19.230 | 0.600 | 73.600 |
exp | 985 | 8,142,244.000 | 5,569,641.000 | 62,710 | 24,649,710 |
eligible | 989 | 344,654.300 | 63,898.230 | 214,235 | 465,181 |
nocand | 989 | 3.435 | 0.740 | 2 | 6 |
ldp | 989 | 0.293 | 0.455 | 0 | 1 |
exppv | 985 | 24.627 | 17.907 | 0.148 | 89.332 |
tidy()
# A tibble: 5 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 23.5 1.70 13.8 1.27e- 39
2 exppv 0.770 0.0250 30.7 7.15e-146
3 ldp 39.8 1.69 23.6 1.08e- 97
4 nocand -4.45 0.444 -10.0 1.41e- 22
5 exppv:ldp -0.749 0.0453 -16.5 2.72e- 54
stargazer()
stargazer(model_1,
digits = 3,
style = "ajps",
title = "Results of model_1 (2005HR election)",
type ="html")
voteshare | |
exppv | 0.770*** |
(0.025) | |
ldp | 39.830*** |
(1.690) | |
nocand | -4.453*** |
(0.444) | |
exppv:ldp | -0.749*** |
(0.045) | |
Constant | 23.459*** |
(1.702) | |
N | 985 |
R-squared | 0.723 |
Adj. R-squared | 0.722 |
Residual Std. Error | 10.130 (df = 980) |
F Statistic | 639.244*** (df = 4; 980) |
p < .01; p < .05; p < .1 |
model_1
, let’s make another model which
does not include the interaction term (model_2
) and compare
them to have a better understanding of the results.stargazer()
stargazer(model_1, model_2,
digits = 3,
style = "ajps",
title = "Results of model_1 and model_2(2005HR election)",
type ="html")
voteshare | ||
Model 1 | Model 2 | |
exppv | 0.770*** | 0.543*** |
(0.025) | (0.024) | |
ldp | 39.830*** | 15.423*** |
(1.690) | (0.927) | |
nocand | -4.453*** | -4.403*** |
(0.444) | (0.502) | |
exppv:ldp | -0.749*** | |
(0.045) | ||
Constant | 23.459*** | 27.558*** |
(1.702) | (1.903) | |
N | 985 | 985 |
R-squared | 0.723 | 0.646 |
Adj. R-squared | 0.722 | 0.645 |
Residual Std. Error | 10.130 (df = 980) | 11.448 (df = 981) |
F Statistic | 639.244*** (df = 4; 980) | 596.035*** (df = 3; 981) |
p < .01; p < .05; p < .1 |
Important point The results of Model_1 is the results when ldp = 0
The coefficient of exppv
(0.770) in model_1
→ when exppv
increased one yen, voteshare
increased by 0.77 percentage points [when a candidate is not an
LDP]
To confirm this, let’s check the Sample Regression Function equations for model_1.
\[\widehat{voteshare}\ = 23.459 +
0.77exppv -0.749ldp:exppv - 4.453nocand\]
\[= 23.459 + (0.77 - 0.749ldp)exppv -
4.453nocand\]
exppv
on voteshare
\((α_1 + α_3ldp)\) is:\[{0.77 - 0.749ldp}\]
exppv
→ voteshare
)when a candidate increases his/her campaign
expenditure by 1 yen per voterldp
ldp
:\[\textrm{voteshare}= 23.46 + 0.77 \cdot \textrm{exppv} - 4.45 \cdot \textrm{nocand} + \varepsilon\]
\[\textrm{voteshare}= 63.29 + 0.02 \cdot \textrm{exppv} - 4.47 \cdot \textrm{nocand} + \varepsilon\]
plot1 <- ggplot(df1, aes(x = exppv, y = voteshare)) +
geom_point(aes(color = as.factor(ldp))) +
geom_abline(intercept = 23.46, slope = 0.77, linetype = "dashed", color = "red") +
geom_abline(intercept = 63.29, slope = 0.02, color = "blue") +
ylim(0, 100) +
labs(x = "Campaign expenditure per voter (yen)", y = "Vote share(%)") +
geom_text(label = "voteshare = 23.46 + 0.77exppv- 4.45nocand\n(non-LDP candidates)",
x = 60, y = 95, color = "red") +
geom_text(label = "voteshare = 63.29 + 0.002exppv - 4.47nocand\n(LDP candidates)",
x = 30, y = 80, color = "blue")
plot1
\[voteshare = 27.558 + 0.543exppv - 4.403\]
plot2 <- ggplot(df1, aes(x = exppv, y = voteshare)) +
geom_point() +
geom_abline(intercept = 27.558, slope = 0.543, color = "black") +
ylim(0, 100) +
labs(x = "Campaign expenditure per voter (yen)", y = "Vote share(%)") +
geom_text(label = "voteshare = 27.558 + 0.543exppv - 4.403nocand\n(Model_2)",
x = 60, y = 80, color = "black")
plot2
msm package
Intercept
and the four
coefficients
in model_1(Intercept) exppv ldp nocand exppv:ldp
23.4586397 0.7701405 39.8297455 -4.4525774 -0.7493132
We need to calculate the marginal effects (= slopes) of
exppv
on voteshare
when ldp = 0
and ldp = 1
.
To calculate these two marginal effects, we use the 2nd value
(exppv
) and the 5th value (exppv:ldp
)
[1] 0.77014053 0.02082736
delta method
, we calculate
standard error
on these two marginal effects with 95%
confidence intervalsestmean <- coef(model_1)
var <- vcov(model_1)
SEs <- rep(NA, length(at.ldp))
for (i in 1:length(at.ldp)){
j <- at.ldp[i]
SEs[i] <- deltamethod (~ (x2) + (x5)*j, estmean, var) # standard error
}
upper <- slopes + 1.96*SEs
lower <- slopes - 1.96*SEs
cbind(at.ldp, slopes, upper, lower)
at.ldp slopes upper lower
[1,] 0 0.77014053 0.81923536 0.72104569
[2,] 1 0.02082736 0.09518719 -0.05353247
Let me explain what this means.
at.ldp
shows whether a candidate belongs to the LDP
(= 1) or not (= 0)
slopes
is the marginal effects of
exppv
on voteshare
The value (0.77014053) between
[1, ]
and slopes
→ The marginal effect of exppv
on voteshare
for a non-LDP candidate
The value (0.02082736) between
[2, ]
and slopes
→ The marginal effect of exppv
on voteshare
for an LDP candidate
upper
and lower
show the upper bound
and lower bound on 95% confidence intervals
To visualize this result, we change the data into data frame and
name it msm_1
at.ldp slopes upper lower
1 0 0.77014053 0.81923536 0.72104569
2 1 0.02082736 0.09518719 -0.05353247
exppv
on
voteshare
for a Non-LDP and an LDP candidate.msm_1 <- msm_1 %>%
ggplot(aes(at.ldp, slopes, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2, col = "red") +
geom_pointrange(size = 1) +
geom_errorbar(aes(x = at.ldp, ymin = lower, ymax = upper),
width = 0.1) +
labs(x = "Candidate's affiliated party", y = "Marginal Effects") +
scale_x_continuous(breaks = c(1,0),
labels = c("LDP", "Non-LDP")) +
ggtitle("Marginal Effects of exppv on voteshare (Model_1)") +
theme(axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
axis.title.y = element_text(size = 14),
plot.title = element_text(size = 18))
msm_1
voteshare | ||
Model 1 | Model 2 | |
exppv | 0.770*** | 0.543*** |
(0.025) | (0.024) | |
ldp | 39.830*** | 15.423*** |
(1.690) | (0.927) | |
nocand | -4.453*** | -4.403*** |
(0.444) | (0.502) | |
exppv:ldp | -0.749*** | |
(0.045) | ||
Constant | 23.459*** | 27.558*** |
(1.702) | (1.903) | |
N | 985 | 985 |
R-squared | 0.723 | 0.646 |
Adj. R-squared | 0.722 | 0.645 |
Residual Std. Error | 10.130 (df = 980) | 11.448 (df = 981) |
F Statistic | 639.244*** (df = 4; 980) | 596.035*** (df = 3; 981) |
p < .01; p < .05; p < .1 |
Results on model_1 When ldp = 0
・When a non-LDP candidates increases campaign money by 1 yen, then
his/her vote share increases by 0.77 percentage points.
・This is statistically significant with the 1% significant level
(p-value = 7.15e-146
)
When ldp = 1
・When an LDP candidate increases campaign money by 1 yen, then his/her
vote share increases by 0.02 percentage points.
・This is not statistically significant.
The coefficient of exppv:ldp,
-0.749
・The impact of campaign expenditure on vote share differs
between LDP candidates and non-LDP candidates.
・The difference in the impact of exppv
on
voteshare
differs by 0.749.
・A non-LDP candidate’s marginal effect (slope) is larger than a LDP
candidat’s by 0.749 percentage points.
・This is statistically significant with the 1% significant level
(p-value = 2.72e- 54
)
tidy()
or
summary()
# A tibble: 5 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 23.5 1.70 13.8 1.27e- 39
2 exppv 0.770 0.0250 30.7 7.15e-146
3 ldp 39.8 1.69 23.6 1.08e- 97
4 nocand -4.45 0.444 -10.0 1.41e- 22
5 exppv:ldp -0.749 0.0453 -16.5 2.72e- 54
variable | detail |
---|---|
year | Election year (1996-2017) |
pref | Prefecture |
ku | Electoral district name |
kun | Number of electoral district |
rank | Ascending order of votes |
wl | 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner |
nocand | Number of candidates in each district |
seito | Candidate’s affiliated party (in Japanese) |
j_name | Candidate’s name (Japanese) |
name | Candidate’s name (English) |
previous | Previous wins |
gender | Candidate’s gender:“male”, “female” |
age | Candidate’s age |
exp | Election expenditure (yen) spent by each candidate |
status | 0 = challenger / 1 = incumbent / 2 = former incumbent |
vote | votes each candidate garnered |
voteshare | Voteshare (%) |
eligible | Eligible voters in each district |
turnout | Turnout in each district (%) |
castvote | Total votes cast in each district |
seshu_dummy | 0 = Not-hereditary candidates, 1 = hereditary candidate |
jiban_seshu | Relationship between candidate and his predecessor |
nojiban_seshu | Relationship between candidate and his predecessor |
Variable Names | Description |
---|---|
voteshare | Vote Percentage (%) |
exppv | Campaign Expenses per Eligible Voter (yen) |
dpj | Democratic Party Dummy (Democratic Party Candidate = 1, Other Candidates = 0) |
previous | Number of Previous Wins by the Candidate |
age | Age of the Candidate |
nocand | Number of Candidates in the Election |
Note 1
: The variable dpj is not included in the
dataset, so you should create it individually using the seito or party
variable.Note 2
: The variable exppv is not included in the
dataset, so you should create it individually using the exp and eligible
variables.Q1: Display descriptive statistics for the above three variables.
Q2: Display a scatter plot of campaign expenses and vote percentage and provide a brief comment.
Q3: State your hypothesis regarding whether the impact of campaign expenses on vote percentage differs between Democratic Party candidates and candidates from other parties in House of Representatives elections. Also, briefly explain the reasoning behind your hypothesis.
Q4: Can we say that the impact of campaign expenses on vote percentage is different between Democratic Party candidates and candidates from other parties? Use the msm package to visualize the marginal effect of campaign expenses on vote percentage for both Democratic Party candidates and other candidates, and explain the results succinctly.
References