library(corrplot)
library(jtools)
library(margins)
library(patchwork)
library(prediction)
library(ROCR)
library(survival)
library(survminer)
library(stargazer)
library(tidyverse)
Engineering | Reliability Analysis |
Economics | Duration Analysis |
Sociology | Event History Analysis |
Research Question | Appropriate Method | Overview |
Overall trends in treatment duration | Kaplan–Meier (KM) Survival Curve | Visualizes the number of patients surviving until the end of treatment using survival curves. |
Differences in mortality rates between men and women | Log-rank Test | Tests whether survival curves differ statistically across two or more groups. |
Impact of factors such as age on mortality | Cox Proportional Hazards Model | Allows simultaneous analysis of factors affecting the hazard (risk of death). |
Significance of explanatory variables in the Cox model | Likelihood Ratio Test | Evaluates the significance of Cox regression coefficients. The likelihood ratio test assesses the overall model, while the Wald test examines the effect of each variable individually. |
Event | Death, onset of disease, relapse, recovery, or any other outcome of interest |
Time | The duration from the start of observation (e.g., surgery or treatment) until (i) the occurrence of the event, (ii) the end of the study, or (iii) loss to follow-up/withdrawal from the study |
Censoring | Occurs when some information on survival time is available, but the exact survival time is unknown |
Survival Function: S(t) | The probability that a subject survives longer than time t |
aml.csv
aml.csv
dataaml
datasettime | Survival time or censoring time (weeks) |
status | Cancer relapse: 0 = no event (censored), 1 = event occurred (relapse) |
x | Treatment group: Nonmaintained (no maintenance chemotherapy), Maintained (with maintenance chemotherapy) |
# of risk
= n_risk
\[Cumulative.survival.probability (day\_5) \\= s.prob (day\_0)× s.prob (day\_5)」\\ = 1×0.91 = 0.91\]
\[Cumulative.survival.probability (Day\_8)\\= s.prob (day\_0)× s.prob (day\_5) × s.prob (day\_8)」\\ = 1×0.91 × 0.905= 0.823\]
# KM estimate (overall)
fit_all <- survfit(Surv(time, status) ~ 1, data = aml)
# Show summary (survival probability at each time)
summary(fit_all)
Call: survfit(formula = Surv(time, status) ~ 1, data = aml)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
5 23 2 0.9130 0.0588 0.8049 1.000
8 21 2 0.8261 0.0790 0.6848 0.996
9 19 1 0.7826 0.0860 0.6310 0.971
12 18 1 0.7391 0.0916 0.5798 0.942
13 17 1 0.6957 0.0959 0.5309 0.912
18 14 1 0.6460 0.1011 0.4753 0.878
23 13 2 0.5466 0.1073 0.3721 0.803
27 11 1 0.4969 0.1084 0.3240 0.762
30 9 1 0.4417 0.1095 0.2717 0.718
31 8 1 0.3865 0.1089 0.2225 0.671
33 7 1 0.3313 0.1064 0.1765 0.622
34 6 1 0.2761 0.1020 0.1338 0.569
43 5 1 0.2208 0.0954 0.0947 0.515
45 4 1 0.1656 0.0860 0.0598 0.458
48 2 1 0.0828 0.0727 0.0148 0.462
Key points of the Kaplan–Meier curve
・When calculating survival probability, individuals who are
censored are excluded from the denominator.
→ This allows us to estimate how many events would have occurred if
those censored cases had remained under observation.
→ This approach avoids underestimating the risk.
Research question ・Whether patients who received maintenance therapy experienced delayed mortality compared to those who did not receive maintenance therapy.
library(survival)
library(survminer)
# Load data
aml <- read_csv("data/aml.csv") # Load the uploaded aml.csv file
# Create survival object
# Time to relapse: time, Event occurrence: status, Treatment: x (Maintained or Nonmaintained)
fit <- survfit(Surv(time, status) ~ x, data = aml)
# Plot KM curve
km_1 <- ggsurvplot(fit, data = aml,
pval = TRUE, # p-value (log-rank test)
conf.int = TRUE, # Show confidence interval
risk.table = TRUE, # Show risk table
xlab = "Time to Relapse",
ylab = "Survival Probability",
legend.title = "Therapy",
legend.labs = c("Maintained", "Non-maintained"),
palette = c("blue", "red"),
ggtheme = theme_minimal())
km_1
cumrative survival probability
).aml
dataset, five subjects were censored at
weeks 13, 16, 28, 45, and 161, respectively.survdiff()
function to
examine the log-rank test results in more detail.Call:
survdiff(formula = Surv(time, status) ~ x, data = aml)
N Observed Expected (O-E)^2/E (O-E)^2/V
x=Maintained 11 7 10.69 1.27 3.4
x=Nonmaintained 12 11 7.31 1.86 3.4
Chisq= 3.4 on 1 degrees of freedom, p= 0.07
N | Sample size |
Observed | Number of relapses |
(O–E)² / E | Expected value (Expected) |
(O–E)² / V | Chi-squared test statistic |
p-value = 0.07
is slightly above the 0.05 significance
level, and therefore not statistically significant.Note:
library(survival)
library(survminer)
library(tidyverse)
# --- KM estimation (by group) ---
fit <- survfit(Surv(time, status) ~ x, data = aml)
# --- Survival probability at 40 weeks ---
s40 <- summary(fit, times = 40)
df40 <- data.frame(
group = sub("^x=", "", s40$strata), # "x=Maintained" → "Maintained"
surv = s40$surv
)
# Labels (percent format) and colors (to match ggsurvplot palette)
df40$label <- paste0(df40$group, ": ", sprintf("%.1f%%", df40$surv * 100))
df40$col <- ifelse(df40$group == "Maintained", "blue", "red")
# --- Plot ---
p <- ggsurvplot(
fit, data = aml,
conf.int = TRUE,
risk.table = TRUE,
xlab = "Time to Relapse (weeks)",
ylab = "Survival Probability",
legend.title = "Therapy",
legend.labs = c("Maintained", "Non-maintained"),
palette = c("blue", "red")
)
# Add vertical line, points, and labels at 40 weeks
p$plot <- p$plot +
geom_vline(xintercept = 40, linetype = 2) +
geom_point(
data = df40,
aes(x = 40, y = surv),
color = df40$col, size = 3, inherit.aes = FALSE
) +
geom_text(
data = df40,
aes(x = 40, y = surv, label = label),
vjust = -0.8, color = df40$col, fontface = "bold",
inherit.aes = FALSE
)
p
Number at risk
: the number of patients who have not yet
experienced the event and remain alive at each time pointsurvdiff()
function, we can check the results
of the log-rank test up to 40 weeks(Note: survdiff() does not take a data frame as the first argument, so the formula and data = … must be explicitly specified.)
Call:
survdiff(formula = Surv(time, status) ~ x, data = aml_40)
N Observed Expected (O-E)^2/E (O-E)^2/V
x=Maintained 8 6 8.3 0.639 1.6
x=Nonmaintained 10 9 6.7 0.793 1.6
Chisq= 1.6 on 1 degrees of freedom, p= 0.2
N | Number of samples | → Total sample size up to 40 weeks: 18 |
Observed | Number of deaths | → Total number of deaths up to 40 weeks: 15 |
(O-E)^2/E | Expected value (Expected) | |
(O-E)^2/V | Chi-square test statistic | |
p-value = 0.2
is slightly above the significance
level of 0.05, and thus cannot be considered statistically
significant.library(survival)
library(survminer)
library(ggplot2)
# --- Kaplan–Meier estimation ---
fit <- survfit(Surv(time, status) ~ x, data = aml)
# --- Extract median survival times ---
tb <- summary(fit)$table
df_med <- data.frame(
group = gsub("^x=", "", rownames(tb)), # → "Maintained" / "Nonmaintained"
median = as.numeric(tb[, "median"]),
stringsAsFactors = FALSE
)
# Exclude groups whose median is not reached (NA)
df_med2 <- subset(df_med, !is.na(median))
# Split by group
df_m <- subset(df_med2, group == "Maintained")
df_n <- subset(df_med2, group == "Nonmaintained") # ← Note: no hyphen
# --- Draw KM curves ---
p <- ggsurvplot(
fit, data = aml,
conf.int = TRUE, risk.table = TRUE,
xlab = "Time to Relapse (weeks)", ylab = "Survival Probability",
legend.title = "M.S.Time",
legend.labs = c("Maintained", "Non-maintained"),
palette = c("blue", "red"),
xlim = c(0, 50)
)
# --- Horizontal line at y=0.5, intersection dots & numeric labels ---
p$plot <- p$plot +
geom_hline(yintercept = 0.5, linetype = 2) +
# Maintained: dot & number (blue)
geom_point(data = df_m, aes(x = median, y = 0.5),
inherit.aes = FALSE, color = "blue", size = 3) +
geom_text(data = df_m, aes(x = median, y = 0.55, label = sprintf("%.0f", median)),
inherit.aes = FALSE, color = "blue", fontface = "bold", hjust = -0.3) +
# Nonmaintained: dot & number (red)
geom_point(data = df_n, aes(x = median, y = 0.5),
inherit.aes = FALSE, color = "red", size = 3) +
geom_text(data = df_n, aes(x = median, y = 0.55, label = sprintf("%.0f", median)),
inherit.aes = FALSE, color = "red", fontface = "bold", hjust = -0.3)
p
Call:
survdiff(formula = Surv(time, status) ~ x, data = aml)
N Observed Expected (O-E)^2/E (O-E)^2/V
x=Maintained 11 7 10.69 1.27 3.4
x=Nonmaintained 12 11 7.31 1.86 3.4
Chisq= 3.4 on 1 degrees of freedom, p= 0.07
p-value = 0.07
is slightly above the
0.05 significance level and therefore cannot be regarded as
statistically significant.What you want | Why KM Is Insufficient | Why Use the Cox Model |
🔸 Analyze multiple factors (covariates) simultaneously | Can only handle one variable | Allows multivariable analysis |
🔸 Include quantitative variables | Can only handle categorical variables | Can handle continuous variables |
🔸 Control for confounders (e.g., gender, age) | Cannot adjust for them | The Can adjust for confounders |
🔸 Know the hazard ratio (risk magnitude) | Provides only survival probabilities | Outputs hazard ratios |
🔸 Build a predictive model | Not well-suited for prediction | Can be applied to prediction |
[1] "inst" "time" "status" "age" "sex" "ph.ecog"
[7] "ph.karno" "pat.karno" "meal.cal" "wt.loss"
inst | : Medical institution code |
time | : Survival time (days) |
status | : Censoring status (1 = censored, 2 = dead) |
age | : Age (years) |
sex | : Gender (1 = male, 2 = female) |
ph.ecog | : Physician-rated ECOG performance score (0 = asymptomatic, 1 = symptomatic but fully ambulatory, 2 = in bed less than 50% of the day, 3 = in bed more than 50% of the day but not bedridden, 4 = completely bedridden) |
ph.karno | : Physician-rated Karnofsky performance score (0 = worst, 100 = best) |
pat.karno | : Patient-rated Karnofsky performance score |
meal.cal | : Calories consumed from meals |
wt.loss | : Weight loss in the past 6 months (pounds) |
lung <- lung |>
mutate(death = if_else(status == 2, 1, 0)) |>
mutate(female = if_else(sex == 2, 1, 0))
◎Note: By using the following command, you can check and filter multiple variables (e.g., time, status, x) simultaneously.
# Load packages
library(survival)
library(survminer)
# Load data
aml <- read_csv("data/lung.csv") # Load the uploaded lung.csv file
# Create survival object
# Time to relapse: time, Event occurrence: death, Treatment: x (Maintained or Nonmaintained)
fit <- survfit(Surv(time, death) ~ sex,
data = lung)
# Plot KM curve
km_1 <- ggsurvplot(fit, data = lung,
pval = TRUE, # p-value (log-rank test)
conf.int = TRUE, # show confidence interval
risk.table = TRUE, # show risk table
xlab = "Time to Relapse",
ylab = "Survival Probability",
legend.title = "Therapy",
legend.labs = c("Male Patient", "Female Patient"),
palette = c("blue", "red"),
ggtheme = theme_minimal())
km_1
Findings from the Kaplan–Meier Survival Curve
・The female curve lies above the male curve.
⇒ Compared to the male group, the female group tends to survive
longer.
・Log-rank test (p = 0.0013)
⇒ The difference between men and women in survival time and survival
probability is statistically significant.
cox_model <- coxph(Surv(time, death) ~ age +
female +
ph.ecog +
ph.karno +
pat.karno +
meal.cal +
wt.loss,
data = lung)
Call:
coxph(formula = Surv(time, death) ~ age + female + ph.ecog +
ph.karno + pat.karno + meal.cal + wt.loss, data = lung)
n= 168, number of events= 121
(60 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
age 1.065e-02 1.011e+00 1.161e-02 0.917 0.35906
female -5.509e-01 5.765e-01 2.008e-01 -2.743 0.00609 **
ph.ecog 7.342e-01 2.084e+00 2.233e-01 3.288 0.00101 **
ph.karno 2.246e-02 1.023e+00 1.124e-02 1.998 0.04574 *
pat.karno -1.242e-02 9.877e-01 8.054e-03 -1.542 0.12316
meal.cal 3.329e-05 1.000e+00 2.595e-04 0.128 0.89791
wt.loss -1.433e-02 9.858e-01 7.771e-03 -1.844 0.06518 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.0107 0.9894 0.9880 1.0340
female 0.5765 1.7347 0.3889 0.8545
ph.ecog 2.0838 0.4799 1.3452 3.2277
ph.karno 1.0227 0.9778 1.0004 1.0455
pat.karno 0.9877 1.0125 0.9722 1.0034
meal.cal 1.0000 1.0000 0.9995 1.0005
wt.loss 0.9858 1.0144 0.9709 1.0009
Concordance= 0.651 (se = 0.029 )
Likelihood ratio test= 28.33 on 7 df, p=2e-04
Wald test = 27.58 on 7 df, p=3e-04
Score (logrank) test = 28.41 on 7 df, p=2e-04
Indicator | Value | Interpretation |
female coef | -5.509e-01 | log hazard ratio (effect of being female) |
female exp(coef) | 0.5765 | Hazard ratio (interpretable) |
female p-value | 0.00609 | Since it is below the 5% significance level, the difference is statistically significant |
Findings from the Cox Proportional Hazards Model (female)
・exp(coef) = 0.5765 → The mortality risk for
women is about 57.6% of that for men
・Compared to men, women have about a 42.4%
lower risk of death
・A hazard ratio below 1 indicates lower risk
・p-value = 0.00609 → Statistically
significant
Indicator | Value | Interpretation |
age coef (coef) | 1.065e-02 | log hazard ratio (effect of age) |
age exp(coef) | 1.0107 | Hazard ratio (interpretable) |
age p-value | 0.35906 | Since it is above the 5% significance level, it is not statistically significant |
Findings from the Cox Proportional Hazards Model (age)
・exp(coef) = 1.0107 → Each additional year of age is estimated to
increase the risk of death by about 1.07%.
・However, this difference is not statistically significant (p-value =
0.359).
・Therefore, in this model controlling for other
variables, we cannot conclude that “higher age leads to a higher risk of
death.”
・A hazard ratio below 1 indicates reduced risk.
・p-value = 0.35906 → Not statistically
significant
lung
dataStatistic | N | Mean | St. Dev. | Min | Max |
inst | 227 | 11.088 | 8.303 | 1 | 33 |
time | 228 | 305.232 | 210.646 | 5 | 1,022 |
status | 228 | 1.724 | 0.448 | 1 | 2 |
age | 228 | 62.447 | 9.073 | 39 | 82 |
sex | 228 | 1.395 | 0.490 | 1 | 2 |
ph.ecog | 227 | 0.952 | 0.718 | 0 | 3 |
ph.karno | 227 | 81.938 | 12.328 | 50 | 100 |
pat.karno | 225 | 79.956 | 14.623 | 30 | 100 |
meal.cal | 181 | 928.779 | 402.175 | 96 | 2,600 |
wt.loss | 214 | 9.832 | 13.140 | -24 | 68 |
death | 228 | 0.724 | 0.448 | 0 | 1 |
female | 228 | 0.395 | 0.490 | 0 | 1 |
p_age <- ggplot(lung, aes(x = age, y = death)) +
geom_jitter(size = 1,
alpha = 1/3,
width = 0,
height = 0.05) +
geom_smooth(method = "glm",
color = "red",
method.args = list(family = binomial(link = "logit"))) +
labs(x = "Age",
y = "Probability of Death")
p_female <- ggplot(lung, aes(x = female, y = death)) +
geom_jitter(size = 1,
alpha = 1/3,
width = 0.05,
height = 0.05) +
geom_smooth(method = "glm",
color = "red",
method.args = list(family = binomial(link = "logit"))) +
labs(x = "Gender",
y = "Probability of Death")
model_lung <- glm(death ~ age + female + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss,
data = lung,
family = binomial(link = "logit")) # 係数を「オッズの対数」に指定
tidy()
to check the estimation results.# A tibble: 8 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -2.28 3.30 -0.692 0.489 -8.80 4.24
2 age 0.0210 0.0219 0.958 0.338 -0.0221 0.0643
3 female -0.940 0.392 -2.40 0.0164 -1.72 -0.177
4 ph.ecog 1.05 0.480 2.19 0.0287 0.120 2.02
5 ph.karno 0.0293 0.0271 1.08 0.279 -0.0245 0.0827
6 pat.karno -0.0148 0.0161 -0.914 0.361 -0.0471 0.0165
7 meal.cal 0.000250 0.000502 0.498 0.618 -0.000700 0.00129
8 wt.loss -0.00534 0.0147 -0.364 0.716 -0.0336 0.0247
→ The regression coefficient estimate displayed is the log of the odds, i.e., log(odds) = also called the
log-odds.
- For example, the estimated coefficient for female is -0.940.
- This represents the log-odds
= log(odds)
=
logit
- If this were the result of a multiple linear regression, it would
mean:
“For every one-unit (1 million yen) increase in campaign expenditure, the probability of winning decreases by 0.940 points.”
log-odds
are difficult to
interpret → Convert them into odds ratios
Display the results of model_lung in terms of odds ratios.
# A tibble: 8 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.102 3.30 -0.692 0.489 0.000151 69.4
2 age 1.02 0.0219 0.958 0.338 0.978 1.07
3 female 0.390 0.392 -2.40 0.0164 0.179 0.837
4 ph.ecog 2.86 0.480 2.19 0.0287 1.13 7.51
5 ph.karno 1.03 0.0271 1.08 0.279 0.976 1.09
6 pat.karno 0.985 0.0161 -0.914 0.361 0.954 1.02
7 meal.cal 1.00 0.000502 0.498 0.618 0.999 1.00
8 wt.loss 0.995 0.0147 -0.364 0.716 0.967 1.02
# Format results (convert to odds ratios)
results <- tidy(model_lung, conf.int = TRUE, exponentiate = TRUE) %>%
filter(term != "(Intercept)") %>%
mutate(
term = recode(term,
"female" = "Female dummy",
"age" = "Age",
"ph.ecog" = "ECOG Performance Score",
"ph.karno" = "Karnofsky Score (Physician)",
"pat.karno" = "Karnofsky Score (Patient)",
"meal.cal" = "Calories from Meals",
"wt.loss" = "Weight Loss (pounds)"),
OR_label = sprintf("%.2f", estimate) # Convert OR to string
)
# Forest plot + numeric labels
ggplot(results, aes(x = estimate, y = term)) +
geom_point(size = 3, color = "blue") +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) +
geom_vline(xintercept = 1, linetype = "dashed", color = "red") +
geom_text(aes(label = OR_label), # Display OR
hjust = 0.5,
vjust = -2,
size = 4) +
labs(
title = "Odds Ratios from Logistic Regression",
x = "Odds Ratio (95% CI)",
y = ""
) +
theme_minimal(base_size = 14) + # Add margin so labels are not cut off
xlim(0, max(results$conf.high) * 1.2) +
theme_bw(base_family = "HiraKakuProN-W3")
p-value = 0.0164
)\[OddRatios = \frac{odds_{female=1}}{odds_{female = 0}} = 0.39\]
→ The odds for \(odds_{female = 1}\) are 0.39 times the odds for \(odds_{female = 0}\)
→ For females, the probability of death decreases
p-value
for female is 0.0164.
log-odds
are difficult to
interpret → Convert to probabilities
predict()
.→ Useful when we want to illustrate “gender differences for a representative patient.”
library(ggplot2)
library(dplyr)
# Fix average values and set female to 0 / 1
newdata <- data.frame(
female = c(0, 1),
age = mean(lung$age, na.rm = TRUE),
ph.ecog = mean(lung$ph.ecog, na.rm = TRUE),
ph.karno = mean(lung$ph.karno, na.rm = TRUE),
pat.karno = mean(lung$pat.karno, na.rm = TRUE),
meal.cal = mean(lung$meal.cal, na.rm = TRUE),
wt.loss = mean(lung$wt.loss, na.rm = TRUE)
)
# Predicted values and standard errors
pred <- predict(model_lung,
newdata = newdata,
type = "response",
se.fit = TRUE)
# Data for plotting
plot_df <- newdata %>%
mutate(
fit = pred$fit,
se = pred$se.fit,
lower = fit - 1.96 * se,
upper = fit + 1.96 * se,
sex = factor(female, labels = c("male", "female"))
)
# Plot: add p-value as text inside the figure
plot_prediction <- ggplot(plot_df, aes(x = sex, y = fit)) +
geom_point(size = 5, color = "black") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1) +
geom_text(aes(label = paste0(round(fit * 100, 1), "%")),
hjust = -0.5, size = 5) +
geom_text(
data = data.frame(x = 1.5, y = max(plot_df$upper) + 0.05),
aes(x = x, y = y, label = "p = 0.0164"),
inherit.aes = FALSE,
size = 5
) +
labs(
title = "Predicted Probability of Death by Gender",
x = "Patient Gender",
y = "Predicted Probability of Death"
) +
theme_bw(base_family = "HiraKakuProN-W3")
plot_prediction
Results (Comparison of Predicted
Probability of Death)
・The figure shows the difference in predicted probability of death by
gender.
・Six control variables are fixed at their mean values.
・Predicted probability of death for males: 80.7%
・Predicted probability of death for females: 61.9%
・Since p-value = 0.024
, at the 5% significance level we
conclude that the difference between men and women is statistically
significant.
・The difference in predicted probabilities is about 18.8 percentage
points (80.7 − 61.9 = 18.8).
Under the same conditions, women have a
significantly lower probability of death compared to men.
margins()
→ Useful when you want to show the “overall average gender
difference.”
For the categorical variable (female), compute for each observation how
much the probability of death changes when switching from male →
female.
→ Take the average of these changes → this is the AME: Average Marginal Effect.
→ This allows you to state, for example:
“On average, women have a death probability that is X percentage points
higher (or lower) than men.”
→ The interpretation of the coefficient thus becomes much clearer.
# --- Function to format p-values ---
format_p <- function(x) {
ifelse(x < 0.001,
"p < 0.001",
formatC(x, format = "e", digits = 3))
}
# --- Calculation of marginal effects ---
mfx <- margins(model_lung, variables = "female")
factor AME SE z p lower upper
female -0.1660 0.0650 -2.5543 0.0106 -0.2933 -0.0386
Results (Comparison of Marginal
Effects)
・Understanding the “effect of gender (difference)”
= Clarifies the “average impact of gender change on the probability of
death.
・AME = -0.1660
→ On average, women have a death probability
about 16.6 percentage points lower
・The gender effect is statistically significant (p-value = 0.0106).
Summary of Results
predict()
・Compares predicted probabilities for men and women by assuming an
“average patient.”
→ “Women have a probability of death that is
18.8 percentage points lower than men.”
margins()
・Respects the conditions of each sample observation and shows the
average change in probability for women (AME, with 95% CI).
→ “On average, women have a death probability
about 16.6 percentage points lower.”
- Since logistic regression is nonlinear, it is normal for the numerical
values not to match exactly.
Objective | Estimate the risk of death (hazard ratio) based on “survival time.” |
Result | Women have about a 42% lower risk of death compared to men; ph.ecog and ph.karno are significant. |
Strength | Provides the probability of surviving up to a certain point in time (survival curve), rather than just the probability of death itself. |
Validity | If survival data (time + censoring) are available, this is generally the most valid approach. |
Objective | Treat death as a binary outcome (0/1) and estimate “probability of death” from explanatory variables. |
Result | Predicted probability of death: Women = 61.9%, Men = 80.7%. |
Weakness | Treats those censored during the observation period (status = 1) as “not dead (death = 0).” |
Validity | Inferior to the Cox proportional hazards model because it ignores survival time data. |