library(tidyverse)
library(ggforce)

1. Monte Carlo experiments

  • A broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
  • The underlying concept is to use randomness to solve problems that might be deterministic in principle.
  • They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches.
  • Monte Carlo methods are mainly used in the following three problem classes:
  1. optimization
  2. numerical integration
  3. generating draws from a probability distribution

1.1 Generating random number: sample()

  • Two ways of generating random number
  1. Generate random number by drawing values from a group of numbers
  2. Generate random number by drawing values from probability distribution, such as normal distribution
(1) Generate random number by drawing values from a group of numbers

How to use sample() sample(x = a set vector,
   size = the number of trials,
   replace = replace the number you draw or not
   prob = probability of each number drawn)

  • For instance, suppose x is a set vector containing the value from 1 to 10
x <- c(1, 2, 3, 4, 5, 6)
  • Randomly draw 5 numbers from
sample(x,
       size = 5)
[1] 3 5 6 1 2
  • In this case, the number you draw, for example you draw number 1, number 1 is not drawn in the trial
  • But, if you add replace = TRUE, which means extraction with restoration, then it is likely that you select number 1 in the trial
sample(x,
       size = 5, 
       replace = TRUE)
[1] 6 2 2 6 2
  • prob means the probability that each number is drawn
  • Since there are 6 numbers, R automatically assigns this probability as one-sixth
  • Suppose you want to randomly draw the three numbers (4, 5, 6) and these three numbers are twice as likely as the other three numbers (1, 2, 3) to be drawn
  • If this is the case, then you should use the following R code:
sample(x,
       size = 5, 
       replace = TRUE,
       prob = c(1, 1, 1, 2, 2, 2))
[1] 1 6 4 5 4
  • You can also use prob = c(1/9, 1/9, 1/, 2/9, 2/9, 2/9) and get the same result
  • If you want to roll the dice three times, then you should add the number of trials: size = 3
  • If you roll the dice three times, then you may not have the same number or you may have it multiple times
    → You need to add replace = TRUE, which means extraction with restoration
  • The probability that you have one of the 6 numbers is one-sixth
  • You need not to assign this probability
  • R automatically assign it
sample(1:6,
       3,
       replace = TRUE)
[1] 6 5 5
  • Suppose you have a loaded dice that shows the odd numbers (1, 3, 5) twice as likely as the even number (2, 4, 6)
  • If this is the case, then you need to add: prob = c(2, 1, 2, 1, 2, 1)
sample(x,
       size = 5, 
       replace = TRUE,
       prob = c(2, 1, 2, 1, 2, 1))
[1] 1 5 3 5 1
  • Also suppose you roll this loaded dice 10,000 times and check the number you get
  • Then visualize the result using a bar chart
  • Using table(), you can show the result
dice <- sample(1:6, 
               10^4, 
               replace = TRUE,
               prob = c(2, 1, 2, 1, 2, 1))
table(dice)
dice
   1    2    3    4    5    6 
2187 1126 2264 1156 2209 1058 
  • Using barplot(), let’s visualize this result
table(dice) |> 
  barplot()

- You see that the odd numbers (1, 3, 5) is twice as likely as the even numbers (2, 4, 6) to be drawn in this simulation

1.2 Generating random numbers from probability distributios

Function Probability distribution Parameters
rbeta() beta distribution n, shape1, shape2
rbinom() binomial distribution n, size, prob
rcauchy() cauchy distribution n, location, scale
rchisq() Chi-square distribution n, df
rexp() exponential distribution n, rate
rf() F distribution n,
rgamma() gamma distribution n, shape, scale
rgeom() geometric distribution n, prob
rhyper() hypergeometric distribution nn, m, n, k
rlnorm() lognormal distribution n, meanlog, sdlog
rmultinom() multinomial distribution n, size, prob
rnbinom() negative binomial distribution n, size, prob
rnorm() normal distribution n, mean, sd
rpois() Poisson distribution n, lambda
rt() t distribution n, df
runif() uniform distribution n, min, max
rweibull() Weibull distribution n, shape, scale
mvtnorm::rmvnorm() multivariate normal distribution n, mean, sigma

1.3 Set seed in R

  • When you generate “random numbers” in R, you are actually generating pseudorandom numbers

  • These numbers are generated with an algorithm that requires a seed to initialize

  • Being pseudorandom instead of pure random means that, if you know the seed and the generator, you can predict (and reproduce) the output

  • Setting a seed in R means to initialize a pseudorandom number generator

  • Most of the simulation methods in Statistics require the possibility to generate pseudorandom numbers that mimic the properties of independent generations of a uniform distribution in the interval (0, 1)

  • If you want to draw a random number in R, you receive different numbers every time you do

  • Suppose you wan to draw three numbers from a normal distribution with average = 0 and standard deviation = 1

  • Round the number with 2 decimal places three times

rnorm(3) |> 
        round(2)
[1] -1.52  0.39 -0.52
rnorm(3) |> 
        round(2)
[1] -1.32  0.27 -0.06
rnorm(3) |> 
        round(2)
[1] -0.12  0.02  0.19
  • You see that ever time you draw random numbers, you receive different numbers

  • If you a simulation onece, there is no problem

  • But, if you want to get the same simulation result for some reason, then you must be happy to have the same result by fixing random number to get

  • This is when you use set.seed()

  • If you use the same set.seed(), then you get the same random number

  • In the parenthesis, you set a numeric scalar (which means, any nubmers you like)

  • Let’s set a numeric scalar, 20220826 within set.seed()

set.seed(20220826)
rnorm(3) |> 
        round(2)
[1] -0.14 -0.87 -0.29
set.seed(20220826)
rnorm(3) |> 
        round(2)
[1] -0.14 -0.87 -0.29
  • Once you set seed, then you can fix the random number and get the same result in your simulation

2. Birthday Paradox

  • In probability theory, the birthday problem asks for the probability that, in a set of N randomly chosen people, at least two will share a birthday
  • Let’s think about the question: how many people in a group we need to have a shared birthday?

Answer The probability of a shared birthday exceeds 50% in a group of only 23 people.

  • The birthday paradox is counterintuitive, and a veridical paradox: it appears wrong, but is in fact true
  • While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the comparisons of birthdays will be made between every possible pair of individuals.

Let’s confirm Birthday Paradox using R

  • Set the number of student: n_student
  • Draw multiple students from number 1 to 365
  • Suppose we have 10 student: n_student = 10
n_student <- 10
Birth_vec <- sample(1:365,
                    n_student,
                    replace = TRUE)
Birth_vec
 [1] 140  55 363 170 292 335 345   6 179   6
  • If you see a pair of number in this group
    → There a shared birthday
duplicated()
  • We can check whether there is a shared birthday by using duplicated()
  • If there is a shared birthday, then we receive TRUE, FALSE otherwise
duplicated(Birth_vec)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
  • We see 9 FALSE and 1 TRUE
  • TRUE means that there is a shared birthday
    → Actually, the 8th student and 10th student have the same birthday (which is the 6 in the 365th DOB)
    → We have one shared birthday in this group
any()
  • any() enables us to check whether or not there is at least one TRUE
    TRUE means that there is at least one shared birthday
    FALSE means that there is no TRUE
any(duplicated(Birth_vec))
[1] TRUE
  • You get TRUE because the 6th DOB is a shared birthday
  • Let’s repeat this 100 times
n_student <- 10   # the number of students
n_trials  <- 100  # the number of trials  


Result_vec <- rep(NA, n_trials) # Prepare for  the place where we put our simulation result: Result_vec

# iteration  
for (i in 1:n_trials) {
    Birth_vec     <- sample(1:365, n_student, replace = TRUE)
    Result_vec[i] <- any(duplicated(Birth_vec))
}

Result_vec
  [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [13] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [61] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [85] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
 [97] FALSE FALSE FALSE FALSE
sum(Result_vec)
[1] 12
  • We get 12 TRUE
    → 12 out of 100 (12%) share the same birthday
  • Let’s repeat this 10,000 times
    → We can approximate probability
n_student <- 10    # the number of students
n_trials  <- 1000 # the number of trials  

# Prepare for  the place where we put our simulation result: Result_vec
Result_vec <- rep(NA, n_trials)

# iteration
for (i in 1:n_trials) {
    Birth_vec     <- sample(1:365, n_student, replace = TRUE)
    Result_vec[i] <- any(duplicated(Birth_vec))
}

sum(Result_vec)
[1] 110
  • We get 1190 TRUE
    → 1,190 out of 10,000 (11.9%) share the same birthday
    → If there are 11 people, then the probability to have a shared birthday is 11.9%

2.1 Comparison between simulation result and theoretical value

2.1.1 Simulation of birthday paradox

  • Let’ fix the number of trials as 100
  • Changing the number of students from 2 to 100, we want to see the probability that we have more than one shared birthday
Prob_df <- tibble(Students = 2:100, # change the number of students from 2 to 100
                  Probs    = NA) 

for (i in 1:nrow(Prob_df)) {
    
    Result_vec <- rep(NA, 100) # Assign the number of trials = 100
    
    for (j in 1:100) {        # Assign the number of trials = 100
        Birth_vec     <- sample(1:365, Prob_df$Students[i], replace = TRUE)
        Result_vec[j] <- any(duplicated(Birth_vec))
    }
    
    Prob_df$Probs[i] <- mean(Result_vec)
}
  • Check the simulation result
DT::datatable(Prob_df)
  • When we have 2 students, the probability that we have the shared birthday is zero (this makes perfect sense!)
  • Let’s check the number of students (Students) when the probability that we have at least one shared birthday (Probs) is 0.5 (=50%)
Prob_df |> 
  filter(Probs >= 0.4 & Probs <= 0.6)
# A tibble: 7 × 2
  Students Probs
     <int> <dbl>
1       19  0.43
2       20  0.43
3       21  0.5 
4       22  0.51
5       23  0.59
6       24  0.58
7       25  0.59
  • We see that the number of students (Students) when the probability that we have at least one shared birthday (Probs) is 0.5 (=50%) is around 21 or 22
  • Visualize the result by setting x-axis as the number of students (Students) and y-axis as the probability that we have at least one shared birthday (Probs)
Prob_df %>%
    ggplot() +
    geom_line(aes(x = Students, 
                  y = Probs * 100), 
              size = 1) +
    geom_hline(yintercept = 50, # Draw a red dotted line where Probs = 50%
               color = "red", 
               linetype = 2) +
    labs(x = "The number of Students",
         y = "The prob. that we have at least one shared birthday (Probs)  \n(Trials = 100)") 

Simulation result (Trial = 100) The probability of a shared birthday exceeds 50% in a group of only 23 people

2.1.2 Theoretical value of birthday paradox

  • The probability that you have more than one shared birthday \(P(n)\) when there are n people

\[P(n) = 1 - \frac{365!}{365^n(365-n)!}\]

  • \(3! = 3 × 2 × 1 = 6\)
    → \(365! = 365 × 364 ×....× 1\)

  • In R, you can calculate this value by using factorial()

  • For instance, if you have 50 people, then \(P(n)\) is calculated as follows:

1 - factorial(365) / (365^50 * factorial(365 - 50))
[1] NaN
  • NaN means Not calculable
  • Because the number is too big, R cannot calculate it

Inf ・Let’s calculate \(100!\) on R

factorial(100)
[1] 9.332622e+157
  • This is a gigantic number with 157 decimal points!!

  • Let’s calculate \(200!\) on R

factorial(200)
[1] Inf
  • Inf means that R fails to calculate because the number is goo large
  • Let’s change the mathematical formula to a calculable one

\[P(n) = 1 - \frac{n!×_{365}C_n}{365^n(365-n)!}\]

  • C means binomial coefficient

\[_nC_k = \frac{n!}{k!(n-k)!}\]

  • In R, you can calculate C by using choose(n, k)  
1 - ((factorial(50) * choose(365, 50)) / 365^50)
[1] 0.9703736
  • We see when there are 50 students, then the probability that we have at least one shared birthday (Probs) is 0.97 (= 97%)
  • Let’s calculate the probability that we have at least one shared birthday when there are 22 students and 23 students respectively
Birth_Expect <- function (n) {
  1 -  ((factorial(n) * choose(365, n)) / 365^n)
}
Birth_Expect(22)
[1] 0.4756953
Birth_Expect <- function (n) {
  1 - ((factorial(n) * choose(365, n)) / 365^n)
}
Birth_Expect(23)
[1] 0.5072972
  • We see that the probability that we have at least one shared birthday is when there are 23 students

  • Let’s add this theoretical value to the data frame (Prob_df) we made in 2.1.1 Simulation of birthday paradox and name it Expect

Prob_df <- Prob_df |> 
  mutate(Expect = map_dbl(Students,
                          ~Birth_Expect(.x)))
DT::datatable(Prob_df)
  • Using pivot_longer(), let’s customize this data to a tidy one
Prob_tidy <- Prob_df |> 
  pivot_longer(cols = Probs:Expect,
               names_to = "Type",
               values_to = "Prob")
DT::datatable(Prob_tidy)
  • Let’s visualize the result gained from Monte Carlo simulation (Probs) and the result gained as theoretical value (Expect)
Prob_tidy %>%
    mutate(Type = ifelse(Type == "Probs", "simulation", "theory")) %>%
    ggplot() +
    geom_line(aes(x = Students, 
                  y = Prob * 100, 
                  color = Type), 
              size = 1) +
    labs(x     = "The number of students",
        y = "The prob. that we have at least one shared birthday (Probs)  \n(Trials = 100)", 
         color = "") +
    theme(legend.position = "bottom")

  • Let’s increase the number of trials to 1000 and visualize it
n_student <- 100    # the number of students
n_trials  <- 1000   # the number of trials  

# Prepare for the place where we put our simulation result: Result_vec
Result_vec <- rep(NA, n_trials)

# iteration
for (i in 1:n_trials) {
    Birth_vec     <- sample(1:365, 
                            n_student, 
                            replace = TRUE)
    Result_vec[i] <- any(duplicated(Birth_vec))
}
Prob_df <- tibble(Students = 2:100,
                  Probs    = NA)

for (i in 1:nrow(Prob_df)) {
    
    Result_vec <- rep(NA, n_trials)
    
    for (j in 1:n_trials) {
        Birth_vec     <- sample(1:365, Prob_df$Students[i], replace = TRUE)
        Result_vec[j] <- any(duplicated(Birth_vec))
    }
    
    Prob_df$Probs[i] <- mean(Result_vec)
}
Prob_df %>%
    mutate(Expect = map_dbl(Students, ~Birth_Expect(.x))) %>%
    pivot_longer(cols      = Probs:Expect,
                 names_to  = "Type",
                 values_to = "Prob") |> 
    mutate(Type = ifelse(Type == "Probs", "Simulation", "Theoretical value")) %>%
    ggplot() +
    geom_line(aes(x = Students, y = Prob * 100, color = Type), size = 1) +
    labs(x     = "The number of Students",
          y = "The prob. that we have at least one shared birthday (Probs)  \n(Trials = 1000)", 
         color = "") +
    theme_minimal(base_family = "HiraKakuProN-W3") +
    theme(legend.position = "bottom")

  • We see that the result gained from our Monte Carlo simulation and the theoretical value looks almost the same

3. Probability and Area

3.1 How to calculate Area of a Circle without using \(π\)

  • Let’s consider a circle of radius 1 centered at the origin, find its area without using \(π\)

  • Step 1: Draw 4 lines around the circle and make a square

  • Step 2: Randomly add points inside of the square

  • Step 3: Calculate the ratio of dots inside and outside of the square
    → We get the ratio of the area inside and outside of the square
    → We can calculate the are of the circle
    → We can estimate the value of \(\pi = 3.14\)

Using a Monte Carlo simulation, let’s calculate \(\pi\)

  • The are of the square = 2 × 2 = 4
  • Using R, randomly add 100 dots inside of the square
set.seed(2022)
pi_df <- tibble(x = runif(100, -1, 1),
                y = runif(100, -1, 1))
pi_df
# A tibble: 100 × 2
         x      y
     <dbl>  <dbl>
 1  0.632   0.910
 2  0.295  -0.594
 3 -0.759   0.681
 4  0.0876  0.622
 5 -0.631   0.148
 6  0.272  -0.585
 7 -0.851   0.784
 8 -0.916  -0.831
 9 -0.259   0.243
10  0.515   0.968
# … with 90 more rows
pi_df |> 
  ggplot() +
    geom_rect(aes(xmin = -1, 
                  ymin = -1, 
                  xmax = 1, 
                  ymax = 1),
              fill = "white", color = "black") +
    geom_point(aes(x = x, y = y)) +
    coord_fixed(ratio = 1) +
    theme_minimal()

  • Add a circle to the square
  • When drawing a circle, you can use geom_circle() which comes with {ggforce}
  • Assign x = 0, y = 0 and the radius of the circle: \(r = 1\)
pi_df %>%
    ggplot() +
    geom_rect(aes(xmin = -1, 
                  ymin = -1, 
                  xmax = 1, 
                  ymax = 1),
              fill = "white", color = "black") +
    geom_circle(aes(x0 = 0, 
                    y0 = 0, 
                    r = 1)) +
    geom_point(aes(x = x, y = y)) +
    coord_fixed(ratio = 1) +
    theme_minimal()

  • Differently color the dots inside and outside of the circle
  • In order to do so, we make a vector, in_circle
  • The following is the formula of the circle:

\[(x-x')^2 + (y - y')^2 < r^2\]

  • Since the circle of radius 1 centered at the origin, we can express the circle as follows:

\[x^2 + y^2 < 1^2\]

  • Using in_circle, you can decide whether or not the random dots satisfy this mathematical equation
pi_df <- pi_df %>%
    mutate(in_circle = if_else(x^2 + y^2 < 1^2, "Inside", "Outside"))

3.2 The probability of Dots’ being inside and outside equals to Area

  • Let’s display the dots being inside and outside of the circle
pi_df %>%
    ggplot() +
    geom_rect(aes(xmin = -1, 
                  ymin = -1, 
                  xmax = 1, 
                  ymax = 1),
              fill = "white", 
              color = "black") +
    geom_point(aes(x = x, 
                   y = y, 
                   color = in_circle), 
               size = 2) +
    geom_circle(aes(x0 = 0, 
                    y0 = 0, 
                    r = 1)) +
    labs(x = "X", y = "Y", color = "") +
    coord_fixed(ratio = 1) +
    theme_void(base_size = 12) 

  • Using group_by(), you can count the number of dots inside and outside of the circle
pi_df |> 
  group_by(in_circle) |> 
  summarise(N = n())
# A tibble: 2 × 2
  in_circle     N
  <chr>     <int>
1 Inside       82
2 Outside      18
  • You see that 82 dots (82%) inside of the circle out of 100 dots.
  • The area of the square is 2 × 2 = 4
    → The are of the circle = \(4 × \frac{82}{100} = 3.28\)
  • Since the radius of the circle is 1, this is equivalent to \(π\)
4 * 0.82
[1] 3.28
  • If you increase the number of dots, they can approximate a more accurate value of the \(π\)
  • Let’s approximate the value of \(π\) with 2000 dots
set.seed(2022)
pi_df2 <- tibble(x = runif(2000, -1, 1),
                 y = runif(2000, -1, 1))

pi_df2 <- pi_df2 %>%
    mutate(in_circle = if_else(x^2 + y^2 < 1^2, "Inside", "Outside"))

pi_df2 %>%
    ggplot() +
    geom_rect(aes(xmin = -1, 
                  ymin = -1, 
                  xmax = 1, 
                  ymax = 1),
              fill = "white", color = "black") +
    geom_point(aes(x = x, 
                   y = y, 
                   color = in_circle,
                   alpha = 0.5), 
               size = 2) +
    geom_circle(aes(x0 = 0, 
                    y0 = 0, 
                    r = 1)) +
    labs(x = "X", y = "Y", color = "") +
    coord_fixed(ratio = 1) +
    theme_void(base_size = 12) 

pi_df2 %>%
    group_by(in_circle) %>%
    summarise(N = n())
# A tibble: 2 × 2
  in_circle     N
  <chr>     <int>
1 Inside     1562
2 Outside     438
  • You see that 1562 dots inside of the circle and 438 dots outside of the circle out of 2000 dots
  • The are of the square is 4
    → The are of the area = \(4 × \frac{1562}{2000} = 3.124\)
  • Since the radius of the circle is 1, this is equivalent to \(π\)
4 * 1562 / 2000
[1] 3.124
  • It is getting more to 3.14

3.3 Monte Carlo Simulation on \(\pi\)

  • Changing the number of dots to 2000, let’s estimate the value of \(\pi\)
dots <- 2:2000   # Change the number of dots from 2 to 2000
n <- length(dots)
results <- rep(NA, n)

for (i in 1:n) {
  results[i] <- tibble(x = runif(dots[i], -1, 1),
                       y = runif(dots[i], -1, 1)) |> 
    mutate(in_circle = if_else(x^2 + y^2 < 1^2, "Inside", "Outside")) |> 
    group_by(in_circle) |> 
    summarise(N = n()) |> 
    filter(in_circle == "Inside") |>  # Select the dots insides of the circle
    summarise(mean(N/dots[i]*4))  # Estimate pi  
  } 

pi <- unlist(results)       # Transform the date into vector
df_pi <- tibble(dots, pi) # Make a data frame, df_pi  

df_pi %>%
  ggplot() +
  geom_line(aes(x = dots, 
                y = pi), 
            size = 0.2,
            color = "blue",
            alpha = 0.8) +
  geom_hline(yintercept = 3.14, # Draw the line = 3.14
             color = "red", 
             linetype = 2) +
  labs(x = "The number of dots",
       y = "Estimated pi") +
  ggtitle("Estimated pi gained from Monte Carlo Simulation") 

  • As the number of dots increases, you get the value closer to 3.14

5. Monty Hall Problem

  • Let’s think about Monty Hall problem

  • The Monty Hall problem is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let’s Make a Deal and named after its original host, Monty Hall.

【Rules】

  • Suppose you’re on a game show, and you’re given the choice of three doors, 1, 2, and 3.
  • Behind one door is a car; behind the others, goats.
  • You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat.
  • He then says to you, “Do you want to stay at door No. 1 or pick door No.2?”
  • Is it to your advantage to switch your choice?
  • Here you have the following two choice:

Stay:
You should stay, because there is no difference between picking door No.1 and door No.2 in terms of winning a car.

Move:
You should pick door No.2 because it enhances your chance of winning a car.

  • Watch this video and think which Choice sound right.

【Simulation on R】

  • Let’s suppose a car is behind the door No.1.
  • You, a player, randomly choose a door out of he three doors.
  • If there is no car behind the door you chose, Monty open one of the two doors which do not have a car.
  • You see a goat behind that door.
  • Now you have the following two options:
  1. Stay at the door you initially picked
  2. Move to the door Monty suggested
  • Player are supposed to play this game 200 times
  • Calculate the probability that you win a car in case you choose to stay and in case you choose to move.

Script for Monty Hall Problem on R

set.seed(1838-3-11) # Set a random number, Okuma's DOB

# Player are supposed to play this game 200 times 
trials <- 200  
prize <- sample(1:3, trials, replace = TRUE) 
prize
stay <- prize == 1 # Define that a car is behind door No.1
head(stay)
move <- prize != 1 # Define that a car is not behind door No.2 and No.3  
head(move)
prob.stay <- prob.move <- rep(NA, trials) 
for (i in 1:trials) {
prob.stay[i] <- sum(stay[1:i]) / i # Prob. of winning a car if you stay
prob.move[i] <- sum(move[1:i]) / i # Prob . of winning a car if you move

# Plot your simulation results
plot(1:trials, prob.move, ylim = c(0, 1),  
     type = "l", col = "red", 
     xlab = "number of trials",
     ylab = "prob. of getting a car",
     main = "Simulation of Monty Hall Problem")
 lines(1:trials, prob.stay, type = "l", col = "blue")
 abline(h = c(0.33, 0.66), lty = "dotted") # Draw two lines with prob = 33 % and prob = 66%
 legend("topright", lty = 1, col = c("red", "blue"),
       legend = c("change", "stay"))}

・How to run this simulation:
1. Go to File on R (not RStudio) 2. Click New File
3. Paste the R code above (Script for Monty Hall Problem on R) 4. Hit the return key or enter key and you get the following results:

References