Step 1: Prepare data
- Make four fake variables (
name
, math
,
stat
, gender
) and merge them into data frame,
named df1
# Make 4 variables
name <- c("Joe", "Ze'ev", "David", "Mike", "Ross", "Woojin", "Inha", "Jih-wen",
"Mark", "Dennis", "Carol", "Shira", "Mimi", "Amital", "Rachel", "Ariel",
"Kelly", "RongRong", "Kathy", "Barbara")
math <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 4, 5, 6, 7, 9, 8, 9, 8, 10)
stat <- c(2, 4, 6, 5, 7, 9, 7, 10, 12, 15, 14, 13, 12, 13, 11, 10, 9, 8, 6, 4)
gender <- c("Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female")
# Merge the 4 variables into data frame, df1
df1 <- tibble(name, math, stat, gender)
- Let’s check how
df1
looks like
df1
# A tibble: 20 × 4
name math stat gender
<chr> <dbl> <dbl> <chr>
1 Joe 1 2 Male
2 Ze'ev 2 4 Male
3 David 3 6 Male
4 Mike 4 5 Male
5 Ross 5 7 Male
6 Woojin 6 9 Male
7 Inha 7 7 Male
8 Jih-wen 8 10 Male
9 Mark 9 12 Male
10 Dennis 10 15 Male
11 Carol 2 14 Female
12 Shira 4 13 Female
13 Mimi 5 12 Female
14 Amital 6 13 Female
15 Rachel 7 11 Female
16 Ariel 9 10 Female
17 Kelly 8 9 Female
18 RongRong 9 8 Female
19 Kathy 8 6 Female
20 Barbara 10 4 Female
Assign data: ggplot()
- You need to do
mapping
, meaning assigning data within
aes()
ggplot()
has two arguments: data
and
mapping
- First, let’s assign
df1
as data to use
- Then, we can see the canvas where we draw a graph
ggplot(data = df1)
- We can see a blank canvas
Step 3: Select a graph
- You choose a geometric object,
geom_point()
to draw a
scatter plot
- You can add
geom_point()
by using +
ggplot(data = df1) +
aes(x = math, y = stat) +
geom_point()
aes(x = math, y = stat)
is an independent layer
- But, you can put
aes(x = math, y = stat)
within
geom_point()
- This means that you map
aes()
within
geom_point()
- Within
geom_point()
, you need to map by tying
mapping =
- Within
aes()
, you need to link each element (such as a
dot, line, or surface) to a particular variable
→ Here, we link x axis
is linked to math
,
y axis
is linked to stat
, and each dots are
colored by gender
ggplot(data = df1) +
geom_point(mapping = aes(x = math,
y = stat,
color = gender))
- Both
data =
and mapping =
are the first
argument and they are omissible
→ You can rewrite the R code above as follows:
ggplot(df1) +
geom_point(aes(x = math,
y = stat,
color = gender))
- Further, by using the pipe operator (
%>%
or
|>
), you can rewrite it as follows:
- You can use the pipe operator (
%>%
or
|>
) to link command together.
- This tell
R
to do something and then something else to
the output of the first something.
- Chaining functions together like this will become very useful as
your tasks become more complicated.
df1 %>%
ggplot() +
geom_point(aes(x = math,
y = stat,
color = gender))
Mapping aes()
and omission of R code
Summary on mapping
aes()
・In
Step 2: Assign the variables
,
aes(x = math, y = stat)
is added by +
after
ggplot(data = df1)
ggplot(data = df1) +
aes(x = math, y = stat) +
geom_point()
aes(x = math, y = stat)
can not only be added within ggplot()
、but also be added
within geom_point()
- In sum, you have two ways of mapping
aes()
:
Mapping aes()
within ggplot()
df1 %>%
ggplot(aes(x = math,
y = stat,
color = gender) +
geom_point()
Mapping aes()
within geom_plot()
df1 %>%
ggplot() +
geom_point(aes(x = math,
y = stat,
color = gender))
- Which way you choose all depends on what you want to do in your
analysis
How pipes (%>%
or
|>
) are used ・The pipes (%>%
or
|>
) allow you to express a sequence of multiple
operations
・%>%
and |>
can be used
interchangeably.
・Pipes can greatly simplify your code and make your operations more
intuitive
・The pipe operator (%>%
) is automatically imported as
part of the {tidyverse
} library
・Pipes (%>%
) are included in {magrittr
}
package
・{magrittr
} package is included in
{tidyverse}
package
→ You need to read either of the following packages to use the pipe
operator (%>%
)
library(magrittr)
library(tidyverse)
The pipe operator
(%>%
) automatically passes the output from the first
line into the next line as the input
- You can use the pipe operator (
%>%
) to link command
together.
- This tell
R
to do something and then something else to
the output of the first something.
- Chaining functions together like this will become very useful as
your tasks become more complicated.
Let’s take a look at an exmple of using the pipe
- Generate vectors from 1 to 10
1:10
[1] 1 2 3 4 5 6 7 8 9 10
- Calculate
1 + 2 + .... + 10
sum(1:10)
[1] 55
- If you use pipe (
%>%
), you write R code as
follows:
1:10 %>% # Generate vectors from 1 to 10
sum() # Add them all
[1] 55
- If you want to calculate the square root, …
1:10 %>%# Generate vectors from 1 to 10
sum() %>% # Add them all
sqrt() # Calculate the square root
[1] 7.416198
- Generate vectors from 1 to 10 → Add them all → Calculate the square
root
- This is easier to interpret the sequence of operations
If you don’t use pipes,…
sqrt(sum(1:10))
[1] 7.416198
- This is less intuitive because you have to think backward
- Calculate the square roof ← Add them all ← Generate vectors
How to interpret R code with pipes
(%>%
) ・You can interpret the R code you made in
4.1 A simple scatter plot
as follows:
df1 %>% # Use df1 as data
ggplot(aes(x = math, # Assign x = math
y = stat, # Assign y = stat
color = gender)) + # Dots are colored by gender
geom_point() # Draw a scatter plot
・df1 %>% ggplot()
means the first argument of
ggplot
() is df1
Interpretation of the
R code:
・Use df1
as data
→ Assign x = math
→ Assign y = stat
→ Dots are colored by gender
→ Draw a scatter plot
・You don’t have to go backward in interpreting the R code
・R code with pipes (%>%
) are intuitive and easy to
follow