A script is a series of commands run in sequence. They build on each other to produce a desired outcome. Like this:
Before we get too far, let’s start by creating an R Notebook for you to practice with. Remember how this works? Check the previous unit on how to start a new Notebook. Be sure you do the following:
- Update/edit the header
- Create a setup chunk
- Create a new empty R code chunk
When making a script, be generous with comments describing your work. There are two ways to make comments
What is a variable? How would you define it?
Right! A variable is something that varies - it can take on different values.
In science, we measure variables, such as scores on a depression
inventory. This is a variable because the scores can be different from
person to person.
In programming we create a variable, and then assign it a value. A
variable is like a tupperware container where we can store something to
use later. It’s called a variable because it can hold different things -
the content can vary.
There are 3 main types of variables:
Note the use of arrow symbol <- to ‘assign’ a value to a variable.
To make this, you can simply type < and -.
Don’t want to type that? Use the shortcut ALT + - instead,
In the text part of your markdown document, do the following:
In a new code chunk, do the following:
In a separate code chunk, do the following:
A variable stores a single value. A vector stores many! The different items in a vector are called elements.
# Numeric Vector
MyFirstVector <- c(1, 2, 3, 4, 5)
# String Vector
DisneyPrinces <- c("Aladdin", "Hans", "Tarzan", "Captain America")
Note the c before the parentheses. To make a vector, put a list of elements in parentheses with a c in front.
In a code chunk or chunks, do the following:
# How many elements in our vector?
length(MyFirstVector) # 5
# What kind of vector is it
class(DisneyPrinces) # "character" means string
typeof(MyFirstVector) # this is more specific.
# 'double' means a real number (i.e. 3.14159, 217, 0.43)
# Naming a vector's elements
names(DisneyPrinces) <- c("Aladdin", "Frozen", "Tarzan", "Avengers")
DisneyPrinces
names(MyFirstVector) <- c("Hamburger", "Hot Dog", "Fries", "Pizza", "Steak and Lobster")
Extracting elements from a vector. Note the use of square brackets
# Extracting an element by number
MyFirstVector[1] # 1
MyFirstVector[4] # 4
DisneyPrinces[3] # Tarzan
# Extracting multiple elements by number
MyFirstVector[c(1,2)] #1 2
DisneyPrinces[c(4, 1)] # "Captain America" "Aladdin"
MyFirstVector[1:3] # 1 2 3
# 1:3 means 'from 1 to 3'. It's the same as c(1, 2, 3)
DisneyPrinces[3:1] # "Tarzan" "Hans" "Aladdin"
# 3:1 means 'from 3 to 1'. It's the same as c(3, 2, 1).
# Extracting an element from a named vector
DisneyPrinces["Avengers"] # Captain America
# Note the quotes around the name
# Replacing an element
MyFirstVector[1] <- 5 # 5 2 3 4 5
MyFirstVector[4] <- 99 #5 2 3 99 5
# Edit the vector in a pop-out window
edit(DisneyPrinces) # Not best practice
#Let's start over with this
MyFirstVector <- c(1, 2, 3, 4, 5)
# A mathematical operation will apply to all elements in the vector
MyFirstVector + 5 # 6 7 8 9 10
# If one vector operates on another, the operation proceeds in order
MyFirstVector * MyFirstVector #1 4 9 16 25
# 1 * 1, 2 * 2, 3 * 3, etc.
# What if one vector is shorter than another?
MyFirstVector * c(2, 3, 4) # 2 6 12 8 15
keep <- c(TRUE, FALSE, TRUE, TRUE) # Create a logical vector
# Now use the logical vector to filter the string vector
DisneyPrinces[keep] #"Aladdin" "Tarzan" "Captain America"
# Create a logical vector that shows what values are greater than 3
expensive = MyFirstVector > 3 # Other comparators: > < >= <= == !=
expensive #FALSE FALSE FALSE TRUE TRUE
# Now filter
MyFirstVector[expensive] # 4 5
Use minus signs to remove elements:
# Temporarily remove Tarzan
DisneyPrinces[-3] # "Aladdin" "Hans" "Captain America"
# Or use logic
DisneyPrinces[DisneyPrinces != "Tarzan"]
# the != neams NOT equal, so DisneyPrinces != "Tarzan" means
# Keep everything that is not tarzan
# To make this change permanent, we have to re-assign the vector to itself:
DisneyPrinces <- DisneyPrinces[-3] # Makes this change permanent
In a code chunk(s), do the following:
A loop is block of code (i.e., a sequence of commands) that R will execute over and over again until some termination criterion is met
When do you think we would want a loop?
When we need to do the same thing repeatedly, we use a loop
There are 2 kinds of loops:
While_Variable <- 0 #You need some kind of variable that will signal when to exit the loop
#And you usually want to (re)set this value just before the loop starts
while(While_Variable < 1000) {
#Some kind of comparator that produces a logical (T/F) value
print(While_Variable)
While_Variable <- While_Variable + 1 #What does this do?
#What would happen if we didn’t have it (or something like it)?
} # Notice the curly brackets enclosing the code to be repeated
while ( CONDITION ) {
STATEMENT1
STATEMENT2
ETC
}
for(For_Variable in 1:1000) { #Some kind of vector that determines how long the loop will run
# Code inside brackets will be repeated, like the line below:
print(For_Variable)
}
#Instead of 1:1000, we could use any vector
words <- c("it", "was", "the", "dirty", "end", "of", "winter")
for(this_word in words) {
n_letters <- nchar(this_word) #what does this do?
block_word <- toupper(this_word) #and this?
cat(block_word, "has", n_letters, "letters\n") #What about this?
}
for ( VAR in VECTOR ) {
STATEMENT1
STATEMENT2
ETC
}
You don’t know how long you’ll need the loop to run for
You know exactly how long the loop should run (or R does)
If you’re not careful, you can make a loop that keeps going forever, with no termination condition. This is bad. Which kind of loop is more likely to do this?
Try and avoid them if you can (without just repeating code) Instead, use vectorized code: apply a function to the entire vector at once, instead of one element at a time (e.g. nchar(words) creates a new vector with the lengths of all words in the “words” vector, no loop required)
In text, do the following:
In separate code chunks, do the following:
A branch statement executes a set of code only if certain conditions are met. This allows your code to make choices – treating different data values differently.
There are two kinds of branches:
Before we jump further into branches like monkeys, we need to know about conditional statements. A conditional statement is a short sentence that can be true or false. Every branch has a conditional statement -if the statement is true, the code in the branch will run. If the conditional statement is false, the code in the branch will not run.
Here are some examples of conditional statements, which can be true or false:
The simplest branch we can use is a logical variable that has a TRUE of FALSE value. Like these:
CONDITIONAL_VARIABLE <- TRUE
THIS_VARIABLE_IS_FALSE <- FALSE
But more often, we will set of a condition by comparing two things in some way. Here are some examples.
To see if two things are equal, we use a double equal sign == The code below compares the string “John” to whatever is in the variable NAME_VARIABLE and returns TRUE if they ARE THE SAME. Will it return TRUE or FALSE?
SKY_VARIABLE <- "Blue"
# The sky is blue
SKY_VARIABLE == "Blue"
To see if two things are not equal, we use exclamantion equal != The code below compares the string “John” to whatever is in the variable NAME_VARIABLE, and returns TRUE if they are NOT THE SAME. Will it return TRUE or FALSE?
PLUTO_VARIABLE <- "Not a Planet"
# Pluto is not a planet
PLUTO_VARIABLE != "Planet"
Greater than and less than compares one number to another. If the one on the open side of the ‘greater than’ sign (remember, the alligator wants to eat the bigger number), the statement is true.
Lebron_Height <- 81 # inches
Average_Baby_Height <- 20 # inches
# Lebron height is greater than baby height
Lebron_Height > Average_Baby_Height
We can also make ‘greater than or equal to’ or ‘less than or equal to’ by adding an equal sign after then greater than/less than sign, like so:
Human_Legs <- 2 # inches
Dog_Legs <- 4 # inches
# Humans legs are less than or equal to dog legs
Human_Legs <= Dog_Legs
We can link multiple conditional statements together into a ‘super-conditional’ using AND and OR, like this:
An AND statement is only true if BOTH parts of the statement are true. If one is false, the whole statement is false. An AND is represented by an ampersand &. An OR statement is true if even one of the statements are true. If one is true, the whole statement is true. An OR is represented by a vertical bar |.
Here is an AND statement in R code:
# The sky is blue AND Pluto is not a planet.
SKY_VARIABLE == "Blue" & PLUTO_VARIABLE != "Planet"
# Lebron James is taller than a baby OR Humans have fewer legs than humans.
Lebron_Height > Average_Baby_Height | Human_Legs <= Dog_Legs
Now that we know about conditional statements, let’s see how branches use conditionals to selectively run code. We’ll start with the simplest branch, the if branch.
An if branch runs the statements inside the brackets if the condition is true. Otherwise it does nothing. It looks like this.
if ( CONDITION ) {
#CODE THAT WILL RUN IF CONDITION IS TRUE
}
For example, for the if below, if the variable COVID is TRUE, R will print “Stay home!” If COVID is FALSE, or anything besides TRUE, R skips the print command.
if (COVID) {
print("Stay home!")
}
An if-else branch has code that runs if the conditional is true, as well as different code that runs if the statement is false. Like this:
if ( CONDITION ) {
#CODE THAT WILL RUN IF CONDITION IS TRUE
} else {
#CODE THAT WILL RUN IF CONDITION IS FALSE
}
if (COVID) {
print("Stay home!")
} else {
print("No Worries!")
}
Dungeons and Dragons characters have 6 ‘stats’: “Strength”, “Dexterity”, “Constitution”, “Intelligence”, “Wisdom”, and “Charisma”. Each stat determined by doing the following:
We’re going to use variables, vectors, and loops to make a character. Do the following:
What is a function? (Hint - we’ve seen LOTS of these already).
Right! A function is an R command that does something.
Did you know that you can make your own functions in R?
When might you want to? I suggest that you make your own function when you are using a set of code repeatedly in your script and you want to make things simpler and easier to remember.
FNAME <- function( ARG1, ARG2, ETC ) {
STATEMENT1
STATEMENT2
ETC
return( VALUE )
}
# give the function a name - fahrenheit_to_celsius
fahrenheit_to_celsius <- function(temp_F) {
# call "function" and give the argument variable a name - temp_F
# Do the math, creating a new variable temp_C
temp_C <- (temp_F - 32) * 5 / 9
# return temp_C as the output
return(temp_C)
}