Chapter 1 The Basics

This section will go over the different types of data that we will use in R. Before we get into that, we need to know the basic workings of R. We can use R like a calculator. To be honest, everything that follows might be better learned from DataCamp’s Free Introduction to R. But if you’re still with me, I’m going to go over this stuff kind of quickly.

1.1 Arithmetic Operators

  • ^ : exponentiation (exponents) [E]
  • * : multiplication [M]
  • / : division [D]
  • + : addition [A]
  • - : subtraction [S]

These can be used together in parentheses [P] ( ) to determine the order of operations (PEMDAS)

2 ^ 2
## [1] 4
2 * 3
## [1] 6
4 / 2
## [1] 2
1 + 3
## [1] 4
3 - 2
## [1] 1

1.2 Creating Variables

So, as I said, knowledge of substition is a prerequisite. We can create new variables in our R environment (what’s an environment you ask?) that can be used to reference things. There are many types of variables, but will get into that later. Let’s work with something we’re all familiar with x. Note: I will use the words variable and object interchangeably throughout this.

Say we want to solve y = 3x + 2 for when x = 5. We can create a variable for x and then make one for y. We create variables with the assignment operator/arrow, <-. Mr. Jared Lander taught me that x <- 1 formally reads as _“x gets 1”. To me, it just means they are set equal, like an equation.

# This is a comment. This won't be run by R. 
x <- 1

# You can print the variable by just writing it 
x
## [1] 1

Now let’s try some math with the variable.

x * 3
## [1] 3
x + 2
## [1] 3

Alright, you’re getting the idea. How about using our formula above?

3 * x + 2
## [1] 5

Okay, now lets create a new object called, y, which is the output of the above operation.

y <- 3 * x + 2

Here, y takes on the value of whatever is to the right of it. This is the case whenever you assign any object.

1.3 Logical operators

Alright, this might get a little tricky, and if you struggle try cheking out section 5.2.2 of R for Data Science.

  • < : less than
  • > : greater than
  • <= : less than or equal to
  • >= : greater than or equal to
  • == : exactly equal (I like to think of it as “are these the same thing?”)
  • != : not equal ( “are these things not the same”)

Let’s bring it back to early algebra. Let’s say x = 3 and y = 5.

Is x less than y?

# set variables.
x <- 3
y <- 5

x < y
## [1] TRUE
# greater than?
x > y
## [1] FALSE
# less than or equal
x <= y
## [1] TRUE
# greater or equal
x >= y
## [1] FALSE
# exactly equal?
x == y
## [1] FALSE
# not equal
x != y
## [1] TRUE

Alright, so now I want to talk about the ! (called the bang operator) and its nuance. See how we put ! in front of our =? The bang operator essentially checks the opposite of a thing. So in this case it checked the opposite of equals. If we put ! in front of a logical statement, it will reverse the outcome.

1 == 1
## [1] TRUE
!(1 == 1)
## [1] FALSE

Now I want to introduce two more logical operators, and (&), and or (|). & checks multiple conditions and will return TRUE only if they are both TRUE. | will return TRUE when one of the conditions is TRUE.

Now for an illustrative example for & statements :

# We have TRUE and TRUE, this should be false because they aren't both TRUE
TRUE & FALSE 
## [1] FALSE
# both a TRUE, we expect TRUE
TRUE & TRUE
## [1] TRUE
# The first statement is TRUE, but the second is not TRUE, expect FALSE
(1 == 1) & (1 < 1)
## [1] FALSE
# The first statement is TRUE and the second is TRUE, expect TRUE
(1 == 1) & (1 <= 1)
## [1] TRUE

Illustrative | statements:

Remember, only one condition needs to be TRUE.

TRUE | TRUE
## [1] TRUE
TRUE | FALSE
## [1] TRUE
FALSE | FALSE
## [1] FALSE

1.4 Data Types!

Okay! Now you’ve got an idea of the basics of R, maybe you’ve event caught on to some types of Data. There are 4 types of data that you will encounter on a regular basis. You’ve encountered 2 of them already.

So, think back to your Stat 101 class. You dealt with discrete, continuous, and nominal data. Discrete data was whole numbers, or integers. Continuous data had decimal points. And nominal data was essentially just words.

In R we will use:

  • integer: discrete
  • numeric: continuous
  • character: nominal
  • logical

To indicate to R that a number is an integer we put an L after the number (why?). We can always check an object’s type by using the function class(). I’ll go over functions briefly later, but you should be able to pick up the intuition—tl;dr, functions take an input and make an output.

# numbers are by default "numeric"
class(2)
## [1] "numeric"
# to specify an integer we can use the `L`
class(2L)
## [1] "integer"

character data is anything surrounded by quotations. Generally, these are things like words and sentences.

my_characters <- "this is a character"
class(my_characters)
## [1] "character"

You can’t add character objects to numerics (integers will be counted as a numeric, because they are a number). Most data types have a function to check if it is indeed that data type. You can check to see if a variable is a character by using the is.character() function. It will check the object’s class, and tell you if it matches with character. The output is a logical.

logical data types are things like TRUE or FALSE. What is interesting is that TRUE is actually the value 1, and FALSE is actually 0. So you can add TRUE to a number. Funny, huh?

TRUE + 5
## [1] 6

Well, that just about covers the data types we will use. Now, it’s not often that we’ll just want to use 1 number or value for our work. Let’s get into data structures.

1.5 Data Structures

1.5.1 Vectors

The simplest data structure in R is the vector. Vectors are “one-dimensional arrays”, that’s fancy speak for an object that holds many elements of the same type. So a vector will [almost] always be numeric (integer counts as numeric, remember?), character, or logical.

We create vectors by using the function c(), which stands for combine. Each element of the vector is separated by a comma.

c(1, 2)
## [1] 1 2

If you put a character (or a string, I will use both interchangeably from here on), in a numeric vector, R will coerce (force) the numbers to be treated like a character (meaning you can’t do math with them).

c(1, 2, "characters go here")
## [1] "1"                  "2"                  "characters go here"

Notice how the numbers are in quotes now. So what happens when you put a logical inside a numeric?

c(FALSE, 1, TRUE)
## [1] 0 1 1

See how that vector contains 3 elements, 0 1 1? Remember that that FALSE is actually a 0 and TRUE is a 1!

Now, what if we want to tell R that there is missing data? We use NA to indicate missing data.

c(1, 2, NA, 4)
## [1]  1  2 NA  4

Here is a fun shortcut: if you want to create a sequence of integers, you can use a :. If I wanted the numbers 1 through 20, I could write 1:20.

my_ints <- 1:20
my_ints # this prints it, remember?
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

We know that there are 20 elements in this vector, this means that the vector has a length of 20. We can always find out how long (or how many elements are in) a vector by supplying the vector to the length() function.

# how long is `my_ints`?
length(my_ints)
## [1] 20

Ah, just as anticipated! Say we wanted to add 10 to each value, we could supply the variable name in a basic arithmetic statement.

my_ints + 10
##  [1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Say we wanted to only have the 10th element from my_ints, how would we get it? I mean, we know it’s 10, but we can’t write out every value we want every time. We can access it through an index. Indexes are accessed using brackets immediately after the object that we want to access.

my_ints[10]
## [1] 10

We can access multiple values in this vector by supplying a vector to our vectors index (think about that one for a second). To grab the 1st, 10th, and 20th element of our vector we would put a vector with the values 1, 10, 20 in our index.

my_ints[c(1, 10, 20)]
## [1]  1 10 20

But since vectors can be stored in object, we can create a vector called index and substitute that!

# create an index vector
index <- c(1, 10, 20)

# use that index to select from our vector
my_ints[index]
## [1]  1 10 20

Many of you are probably quite familiar with Excel for doing your number crunching. The table look with columns and rows is a very useful way of visualizing and handling data. In R, we use something called data frames to do this.

1.5.2 data frames

Unlike vectors, data frames are two-dimensional. I like to think of making a graph back in school where I’d know the x and y coordinates. We can think of our rows as x coordinates, and columns as y coordinates. Each coordinate pair makes a value pair in our data frame. Our two-dimensional data frames are actually made up of one-dimensional objects (vectors)!

Say we have two vectors x and y.

x <- 1:10
y <- 10:1

x
##  [1]  1  2  3  4  5  6  7  8  9 10
y
##  [1] 10  9  8  7  6  5  4  3  2  1

Notice how they are the same length! We can think of each element (observation) as one row. These vectors will become our columns in our data frame. Data frames can be constructed with data.frame(). The first argument (value) will be the vector which will become a column, and each preceding argument (separated with a comma) will be another column.

df <- data.frame(x, y)
df
##     x  y
## 1   1 10
## 2   2  9
## 3   3  8
## 4   4  7
## 5   5  6
## 6   6  5
## 7   7  4
## 8   8  3
## 9   9  2
## 10 10  1

Look at how nice that is!

So, remember how we accessed values from vectors using [] and a value(s)? To get values from a data frame, we need to supply the brackets with x, y coordinates! This is because a data frame is two-dimensional, whereas the vector was 1 dimensional.

If we want to get the first value from our x column, we need to specify that we want row (x) 1, and column (y) 1. Generally, this looks like df[x,y].

df[1,1]
## [1] 1

If we want to get whole rows, we leave the value in the column area empty (but remember the comma!).

df[1,]
##   x  y
## 1 1 10

Now the same rule of thumb applies for columns, just leave the x value empty, but remember the comma.

df[,1]
##  [1]  1  2  3  4  5  6  7  8  9 10

Additionally, we can access the underlying columns another way, by using the $ operator. To do this we would write the data frame name, followed by $ and the column name, i.e. data_frame$col_name.

To grab the x column of df we can do it like.

df$x
##  [1]  1  2  3  4  5  6  7  8  9 10

We can do math with our columns too!

df$x * df$y
##  [1] 10 18 24 28 30 30 28 24 18 10

1.6 Recap

Alright, there are a few key take aways here.

  1. We can use R to do some math
  2. We can create variables which take on values
    • we can even do math on those variables
  3. We can compare objects and values using logical statements
  4. We can store many values of the same type in one-dimensional vectors
  5. We can make two-dimensional tables from multiple one-dimensional vectors

1.7 HW

  • Create a vector called z which is the square root of each element of the vector y.
  • Create a data frame called xyz which has 3 columns made from the vectors x, y, and z.
  • Check to see if each value of the column x is greater than or equal to the corresponding value in the y column

1.7.1 HW Solutions

# Create a vector z
z <- sqrt(y)

# create a new data frame
xyz <- data.frame(x, y, z)

# compare x and y
xyz$x >= xyz$y
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE