1. Practical Introduction to R
1 Introduction
This introduction covers the basics of R programming. As you progress, you’ll discover more advanced features and packages that extend R’s capabilities even further. Remember to use the help function (?function_name
) to learn more about specific functions and their usage.
2 Executing R Scripts: Full vs. Partial Execution
2.1 Full Execution of an R Script
Running a script from top to bottom is useful when you want to execute all the code at once. This is typically done after you’ve written and verified the entire script.
Steps:
- Save your script with a
.R
extension, such asmy_script.R
. - Use the
source()
function in R to execute the entire script:
source("my_script.R")
- Alternatively, in RStudio:
- Click the Source button at the top of the script editor.
- Use the shortcut
Ctrl + Shift + S
(Windows) orCmd + Shift + S
(Mac).
Example script:
# Define a function
<- function(x, y) {
add_numbers return(x + y)
}
# Perform calculations
<- add_numbers(5, 3)
result print(result)
# Generate a sequence
<- seq(1, 10, by = 2)
seq_data print(seq_data)
2.2 Partial Execution of an R Script
Running parts of your script manually is useful during the development and debugging process. This allows you to test specific sections without executing the entire script.
Steps in RStudio:
- Highlight the portion of the script you want to run.
- Press
Ctrl + Enter
(Windows) orCmd + Enter
(Mac) to run the selected lines. - If you want to run only the current line, place the cursor on that line and press the same shortcut.
# Define a function
<- function(x, y) {
add_numbers return(x + y)
}
# Highlight and run the following line:
# result <- add_numbers(5, 3)
# Debugging this line separately:
print(result)
# Highlight and run this block to test sequences:
<- seq(1, 10, by = 2)
seq_data print(seq_data)
2.3 Best Practices for Script Organization
Divide your script into sections using comments:
# Section 1: Load libraries
library(ggplot2)
# Section 2: Load data
<- read.csv("data.csv")
data
# Section 3: Analysis
summary(data)
Use descriptive variable and function names to make scripts easier to understand.
Keep your script modular—write functions for reusable code blocks.
Include comments to explain complex logic or calculations.
3 Basic R Syntax
R is case-sensitive and uses the <-
operator for assignment (though =
can also be used). Comments start with #
.
# This is a comment
<- 5 # Assign the value 5 to x
x = 10 # This also works, but <- is more common in R y
4 Objects and Variable Assignment
In R, you can assign values to variables using the assignment operator <-
:
<- 42
my_variable <- "Alice" my_name
You can view the contents of a variable by typing its name:
my_variable my_name
R is case sensitive, meaning that uppercase and lowercase letters are treated as different characters. In the example above, typing My_variable
would result in an error. Below is another example:
<- 10
x
# will print the number 5
x # will return an error X
5 Data Types in R
Understanding data types is fundamental when working with R. Data types define the kind of data you can store and manipulate. In R, there are six primary data types:
- Numeric: Decimal (real) numbers.
- Integer: Whole numbers.
- Character: Text or string data.
- Logical: TRUE or FALSE values (boolean).
- Complex: Complex numbers (rarely used in business applications).
- Raw: Raw bytes (advanced usage, rarely needed).
Examples of the fundamental data types are provided below. You can also convert data types by using functions like as.numeric()
, as.character()
, or as.logical()
# Converting a numeric to a character
<- 123
numeric_value <- as.character(numeric_value)
character_value typeof(character_value) # Output: "character"
Converting values is important when working with mixed data (e.g., reading in text files, user inputs, categorical data). The function as.character(numeric_value)
converts a numeric value 123
into a character string “123”. R will now treat this number as text and you will not be able to use this variable for calculations. This can be useful or appropriate when numbers are used as labels (e.g., phone numbers, IDs) rather than mathematical values or when using functions that require specific data types (e.g., paste()
expects characters), among others.
In R, class()
and typeof()
are both used to inspect the properties of an object, but they serve different purposes.
class()
tells you what kind of object it is in terms of R’s functionality (often human-readable and associated with the object’s use likedata.frame
,matrix
,factor
).typeof()
tells you the low-level data type the object is stored as in memory (usually in more technical terms likedouble
,integer
, orlist
).
# Example 1:
<- data.frame(a = 1:3, b = 4:6)
x class(x) # Output: "data.frame"
typeof(x) # Output: "list" (because data frames are stored as lists internally)
# Example 2:
<- c(1, 2, 3) # A numeric vector
x class(x) # Output: "numeric" (functional description)
typeof(x) # Output: "double" (internal storage type)
You can also check specific data types by using is.numeric()
, is.character()
, is.logical()
, etc., which will return a logical output.
5.1 Numeric
Numeric data includes real numbers (decimal values).
# Assigning a numeric value
<- 55000.75
salary
# Displaying the type of data
typeof(salary) # Output: "double"
5.2 Integer
Integers are whole numbers. To explicitly define an integer, add an L after the number.
# Assigning an integer value
<- 30L
age
# Displaying the type of data
typeof(age) # Output: "integer"
5.3 Character
Character data represents text or strings. In R, strings are enclosed in double (“) or single (’) quotes.
# Assigning a character string
<- "John Doe"
name
# Displaying the type of data
typeof(name) # Output: "character"
5.4 Logical
Logical values represent TRUE
or FALSE
. They are useful for decision-making and logical comparisons.
# Assigning logical values
<- TRUE
is_graduate
# Displaying the type of data
typeof(is_graduate) # Output: "logical"
5.5 Complex
Complex numbers consist of a real and an imaginary part. Business students will rarely use them.
# Assigning a complex number
<- 2 + 3i
z
# Displaying the type of data
typeof(z) # Output: "complex"
5.6 Raw
Raw data represents bytes. This is an advanced data type, generally not required for business applications.
# Creating raw data
<- charToRaw("ABC")
r
# Displaying the type of data
typeof(r) # Output: "raw"
6 Operators in R
R supports various types of operators:
- Arithmetic:
+
,-
,*
,/
,^
(exponent),%%
(modulus) - Relational:
<
,>
,<=
,>=
,==
,!=
- Logical:
&
(and),|
(or),!
(not)
<- 10
x <- 3
y
+ y
x > y
x > 5) & (y < 5) (x
7 Basic Data Structures
In addition to the fundamental data types, R also has several important structured data types:
- Vectors: One-dimensional arrays that can hold data of the same type
- Matrices: Two-dimensional arrays with data of the same type
- Lists: Can hold elements of different types
- Data Frames: Two-dimensional arrays that can hold different types of data
7.1 Vectors
Vectors are one of the most fundamental data structures in R, designed to store a sequence of elements of the same type (e.g., numeric, character, logical). They are extensively used in R for data manipulation, analysis, and computations due to their simplicity and efficiency. A vector can be created using the c()
function, which combines individual elements into a single structure. For example, c(1, 2, 3)
creates a numeric vector containing three elements. R performs operations on vectors element-wise, making it easy to perform tasks such as arithmetic, logical comparisons, and indexing. Vectors serve as building blocks for more complex data structures like matrices, lists, and data frames, making them indispensable in R programming for both basic and advanced applications.
# Numeric vector
<- c(100.5, 200.75, 300.25) # c() is used to create both numeric and character vectors
prices
# Character vector
<- c("Laptop", "Tablet", "Smartphone")
products
# Logical vector
<- c(TRUE, FALSE, TRUE)
available
# Mixed vectors
<- c("g",1,3,"m")
mixed
# Checking the type of a vector
typeof(prices) # Output: "double"
typeof(mixed) # Output: "character", if there are a mixture of strings and numbers the numbers will be forced to be characters
We can easily create vectors with repeating sequences of elements:
<- seq(1,4, by=0.5) # creates a vector going from 1 to 4 in steps of 0.5
seq_vector1 <- seq(1,4, length.out=10) # creates a vector of evenly spaced numbers from 1 to 4 with lenth 10
seq_vector2 <- 1:4 # creates a vector from 1 to 4 in steps of 1
seq_vector3
<- rep(1, times=4) # repeats the value 1 4 times
rep_vector1 <- rep(d, times=4) # repeats the vector d 4 times
rep_vector2 <- rep(d, each=4) # repeats each value in d 4 times rep_vector3
We can also access individual values from each vector using single square brackets.
# Define vector
<- seq(1,4, by=0.5)
seq_vector1
2] # Output: "1.5"
seq_vector1[3] # Output: "2" seq_vector1[
We can perform arithmetic operations with two numeric vectors in R, such as a + b
, the operation is applied component-wise. This means each element in vector d is added to the corresponding element in vector e. If the vectors are of unequal length, R recycles elements of the shorter vector until it matches the length of the longer vector, with a warning if the lengths are not multiples of each other.
<- c(1, 2, 3)
a <- c(4, 5, 6)
b + b # Output: c(5, 7, 9) (1+4, 2+5, 3+6) a
When working with vectors of strings, the paste()
function in R is a versatile tool. It is primarily used to concatenate elements of one or more vectors into strings, offering flexibility to combine text data either component-wise or by collapsing all elements into a single string. This makes paste()
particularly useful for tasks such as creating descriptive labels, formatting output, or preparing data for presentation.
Key Arguments:
sep
: Specifies the separator to place between concatenated elements. Default is a single space (" "
).collapse
: Combines all elements of a single vector into one string, using the specified delimiter. If not provided, each concatenated result remains as a separate string.
When you have two or more vectors and want to combine corresponding elements (component-wise concatenation):
# Example vectors
<- c("Laptop", "Tablet", "Smartphone")
products <- c("1000", "500", "750")
prices
# Combine product names and prices
<- paste(products, prices, sep=" - $")
result print(result) # Output: "Laptop - $1000" "Tablet - $500" "Smartphone - $750"
When you want to combine all elements of a single vector into one string:
# Example vector
<- c("Apple", "Banana", "Cherry")
items
# Collapse all items into a single string
<- paste(items, collapse=", ")
result print(result) # Output: "Apple, Banana, Cherry"
7.2 Matrices
Matrices in R are two-dimensional data structures that store elements of the same type (e.g., numeric, character, or logical) in a grid format with rows and columns. They are ideal for mathematical computations, linear algebra operations, and organizing data with fixed dimensions. Matrices are created using the matrix()
function, where you specify the data, number of rows (nrow
), and columns (ncol
). Operations like addition, subtraction, or multiplication can be applied element-wise or across rows and columns, and advanced matrix-specific operations (e.g., transposition, matrix multiplication) are supported. Matrices are often used in business applications like modeling financial data, performing statistical analyses, or visualizing multidimensional datasets.
# Creating a matrix
<- matrix(c(10, 20, 30, 40), nrow = 2, ncol = 2)
sales
# Displaying the matrix
print(sales)
# Checking the type
typeof(sales) # Output: "double"
Element-wise multiplication performs operations on corresponding elements of two matrices of the same dimensions.
# Create two matrices
<- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
A <- matrix(c(5, 6, 7, 8), nrow = 2, ncol = 2)
B
# Element-wise multiplication
<- A * B
C print(C)
# Output:
# [,1] [,2]
# [1,] 5 21
# [2,] 12 32
In the example above, the vector c(1, 2, 3, 4)
is filled column-wise by default into a 2 × 2 matrix. This results in:
A =
[,1] [,2]
[1,] 1 3
[2,] 2 4
The expression C <- A * B
performs element-wise multiplication, meaning each corresponding element in A
and B
is multiplied together.
C =
[,1] [,2]
[1,] 1*5 3*7 => [1,] 5 21
[2,] 2*6 4*8 => [2,] 12 32
You can also apply operations across rows or columns using functions like apply()
or colSums()
/rowSums()
.
# Create a matrix
<- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
M
# Sum across rows
<- rowSums(M)
row_totals print(row_totals)
# Output: [1] 9 12
# Sum across columns
<- colSums(M)
col_totals print(col_totals)
# Output: [1] 3 7 11
Transposition swaps rows and columns of a matrix. Use the t()
function for this operation.
# Transpose a matrix
<- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
M print(M)
# Output:
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
<- t(M)
transposed print(transposed)
# Output:
# [,1] [,2]
# [1,] 1 2
# [2,] 3 4
Matrix multiplication (dot product) is performed using the %*%
operator. This is different from element-wise multiplication and follows the rules of linear algebra.
# Create two matrices
<- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
A <- matrix(c(5, 6, 7, 8), nrow = 2, ncol = 2)
B
# Matrix multiplication
<- A %*% B
C print(C)
# Output:
# [,1] [,2]
# [1,] 23 31
# [2,] 34 46
There are further advanced operations, including the determinant of a square matrix, which can be calculated using det()
.
<- det(A)
det_A print(det_A)
# Output: -2 (for the given matrix A)
The inverse of a square matrix (if it exists) can be calculated using solve()
.
# Inverse of a matrix
<- solve(A)
inverse_A print(inverse_A)
# Output:
# [,1] [,2]
# [1,] -2 1.5
# [2,] 1 -0.5
Eigenvalues and eigenvectors can be calculated using the eigen()
function.
<- eigen(A)
eigen_A print(eigen_A$values) # Eigenvalues
print(eigen_A$vectors) # Eigenvectors
7.3 Lists
Lists in R are highly versatile data structures that can store elements of different types, such as vectors, matrices, data frames, and even other lists. This flexibility makes lists ideal for organizing and managing complex data, such as combining related datasets or storing model outputs. Lists are created using the list()
function, where each element can be named for easy reference. They are particularly useful in business analytics for grouping data with varied structures, such as customer demographics, sales figures, and statistical results. Accessing elements in a list is done using $
(by name) or double square brackets [[ ]]
(by index). Lists are widely used in R for functions that return multiple results, such as regression models or simulation outputs, providing a structured yet flexible way to handle diverse data.
# Creating a list
<- list(name = "John Doe", age = 30L, salary = 55000.75, active = TRUE)
employee
# Accessing elements
$name # Output: "John Doe" employee
7.4 Data Frames
Data frames in R are one of the most widely used data structures for storing and analyzing tabular data. They organize data into rows and columns, where each column can have a different data type (e.g., numeric, character, or logical), making them ideal for real-world datasets like business transactions or survey results. Data frames are created using the data.frame()
function or imported from external files like CSVs or Excel sheets. They support powerful indexing and manipulation capabilities, allowing users to filter, summarize, and transform data efficiently. Data frames are fundamental for business analytics and statistical modeling, providing a structured and intuitive way to handle datasets in R.
# Creating a data frame
<- data.frame(
df Product = c("Laptop", "Tablet", "Smartphone"),
Price = c(1000, 500, 750),
InStock = c(TRUE, TRUE, FALSE)
)
# Displaying the data frame
print(df)
8 Functions in R
Functions in R are reusable blocks of code designed to perform specific tasks, making programming more efficient and organized. They help reduce repetition, improve readability, and facilitate debugging by encapsulating logic into a single unit. Functions are essential for scalable and modular programming in R. R comes with many built-in functions for tasks like data manipulation, statistical analysis, and visualization (e.g., mean()
, summary()
).
# Using a built-in function
mean(c(1, 2, 3, 4, 5))
Users can also create custom functions using the function()
keyword, specifying input arguments and defining the operations to perform. Custom functions are especially useful for automating repetitive tasks or implementing specialized calculations. To create a function, assign it to a name and define its arguments and body, returning the desired output. For example:
# Creating a custom function
<- function(x) {
square return(x^2)
}
square(4)
9 Control Structures
Control structures like if-else statements and loops in R are fundamental tools for directing the flow of a program based on conditions and for performing repetitive tasks.
If-else statements allow conditional execution of code, enabling decisions to be made based on logical tests.
# If-else statement
<- 10
x if (x > 5) {
print("x is greater than 5")
else {
} print("x is not greater than 5")
}
Loops, such as for and while, are used for repetitive operations. For loops allow iteration over a sequence, such as a vector, list, or range. They are used for tasks where operations need to be applied to each element in a sequence. Examples include iterating over rows in a dataset, applying calculations to each column, or generating repeated output.
# Example of a for loop
for (i in 1:5) {
print(i^2)
}
While loops continue to execute a block of code as long as a specified condition is true. They are used for tasks where the number of iterations is not predefined but depends on a dynamic condition. Examples include repeatedly performing calculations until a threshold is reached or a condition changes.
# While loop
<- 1
i while (i <= 5) {
print(i^2)
<- i + 1
i }
10 Reading and Writing Data
R can read data from various file formats. Here’s an example with CSV:
# Reading a CSV file
# Assuming you have a file named "data.csv" in your working directory
<- read.csv("data.csv")
data
# Writing a CSV file
write.csv(df, "output.csv", row.names = FALSE)
For this example, you will need to create a “data.csv” file in your working directory or adjust the file path accordingly.
11 Basic Data Manipulation
R provides many functions for manipulating data:
# Assuming we're using the 'df' data frame from earlier
# Selecting a column
$Product
df
# Filtering rows
$Price > 600, ]
df[df
# Adding a new column and printing the new dataset
$HighEndt <- df$Price >= 999
dfprint(df)
# Summarizing data
summary(df)
12 Introduction to Basic Plotting
R has powerful plotting capabilities. Here’s a simple example:
# Create some data
<- 1:10
x <- x^2
y
# Create a scatter plot
plot(x, y, main = "Square Function", xlab = "x", ylab = "y")
# Add a line
lines(x, y, col = "red")
We will explore more advanced plotting with the ggplot
package later.