Subset Data Frame In R With Examples
Maybe your like
To get the subset of the data frame by rows & columns in R, you can use the basic R subset() function, square bracket notation df[], or filter() from dplyr package. The subset() is a versatile R function that allows to subset the data frame based on specified conditions for the rows and columns (In R terms observations & variables). In this article, I will explain different ways of subsetting a data frame based on rows and columns.
AdvertisementsRelated: In R, you can also subset the vectors and matrices.
Key Points-
- The subset() function allows you to extract rows and columns based on specified conditions.
- You can use logical operators like ==, &, |, and %in% to define conditions for row subsetting.
- df[] notation uses square brackets to extract specific rows and columns by index or name.
- filter() from dplyr is a more versatile function for subsetting data frames based on conditions.
- Both subset() and df[] allow combining conditions with logical operators.
1. Create DataFrame
Let’s create a DataFrame in R, and run the examples to subset the data.frame (DataFrame) by rows and columns.
# Create DataFrame df <- data.frame( id = c(10,11,12,13,14,15,16,17), name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'), gender = c('M','M',NA,'F','M','M','M','F'), dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16', '1995-03-02','1991-6-21','1986-3-24','1990-8-26')), state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'), row.names=c('r1','r2','r3','r4','r5','r6','r7','r8') ) dfYields below output.

2. Subset DataFrame by Rows
In R, a subset() function is used to subset the data frame by the observations and variables. Also used to get a subset of vectors and a subset of matrices.
2.1 Syntax of the subset()
Below is the syntax of the subset() function
# Syntax of the subset() function subset(x, subset, select, drop = FALSE, …)This function takes four arguments where the first argument is the input object x, the second argument is the subset expression, the third is to specify what variables to select, and the fourth argument is drop.
This function returns a subset of a data frame by rows and columns based on a specific condition.
# Subset a data frame by specifed row name subset(df, subset=rownames(df) == 'r1') # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # subset a data frame by vector of row names(multiple rows) subset(df, rownames(df) %in% c('r1','r2','r3')) # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r2 11 ram M 1981-03-24 NY # r3 12 deepika <NA> 1987-06-14 <NA> # subset a data frame based on condition subset(df, gender == 'M') # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r2 11 ram M 1981-03-24 NY # r5 14 kumar M 1995-03-02 DC # r6 15 scott M 1991-06-21 DW # r7 16 Don M 1986-03-24 AZ # subset a data frame by condition with %in% subset(df, state %in% c('CA','DC')) # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r5 14 kumar M 1995-03-02 DC # subset a data farme by multiple conditions using | subset(df, gender == 'M' | state == 'PH') # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r2 11 ram M 1981-03-24 NY # r5 14 kumar M 1995-03-02 DC # r6 15 scott M 1991-06-21 DW # r7 16 Don M 1986-03-24 AZ # r8 17 Lin F 1990-08-26 PH # subset a data frame by multiple conditions using & subset(df, gender == 'M' & state %in% c('CA','NY')) # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r2 11 ram M 1981-03-24 NY2.1 Using df[] Notation
By using bracket notation on the R data frame we can subset the data frame by rows based on single/multiple/range of row indexes, column values, and single/multiple conditions.
# Subset a data frame by Row Index df[3,] # Output: # id name gender dob state # r3 12 deepika <NA> 1987-06-14 <NA> # Subset a data frame by List of row indexex df[c(3,4,6),] # Output: # id name gender dob state # r3 12 deepika <NA> 1987-06-14 <NA> # r4 13 sahithi F 1985-08-16 <NA> # r6 15 scott M 1991-06-21 DW # Select Rows by Index Range df[3:6,] # Output: # id name gender dob state # r3 12 deepika <NA> 1987-06-14 <NA> # r4 13 sahithi F 1985-08-16 <NA> # r5 14 kumar M 1995-03-02 DC # r6 15 scott M 1991-06-21 DW # Subset a data frame by column value df[df$gender == 'M',] # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r2 11 ram M 1981-03-24 NY # NA NA <NA> <NA> <NA> <NA> # r5 14 kumar M 1995-03-02 DC # r6 15 scott M 1991-06-21 DW # r7 16 Don M 1986-03-24 AZ # Subset a data frame by vector ofcolumn Values df[df$state %in% c('CA','AZ','PH'),] # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r7 16 Don M 1986-03-24 AZ # r8 17 Lin F 1990-08-26 PH # Subset a data frame byrows based on multiple conditions df[df$gender == 'M' & df$id > 15,] # Output: # id name gender dob state # r7 16 Don M 1986-03-24 AZ3. Subset DataFrame Columns
In this section, I will cover how to subset DataFrame (data.frame) columns by using the subset() method, df[] notation, and filter() from dplyr package.
3.1 Using subset() Function
The below examples subset’s DataFrame (data.frame) columns by name and index.
#subset a data frame column Names subset(df,gender=='M',select=c('id','name','gender')) # Output: # id name gender # r1 10 sai M # r2 11 ram M # r5 14 kumar M # r6 15 scott M # r7 16 Don M # subset a data frame by column Indexes subset(df,gender=='M',select=c(1,2,3)) # Output: # The output same as the above3.2 Using df[] Notation
By using df[] notation you can also subset the columns. From the following, the example gets the columns with indices 2 and 3 and the second gets the same result but uses the column names.
# Subset a data frame by vector of columns with indices 2 & 3 df[,c(2,3)] or # Subset a data frame by vector of columns with name and gender df[,c('name','gender')] # Output: # name gender # r1 sai M # r2 ram M # r3 deepika <NA> # r4 sahithi F # r5 kumar M # r6 scott M # r7 Don M # r8 Lin F4. Using filter() Function
Alternatively, you can use the filter() function from the dplyr package to select the specific rows from the data frame. To use this package, you must first install it with the command install.packages('dplyr') and then load it into your environment with library(dplyr).
# Using dplyr::filter subset a data frame dplyr::filter(df, state %in% c("CA", "AZ", "PH")) # Output: # id name gender dob state # r1 10 sai M 1990-10-02 CA # r7 16 Don M 1986-03-24 AZ # r8 17 Lin F 1990-08-26 PH5. Complete Example of R Subset DataFrame
# Create DataFrame df <- data.frame( id = c(10,11,12,13,14,15,16,17), name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'), gender = c('M','M',NA,'F','M','M','M','F'), dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16', '1995-03-02','1991-6-21','1986-3-24','1990-8-26')), state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'), row.names=c('r1','r2','r3','r4','r5','r6','r7','r8') ) df # subset by row name subset(df, subset=rownames(df) == 'r1') # subset row by vector of row names subset(df, rownames(df) %in% c('r1','r2','r3')) # subset by condition subset(df, gender == 'M') # subset by condition with %in% subset(df, state %in% c('CA','DC')) # subset by multiple conditions using | subset(df, gender == 'M' | state == 'PH') # subset by multiple conditions using & subset(df, gender == 'M' & state %in% c('CA','NY')) # subset Rows by Index df[3,] # subset Rows by List of Index Values df[c(3,4,6),] # subset Rows by Index Range df[3:6,] # subset Rows by column value df[df$gender == 'M',] # subset Rows by vector of Values df[df$state %in% c('CA','AZ','PH'),] # subset Rows by Checking multiple conditions df[df$gender == 'M' & df$id > 15,] # Using dplyr::filter dplyr::filter(df, state %in% c("CA", "AZ", "PH")) # Subset columns by Name subset(df,gender=='M',select=c('id','name','gender')) # subset columns by Index subset(df,gender=='M',select=c(1,2,3)) # subset columns with indices 2 & 3 df[,c(2,3)] # subset columns name and gender df[,c('name','gender')]6. Conclusion
In this article, I have explained the concept of subsetting a data frame in R is a common task and can be achieved through different methods, such as subset(), square bracket notation df[], and filter() function from dplyr is essential for effective data manipulation and analysis in R.
Related Articles
- R subset multiple conditions
- R Subset Data Frame by Column Value & Name
- R – Create DataFrame from Existing DataFrame
- How to Subset Vector in R?
- How to subset a matrix in R?
- How to filter dataframe by column value?
References
- https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/subset
Tag » How To Subset Data In R
-
Subsetting Data - Quick-R
-
5 Ways To Subset A Data Frame In R - R-bloggers
-
How Can I Subset A Data Set? | R FAQ - Statistical Consulting
-
How To Subset & Select DataFrame Rows & Columns In R - DataCamp
-
Subset A Data Frame
-
6 Ways Of Subsetting Data In R
-
12 Subsetting | Data Wrangling With R
-
How To Subset A Data Frame In R (4 Examples) - - Statology
-
Subsetting Vectors, Matrices And Data Frames - RDocumentation
-
Chapter 5 Subsetting Data In R | R Lecture Notes
-
4 Subsetting | Advanced R
-
Subset Data Frame Rows In R - Datanovia
-
Subset Rows Using Column Values — Filter • Dplyr
-
Subset Columns Using Their Names And Types — Select • Dplyr