How Can I Subset A Data Set? | R FAQ - Statistical Consulting
Maybe your like
- Skip to primary navigation
- Skip to main content
- Skip to primary sidebar
The R program (as a text file) for all the code on this page.
Subsetting is a very important component of data management and there are several ways that one can subset data in R. This page aims to give a fairly exhaustive list of the ways in which it is possible to subset a data set in R.
set.seed(1234) x <- matrix(rnorm(30, 1), ncol = 5) y <- c(1, seq(5)) #combining x and y into one matrix x <- cbind(x, y) #converting x into a data frame called x.df x.df <- data.frame(x) x.df V1 V2 V3 V4 V5 y 1 -0.2070657 0.425260040 0.22374611 0.1628283 0.30627975 1 2 1.2774292 0.453368144 1.06445882 3.4158352 -0.44820491 1 3 2.0844412 0.435548001 1.95949406 1.1340882 1.57475572 2 4 -1.3456977 0.109962171 0.88971451 0.5093141 -0.02365572 3 5 1.4291247 0.522807300 0.48899049 0.5594521 0.98486170 4 6 1.5060559 0.001613555 0.08880458 1.4595894 0.06405140 5First we will create the data frame that will be used in all the examples. We will call this data frame x.df and it will be composed of 5 variables (V1 – V5) where the values come from a normal distribution with a mean 0 and standard deviation of 1; as well as, one variable (y) containing integers from 1 to 5.
names(x.df) [1] "V1" "V2" "V3" "V4" "V5" "y"In order to verify which names are used for the variables in the data frame we use the names function.
x.sub <- subset(x.df, y > 2) x.sub V1 V2 V3 V4 V5 y 4 -1.345698 0.109962171 0.88971451 0.5093141 -0.02365572 3 5 1.429125 0.522807300 0.48899049 0.5594521 0.98486170 4 6 1.506056 0.001613555 0.08880458 1.4595894 0.06405140 5Subsetting rows using the subset function
The subset function with a logical statement will let you subset the data frame by observations. In the following example the x.sub data frame contains only the observations for which the values of the variable y is greater than 2.
x.sub1 <- subset(x.df, y > 2 & V1 > 0.6) x.sub1 V1 V2 V3 V4 V5 y 5 1.429125 0.522807300 0.48899049 0.5594521 0.9848617 4 6 1.506056 0.001613555 0.08880458 1.4595894 0.0640514 5Subsetting rows using multiple conditional statements
There is no limit to how many logical statements may be combined to achieve the subsetting that is desired. The data frame x.sub1 contains only the observations for which the values of the variable y is greater than 2 and for which the variable V1 is greater than 0.6.
x.sub2 <- subset(x.df, y > 2 & V2 > 0.4, select = c(V1, V4)) x.sub2 V1 V4 5 1.429125 0.5594521Subsetting both rows and columns
It is possible to subset both rows and columns using the subset function. The select argument lets you subset variables (columns). The data frame x.sub2 contains only the variables V1 and V4 and then only the observations of these two variables where the values of variable y are greater than 2 and the values of variable V2 are greater than 0.4.
x.sub3 <- subset(x.df, y > 3, select = V2:V5) x.sub3 V2 V3 V4 V5 5 0.522807300 0.48899049 0.5594521 0.9848617 6 0.001613555 0.08880458 1.4595894 0.0640514In the data frame x.sub3 contains only the observations in variables V2-V5 for which the values in variable y are greater than 3.
x.sub4 <- x.df[x.df$y == 1, ] x.sub4 V1 V2 V3 V4 V5 y 1 -0.2070657 0.4252600 0.2237461 0.1628283 0.3062798 1 2 1.2774292 0.4533681 1.0644588 3.4158352 -0.4482049 1Subsetting rows using indices
Another method for subsetting data sets is by using the bracket notation which designates the indices of the data set. The first index is for the rows and the second for the columns. The x.sub4 data frame contains only the observations for which the values of variable y are equal to 1. Note that leaving the index for the columns blank indicates that we want x.sub4 to contain all the variables (columns) of the original data frame.
x.sub5 <- x.df[x.df$y %in% c(1, 4), ] x.sub5 V1 V2 V3 V4 V5 y 1 -0.2070657 0.4252600 0.2237461 0.1628283 0.3062798 1 2 1.2774292 0.4533681 1.0644588 3.4158352 -0.4482049 1 5 1.4291247 0.5228073 0.4889905 0.5594521 0.9848617 4Subsetting rows selecting on more than one value
We use the %in% notation when we want to subset on multiple values of y. The x.sub5 data frame contains only the observations for which the values of variable y are equal to either 1 or 4.
x.sub6 <- x.df[, 1:2] x.sub6 V1 V2 1 -0.2070657 0.425260040 2 1.2774292 0.453368144 3 2.0844412 0.435548001 4 -1.3456977 0.109962171 5 1.4291247 0.522807300 6 1.5060559 0.001613555Subsetting columns using indices
We can also use the indices to subset the variables (columns) of the data set. The x.sub6 data frame contains only the first two variables of the x.df data frame. Note that leaving the index for the rows blank indicates that we want x.sub6 to contain all the rows of the original data frame.
x.sub7 <- x.df[, c(1, 3, 5)] x.sub7 V1 V3 V5 1 -0.2070657 0.22374611 0.30627975 2 1.2774292 1.06445882 -0.44820491 3 2.0844412 1.95949406 1.57475572 4 -1.3456977 0.88971451 -0.02365572 5 1.4291247 0.48899049 0.98486170 6 1.5060559 0.08880458 0.06405140The x.sub7 data frame contains all the rows but only the 1st, 3rd and 5th variables (columns) of the x.df data set.
x.sub8 <- x.df[c(1, 3), 3:6] x.sub8 V3 V4 V5 y 1 0.2237461 0.1628283 0.3062798 1 3 1.9594941 1.1340882 1.5747557 2Subsetting both rows and columns using indices
The x.sub8 data frame contains the 3rd-6th variables of x.df and only observations number 1 and 3.
Primary Sidebar
Click here to report an error on this page or leave a comment
Your Name (required)
Your Email (must be a valid email for us to receive the report!)
Comment/Error Report (required)
How to cite this page
UCLA OARC- © 2024 UC REGENTS
- HOME
- CONTACT
Tag » How To Subset Data In R
-
Subsetting Data - Quick-R
-
5 Ways To Subset A Data Frame In R - R-bloggers
-
How To Subset & Select DataFrame Rows & Columns In R - DataCamp
-
Subset A Data Frame
-
6 Ways Of Subsetting Data In R
-
12 Subsetting | Data Wrangling With R
-
How To Subset A Data Frame In R (4 Examples) - - Statology
-
Subset Data Frame In R With Examples
-
Subsetting Vectors, Matrices And Data Frames - RDocumentation
-
Chapter 5 Subsetting Data In R | R Lecture Notes
-
4 Subsetting | Advanced R
-
Subset Data Frame Rows In R - Datanovia
-
Subset Rows Using Column Values — Filter • Dplyr
-
Subset Columns Using Their Names And Types — Select • Dplyr