Split - MAKE ME ANALYST

Skip to content MAKE ME ANALYST Close menu
  • Home
  • Statistical Data Analysis
    • Basic Statistics
    • Inferential Statistics
    • Statistics with R
  • R For Data Science
  • NLP
  • Python
  • Research Methodology
    • Research Methodology
    • Research and Publication Ethics (RPE)
  • Blog
    • Data Science
    • Deep Learning
    • Machine Learning
    • MLOps
    • OCR
    • R Programming
    • ChatGPT
    • Data Science with Python
    • Blockchain

R Programming

split function in R

The split() function takes a vector or other objects and splits it into groups determined by a factor or list of factors. The basic idea is that you can take a data structure, split it into subsets defined by another variable, and apply a function over those subsets.

You can get the help file by typing ?split

?spilt

The arguments of split() can be shown by just typing split in your R console.

split

Output:

function (x, f, drop = FALSE, …)

Here,

  1. x is a vector (or list) or data frame
  2. f is a factor (or coerced to one) or a list of factors
  3. drop indicates whether empty factors levels should be dropped

Example:

Here we will simulate some data and split it according to a factor variable. Note that gl() function is used to “generate levels” in a factor variable.

 

set.seed(1)x<-runif(20, min=155, max=180) #simulate 20 random heightsy<-gl(2, 10, labels = c("Male", "Female")) #Generate factors by specifying the pattern of their levels.s<-split(x, y)slapply(s, mean)

Output:

> s $Male [1] 161.6377 164.3031 169.3213 177.7052 160.0420 177.4597 178.6169 171.5199 170.7279 156.5447

$Female [1] 160.1494 159.4139 172.1756 164.6026 174.2460 167.4425 172.9405 179.7977 164.5009 174.4361

> lapply(s, mean) $Male [1] 168.7878

$Female [1] 168.9705

Split a Data Frame:

Here we will use a dataset called airquality. To get the help file just type ?airquality. Check the structure of the data set using str(airquality).

 

?airquality library(datasets) str(airquality)

Output:

‘data.frame’: 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA … $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 … $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 … $ Temp : int 67 72 74 62 56 66 65 59 61 69 … $ Month : int 5 5 5 5 5 5 5 5 5 5 … $ Day : int 1 2 3 4 5 6 7 8 9 10 …

You can split the airquality data frame by the Month variable using following code.

mydata <- split(airquality, airquality$Month) str(mydata)

Output:

List of 5 $ 5:’data.frame’: 31 obs. of 6 variables: ..$ Ozone : int [1:31] 41 36 12 18 NA 28 23 19 8 NA … ..$ Solar.R: int [1:31] 190 118 149 313 NA NA 299 99 19 194 … ..$ Wind : num [1:31] 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 … ..$ Temp : int [1:31] 67 72 74 62 56 66 65 59 61 69 … ..$ Month : int [1:31] 5 5 5 5 5 5 5 5 5 5 … ..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 … $ 6:’data.frame’: 30 obs. of 6 variables: ..$ Ozone : int [1:30] NA NA NA NA NA NA 29 NA 71 39 … ..$ Solar.R: int [1:30] 286 287 242 186 220 264 127 273 291 323 … ..$ Wind : num [1:30] 8.6 9.7 16.1 9.2 8.6 14.3 9.7 6.9 13.8 11.5 … ..$ Temp : int [1:30] 78 74 67 84 85 79 82 87 90 87 … ..$ Month : int [1:30] 6 6 6 6 6 6 6 6 6 6 … ..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 … $ 7:’data.frame’: 31 obs. of 6 variables: ..$ Ozone : int [1:31] 135 49 32 NA 64 40 77 97 97 85 … ..$ Solar.R: int [1:31] 269 248 236 101 175 314 276 267 272 175 … ..$ Wind : num [1:31] 4.1 9.2 9.2 10.9 4.6 10.9 5.1 6.3 5.7 7.4 … ..$ Temp : int [1:31] 84 85 81 84 83 83 88 92 92 89 … ..$ Month : int [1:31] 7 7 7 7 7 7 7 7 7 7 … ..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 … $ 8:’data.frame’: 31 obs. of 6 variables: ..$ Ozone : int [1:31] 39 9 16 78 35 66 122 89 110 NA … ..$ Solar.R: int [1:31] 83 24 77 NA NA NA 255 229 207 222 … ..$ Wind : num [1:31] 6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 … ..$ Temp : int [1:31] 81 81 82 86 85 87 89 90 90 92 … ..$ Month : int [1:31] 8 8 8 8 8 8 8 8 8 8 … ..$ Day : int [1:31] 1 2 3 4 5 6 7 8 9 10 … $ 9:’data.frame’: 30 obs. of 6 variables: ..$ Ozone : int [1:30] 96 78 73 91 47 32 20 23 21 24 … ..$ Solar.R: int [1:30] 167 197 183 189 95 92 252 220 230 259 … ..$ Wind : num [1:30] 6.9 5.1 2.8 4.6 7.4 15.5 10.9 10.3 10.9 9.7 … ..$ Temp : int [1:30] 91 92 93 93 87 84 80 78 75 73 … ..$ Month : int [1:30] 9 9 9 9 9 9 9 9 9 9 … ..$ Day : int [1:30] 1 2 3 4 5 6 7 8 9 10 …

Then, you can take the column means for Ozone, Solar.R, and Wind for each sub-data frame using the following code.

sapply(mydata, function(x) {colMeans(x[, c("Ozone", "Solar.R", "Wind")])})

Output:

    5       6        7      8      9Ozone    NA     NA     NA    NA   NASolar.R NA 190.16667 216.483871 NA 167.4333Wind 11.62258 10.26667 8.941935 8.793548 10.1800

tapply

Previous

apply

Next

About    Contact   |  Privacy Policy  |  Disclaimer  |  Sitemap |   Blog   

Subscribe For Updates!!

Your Email Address

go

ShareShare Loading Comments... Write a Comment... Email (Required) Name (Required) Website We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok

Tag » How To Split In R