How To Add A Column To A DataFrame In R (with 18 Code Examples)
Maybe your like
In this tutorial, we'll consider one of the most common operations used for manipulating dataframes in R: how to add a column to a dataframe in the base R.
A dataframe is one of the basic data structures of the R programming language. It is also a very versatile data structure since it can store multiple data types, be easily modified, and easily updated.
What is a Dataframe in R?
Technically speaking, a dataframe in R is a specific case of a list of vectors of the same length, where different vectors can be (and usually are) of different data types. Since a dataframe has a tabular, 2-dimensional form, it has columns (variables) and rows (data entries). If you're new to creating DataFrames in R, you may want to read How to Create a Dataframe in R before continuing with this post.
Adding a Column to a Dataframe in R
We may want to add a new column to an R dataframe for various reasons: to calculate a new variable based on the existing ones, to add a new column based on the available one but with a different format (keeping in this way both columns), to append an empty or placeholder column for further filling it, to add a column containing completely new information.
Let's explore different ways of adding a new column to a dataframe in R. For our experiments, we'll be mostly using the same dataframe called super_sleepers which we'll reconstruct each time from the following initial dataframe:
super_sleepers_initial <- data.frame(rating=1:4, animal=c('koala', 'hedgehog', 'sloth', 'panda'), country=c('Australia', 'Italy', 'Peru', 'China')) print(super_sleepers_initial) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda ChinaOur task will be to add to this dataframe a new column called avg_sleep_hours representing the average time in hours that each of the above animals sleeps per day, according to the following scheme:
| Animal | Avg hrs of sleep per day |
|---|---|
| koala | 21 |
| hedgehog | 18 |
| sloth | 17 |
| panda | 10 |
For some examples, we'll experiment with adding two other columns: avg_sleep_hours_per_year and has_tail.
Now, let's dive in.
Adding a Column to a Dataframe in R Using the $ Symbol
Since a dataframe in R is a list of vectors where each vector represents an individual column of that dataframe, we can add a column to a dataframe just by adding the corresponding new vector to this "list". The syntax is as follows:
dataframe_name$new_column_name <- vectorLet's reconstruct our super_sleepers dataframe from the initial super_sleepers_initial dataframe (we'll do so for each subsequent experiment) and add to it a column called avg_sleep_hours represented by the vector c(21, 18, 17, 10):
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # printing an empty line # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10Note that the number of items added in the vector must be equal to the current number of rows in a dataframe, otherwise, the program throws an error:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Attempting to add a new column `avg_sleep_hours` to the `super_sleepers` dataframe # with the number of items in the vector NOT EQUAL to the number of rows in the dataframe super_sleepers$avg_sleep_hours <- c(21, 18, 17) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China Error in $<-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 17): replacement has 3 rows, data has 4 Traceback: 1. <-(*tmp*, avg_sleep_hours, value = c(21, 18, 17)) 2. <-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 17)) 3. stop(sprintf(ngettext(N, "replacement has %d row, data has %d", . "replacement has %d rows, data has %d"), N, nrows), domain = NA)Instead of assigning a vector, we can assign a single value, whether numeric or character, for all the rows of a new column:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and setting it to 0 super_sleepers$avg_sleep_hours <- 0 print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 0 2 2 hedgehog Italy 0 3 3 sloth Peru 0 4 4 panda China 0In this case, the new column plays a role of a placeholder for the real values of the specified data type (in the above case, numeric) that we can insert later.
Alternatively, we can calculate a new column based on the existing ones. Let's first add the avg_sleep_hours column to our dataframe and then calculate a new column avg_sleep_hours_per_year from it. We want to know how many hours these animals sleep on average per year:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10) print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours` super_sleepers$avg_sleep_hours_per_year <- super_sleepers$avg_sleep_hours * 365 print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10 rating animal country avg_sleep_hours avg_sleep_hours_per_year 1 1 koala Australia 21 7665 2 2 hedgehog Italy 18 6570 3 3 sloth Peru 17 6205 4 4 panda China 10 3650Also, it's possible to copy a column from one dataframe to another using the following syntax:
df1$new_col <- df2$existing_colLet's replicate such a situation:
# Creating the `super_sleepers_1` dataframe with the only column rating super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1` # Note that in the new dataframe, the column is called `ANIMAL` instead of `animal` super_sleepers_1$ANIMAL <- super_sleepers_initial$animal print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating ANIMAL 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 pandaThe drawback of this approach (i.e., using the $ operator to append a column to a dataframe) is that we can't add in this way a column whose name contains white spaces or special symbols. Indeed, it can't contain anything that is not a letter (upper- or lowercase), a number, a dot, or an underscore. Also, this approach doesn't work for adding multiple columns.
Adding a Column to a Dataframe in R Using Square Brackets
Another way of adding a new column to an R dataframe is more "dataframe-style" rather than "list-style": by using bracket notation. Let's see how it works:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new `column avg_sleep_hours` to the `super_sleepers` dataframe: super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10In the piece of code above, we can substitute this line:
super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)This line can also be substituted:
super_sleepers[['avg_sleep_hours']] <- c(21, 18, 17, 10)Lastly, this one can be substituted as well:
super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)The result will be identical, those are just 3 different versions of the syntax.
As it was for the previous method, we can assign a single value instead of a vector to the new column:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and assigning it to 'Unknown' super_sleepers['avg_sleep_hours'] <- 'Unknown' print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia Unknown 2 2 hedgehog Italy Unknown 3 3 sloth Peru Unknown 4 4 panda China UnknownAs an alternative, we can calculate a new column based on the existing ones:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10) print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours` super_sleepers['avg_sleep_hours_per_year'] <- super_sleepers['avg_sleep_hours'] * 365 print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10 rating animal country avg_sleep_hours avg_sleep_hours_per_year 1 1 koala Australia 21 7665 2 2 hedgehog Italy 18 6570 3 3 sloth Peru 17 6205 4 4 panda China 10 3650Using another option we can copy a column from another dataframe:
# Creating the `super_sleepers_1` dataframe with the only column `rating` super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1` # Note that in the new dataframe, the column is called `ANIMAL` instead of `animal` super_sleepers_1['ANIMAL'] <- super_sleepers_initial['animal'] print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating ANIMAL 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 pandaThe advantage of using square brackets over the $ operator to append a column to a dataframe is that we can add a column whose name contains white spaces or any special symbols.
Adding a Column to a Dataframe in R Using the cbind() Function
The third way of adding a new column to an R dataframe is by applying the cbind() function that stands for "column-bind" and can also be used for combining two or more dataframes. Using this function is a more universal approach than the previous two since it allows adding several columns at once. Its basic syntax is as follows:
df <- cbind(df, new_col_1, new_col_2, ..., new_col_N)The piece of code below adds the avg_sleep_hours column to the super_sleepers dataframe:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10)) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10The next piece of code adds two new columns – avg_sleep_hours and has_tail – to the super_sleepers dataframe at once:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat(\n\n) # Adding two new columns `avg_sleep_hours` and `has_tail` to the `super_sleepers` dataframe super_sleepers <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10), has_tail=c('no', 'yes', 'yes', 'yes')) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours has_tail 1 1 koala Australia 21 no 2 2 hedgehog Italy 18 yes 3 3 sloth Peru 17 yes 4 4 panda China 10 yesApart from adding multiple columns at once, another advantage of using the cbind() function is that it allows assigning the result of this operation (i.e., adding one or more columns to an R dataframe) to a new dataframe leaving the initial one unchanged:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Creating a new dataframe `super_sleepers_new` based on `super_sleepers` with a new column `avg_sleep_hours` super_sleepers_new <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10), has_tail=c('no', 'yes', 'yes', 'yes')) print(super_sleepers_new) cat('\n\n') print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours has_tail 1 1 koala Australia 21 no 2 2 hedgehog Italy 18 yes 3 3 sloth Peru 17 yes 4 4 panda China 10 yes rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda ChinaAs it was for the previous two approaches, inside the cbind() function, we can assign a single value to the whole new column:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and setting it to 0.999 super_sleepers <- cbind(super_sleepers, avg_sleep_hours=0.999) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 0.999 2 2 hedgehog Italy 0.999 3 3 sloth Peru 0.999 4 4 panda China 0.999Another option allows us to calculate it based on the existing columns:
# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10)) print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours` super_sleepers <- cbind(super_sleepers, avg_sleep_hours_per_year=super_sleepers['avg_sleep_hours'] * 365) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10 rating animal country avg_sleep_hours avg_sleep_hours 1 1 koala Australia 21 7665 2 2 hedgehog Italy 18 6570 3 3 sloth Peru 17 6205 4 4 panda China 10 3650With the following option we can copy a column from another dataframe:
# Creating the `super_sleepers_1` dataframe with the only column `rating` super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initia`l to `super_sleepers_1` # Note that in the new dataframe, the column is still called `animal` despite setting the new name `ANIMAL` super_sleepers_1 <- cbind(super_sleepers_1, ANIMAL=super_sleepers_initial['animal']) print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating animal 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 pandaHowever, unlike the $ operator and square bracket approaches, pay attention to the following two nuances here:
- We can't create a new column and calculate one more column based on the new one inside the same cbind() function. For example, the piece of code below will throw an error.
- When we copy a column from another dataframe and try to give it a new name inside the cbind() function, this new name will be ignored, and the new column will be called exactly as it was called in the original dataframe. For example, in the piece of code below, the new name ANIMAL was ignored, and the new column was called animal, just as in the dataframe from which it was copied:
Conclusion
In this tutorial, we discussed the various reasons why we may need to add a new column to an R dataframe and what kind of information it can store. Then, we explored the three different ways of doing so: using the \$ symbol, square brackets, and the cbind() function. We considered the syntax of each of those approaches and its possible variations, the pros and cons of each method, possible additional functionalities, the most common pitfalls and errors, and how to avoid them. Also, we learned how to add multiple columns to an R dataframe at once.
It's worth noting that the discussed approaches are not the only ways to add a column to a dataframe in R. For example, for the same purpose, we can use the mutate() or add_column() functions. However, to be able to apply these functions, we need to install and load specific R packages (dplyr and tibble, respectively) without them adding any extra functionalities to the operation of interest than those that we discussed in this tutorial. Instead, using the $ symbol, square brackets, and the cbind() function doesn't require any installation to be implemented in the base R.
If you'd like to learn more about working with dataframes in R, check out How to Append Rows to a Dataframe in R (with 7 Code Examples)
TutorialsTag » Add Df To List R
-
Append A Data Frame To A List - Stack Overflow
-
How To Add A Data Frame Inside A List In R? - Tutorialspoint
-
Create List Of Data Frames In R (Example) - Statistics Globe
-
Insert List As Dataframe Column In R - GeeksforGeeks
-
How To Create And Manipulate Lists And Data Frames In R
-
Pandas – Append A List As A Row To DataFrame
-
R Lists: Create, Append And Modify List Components - DataMentor
-
Appending Dataframes To List Named By Variable - RStudio Community
-
How To Convert A List To A Dataframe In R - Dplyr
-
Data Wrangling: Dataframes, Matrices, And Lists | Introduction To R
-
R Data Frame: How To Create, Append, Select & Subset - Guru99
-
Appending List To Data Frame In R
-
How To Add/append An Item To A List In R? - Tutorial Kart
-
How To Append Output From A For Loop To A Dataframe In R? - ProjectPro