How To Add A Column To A DataFrame In R (with 18 Code Examples)

Maybe your like

In this tutorial, we'll consider one of the most common operations used for manipulating dataframes in R: how to add a column to a dataframe in the base R.

A dataframe is one of the basic data structures of the R programming language. It is also a very versatile data structure since it can store multiple data types, be easily modified, and easily updated.

What is a Dataframe in R?

Technically speaking, a dataframe in R is a specific case of a list of vectors of the same length, where different vectors can be (and usually are) of different data types. Since a dataframe has a tabular, 2-dimensional form, it has columns (variables) and rows (data entries). If you're new to creating DataFrames in R, you may want to read How to Create a Dataframe in R before continuing with this post.

Adding a Column to a Dataframe in R

We may want to add a new column to an R dataframe for various reasons: to calculate a new variable based on the existing ones, to add a new column based on the available one but with a different format (keeping in this way both columns), to append an empty or placeholder column for further filling it, to add a column containing completely new information.

Let's explore different ways of adding a new column to a dataframe in R. For our experiments, we'll be mostly using the same dataframe called super_sleepers which we'll reconstruct each time from the following initial dataframe:

super_sleepers_initial <- data.frame(rating=1:4, animal=c('koala', 'hedgehog', 'sloth', 'panda'), country=c('Australia', 'Italy', 'Peru', 'China')) print(super_sleepers_initial) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China

Our task will be to add to this dataframe a new column called avg_sleep_hours representing the average time in hours that each of the above animals sleeps per day, according to the following scheme:

Animal	Avg hrs of sleep per day
koala	21
hedgehog	18
sloth	17
panda	10

For some examples, we'll experiment with adding two other columns: avg_sleep_hours_per_year and has_tail.

Now, let's dive in.

Adding a Column to a Dataframe in R Using the $ Symbol

Since a dataframe in R is a list of vectors where each vector represents an individual column of that dataframe, we can add a column to a dataframe just by adding the corresponding new vector to this "list". The syntax is as follows:

dataframe_name$new_column_name <- vector

Let's reconstruct our super_sleepers dataframe from the initial super_sleepers_initial dataframe (we'll do so for each subsequent experiment) and add to it a column called avg_sleep_hours represented by the vector c(21, 18, 17, 10):

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # printing an empty line # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10

Note that the number of items added in the vector must be equal to the current number of rows in a dataframe, otherwise, the program throws an error:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Attempting to add a new column `avg_sleep_hours` to the `super_sleepers` dataframe # with the number of items in the vector NOT EQUAL to the number of rows in the dataframe super_sleepers$avg_sleep_hours <- c(21, 18, 17) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China Error in $<-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 17): replacement has 3 rows, data has 4 Traceback: 1. <-(*tmp*, avg_sleep_hours, value = c(21, 18, 17)) 2. <-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 17)) 3. stop(sprintf(ngettext(N, "replacement has %d row, data has %d", . "replacement has %d rows, data has %d"), N, nrows), domain = NA)

Instead of assigning a vector, we can assign a single value, whether numeric or character, for all the rows of a new column:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and setting it to 0 super_sleepers$avg_sleep_hours <- 0 print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 0 2 2 hedgehog Italy 0 3 3 sloth Peru 0 4 4 panda China 0

In this case, the new column plays a role of a placeholder for the real values of the specified data type (in the above case, numeric) that we can insert later.

Alternatively, we can calculate a new column based on the existing ones. Let's first add the avg_sleep_hours column to our dataframe and then calculate a new column avg_sleep_hours_per_year from it. We want to know how many hours these animals sleep on average per year:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10) print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours` super_sleepers$avg_sleep_hours_per_year <- super_sleepers$avg_sleep_hours * 365 print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10 rating animal country avg_sleep_hours avg_sleep_hours_per_year 1 1 koala Australia 21 7665 2 2 hedgehog Italy 18 6570 3 3 sloth Peru 17 6205 4 4 panda China 10 3650

Also, it's possible to copy a column from one dataframe to another using the following syntax:

df1$new_col <- df2$existing_col

Let's replicate such a situation:

# Creating the `super_sleepers_1` dataframe with the only column rating super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1` # Note that in the new dataframe, the column is called `ANIMAL` instead of `animal` super_sleepers_1$ANIMAL <- super_sleepers_initial$animal print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating ANIMAL 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 panda

The drawback of this approach (i.e., using the $ operator to append a column to a dataframe) is that we can't add in this way a column whose name contains white spaces or special symbols. Indeed, it can't contain anything that is not a letter (upper- or lowercase), a number, a dot, or an underscore. Also, this approach doesn't work for adding multiple columns.

Adding a Column to a Dataframe in R Using Square Brackets

Another way of adding a new column to an R dataframe is more "dataframe-style" rather than "list-style": by using bracket notation. Let's see how it works:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new `column avg_sleep_hours` to the `super_sleepers` dataframe: super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10

In the piece of code above, we can substitute this line:

super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)

This line can also be substituted:

super_sleepers[['avg_sleep_hours']] <- c(21, 18, 17, 10)

Lastly, this one can be substituted as well:

super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10)

The result will be identical, those are just 3 different versions of the syntax.

As it was for the previous method, we can assign a single value instead of a vector to the new column:

As an alternative, we can calculate a new column based on the existing ones:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers['avg_sleep_hours'] <- c(21, 18, 17, 10) print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours` super_sleepers['avg_sleep_hours_per_year'] <- super_sleepers['avg_sleep_hours'] * 365 print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10 rating animal country avg_sleep_hours avg_sleep_hours_per_year 1 1 koala Australia 21 7665 2 2 hedgehog Italy 18 6570 3 3 sloth Peru 17 6205 4 4 panda China 10 3650

Using another option we can copy a column from another dataframe:

# Creating the `super_sleepers_1` dataframe with the only column `rating` super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1` # Note that in the new dataframe, the column is called `ANIMAL` instead of `animal` super_sleepers_1['ANIMAL'] <- super_sleepers_initial['animal'] print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating ANIMAL 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 panda

The advantage of using square brackets over the $ operator to append a column to a dataframe is that we can add a column whose name contains white spaces or any special symbols.

Adding a Column to a Dataframe in R Using the cbind() Function

The third way of adding a new column to an R dataframe is by applying the cbind() function that stands for "column-bind" and can also be used for combining two or more dataframes. Using this function is a more universal approach than the previous two since it allows adding several columns at once. Its basic syntax is as follows:

df <- cbind(df, new_col_1, new_col_2, ..., new_col_N)

The piece of code below adds the avg_sleep_hours column to the super_sleepers dataframe:

The next piece of code adds two new columns – avg_sleep_hours and has_tail – to the super_sleepers dataframe at once:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat(\n\n) # Adding two new columns `avg_sleep_hours` and `has_tail` to the `super_sleepers` dataframe super_sleepers <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10), has_tail=c('no', 'yes', 'yes', 'yes')) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours has_tail 1 1 koala Australia 21 no 2 2 hedgehog Italy 18 yes 3 3 sloth Peru 17 yes 4 4 panda China 10 yes

Apart from adding multiple columns at once, another advantage of using the cbind() function is that it allows assigning the result of this operation (i.e., adding one or more columns to an R dataframe) to a new dataframe leaving the initial one unchanged:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Creating a new dataframe `super_sleepers_new` based on `super_sleepers` with a new column `avg_sleep_hours` super_sleepers_new <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10), has_tail=c('no', 'yes', 'yes', 'yes')) print(super_sleepers_new) cat('\n\n') print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours has_tail 1 1 koala Australia 21 no 2 2 hedgehog Italy 18 yes 3 3 sloth Peru 17 yes 4 4 panda China 10 yes rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China

As it was for the previous two approaches, inside the cbind() function, we can assign a single value to the whole new column:

Another option allows us to calculate it based on the existing columns:

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe super_sleepers <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10)) print(super_sleepers) cat('\n\n') # Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours` super_sleepers <- cbind(super_sleepers, avg_sleep_hours_per_year=super_sleepers['avg_sleep_hours'] * 365) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China rating animal country avg_sleep_hours 1 1 koala Australia 21 2 2 hedgehog Italy 18 3 3 sloth Peru 17 4 4 panda China 10 rating animal country avg_sleep_hours avg_sleep_hours 1 1 koala Australia 21 7665 2 2 hedgehog Italy 18 6570 3 3 sloth Peru 17 6205 4 4 panda China 10 3650

With the following option we can copy a column from another dataframe:

# Creating the `super_sleepers_1` dataframe with the only column `rating` super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initia`l to `super_sleepers_1` # Note that in the new dataframe, the column is still called `animal` despite setting the new name `ANIMAL` super_sleepers_1 <- cbind(super_sleepers_1, ANIMAL=super_sleepers_initial['animal']) print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating animal 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 panda

However, unlike the $ operator and square bracket approaches, pay attention to the following two nuances here:

We can't create a new column and calculate one more column based on the new one inside the same cbind() function. For example, the piece of code below will throw an error.

# Reconstructing the `super_sleepers` dataframe super_sleepers <- super_sleepers_initial print(super_sleepers) cat('\n\n') # Attempting to add a new column `avg_sleep_hours` to the `super_sleepers` dataframe # AND another new column `avg_sleep_hours_per_year` based on it super_sleepers <- cbind(super_sleepers, avg_sleep_hours=c(21, 18, 17, 10), avg_sleep_hours_per_year=super_sleepers['avg_sleep_hours'] * 365) print(super_sleepers) Output: rating animal country 1 1 koala Australia 2 2 hedgehog Italy 3 3 sloth Peru 4 4 panda China Error in <code>[.data.frame</code>(super_sleepers, "avg_sleep_hours"): undefined columns selected Traceback: 1. cbind(super_sleepers, avg_sleep_hours = c(21, 18, 17, 10), avg_sleep_hours_per_year = super_sleepers["avg_sleep_hours"] * . 365) 2. super_sleepers["avg_sleep_hours"] 3. <code>[.data.frame</code>(super_sleepers, "avg_sleep_hours") 4. stop("undefined columns selected")

When we copy a column from another dataframe and try to give it a new name inside the cbind() function, this new name will be ignored, and the new column will be called exactly as it was called in the original dataframe. For example, in the piece of code below, the new name ANIMAL was ignored, and the new column was called animal, just as in the dataframe from which it was copied:

# Creating the `super_sleepers_1` dataframe with the only column `rating` super_sleepers_1 <- data.frame(rating=1:4) print(super_sleepers_1) cat('\n\n') # Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1` # Note that in the new dataframe, the column is still called `animal` despite setting the new name `ANIMAL` super_sleepers_1 <- cbind(super_sleepers_1, ANIMAL=super_sleepers_initial['animal']) print(super_sleepers_1) Output: rating 1 1 2 2 3 3 4 4 rating animal 1 1 koala 2 2 hedgehog 3 3 sloth 4 4 panda

Conclusion

In this tutorial, we discussed the various reasons why we may need to add a new column to an R dataframe and what kind of information it can store. Then, we explored the three different ways of doing so: using the \$ symbol, square brackets, and the cbind() function. We considered the syntax of each of those approaches and its possible variations, the pros and cons of each method, possible additional functionalities, the most common pitfalls and errors, and how to avoid them. Also, we learned how to add multiple columns to an R dataframe at once.

It's worth noting that the discussed approaches are not the only ways to add a column to a dataframe in R. For example, for the same purpose, we can use the mutate() or add_column() functions. However, to be able to apply these functions, we need to install and load specific R packages (dplyr and tibble, respectively) without them adding any extra functionalities to the operation of interest than those that we discussed in this tutorial. Instead, using the $ symbol, square brackets, and the cbind() function doesn't require any installation to be implemented in the base R.

If you'd like to learn more about working with dataframes in R, check out How to Append Rows to a Dataframe in R (with 7 Code Examples)

Tutorials

Tag » Add Df To List R

How To Add A Column To A DataFrame In R (with 18 Code Examples)

What is a Dataframe in R?

Adding a Column to a Dataframe in R

Adding a Column to a Dataframe in R Using the $ Symbol

Adding a Column to a Dataframe in R Using Square Brackets

Adding a Column to a Dataframe in R Using the cbind() Function

Conclusion

Append A Data Frame To A List - Stack Overflow

How To Add A Data Frame Inside A List In R? - Tutorialspoint

Create List Of Data Frames In R (Example) - Statistics Globe

Insert List As Dataframe Column In R - GeeksforGeeks

How To Create And Manipulate Lists And Data Frames In R

Pandas – Append A List As A Row To DataFrame

R Lists: Create, Append And Modify List Components - DataMentor

Appending Dataframes To List Named By Variable - RStudio Community

How To Convert A List To A Dataframe In R - Dplyr

Data Wrangling: Dataframes, Matrices, And Lists | Introduction To R

R Data Frame: How To Create, Append, Select & Subset - Guru99

Appending List To Data Frame In R

How To Add/append An Item To A List In R? - Tutorial Kart

How To Append Output From A For Loop To A Dataframe In R? - ProjectPro

Contact