How To Add A Column To A DataFrame In Python Pandas

DOWNLOAD 51 PYTHON PROGRAMS PDF FREE

When working with multiple datasets in Python, I often need to combine information from different dataframes. One common task is adding a column from one dataframe to another.

This might seem simple, but there are several methods to accomplish this, each with its advantages.

In this article, I’ll walk you through six practical methods to add a column from another dataframe in Pandas. So let’s get in..

Table of Contents

Toggle
  • Methods to Add a Column From Another Dataframe In Pandas Python
    • 1: Use pandas.DataFrame.join() Method
    • 2: Use pandas.concat() Method
    • 3: Use pandas.DataFrame.merge() Method
    • 4: Use pandas.Series.map() Method
    • 5: Use DataFrame.assign() Method
    • 6: Use DataFrame.loc[] Method
  • Real-World Example: Combine Customer and Order Data
  • Performance Considerations of All the Methods

Methods to Add a Column From Another Dataframe In Pandas Python

Now, I will explain how to add a column from another Dataframe in Pandas Python.

1: Use pandas.DataFrame.join() Method

The join() method in Python is one of my go-to approaches when I need to add columns from another dataframe. It’s particularly useful when you want to join dataframes based on their indices.

Let’s start with a practical example. Imagine we have two dataframes containing information about sales representatives and their performance:

import pandas as pd # Sales team dataframe sales_team = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'], 'Region': ['East', 'West', 'North', 'South', 'Central'] }) # Sales performance dataframe sales_performance = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Monthly_Sales': [15000, 22000, 18000, 25000, 19500], 'Targets_Met': [3, 4, 3, 5, 4] }) # Set Rep_ID as index for both dataframes sales_team.set_index('Rep_ID', inplace=True) sales_performance.set_index('Rep_ID', inplace=True) # Add Monthly_Sales column from sales_performance to sales_team result = sales_team.join(sales_performance['Monthly_Sales']) print(result)

Output:

Name Region Monthly_Sales Rep_ID 101 John Smith East 15000 102 Sarah Johnson West 22000 103 Mike Davis North 18000 104 Emma Wilson South 25000 105 Robert Brown Central 19500

I executed the above example code and added the screenshot below

pandas add column from another dataframe

The join() method works beautifully when both dataframes share the same index. It’s clean and efficient for this specific scenario.

2: Use pandas.concat() Method

Python’s concat() method is extremely versatile for combining dataframes. When I need to add columns from one dataframe to another, especially when they share the same structure and index, concat() is often my choice.

Here’s how to use it:

import pandas as pd # Reset indices to work with concat sales_team = sales_team.reset_index() sales_performance = sales_performance.reset_index() # Using concat to add Monthly_Sales column result = pd.concat([sales_team, sales_performance['Monthly_Sales']], axis=1) print(result)

Output:

Rep_ID Name Region Monthly_Sales 0 101 John Smith East 15000 1 102 Sarah Johnson West 22000 2 103 Mike Davis North 18000 3 104 Emma Wilson South 25000 4 105 Robert Brown Central 19500

I executed the above example code and added the screenshot below

add column from one dataframe to another

The concat() method combined our dataframes along axis=1 (columns). This works well when dataframes have the same length and order, which is common in many data analysis scenarios.

3: Use pandas.DataFrame.merge() Method

The merge() method in Python is my preferred approach when I need more control over how dataframes are combined. It’s particularly useful for scenarios involving different types of joins (inner, outer, left, right).

Let’s see it in action:

import pandas as pd # Create our sample dataframes again sales_team = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'], 'Region': ['East', 'West', 'North', 'South', 'Central'] }) sales_performance = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Monthly_Sales': [15000, 22000, 18000, 25000, 19500], 'Targets_Met': [3, 4, 3, 5, 4] }) # Merge dataframes on Rep_ID to add Monthly_Sales column result = sales_team.merge(sales_performance[['Rep_ID', 'Monthly_Sales']], on='Rep_ID', how='left') print(result)

Output:

Rep_ID Name Region Monthly_Sales 0 101 John Smith East 15000 1 102 Sarah Johnson West 22000 2 103 Mike Davis North 18000 3 104 Emma Wilson South 25000 4 105 Robert Brown Central 19500

I executed the above example code and added the screenshot below

add columns from one dataframe to another

The merge() method is incredibly flexible. In this example, I used a left join to ensure all records from sales_team are preserved. This is especially useful when dealing with datasets that might not have perfect one-to-one relationships.

4: Use pandas.Series.map() Method

The map() method in Python is elegant when you need to map values from one dataframe to another based on a key. I frequently use this approach for simple column additions.

Here’s how it works:

import pandas as pd # Create our sample dataframes sales_team = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'], 'Region': ['East', 'West', 'North', 'South', 'Central'] }) sales_performance = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Monthly_Sales': [15000, 22000, 18000, 25000, 19500], 'Targets_Met': [3, 4, 3, 5, 4] }) # Create a mapping dictionary from sales_performance sales_map = dict(zip(sales_performance.Rep_ID, sales_performance.Monthly_Sales)) # Add Monthly_Sales to sales_team using map sales_team['Monthly_Sales'] = sales_team['Rep_ID'].map(sales_map) print(sales_team)

Output:

Rep_ID Name Region Monthly_Sales 0 101 John Smith East 15000 1 102 Sarah Johnson West 22000 2 103 Mike Davis North 18000 3 104 Emma Wilson South 25000 4 105 Robert Brown Central 19500

The map() method is straightforward and efficient, especially for simple one-to-one mappings. It’s also very readable, making code maintenance easier.

Read Python Pandas Write to Excel

5: Use DataFrame.assign() Method

The assign() method provides a clean, functional approach to adding columns. I often use it when I want to maintain immutability in my data processing pipeline.

Here’s how to use assign():

import pandas as pd # Create our sample dataframes sales_team = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'], 'Region': ['East', 'West', 'North', 'South', 'Central'] }) sales_performance = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Monthly_Sales': [15000, 22000, 18000, 25000, 19500], 'Targets_Met': [3, 4, 3, 5, 4] }) # Create a mapping dictionary sales_map = dict(zip(sales_performance.Rep_ID, sales_performance.Monthly_Sales)) # Use assign to add the column result = sales_team.assign(Monthly_Sales=lambda x: x.Rep_ID.map(sales_map)) print(result)

Output:

Rep_ID Name Region Monthly_Sales 0 101 John Smith East 15000 1 102 Sarah Johnson West 22000 2 103 Mike Davis North 18000 3 104 Emma Wilson South 25000 4 105 Robert Brown Central 19500

The assign() method doesn’t modify the original dataframe, which is great for maintaining data integrity throughout your analysis process.

Check out Create Plots Using Pandas crosstab() in Python

6: Use DataFrame.loc[] Method

The loc[] method gives you direct, intuitive access to add columns. I find it particularly useful when I need fine-grained control over how data is added.

Let’s see it in action:

import pandas as pd # Create our sample dataframes sales_team = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'], 'Region': ['East', 'West', 'North', 'South', 'Central'] }) sales_performance = pd.DataFrame({ 'Rep_ID': [101, 102, 103, 104, 105], 'Monthly_Sales': [15000, 22000, 18000, 25000, 19500], 'Targets_Met': [3, 4, 3, 5, 4] }) # Set Rep_ID as index for both dataframes for easier alignment sales_team.set_index('Rep_ID', inplace=True) sales_performance.set_index('Rep_ID', inplace=True) # Add Monthly_Sales column using loc sales_team.loc[:, 'Monthly_Sales'] = sales_performance['Monthly_Sales'] print(sales_team.reset_index()) # Reset index for display

Output:

Rep_ID Name Region Monthly_Sales 0 101 John Smith East 15000 1 102 Sarah Johnson West 22000 2 103 Mike Davis North 18000 3 104 Emma Wilson South 25000 4 105 Robert Brown Central 19500

The loc[] method provides a clear, direct way to add columns, especially when working with indexed dataframes.

Read Drop the Header Row of Pandas DataFrame

Real-World Example: Combine Customer and Order Data

Let’s look at a more realistic scenario. Imagine we’re analyzing an e-commerce dataset with customer information in one dataframe and their order history in another:

import pandas as pd # Customer information dataframe customers = pd.DataFrame({ 'Customer_ID': [1001, 1002, 1003, 1004, 1005], 'Name': ['Alice Cooper', 'Bob Dylan', 'Charlie Parker', 'Diana Ross', 'Elton John'], 'State': ['California', 'New York', 'Texas', 'Florida', 'Illinois'], 'Signup_Date': ['2022-01-15', '2022-02-20', '2022-01-30', '2022-03-05', '2022-02-10'] }) # Order history dataframe orders = pd.DataFrame({ 'Customer_ID': [1001, 1002, 1003, 1004, 1005], 'Total_Orders': [12, 8, 15, 6, 10], 'Avg_Order_Value': [85.50, 120.75, 65.30, 210.15, 95.80], 'Last_Order_Date': ['2023-04-10', '2023-04-05', '2023-04-12', '2023-03-28', '2023-04-08'] }) # Let's add the Average Order Value to our customer data using merge enhanced_customers = customers.merge( orders[['Customer_ID', 'Avg_Order_Value', 'Total_Orders']], on='Customer_ID', how='left' ) print(enhanced_customers)

Output:

Customer_ID Name State Total_Orders 0 1001 Alice Cooper California 12.0 1 1002 Bob Dylan New York 8.0 2 1003 Charlie Parker Texas 0.0 3 1004 Diana Ross Florida 6.0 4 1005 Elton John Illinois 10.0

Using a left join ensures all customers are included, and fillna() helps handle missing values appropriately.

Check out np.where in Pandas Python

Performance Considerations of All the Methods

When working with large datasets, performance becomes crucial. Here’s a quick comparison of methods based on my experience:

  1. For small to medium datasets:
    • merge() and join() are generally most intuitive and flexible
    • map() is excellent for simple key-based lookups
  2. For large datasets:
    • merge() with specified columns (rather than entire dataframes) helps reduce memory usage
    • Consider using the pd.options.mode.chained_assignment = None setting for complex operations
  3. For very large datasets:
    • Consider using dask or vaex libraries for out-of-memory processing
    • Chunking operations can help manage memory constraints

Remember that clean, well-structured data is the foundation of good analysis. Taking the time to properly combine your dataframes will pay dividends when you move to the modeling and visualization stages of your data science workflow.

By mastering these techniques, you’ll be able to integrate data from multiple sources efficiently, creating rich, unified datasets that reveal deeper insights.

Related tutorials:

  • Pandas GroupBy Without Aggregation Function in Python
  • Pandas Merge Fill NAN with 0 in Python
  • Pandas Find Duplicates in Python
Bijay - Python ExpertBijay Kumar

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.

enjoysharepoint.com/

Tag » Add Column From One Dataframe To Another Based On Condition R