How To Create A New Column Based on Values from Other Columns in Pandas – Definitive Guide

Pandas allow you to store values as rows and columns.

You can create a new column based on values from other columns in Pandas using the other columns using df[‘New Column‘] = df[‘Old column1‘] * df[‘Old column 2‘].

This tutorial teaches you the different methods to create a new column based on values from other columns in Pandas and when it is appropriate to use them.

If You’re in Hurry…

You can create a new column by assigning new values based on other columns.

The following code demonstrates how to create a total_price column using the multiplication of Unit_price and the no_of_units columns.

df['Total Price'] = df['Unit_Price'] * df['No_Of_Units'] 

df

The new column Total Price is added to the dataframe based on the values from the other two columns.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsTotal Price
0Keyboard500.0005.02500.00
1Mouse200.0005.01000.00
2Monitor5000.23510.050002.35
3CPU10000.55020.0200011.00
4CPU10000.55020.0200011.00
5Speakers250.5008.02004.00
6TrackpadNaNNaNNaN

If You Want to Understand Details, Read on…

Different methods are available to create a new column based on values from other columns.

To know the column names in the dataframe, read: How to Get Column Name in Pandas

You’ll learn how to create

  • Total_price column by multiplying the unit_price and the No_of_units column
  • Discounted_price column to add discounted price based on the number of units purchased

Sample Dataframe

You’ll use the following dataframe to create a new column based on values from the Unit_price column and the no_of_units column.

Code

import pandas as pd

import numpy as np

data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers","Trackpad"],

        "Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],

        "No_Of_Units":[5,5, 10, 20, 20, 8,np.nan]
}

df = pd.DataFrame(data)

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_Units
0Keyboard500.0005.0
1Mouse200.0005.0
2Monitor5000.23510.0
3CPU10000.55020.0
4CPU10000.55020.0
5Speakers250.5008.0
6TrackpadNaNNaN

Using Columns Directly

This is the simplest method to create a new column based on values from other columns.

Perform the desired operation on any two or more columns in the dataframe and assign it to the new column of the dataframe.

For example, you can multiply two number columns, or you can also combine two columns of a dataframe using this method.

Code

The following code demonstrates how to multiply two columns and assign the result to a new column.

df['Total Price'] = df['Unit_Price'] * df['No_Of_Units'] 

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsTotal Price
0Keyboard500.0005.02500.00
1Mouse200.0005.01000.00
2Monitor5000.23510.050002.35
3CPU10000.55020.0200011.00
4CPU10000.55020.0200011.00
5Speakers250.5008.02004.00
6TrackpadNaNNaNNaN

Using Apply

This section teaches you how to apply a function of multiple columns, row-wise and create a new column based on the result of the apply function.

You must apply the function to the column axis using the parameter axis=1.

You can use this method when you want to apply a complex function or specify multiple conditions for the calculations.

Code

The following code demonstrates how to apply the lambda function that multiples the unit price and the no of units column and calculates the total price value.

df['Total Price'] = df.apply(lambda row: (row.Unit_Price * row.No_Of_Units ), axis=1)

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsTotal Price
0Keyboard500.0005.02500.00
1Mouse200.0005.01000.00
2Monitor5000.23510.050002.35
3CPU10000.55020.0200011.00
4CPU10000.55020.0200011.00
5Speakers250.5008.02004.00
6TrackpadNaNNaNNaN

Using LOC

This section demonstrates how to use the loc attribute of the pandas dataframe to create a new column based on other columns.

The loc attribute allows you to access a group of rows and columns. Using this loc attribute, you can also assign value to a new column.

You can use this method when you want to specify simple conditions while performing the calculations—for example, performing the operation only on the Non-NaN columns.

Code

The following code calculates the total price for the rows with the non-empty no of units columns.

df.loc[df['No_Of_Units']!=np.nan,'Total_Price'] = (df['Unit_Price'] * df['No_Of_Units'])

df

The Total price is calculated only for the columns where the no of units is not empty.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsTotal PriceTotal_Price
0Keyboard500.0005.02500.002500.00
1Mouse200.0005.01000.001000.00
2Monitor5000.23510.050002.3550002.35
3CPU10000.55020.0200011.00200011.00
4CPU10000.55020.0200011.00200011.00
5Speakers250.5008.02004.002004.00
6TrackpadNaNNaNNaNNaN

Using If Else While Creating New Column Based on Other Column

To use an If-else statement while creating a new column based on other columns, you need to define a function with the if-else statements. And apply this function using the apply method and lambda.

Code

The following code demonstrates using the if and else while creating new columns.

  • If the No of units is less than 10, then apply the discount of 5%
  • If the No of units is equal to or greater than 10, then apply the discount of 10%
def calculate_discount(row):
  
    if row['No_Of_Units'] < 10:
        return (row['Unit_Price'] * row['No_Of_Units']) - ((5/100)*(row['Unit_Price'] * row['No_Of_Units']))

    elif row['No_Of_Units'] >= 10:
        return (row['Unit_Price'] * row['No_Of_Units']) - ((10/100)*(row['Unit_Price'] * row['No_Of_Units']))

    return (row['Unit_Price'] * row['No_Of_Units'])

df['Discounted Price'] = df.apply(lambda row: calculate_discount(row), axis=1)

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsTotal PriceTotal_PriceDiscounted_Price
0Keyboard500.0005.02500.002500.002375.000
1Mouse200.0005.01000.001000.00950.000
2Monitor5000.23510.050002.3550002.3545002.115
3CPU10000.55020.0200011.00200011.00180009.900
4CPU10000.55020.0200011.00200011.00180009.900
5Speakers250.5008.02004.002004.001903.800
6TrackpadNaNNaNNaNNaNNaN

You can use the same code to specify multiple conditions while creating a new column based on values from other columns in the pandas dataframe.

Sum All Columns and Create New Column

If you want to sum all columns of the dataframe and create a new column, you can use the df.sum() method.

To apply the sun() function only to the numeric column, use the parameter numeric_only=True.

Code

df['C'] = df.sum(axis=1, numeric_only=True)

Conclusion

You’ve learned how to create a new column based on values from other columns in the Pandas dataframe using the direct assignment method and the apply() methods.

If you have any questions, please feel free to comment below.

You May Also Like

Leave a Comment