Pandas allow you to store values as rows and columns.
You can create a new column based on values from other columns in Pandas using the other columns using df[‘New Column‘] = df[‘Old column1‘] * df[‘Old column 2‘].
Basic Example
The following code demonstrates how to create a total_price column using the multiplication of Unit_price and the no_of_units columns.
df['Total Price'] = df['Unit_Price'] * df['No_Of_Units']
df
The new column Total Price is added to the dataframe based on the values from the other two columns.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Total Price | |
---|---|---|---|---|
0 | Keyboard | 500.000 | 5.0 | 2500.00 |
1 | Mouse | 200.000 | 5.0 | 1000.00 |
2 | Monitor | 5000.235 | 10.0 | 50002.35 |
3 | CPU | 10000.550 | 20.0 | 200011.00 |
4 | CPU | 10000.550 | 20.0 | 200011.00 |
5 | Speakers | 250.500 | 8.0 | 2004.00 |
6 | Trackpad | NaN | NaN | NaN |
This tutorial teaches you the different methods to create a new column based on values from other columns in Pandas and when it is appropriate to use them.
To know the column names in the dataframe, read: How to Get Column Name in Pandas
You’ll learn how to create
- Total_price column by multiplying the unit_price and the No_of_units column
- Discounted_price column to add discounted price based on the number of units purchased
Table of Contents
Sample Dataframe
You’ll use the following dataframe to create a new column based on values from the Unit_price column and the no_of_units column.
Code
import pandas as pd
import numpy as np
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers","Trackpad"],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,np.nan]
}
df = pd.DataFrame(data)
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | |
---|---|---|---|
0 | Keyboard | 500.000 | 5.0 |
1 | Mouse | 200.000 | 5.0 |
2 | Monitor | 5000.235 | 10.0 |
3 | CPU | 10000.550 | 20.0 |
4 | CPU | 10000.550 | 20.0 |
5 | Speakers | 250.500 | 8.0 |
6 | Trackpad | NaN | NaN |
Using Columns Directly
This is the simplest method to create a new column based on values from other columns.
- Perform the desired operation on any two or more columns in the dataframe
- Assign it to the new column of the dataframe.
Code
The following code demonstrates how to multiply two columns and assign the result to a new column.
df['Total Price'] = df['Unit_Price'] * df['No_Of_Units']
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Total Price | |
---|---|---|---|---|
0 | Keyboard | 500.000 | 5.0 | 2500.00 |
1 | Mouse | 200.000 | 5.0 | 1000.00 |
2 | Monitor | 5000.235 | 10.0 | 50002.35 |
3 | CPU | 10000.550 | 20.0 | 200011.00 |
4 | CPU | 10000.550 | 20.0 | 200011.00 |
5 | Speakers | 250.500 | 8.0 | 2004.00 |
6 | Trackpad | NaN | NaN | NaN |
Using Apply
This section teaches you how to apply a function of multiple columns, row-wise and create a new column based on the result of the apply function.
- Apply the function to the column axis using the parameter axis=1.
Use this method when you want to apply a complex function or specify multiple conditions for the calculations.
Code
The following code demonstrates how to calculate the Total Price using the Unit price and the No of units columns using the lambda function.
df['Total Price'] = df.apply(lambda row: (row.Unit_Price * row.No_Of_Units ), axis=1)
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Total Price | |
---|---|---|---|---|
0 | Keyboard | 500.000 | 5.0 | 2500.00 |
1 | Mouse | 200.000 | 5.0 | 1000.00 |
2 | Monitor | 5000.235 | 10.0 | 50002.35 |
3 | CPU | 10000.550 | 20.0 | 200011.00 |
4 | CPU | 10000.550 | 20.0 | 200011.00 |
5 | Speakers | 250.500 | 8.0 | 2004.00 |
6 | Trackpad | NaN | NaN | NaN |
Using LOC
The loc
attribute allows you to access a group of rows and columns.
To create a new column based on other columns,
- Use this
loc
attribute, and assign value to a new column.
Use this method when you want to specify simple conditions while performing the calculations—for example, performing the operation only on the Non-NaN columns.
Code
The following code calculates the total price for the rows with the non-empty no of units columns.
df.loc[df['No_Of_Units']!=np.nan,'Total_Price'] = (df['Unit_Price'] * df['No_Of_Units'])
df
The Total price is calculated only for the columns where the no of units is not empty.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Total Price | Total_Price | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5.0 | 2500.00 | 2500.00 |
1 | Mouse | 200.000 | 5.0 | 1000.00 | 1000.00 |
2 | Monitor | 5000.235 | 10.0 | 50002.35 | 50002.35 |
3 | CPU | 10000.550 | 20.0 | 200011.00 | 200011.00 |
4 | CPU | 10000.550 | 20.0 | 200011.00 | 200011.00 |
5 | Speakers | 250.500 | 8.0 | 2004.00 | 2004.00 |
6 | Trackpad | NaN | NaN | NaN | NaN |
Using If Else While Creating New Column Based on Other Column
To use an If-else
statement while creating a new column based on other columns
- Define a function with the if-else statements.
- Apply this function using the apply method and lambda expression.
Code
The following code demonstrates using the if
and else
while creating new columns.
- If the No of units is less than 10, then apply the discount of 5%
- If the No of units is equal to or greater than 10, then apply the discount of 10%
def calculate_discount(row):
if row['No_Of_Units'] < 10:
return (row['Unit_Price'] * row['No_Of_Units']) - ((5/100)*(row['Unit_Price'] * row['No_Of_Units']))
elif row['No_Of_Units'] >= 10:
return (row['Unit_Price'] * row['No_Of_Units']) - ((10/100)*(row['Unit_Price'] * row['No_Of_Units']))
return (row['Unit_Price'] * row['No_Of_Units'])
df['Discounted Price'] = df.apply(lambda row: calculate_discount(row), axis=1)
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Total Price | Total_Price | Discounted_Price | |
---|---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5.0 | 2500.00 | 2500.00 | 2375.000 |
1 | Mouse | 200.000 | 5.0 | 1000.00 | 1000.00 | 950.000 |
2 | Monitor | 5000.235 | 10.0 | 50002.35 | 50002.35 | 45002.115 |
3 | CPU | 10000.550 | 20.0 | 200011.00 | 200011.00 | 180009.900 |
4 | CPU | 10000.550 | 20.0 | 200011.00 | 200011.00 | 180009.900 |
5 | Speakers | 250.500 | 8.0 | 2004.00 | 2004.00 | 1903.800 |
6 | Trackpad | NaN | NaN | NaN | NaN | NaN |
You can use the same code to specify multiple conditions while creating a new column based on values from other columns.
Sum All Columns and Create New Column
To sum all columns of the dataframe and create a new column, use the df.sum() method.
- To apply the
sum()
function only to the numeric column, use the parameternumeric_only=True
.
Code
df['C'] = df.sum(axis=1, numeric_only=True)