Pandas Data frame is a data structure that stores values in a tabular format. During the data analysis operation on a dataframe, you may need to drop a column in Pandas.
You can drop column in pandas dataframe using the df.drop(“column_name”, axis=1, inplace=True) statement.
If you’re in Hurry
You can use the following code to drop the column from the pandas dataframe.
df.drop("column_name", axis=1, inplace=True)
where
Column_name
– Name of the column to be deletedaxis=1
– Specifies the axis to be deleted. Axis 1 means column and 0 means rows.inplace=true
specifies the drop operation to be in same dataframe rather creating a copy of the dataframe after drop.
If more than one column exists with the same name, then both the column with this name will be dropped.
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods drop()
and pop()
to delete columns in pandas in various scenarios.
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
import pandas as pd
import numpy as na
data = {
"Lang":["Java","Python","Cobol","Javascript"],
"Difficulty":["Medium","Easy","Hard","Medium"],
"Difficulty_Score":[5,2,10,8],
"Type":["Statically Typed","Dynamically Typed",pd.NaT,"Dynamically typed"],
}
df = pd.DataFrame(data)
df
Dataframe Looks Like
Lang | Difficulty | Difficulty_Score | Type | |
---|---|---|---|---|
0 | Java | Medium | 5 | Statically Typed |
1 | Python | Easy | 2 | Dynamically Typed |
2 | Cobol | Hard | 10 | NaT |
3 | Javascript | Medium | 8 | Dynamically typed |
Now you’ll see the various methods to drop columns in pandas.
Drop Column Using Drop and Index
To drop column by index in pandas dataframe,
- Use
df.columns[index]
to identify the column name in thatindex
position - Pass that name to the drop method.
An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.
Code
df.drop(df.columns[2], axis=1, inplace=True)
df
where
df.columns[2]
– specifies third column to be deletedaxis=1
– specifies the column axis to be deletedinplace=True
– specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.
Dataframe Looks Like
You’ll see the third column, difficulty_score, is deleted in the dataframe.
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
Drop Column Using Drop and Name
To drop columns by name in Pandas dataframe.
- Use the column name directly in the
drop()
method. - If the column exists, it’ll be dropped from the dataframe.
- If the column doesn’t exist, then the error will be raised.
You can control the error behaviour using the errors = 'ignore'
.
Code
df.drop("Difficulty_Score", axis=1, inplace=True)
df
where
Difficulty_Score
– specifies the name of the column to be deletedaxis=1
– specifies the column axis to be deleted
Dataframe Looks Like
The column difficulty_score is deleted in the dataframe, as shown below.
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
Drop Column Using POP
In this section, you’ll drop column using POP()
.
Use this method when you want to pop out a column from the dataframe and store it in a separate dataframe object to perform some temporary operations.
Use df.pop("Difficulty_Score")
to pop out a Difficulty_Score column from the dataframe. It’ll return the column and store it in the popped_df
object, as shown below.
Code
popped_df = df.pop("Difficulty_Score")
popped_df
df
Popped dataframe looks like
0 5
1 2
2 10
3 8
Name: Difficulty_Score, dtype: int64
Dataframe Looks Like
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
Drop Column Using iloc
In this section, you’ll learn how to drop columns using iloc.
iloc function is used to select rows or columns by using the index of the columns.
- Use
df.iloc[:, 1:3]
to select columns from positions 1 to 3. - The index is 0 based. Hence it’ll select columns 2 to 4.
When you use this in the drop()
method, then columns at index 2
to 4
will be dropped.
df.drop(df.iloc[:, 1:3], inplace = True, axis = 1)
df
Dataframe Looks Like
The columns from 2 to 4 are dropped from the dataframe.
Lang | Type | |
---|---|---|
0 | Java | Statically Typed |
1 | Python | Dynamically Typed |
2 | Cobol | NaT |
3 | Javascript | Dynamically typed |
Drop Multiple Columns by Index
To drop multiple columns by index,
- Use
df.columns[index1, index2, index n]
to identify the list of column names in thatindex
positions - Pass that list to the
drop()
method
An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.
Code
df.drop(df.columns[[1, 2]], axis = 1, inplace = True)
df
where
df.columns[[1, 2]]
– specifies multiple column indexes to be deleted.axis=1
– specifies the column axis to be deleted
Dataframe Looks Like
The columns difficulty and difficulty_score with indexes 1
and 2
are deleted.
Lang | Type | |
---|---|---|
0 | Java | Statically Typed |
1 | Python | Dynamically Typed |
2 | Cobol | NaT |
3 | Javascript | Dynamically typed |
Drop Columns By List of Names
To drop columns by a list of names,
- Pass the columns as a list
["column 1", "column 2"]
to thedrop()
method
Code
df.drop(["Difficulty_Score", "Type"], axis = 1, inplace = True)
df
where
["Difficulty_Score", "Type"]
– specifies names of columns to be deletedaxis=1
– specifies the column axis to be deleted
Dataframe Looks Like
The columns difficulty_score and type are deleted.
Lang | Difficulty | |
---|---|---|
0 | Java | Medium |
1 | Python | Easy |
2 | Cobol | Hard |
3 | Javascript | Medium |
Drop column with Nan
In this section, you’ll learn how to drop columns with Nan.
Nan means missing data, and it can be used to denote when you don’t know the value of a cell in the dataframe.
In the sample dataframe, the column Type has missing data for the row index 2, as shown below.
Lang | Difficulty | Difficulty_Score | Type | |
---|---|---|---|---|
2 | Cobol | Hard | 10 | NaT |
Use dropna()
method to drop such columns.
It accepts a parameter how
where you can specify any
or all
.
Any
– the column will be deleted if it has at least one missing data.all
– the column will be deleted if all the cells of the column have missing data. You can use this to drop the column with all Nan.
Use the following code to delete a column that has at least one missing data.
df.dropna(axis=1, how='any', inplace=True)
df
Dataframe Looks Like
The column type will be dropped from the dataframe as shown below.
Lang | Difficulty | Difficulty_Score | |
---|---|---|---|
0 | Java | Medium | 5 |
1 | Python | Easy | 2 |
2 | Cobol | Hard | 10 |
3 | Javascript | Medium | 8 |
Drop column based on Row Value
To drop a column based on row value, evaluate the row value by using an IF
statement.
In the IF
statement, you can pass the condition which needs to be evaluated.
For example,
df["Difficulty_Score"] > 7).any()
will check if any value of the difficulty_score is greater than 7. If yes, returnsTrue
. ElseFalse
.df["Difficulty_Score"] > 7).all()
means it’ll check if all values of the difficulty_score is greater than 7. If yes, returnsTrue
. ElseFalse
.
To drop the column if it has a value greater than 7, then use the following code.
if((df["Difficulty_Score"] > 7).any())
df.drop("Difficulty_Score" , inplace=True, axis=1)
else:
print("No row exists with difficulty value greater than 7. Hence this column will NOT be dropped")
df
Dataframe Looks Like
Since the column difficulty_score is greater than 7, it’ll delete the column from the dataframe.
Lang | Difficulty | Type | |
---|---|---|---|
0 | Java | Medium | Statically Typed |
1 | Python | Easy | Dynamically Typed |
2 | Cobol | Hard | NaT |
3 | Javascript | Medium | Dynamically typed |
This is how you can drop columns based on row values using IF
statements.
Additional Resources
- How to Add Column to Dataframe in Pandas
- How to rename a column in pandas
Very Nice