How To Drop Column in Pandas Dataframe – Definitive Guide

Pandas Data frame is a data structure that stores values in a tabular format. During the data analysis operation on a dataframe, you may need to drop a column in Pandas.

You can drop column in pandas dataframe using the df.drop(“column_name”, axis=1, inplace=True) statement.

If you’re in Hurry

You can use the following code to drop the column from the pandas dataframe.

df.drop("column_name", axis=1, inplace=True)

where

  • Column_name – Name of the column to be deleted
  • axis=1 – Specifies the axis to be deleted. Axis 1 means column and 0 means rows.
  • inplace=true specifies the drop operation to be in same dataframe rather creating a copy of the dataframe after drop.

If more than one column exists with the same name, then both the column with this name will be dropped.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods drop() and pop() to delete columns in pandas in various scenarios.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

import pandas as pd
import numpy as na

data = {
    "Lang":["Java","Python","Cobol","Javascript"],        
    "Difficulty":["Medium","Easy","Hard","Medium"],
    "Difficulty_Score":[5,2,10,8],
    "Type":["Statically Typed","Dynamically Typed",pd.NaT,"Dynamically typed"],

}

df = pd.DataFrame(data)

df

Dataframe Looks Like

LangDifficultyDifficulty_ScoreType
0JavaMedium5Statically Typed
1PythonEasy2Dynamically Typed
2CobolHard10NaT
3JavascriptMedium8Dynamically typed

Now you’ll see the various methods to drop columns in pandas.

Drop Column Using Drop and Index

To drop column by index in pandas dataframe,

  • Use df.columns[index] to identify the column name in that index position
  • Pass that name to the drop method.

An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.

Code

df.drop(df.columns[2], axis=1, inplace=True)

df

where

  • df.columns[2] – specifies third column to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

Dataframe Looks Like

You’ll see the third column, difficulty_score, is deleted in the dataframe.

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

Drop Column Using Drop and Name

To drop columns by name in Pandas dataframe.

  • Use the column name directly in the drop()method.
  • If the column exists, it’ll be dropped from the dataframe.
  • If the column doesn’t exist, then the error will be raised.

You can control the error behaviour using the errors = 'ignore'.

Code

df.drop("Difficulty_Score", axis=1, inplace=True)

df

where

  • Difficulty_Score – specifies the name of the column to be deleted
  • axis=1 – specifies the column axis to be deleted

Dataframe Looks Like

The column difficulty_score is deleted in the dataframe, as shown below.

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

Drop Column Using POP

In this section, you’ll drop column using POP().

Use this method when you want to pop out a column from the dataframe and store it in a separate dataframe object to perform some temporary operations.

Use df.pop("Difficulty_Score") to pop out a Difficulty_Score column from the dataframe. It’ll return the column and store it in the popped_df object, as shown below.

Code

popped_df = df.pop("Difficulty_Score")

popped_df

df

Popped dataframe looks like

    0     5
    1     2
    2    10
    3     8
    Name: Difficulty_Score, dtype: int64

Dataframe Looks Like

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

Drop Column Using iloc

In this section, you’ll learn how to drop columns using iloc.

iloc function is used to select rows or columns by using the index of the columns.

  • Use df.iloc[:, 1:3] to select columns from positions 1 to 3.
  • The index is 0 based. Hence it’ll select columns 2 to 4.

When you use this in the drop() method, then columns at index 2 to 4 will be dropped.

df.drop(df.iloc[:, 1:3], inplace = True, axis = 1)

df

Dataframe Looks Like

The columns from 2 to 4 are dropped from the dataframe.

LangType
0JavaStatically Typed
1PythonDynamically Typed
2CobolNaT
3JavascriptDynamically typed

Drop Multiple Columns by Index

To drop multiple columns by index,

  • Use df.columns[index1, index2, index n] to identify the list of column names in that index positions
  • Pass that list to the drop() method

An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.

Code

df.drop(df.columns[[1, 2]], axis = 1, inplace = True)

df

where

  • df.columns[[1, 2]] – specifies multiple column indexes to be deleted.
  • axis=1 – specifies the column axis to be deleted

Dataframe Looks Like

The columns difficulty and difficulty_score with indexes 1 and 2 are deleted.

LangType
0JavaStatically Typed
1PythonDynamically Typed
2CobolNaT
3JavascriptDynamically typed

Drop Columns By List of Names

To drop columns by a list of names,

  • Pass the columns as a list ["column 1", "column 2"] to the drop() method

Code

df.drop(["Difficulty_Score", "Type"], axis = 1, inplace = True)

df

where

  • ["Difficulty_Score", "Type"] – specifies names of columns to be deleted
  • axis=1 – specifies the column axis to be deleted

Dataframe Looks Like

The columns difficulty_score and type are deleted.

LangDifficulty
0JavaMedium
1PythonEasy
2CobolHard
3JavascriptMedium

Drop column with Nan

In this section, you’ll learn how to drop columns with Nan.

Nan means missing data, and it can be used to denote when you don’t know the value of a cell in the dataframe.

In the sample dataframe, the column Type has missing data for the row index 2, as shown below.

LangDifficultyDifficulty_ScoreType
2CobolHard10NaT

Use dropna() method to drop such columns.

It accepts a parameter how where you can specify any or all.

  • Any – the column will be deleted if it has at least one missing data.
  • all – the column will be deleted if all the cells of the column have missing data. You can use this to drop the column with all Nan.

Use the following code to delete a column that has at least one missing data.

df.dropna(axis=1, how='any', inplace=True)

df

Dataframe Looks Like

The column type will be dropped from the dataframe as shown below.

LangDifficultyDifficulty_Score
0JavaMedium5
1PythonEasy2
2CobolHard10
3JavascriptMedium8

Drop column based on Row Value

To drop a column based on row value, evaluate the row value by using an IF statement.

In the IF statement, you can pass the condition which needs to be evaluated.

For example,

  • df["Difficulty_Score"] > 7).any() will check if any value of the difficulty_score is greater than 7. If yes, returns True. Else False.
  • df["Difficulty_Score"] > 7).all() means it’ll check if all values of the difficulty_score is greater than 7. If yes, returns True. Else False.

To drop the column if it has a value greater than 7, then use the following code.

if((df["Difficulty_Score"] > 7).any())

    df.drop("Difficulty_Score" , inplace=True, axis=1)

else:

    print("No row exists with difficulty value greater than 7. Hence this column will NOT be dropped")

df

Dataframe Looks Like

Since the column difficulty_score is greater than 7, it’ll delete the column from the dataframe.

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

This is how you can drop columns based on row values using IF statements.

Additional Resources

1 thought on “How To Drop Column in Pandas Dataframe – Definitive Guide”

Leave a Comment