How To Drop Column in Pandas Dataframe – Definitive Guide

Pandas Data frame is a data structure that stores values in a tabular format. During the data analysis operation on a dataframe, you may need to drop a column in Pandas.

You can drop column in pandas dataframe using the df.drop(“column_name”, axis=1, inplace=True) statement.

If You’re in Hurry…

You can use the below code snippet to drop the column from the pandas dataframe.

df.drop("column_name", axis=1, inplace=True)

where

  • Column_name – Name of the column to be deleted
  • axis=1 – Specifies the axis to be deleted. Axis 1 means column and 0 means rows.
  • inplace=true specifies the drop operation to be in same dataframe rather creating a copy of the dataframe after drop.

Note: If more than one column exists with the same name, then both the column with this name will be dropped.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods drop() and pop() to delete columns in pandas in various scenarios.

drop() method will return a copy of the dataframe after deleting the column. Use drop() when you want to remove the column from the dataframe and no operation needs to be performed on the deleted column.

pop() method returns the column that is being deleted. Use pop() when you want to create a dummy column that will be temporarily used for some operation.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

import pandas as pd
import numpy as na

data = {
    "Lang":["Java","Python","Cobol","Javascript"],        
    "Difficulty":["Medium","Easy","Hard","Medium"],
    "Difficulty_Score":[5,2,10,8],
    "Type":["Statically Typed","Dynamically Typed",pd.NaT,"Dynamically typed"],

}

df = pd.DataFrame(data)

df

Dataframe Looks Like

LangDifficultyDifficulty_ScoreType
0JavaMedium5Statically Typed
1PythonEasy2Dynamically Typed
2CobolHard10NaT
3JavascriptMedium8Dynamically typed

Now you’ll see the various methods to drop columns in pandas.

Drop Column By Index

In this section, you’ll learn how to drop column by index in Pandas dataframe.

You can use df.columns[index] to identify the column name in that index position and pass that name to the drop method.

An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.

df.drop(df.columns[2], axis=1, inplace=True)

df

where

  • df.columns[2] – specifies third column to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the third column difficulty_score is deleted in the dataframe as shown below.

Dataframe Looks Like

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

You’ve removed the column using its index.

Next, you’ll remove the column by name.

Drop Column By Name

In this section, you’ll learn how to drop columns by name in Pandas dataframe.

You can use the column name directly to the drop method.

If the column is existing then, it’ll be dropped from the dataframe. If the column doesn’t exist, then the error will be raised. You can control the error behavior using the errors = ‘ignore’. You’ll see the error handling in detail at a later point in this tutorial.

df.drop("Difficulty_Score", axis=1, inplace=True)

df

where

  • Difficulty_Score – specifies the name of the column to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the column difficulty_score is deleted in the dataframe as shown below.

Dataframe Looks Like

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

You’ve removed the column from the dataframe using its name.

Next, you’ll learn how to drop multiple columns by index.

Drop Multiple Columns by Index

In this section, you’ll learn how to drop multiple columns by index.

You can use df.columns[index1, index2, indexn] to identify the list of column names in that index positions and pass that list to the drop method.

An index is 0 based. Use 0 to delete the first column and 1 to delete the second column and so on.

df.drop(df.columns[[1, 2]], axis = 1, inplace = True)

df

where

  • df.columns[[1, 2]] – specifies multiple column indexes to be deleted.
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the columns difficulty and difficulty_score with indexes 1 and 2 are deleted.

Dataframe Looks Like

LangType
0JavaStatically Typed
1PythonDynamically Typed
2CobolNaT
3JavascriptDynamically typed

You’ve deleted multiple columns using index in pandas dataframe.

Next, you’ll learn how to drop columns by list of names.

Drop Columns By List of Names

In this section, you’ll learn how to drop columns by a list of names.

You can do this by passing the columns as list ["column 1", "column 2"] to the drop method as shown below.

df.drop(["Difficulty_Score", "Type"], axis = 1, inplace = True)

df

where

  • ["Difficulty_Score", "Type"] – specifies names of columns to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

After the drop operation, you can print the dataframe using df command as shown in the snippet. You’ll see the columns difficulty_score and type are deleted.

Dataframe Looks Like

LangDifficulty
0JavaMedium
1PythonEasy
2CobolHard
3JavascriptMedium

You’ve deleted multiple columns by a list of names of columns.

Next, you’ll see how to delete a column if exists.

Drop Column If exists

In this section, you’ll learn how to drop column if exists in the dataframe.

Here, you’ll control the error behavior during the delete operation by using the errors='ignore' operation.

By default, during the drop operation if the column is not existing in the dataframe, then the error KeyError: "['Difficulty_Score' 'Type'] not found in axis" will be raised.

To drop column only if exists without raising any error, then you can specify errors='ignore' in the drop method as shown below.

df.drop(["Difficulty_Score", "Type"], axis=1, inplace= True, errors='ignore')

df

where

  • ["Difficulty_Score", "Type"] – specifies names of columns to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.
  • errors='ignore' – To ignore the error while deleting the non existent columns.

Here, you’re deleting the columns Difficulty_score and Type. It’ll be deleted and the dataframe will consist of only two columns Lang and Difficulty.

Dataframe Looks Like

LangDifficulty
0JavaMedium
1PythonEasy
2CobolHard
3JavascriptMedium

Now again, you try to delete the two non-existent columns Difficulty_Score, Type with parameter errors="raise".

df.drop(["Difficulty_Score", "Type"], axis=1, inplace= True, errors='raise')

df

where

  • ["Difficulty_Score", "Type"] – specifies names of columns to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.
  • errors='raise' – To raise the error while deleting the non existent columns.

You’ll see the key error exception being raise as shown below.

    ---------------------------------------------------------------------------

    KeyError                                  Traceback (most recent call last)

    <ipython-input-38-7b9b2d7d9dba> in <module>
          3 #If 'ignore', suppress error and only existing labels are dropped.
          4 
    ----> 5 df.drop(["Difficulty_Score", "Type"], axis=1, inplace= True, errors='raise')
          6 
          7 df  

    KeyError: "['Difficulty_Score' 'Type'] not found in axis"

This is how you can delete column only if exists and ignore errors if it doesn’t exist in the dataframe.

Next, you’ll see how to drop a column that doesn’t have a name.

Drop Column No Name

In this section, you’ll see how to drop a column with no name.

Pandas dataframe can contain a column that has a blank name or in other words, can contain a column without a name.

Assume that, the sample dataframe column with index 2 doesn’t have a name. Read rename column in pandas to know more about renaming or removing the name of the column in pandas dataframe.

Now, you can drop such columns by using the index df.columns[2] as shown below.

df.drop(df.columns[2], axis=1, inplace=True)

df

where

  • df.columns[2] – specifies third column to be deleted
  • axis=1 – specifies the column axis to be deleted
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

The column with index 2 is deleted.

Dataframe Looks Like

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

This is how you can delete columns without names.

Next, you’ll see how to drop columns with Nan values.

Drop column with Nan

In this section, you’ll learn how to drop columns with Nan.

Nan means missing data and it can be used to denote when you don’t know the value for a cell in the dataframe.

When working with data frames, you may need to delete a column that has this type of missing data.

In the same dataframe, the column Type has missing data for the row index 2 as shown below.

LangDifficultyDifficulty_ScoreType
2CobolHard10NaT

You can use dropna method to drop such columns.

It accepts a parameter how where you can specify any or all.

  • Any means the column will be deleted if its has atleast one missing data.
  • all means the column will be deleted if all the cells of the column has missing data. You can use this to drop column with all Nan.

Use the below snippet to delete a column that has at least one missing data.

df.dropna(axis=1, how='any', inplace=True)

df

where

  • axis=1 – specifies the column axis to be deleted
  • how='any' – used to specify delete the entire column even if it has one missing data
  • inplace=True – specifies the drop operation should happen in same dataframe and no copy of the dataframe should be created.

Now, the column type will be dropped from the dataframe as shown below.

Dataframe Looks Like

LangDifficultyDifficulty_Score
0JavaMedium5
1PythonEasy2
2CobolHard10
3JavascriptMedium8

This is how you can delete columns that have missing data.

Next, you’ll learn how to drop all columns after a specific column.

Drop All Columns After Specific Column

In this section, you’ll learn how to drop all columns after a specific column.

For example, you may need to do this when you want to perform an operation on the first three columns.

You can achieve this by using the df.loc function. loc function is used to select rows or columns by using the label name.

Use df.loc[:, :'specific_column'] to create a dataframe with the columns until the specific_column with all the rows.

where

  • First : in the loc function means, to select all the rows from the dataframe.
  • :'specific_column' – means select columns from the first column until the specific_column.
df = df.loc[:, :'Difficulty']

df

Now, df.loc[] will return a copy of a dataframe with all rows and columns until difficulty.

Dataframe Looks Like

LangDifficulty
0JavaMedium
1PythonEasy
2CobolHard
3JavascriptMedium

This is how you can drop all columns after a specific column.

Next, you’ll learn how to drop columns based on row value.

Drop column based on Row Value

In this section, you’ll learn how to drop columns based on row value.

You may want to use this when you want to delete a column with a value that has a specific value so that you can ignore those values in the data analysis.

You can evaluate the row value by using an IF statement.

In the IF statement, you can pass the condition which needs to be evaluated.

For example,

  • df["Difficulty_Score"] > 7).any() will check if any value of the difficulty_score is greater than 7. If yes, returns True. Else False.
  • df["Difficulty_Score"] > 7).all() means it’ll check if all values of the difficulty_score is greater than 7. If yes, returns True. Else False.

Now, to drop the column if it has a value greater than 7, then use the below snippet.

if((df["Difficulty_Score"] > 7).any())

    df.drop("Difficulty_Score" , inplace=True, axis=1)

else:

    print("No row exists with difficulty value greater than 7. Hence this column will NOT be dropped")

df

Since the column difficulty_score is greater than 7, it’ll delete the column from the dataframe. If it doesn’t have then the column will not be removed.

Dataframe Looks Like

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

This is how you can drop columns based on row values using IF statements.

Next, you’ll learn how to delete columns from pandas dataframe using the POP() function.

Drop Column Using POP

In this section, you’ll drop column using POP().

You can use this method when you want to pop out a column from the dataframe and store it in a separate dataframe object to perform some temporary operations.

You can use df.pop("Difficulty_Score") to pop out a Difficulty_Score column from the dataframe. It’ll return the column and store it in the popped_df object as shown below.

popped_df = df.pop("Difficulty_Score")

popped_df

df

Popped dataframe looks like

    0     5
    1     2
    2    10
    3     8
    Name: Difficulty_Score, dtype: int64

Dataframe Looks Like

LangDifficultyType
0JavaMediumStatically Typed
1PythonEasyDynamically Typed
2CobolHardNaT
3JavascriptMediumDynamically typed

This is how you can drop columns from pandas using Pop() method.

Next, you’ll learn how to drop columns using iloc.

Pandas Drop Column Using iloc

In this section, you’ll learn how to drop columns using iloc.

You can achieve this by using the df.iloc function. iloc function is used to select rows or columns by using the index of the columns.

Use df.iloc[:, 1:3] to select columns from positions 1 to 3. The index is 0 based. Hence it’ll select columns 2 to 4.

When you use this in the drop method, then column 2 to 4 will be dropped.

df.drop(df.iloc[:, 1:3], inplace = True, axis = 1)

df

Now, the columns from 2 to 4 will be dropped from the dataframe.

Dataframe Looks Like

LangType
0JavaStatically Typed
1PythonDynamically Typed
2CobolNaT
3JavascriptDynamically typed

This is how you can delete columns from pandas dataframe using iloc.

Conclusion

To summarize, you’ve learned how to drop columns from pandas dataframe with various methods available. You’ve also learned about the sample use-cases when each of these methods will be useful.

If you’ve any questions feel free to comment below.

You May also Like

1 thought on “How To Drop Column in Pandas Dataframe – Definitive Guide”

Leave a Comment