How To Select Columns From Pandas Dataframe – Definitive Guide

Pandas Dataframe stores data in a two-dimensional format. You need to select columns from Dataframe for various data analysis purposes. In this tutorial, you’ll select columns from pandas Dataframe using different methods. Selecting columns is also known as selecting a subset of columns from the dataframe.

You can select columns from Pandas Dataframe using the df.loc[:,’column_name’] statement.

If You’re in Hurry…

You can use the below code snippet to select columns from the dataframe.

Snippet

df.loc[:,'Product_Name']

Where,

  • : – denotes all the rows that must be selected
  • 'Product_Name' – Denotes the name of the column that must be selected

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

This is how you can select columns from the dataframe.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods available to select columns from the dataframe and how they can be used in different scenarios.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

It contains,

  • Rows with values for all columns
  • Rows with Empty or Missing Data for each column
  • One Duplicate row
  • One column in the sample dataframe is of float type
  • One Empty column which will be used to select the empty columns

Snippet

import pandas as pd

data = {"Product_Name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers","Headset"],
        "Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
        "No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
        "Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT],
        "Remarks":[pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT]
       }

df = pd.DataFrame(data)

df = df.astype({"Unit_Price": float})

df

The sample dataframe will look like follows.

DataFrame Will Look Like

Product_NameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_DateBuffer Column
0Keyboard500.0005511/5/2021NaT
1Mouse200.000564/23/2021NaT
2Monitor5000.235101008/21/2021NaT
3CPU10000.55020Not Available09/18/2021NaT
4CPU10000.55020Not Available09/18/2021NaT
5Speakers250.5008NaT01/05/2021NaT
6HeadsetNaNNaTNaTNaTNaT

Now, you’ll learn how to select columns from the dataframe using different methods.

Select Columns from Pandas Dataframe

You can select columns from the pandas dataframe using three different methods.

  • Using Loc
  • Using iLoc
  • Using df.columns

Using Loc pandas

You can select a column from the dataframe using the loc property available in the dataframe.

It is used to locate the rows or columns from the dataframe based on the name passed. It is also called slicing the columns based on the column names.

It accepts row index and column names to be selected.

  • First, select only columns, you can just use : in place of rows which will select all rows.
  • Second, you can pass the column names to be selected.

Note: Column names are case-sensitive. Hence, pass the column names as the same case in the dataframe. Otherwise, you’ll face the key error.

Use the below snippet to select the column from the dataframe using loc.

Snippet

df.loc[:,'Product_Name']

The Product_Name column is selected from the dataframe and displayed as shown below. You can also assign it to a new dataframe if you want to store it for further activities.

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

This is how you can use the loc property to select columns.

Using iloc Pandas

You can select columns from the dataframe using iloc property available in the dataframe.

It is used to locate the rows or columns from the dataframe based on the index passed. It is also called slicing the columns based on the indexes.

It accepts row index and column index to be selected.

  • First, select only columns, you can just use : in place of rows which will select all rows.
  • Second, you can pass the column indexes to be selected.

Use the below snippet to select the column from the dataframe using iloc.

Snippet

df.iloc[:,0]

The first column at the index 0Product_Name is selected from the dataframe and displayed as shown below.

You can also assign it to a new dataframe if you want to store it for further activities.

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

This is also known as indexing and selecting data from pandas dataframe.

Using DF.Columns

You can also select columns using the columns[] property. This method returns the list of columns for the indexes passed.

For example, if you pass, df.columns[0]then it’ll return the first column.

Use the below snippet to select the columns from the dataframe using the df.columns attribute.

Snippet

df[df.columns[0]]

The first column is Product_Name. Hence it’ll be selected and displayed.

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

This is how you can select columns using the df.columns attribute.

You’ve seen the methods available to select columns. Now, you’ll see how it can be used in the different use-cases.

Select Columns by Name

You can select columns by name from the dataframe by passing the column names as a list to the dataframe object.

Use the below snippet to select only two columns Product_Name and Unit_Price as a subset from the dataframe.

Snippet

df[['Product_Name', 'Unit_Price']]

When you execute the script, you’ll see the subset dataframe as follows.

DataFrame Will Look Like

Product_NameUnit_Price
0Keyboard500.000
1Mouse200.000
2Monitor5000.235
3CPU10000.550
4CPU10000.550
5Speakers250.500
6HeadsetNaN

This is how you can get columns by name from the pandas dataframe.

Select Columns by Index

You can select columns by index from the dataframe by using the iloc attribute of the dataframe.

Use the below snippet to select only the column at index position 0.

Snippet

df.iloc[:,0]

You’ll see the column at the index position 0 selected and displayed as shown below.

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

You can also get column by index using df.columns by passing the index as shown below.

Snippet

df[df.columns[0]]

You’ll see the column at the index position 0 selected and displayed as shown below.

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

This is how you can select columns by index from pandas dataframe.

Select Columns by Condition

You can select columns by condition by using the df.loc[] attribute and specifying the condition for selecting the columns.

Use the below snippet to select columns that have a value 5 in any row.

(df == 5).any() evaluates each cell and finds the columns which have a value 5 in any of the cells.

Snippet

df.loc[: , (df == 5).any()]

Since the two columns, No_Of_Units and Available_Quantity have value 5 in them, it is displayed as follows.

You can also assign it to a dataframe if you want this subset to be saved for later use.

DataFrame Will Look Like

No_Of_UnitsAvailable_Quantity
055
156
21010
320Not Available
420Not Available
58NaT
6NaTNaT

This is how you can select columns based on a condition.

Select Multiple Columns

You can select multiple columns by labels by passing the column labels as a list to the dataframe object.

Use the below snippet to select multiple columns from the dataframe.

Snippet

df[['No_Of_Units','Available_Quantity']]

You’ll see the columns No_Of_Units, Available_Quantity selected as subsets as shown below.

DataFrame Will Look Like

No_Of_UnitsAvailable_Quantity
055
156
21010
320Not Available
420Not Available
58NaT
6NaTNaT

This is how you can get multiple columns from the dataframe as a subset.

Select Range of Columns Using Name

You can select a range of columns using the name by passing the columns range separated by : in the loc attribute.

Use the below snippet to select columns from No_Of_Units to Available_Since_Date.

Snippet

df.loc[: , 'No_Of_Units':'Available_Since_Date']

There are three columns in the specified range. Hence, those three columns will be selected as a subset from the dataframe.

DataFrame Will Look Like

No_Of_UnitsAvailable_QuantityAvailable_Since_Date
05511/5/2021
1564/23/2021
2101008/21/2021
320Not Available09/18/2021
420Not Available09/18/2021
58NaT01/05/2021
6NaTNaTNaT

This is how you can get a range of columns using names.

Select Range of Columns Using Index

You can select a range of columns using the index by passing the index range separated by : in the iloc attribute.

Use the below snippet to select columns from 2 to 4. The beginning index is inclusive and the end index is exclusive. Hence, you’ll see the columns at the index 2 and 3.

Snippet

df.iloc[: , 2:4]

You’ll see the columns at the index position 2 and 3 as shown below.

DataFrame Will Look Like

No_Of_UnitsAvailable_Quantity
055
156
21010
320Not Available
420Not Available
58NaT
6NaTNaT

This is how you can get a range of columns using the index.

Select First Column

You can select the first column from the dataframe using df.iloc[:,0].

Use the below snippet to select the first column from the dataframe.

  • : – Denotes all rows that must be selected
  • 0 – Denotes only the first column must be selected.

Snippet

df.iloc[:,0]

You’ll see the values of the first column printed as shown below.

Output

    0    Keyboard
    1       Mouse
    2     Monitor
    3         CPU
    4         CPU
    5    Speakers
    6     Headset
    Name: Product_Name, dtype: object

This is how you can select the first column from the pandas dataframe.

Select Last Column

You can select the last column from the dataframe using df.iloc[:,-1:].

Use the below snippet to select the first column from the dataframe.

  • : – Denotes all rows that must be selected
  • -1: – Denotes only the last column must be selected.

Snippet

df.iloc[:,-1:]

You’ll see the last column displayed as a dataframe as shown below.

DataFrame Will Look Like

Remarks
0NaT
1NaT
2NaT
3NaT
4NaT
5NaT
6NaT

This is how you can get the last column of the dataframe.

Select Columns With Missing Values

You can select columns with missing values by using the df.isnull().any() in the df.columns[] attribute.

This will return the list of columns which has a missing value in it to df[]. Then, it’ll return the columns with the missing value.

You can use it to print columns with missing values or impute values for the missing values.

Snippet

df[df.columns[df.isnull().any()]]

In the sample dataframe, all the columns except Product_Name is having missing values. Hence, all those columns will be printed as shown below.

DataFrame Will Look Like

Unit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_DateBuffer Column
0500.0005511/5/2021NaT
1200.000564/23/2021NaT
25000.235101008/21/2021NaT
310000.55020Not Available09/18/2021NaT
410000.55020Not Available09/18/2021NaT
5250.5008NaT01/05/2021NaT
6NaNNaTNaTNaTNaT

This is how you can get columns with missing values to impute new values to them.

Select Columns Without Missing Values

You can select columns without missing values using the ~df.isnull().any() in the df.columns attribute.

In this way, you can select all columns without NaN or missing values.

The tilde ~ symbol inverses the flag which is returned after checking if any of the values is null.

If any of the value is null, df.isnull().any() returns True. The tilde symbol inverses this and makes it False. Hence columns with missing values will not be added.

Snippet

df[df.columns[~df.isnull().any()]]

The Product_Name is the only column that doesn’t have any missing value of null value. Hence that column will be displayed as shown below.

DataFrame Will Look Like

Product_Name
0Keyboard
1Mouse
2Monitor
3CPU
4CPU
5Speakers
6Headset

This is how you can get columns without any missing values.

Select Empty Columns

You can select empty columns by using df.isnull().all() in the df.columns attribute.

isnull().all() checks if all the cells of the column is null. If yes, then it returns True.

Use the below snippet to get the empty columns.

Snippet

df[df.columns[df.isnull().all()]]

Remarks is the only column without values for any of the cells in it. Hence, it’ll be printed as below.

DataFrame Will Look Like

Remarks
0NaT
1NaT
2NaT
3NaT
4NaT
5NaT
6NaT

This is how you can get the empty columns.

Conclusion

To summarize, you’ve learned how to select columns from pandas dataframe using loc, iloc or df.columns.

You’ve learned how this can be used to select various subsets of columns from the dataframe such as selecting the first column, selecting last columns, selecting columns by name or index, and so on.

If you have any questions, comment below.

You May Also Like

Leave a Comment