How to Get Column Names in Pandas Dataframe – Definitive Guide

Pandas dataframe is a two-dimensional data structure used to store data in rows and columns format. Each column will have headers/names. These names can be used to identify the columns.

You can get column names in Pandas dataframe using df.columns statement.

Usecase: This is useful when you want to show all columns in a dataframe in the output console (E.g. in the jupyter notebook console).

In this tutorial, you’ll learn the different methods available to get column names from the pandas dataframe.

If You’re in Hurry…

You can use the below code snippet to get column names from pandas dataframe.

Snippet

df.columns

You’ll see all the column names from the dataframe printed as Index. The index is an immutable sequence used for indexing.

Output

    Index(['product_name', 'Unit_Price', 'No_Of_Units', 'Available_Quantity',
           'Available_Since_Date'],
          dtype='object')

To get the column headers as a list, use the below snippet.

It’ll convert the columns as an array and then it’ll convert to a list using the method tolist() method.

Snippet

df.columns.values.tolist()

You’ll see the column names printed as a list as shown below.

Output

    ['product_name',
     'Unit_Price',
     'No_Of_Units',
     'Available_Quantity',
     'Available_Since_Date']

This is how you can get the column headers of the pandas dataframe as a list.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods available to get the pandas dataframe column headers for various purposes.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

import pandas as pd

data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers",pd.NaT],
        "Unit_Price":[500,200, 5000, 10000, 250.50,350],
        "No_Of_Units":[5,5, 10, 20, 8,pd.NaT],
        "Available_Quantity":[5,6,10,"Not Available", pd.NaT,pd.NaT],
        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','01/05/2021',pd.NaT]
       }

df = pd.DataFrame(data)

# Converting one column as float to demonstrate dtypes
df = df.astype({"Unit_Price": float})


df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.0564/23/2021
2Monitor5000.0101008/21/2021
3CPU10000.020Not Available09/18/2021
4Speakers250.58NaT01/05/2021
5NaT350.0NaTNaTNaT

Now, let’s see how to get the column headers.

Pandas Get Column Names

In this section, you’ll see how to get column names using different methods.

Using Columns

Columns attribute of the dataframe returns the column labels of the dataframe.

Snippet

df.columns

Output

    Index(['product_name', 'Unit_Price', 'No_Of_Units', 'Available_Quantity',
           'Available_Since_Date'],
          dtype='object')

Get Column Names as Array

You can get the column names as an array by using the .columns.values property of the dataframe.

Snippet

df.columns.values

You’ll see the column headers returned as array.

Output

    array(['product_name', 'Unit_Price', 'No_Of_Units', 'Available_Quantity',
           'Available_Since_Date'], dtype=object)

This is how you can get all the column headers from the pandas dataframe.

Next, you’ll learn how to get a list from dataframe column headers.

Pandas Get List From Dataframe Columns Headers

You can get column names as list by using the .columns.values property of the dataframe and converting it to a list using the tolist() method as shown below.

Snippet

df.columns.values.tolist()

You’ll see the column headers returned as list.

Output

    ['product_name',
     'Unit_Price',
     'No_Of_Units',
     'Available_Quantity',
     'Available_Since_Date']

Another way to get column headers as a list is by using the list() method.

You can pass the dataframe object to the list() method. It’ll return the column headers as a list.

Snippet

columns_list = list(df)

columns_list

You’ll see the column headers displayed as a list.

Output

    ['product_name',
     'Unit_Price',
     'No_Of_Units',
     'Available_Quantity',
     'Available_Since_Date']

This is how you can get pandas column names as a list.

Next, you’ll learn how to get column names and types.

Pandas List Column Names and Types

In this section, you’ll learn how to list column names and types of each column of the dataframe.

You can do this by using the dtypes. This returns a series with the datatype of each column in the dataframe.

Snippet

df.dtypes

You’ll see the column name and the data type of each column is printed as series.

Output

    product_name             object
    Unit_Price              float64
    No_Of_Units              object
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

Next, you’ll learn how to get a list from dataframe columns based on datatype.

Pandas Get List From Dataframe Columns Headers based on Data Type

In this section, you’ll learn how to get a list from dataframe column headers based on the data type of the column.

For example, this can be used when you want to identify all the number columns available in the dataframe.

You can do this by using the select_dtypes() method available in the dataframe. It’ll return a subset of dataframe columns based on the dataframe types. Then you can use the columns property on the subset to get the column names.

You can pass any default datatypes available in Python or you can also use the datatypes available in packages such as Pandas or Numpy.

Snippet

list(df.select_dtypes(['float64']).columns)

where,

  • df.select_dtypes – Invoking the select dtypes method in dataframe to select the specific datatype columns
  • ['float64'] – Datatype of the column to be selected
  • .columns – To get the header of the column selected using the select_dtypes(). This value is passed to the list() method to get the column names as list. In the sample dataframe, only the Unit_Price column is a float column. Hence only this column will be displayed.

Output

    ['Unit_Price']

This is how you can get column headers based on data types.

Next, you’ll learn how to get column names by using the index.

Pandas Get Column Names by Index

In this section, you’ll learn how to get column names by using its index.

This can be useful when you want to know which column is existing in a specific position.

You can get the name from a specific index by passing the index to the columns attribute of the dataframe as shown below.

Index is 0 based. Hence, if you use 2, you’ll get a column from the third position.

Snippet

df.columns[2]

You’ll see the column header available in the position 3.

Output

    'No_Of_Units'

This is how you can get a single column header using the index.

Next, you’ll learn using multi-index.

Pandas Get Column Names Multiindex

In this section, you’ll learn how to get column names by using the multi index.

Multi index can be used to get multiple column headers from the dataframe.

Multiple column headers will be printed as Index. The index is an immutable sequence used for indexing.

As said before, the Index is 0 based. Hence, if you use 2, you’ll get a column from the third position.

Snippet

df.columns[[1,2]]

You’ll see the column header available in the position 2 and 3.

Output

    Index(['Unit_Price', 'No_Of_Units'], dtype='object')

This is how you can get multiple column headers using the index.

Next, you’ll learn to get columns starting with a specific String.

Pandas Get Column Names Starting With

In this section, you’ll learn how to get column names starting with a specific String literal.

You can use the startswith() method available in the String() object on the list of column names.

df.loc[] is used to identify the columns using the names.

df.columns.str.startswith('A') will yield the columns starting with A and df.loc will return all the columns returned by startswith(). Then you can get the column names using the columns attribute.

Snippet

df.loc[:, df.columns.str.startswith('A')].columns

All the columns starting with A will be displayed as an index.

Output

Index(['Available_Quantity', 'Available_Since_Date'], dtype='object')

This is how you can get column names starting with a specific String literal.

Next, you’ll learn how to get column names based on conditions.

Pandas Get Column Names Based on Condition

In this section, you’ll learn how to get column names based on conditions.

This can be useful when you want to identify columns that contain specific values. It is also known as getting column names by value.

For example, if you need to get column names which have the value 5 in any cell, then you can use the below example.

Snippet

df.columns[      
    (df == 5)        # mask 
    .any(axis=0)     # mask
]

In the sample dataframe, the columns No_Of_Units and Available_Quantity contains the value 5. Hence, you’ll see the two columns printed as index.

Output

    Index(['No_Of_Units', 'Available_Quantity'], dtype='object')

This is how you can get column names based on value.

Next, you’ll see about the column names with Nan.

Pandas Get Column Names With NaN

In this section, you’ll learn how to get column names with NaN.

NaN is a value used to denote the missing data.

You can identify the columns with missing data using isna() method or isnull() method.

Snippet for isna()

df.isna().any()

Output

    product_name             True
    Unit_Price              False
    No_Of_Units              True
    Available_Quantity       True
    Available_Since_Date     True
    dtype: bool

Snippet for isnull()

df.isnull().any()

Output

    product_name             True
    Unit_Price              False
    No_Of_Units              True
    Available_Quantity       True
    Available_Since_Date     True
    dtype: bool

This is how you can identify column headers with missing values.

Next, let’s discuss the columns with the duplicate values.

Pandas Get Column Names with Duplicate Values

In this section, you’ll learn how to get column names with duplicate values. This can be useful when you want to identify the columns which have duplicates.

You can do this by applying the function duplicated() on each cell.

Lambda function returns True, if any value is duplicated. False, if the value is not duplicated.

Snippet

df.apply(lambda x: x.duplicated().any(), axis='rows')

Output

    product_name            False
    Unit_Price              False
    No_Of_Units              True
    Available_Quantity       True
    Available_Since_Date    False
    dtype: bool

This is how you can get the columns headers that contain duplicated values.

Next, you’ll learn how to get column names in a sorted way.

Pandas Get Column Names Sorted

In this section, you’ll learn how to get column names sorted in an alphabetical way.

You can do this by using the sorted() function.

Sorted() function sorts the list of values passed to it. So when you pass the dataframe to it, it’ll sort the column headers in an alphabetical way and return it as list.

Snippet

sorted(df)

The dataframe column headers are sorted in an alphabetical way and listed as below.

Output

    ['Available_Quantity',
     'Available_Since_Date',
     'No_Of_Units',
     'Unit_Price',
     'product_name']

This is how you can get column headers in an alphabetical way.

Conclusion

To summarize, you’ve learned how to get column names from the pandas dataframe in different scenarios. Also with examples such as getting column names as a list, column names using Index, Getting column names based on condition and column names which have duplicate values or missing values, and so on.

This also answers how to show all columns of a dataframe in the output console.

If you have any questions, comment below.

You May Also Like

Leave a Comment