How To Select Column Names Starting With a Particular String in Pandas Dataframe -Definitive Guide

Large datasets might have more columns, and you may need to find columns starting with a specific name.

You can select column names starting with a particular string in the pandas dataframe using df[df.columns[pd.Series(df.columns).str.startswith(‘STR’)]] statement.

This tutorial teaches you how to select column names starting with a particular string in the Pandas dataframe and when it is appropriate to use each method.

If you’re in Hurry

Use the following code to select column names starting with a particular string.

df[df.columns[pd.Series(df.columns).str.startswith('Product')]]

If You Want to Understand Details, Read on…

Read on in detail about finding column names starting with a specific string.

Sample Dataframe

Create a sample dataframe with different columns.

import pandas as pd

import numpy as np

data = {"Product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
        "Product_Unit_Price":[500,200, 5000, np.nan, 250.50],
        "No_Of_Units":[5,5, 10, np.nan, 8],
        "Available_Quantity":[5,10,12,15, "Not Available"]
       }

df = pd.DataFrame(data)

df

DataFrame Will Look like

Product_nameProduct_Unit_PriceNo_Of_UnitsAvailable_Quantity
0Keyboard500.05.05
1Mouse200.05.010
2Monitor5000.010.012
3CPUNaNNaN15
4Speakers250.58.0Not Available

Using Columns and Startswith

To select column names starting with a specific string,

  • Create a series with the columns starting with your desired string
  • Pass it to the df.columns[ ] to select those specific columns
  • Pass the result to df[ ] to select subset of that specific columns

Whentouse: Use this method when you want to select column names starting with a particular string.

Code

df[df.columns[pd.Series(df.columns).str.startswith('Product')]]

DataFrame Will Look Like

Product_nameProduct_Unit_Price
0Keyboard500.0
1Mouse200.0
2Monitor5000.0
3CPUNaN
4Speakers250.5

Using Filter

The filter() method filters the dataframe based on the condition specified and returns the subset of the data.

  • Use a regular expression to select column names based on a condition
  • To select columns starting with a specific string, use str*, which means the name should start with str; after that, it can contain any String.

Whentouse: You can use this method when you want to select using a regular expression to select columns matching a specific condition.

Code

df.filter(regex='Product*')

DataFrame Will Look Like

Product_nameProduct_Unit_Price
0Keyboard500.0
1Mouse200.0
2Monitor5000.0
3CPUNaN
4Speakers250.5

Using the Like Keyword with Filter

You can also use the like keyword to select using the SQL-like expressions.

df.filter(like='Product')

DataFrame Will Look Like

Product_nameProduct_Unit_Price
0Keyboard500.0
1Mouse200.0
2Monitor5000.0
3CPUNaN
4Speakers250.5

Using Loc and StartsWith

The loc attribute is used to select a subset of the dataframe using labels.

  • Filter the column names using the startswith() method.
  • Specify a range of rows to be selected from those columns. You can also select all rows.

Whentouse: Use this method to select columns starting with a specific string and select a range of rows from it.

Code

df.loc[0:2, df.columns.str.startswith('Product')]

DataFrame Will Look Like

Product_nameProduct_Unit_Price
0Keyboard500.0
1Mouse200.0
2Monitor5000.0

Additional Resources

Leave a Comment