Large datasets might have more columns, and you may need to find columns starting with a specific name.
You can select column names starting with a particular string in the pandas dataframe using df[df.columns[pd.Series(df.columns).str.startswith(‘STR’)]] statement.
This tutorial teaches you how to select column names starting with a particular string in the Pandas dataframe and when it is appropriate to use each method.
If you’re in Hurry
Use the following code to select column names starting with a particular string.
df[df.columns[pd.Series(df.columns).str.startswith('Product')]]
If You Want to Understand Details, Read on…
Read on in detail about finding column names starting with a specific string.
Table of Contents
Sample Dataframe
Create a sample dataframe with different columns.
import pandas as pd
import numpy as np
data = {"Product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
"Product_Unit_Price":[500,200, 5000, np.nan, 250.50],
"No_Of_Units":[5,5, 10, np.nan, 8],
"Available_Quantity":[5,10,12,15, "Not Available"]
}
df = pd.DataFrame(data)
df
DataFrame Will Look like
Product_name | Product_Unit_Price | No_Of_Units | Available_Quantity | |
---|---|---|---|---|
0 | Keyboard | 500.0 | 5.0 | 5 |
1 | Mouse | 200.0 | 5.0 | 10 |
2 | Monitor | 5000.0 | 10.0 | 12 |
3 | CPU | NaN | NaN | 15 |
4 | Speakers | 250.5 | 8.0 | Not Available |
Using Columns and Startswith
To select column names starting with a specific string,
- Create a series with the columns starting with your desired string
- Pass it to the
df.columns[ ]
to select those specific columns - Pass the result to
df[ ]
to select subset of that specific columns
Whentouse: Use this method when you want to select column names starting with a particular string.
Code
df[df.columns[pd.Series(df.columns).str.startswith('Product')]]
DataFrame Will Look Like
Product_name | Product_Unit_Price | |
---|---|---|
0 | Keyboard | 500.0 |
1 | Mouse | 200.0 |
2 | Monitor | 5000.0 |
3 | CPU | NaN |
4 | Speakers | 250.5 |
Using Filter
The filter() method filters the dataframe based on the condition specified and returns the subset of the data.
- Use a regular expression to select column names based on a condition
- To select columns starting with a specific string, use str*, which means the name should start with
str
; after that, it can contain any String.
Whentouse: You can use this method when you want to select using a regular expression to select columns matching a specific condition.
Code
df.filter(regex='Product*')
DataFrame Will Look Like
Product_name | Product_Unit_Price | |
---|---|---|
0 | Keyboard | 500.0 |
1 | Mouse | 200.0 |
2 | Monitor | 5000.0 |
3 | CPU | NaN |
4 | Speakers | 250.5 |
Using the Like Keyword with Filter
You can also use the like
keyword to select using the SQL-like expressions.
df.filter(like='Product')
DataFrame Will Look Like
Product_name | Product_Unit_Price | |
---|---|---|
0 | Keyboard | 500.0 |
1 | Mouse | 200.0 |
2 | Monitor | 5000.0 |
3 | CPU | NaN |
4 | Speakers | 250.5 |
Using Loc and StartsWith
The loc attribute is used to select a subset of the dataframe using labels.
- Filter the column names using the
startswith()
method. - Specify a range of rows to be selected from those columns. You can also select all rows.
Whentouse: Use this method to select columns starting with a specific string and select a range of rows from it.
Code
df.loc[0:2, df.columns.str.startswith('Product')]
DataFrame Will Look Like
Product_name | Product_Unit_Price | |
---|---|---|
0 | Keyboard | 500.0 |
1 | Mouse | 200.0 |
2 | Monitor | 5000.0 |