Pandas dataframe allows you to store data in rows and column format.
You can select rows from Pandas Dataframe based on Column values using df.loc[df[‘No_Of_Units’] == 5] statement.
In this tutorial, you’ll learn how to select rows from Pandas dataframe based on column values. Using single condition, multiple conditions or selecting rows with column values starting with a specific string.
If You’re in Hurry…
You can use the loc attribute of the dataframe to select rows based on the column value.
Code
The code below demonstrates how to select rows with the value 5 for the column No_Of_Units.
df.loc[df['No_Of_Units'] == 5]
If You Want to Understand Details, Read on…
You’ll see how to select rows based on column value in detail.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
It contains,
- Rows with values for all columns
- Rows with Empty or Missing Data for each column
- Rows with Empty or Missing data for all columns
- One Duplicate row
- One column in the sample dataframe is of the
float
type.
import pandas as pd
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
"Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]
}
df = pd.DataFrame(data)
df = df.astype({"Unit_Price": float})
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |
Select Rows Where Column value is equal to Specific Value
In this section, you’ll learn how to select rows where a column value is equal to a specific value using the loc attribute.
The loc attribute allows you to select a subset of rows based on the row or column labels.
The condition df['No_Of_Units'] == 5
creates a Mask
for each row with True
and False
values if the condition pass. Based on this mask, the loc
attribute will select the rows from the dataframe where the Mask
is True
.
Code
The code below demonstrates how to select rows from the dataframe with the value 5 for the column No_Of_Units.
df.loc[df['No_Of_Units'] == 5]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 6 | 4/23/2021 |
This is how you can select rows from pandas dataframe based on a single condition.
Select Rows based on Column value in a List
In this section, you’ll learn how to select rows where a column value is in a list of values using the isin() method and the loc
attribute.
The condition df['No_Of_Units'].isin([5,10])]
creates a Mask
for each row with True
and False
values where the column is 5
or 10
. Based on this mask, the loc
attribute will select the rows from the dataframe where the Mask
is True
.
Code
The code below demonstrates how to select rows from the dataframe where the column values are in the list [5,10]
.
df.loc[df['No_Of_Units'].isin([5,10])]
Dataframe Will Look like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
This is how you can select rows from Pandas dataframe based on values in the list of values.
Select Rows based on Multiple Conditions With OR
In this section, you’ll learn how to select rows based on Multiple conditions with OR
conditions.
The OR
operator is denoted by the single pipe |
symbol.
The rows that pass any one of the multiple conditions are masked to True
and selected by the loc
attribute.
Code
The code below demonstrates how to select rows from Pandas dataframe based on multiple conditions and the OR
operator.
df.loc[(df['No_Of_Units'] ==8) | (df['No_Of_Units'] ==10)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
This is how you can select rows from pandas dataframe with multiple conditions and an OR
operator.
Select Rows based on Multiple Conditions With AND
In this section, you’ll learn how to select rows based on Multiple conditions with AND
conditions.
The AND
operator is denoted by the single &
symbol.
The rows that pass all the conditions specified are masked to True
and selected by the loc
attribute.
Code
The code below demonstrates how to select rows from Pandas dataframe based on multiple conditions and the AND
operator.
df.loc[(df['No_Of_Units'] >= 8) & (df['No_Of_Units'] <= 15)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
This is how you can select rows from pandas dataframe with multiple conditions and an OR
operator.
Select Rows where column Value Starts with
In this section, you’ll learn how to select rows from the pandas dataframe where column values start with a specific string.
You can use the pandas series startswith() method to select rows where column values start with a specific value.
It also accepts another mandatory parameter, na
. You can assign a boolean value that needs to be used for elements that are not a proper String. For example, NaN
or None
type missing Values.
Code
The code below demonstrates selecting the rows with the product name starting with the capital M
.
df[df.product_name.str.startswith('M', na=False)]
na=False
denotes that elements that are not properString
must be ignored.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
Conclusion
You’ve learned how to select rows from Pandas dataframe based on column values.
Also, you’ve learned how to use single conditions and multiple conditions and select rows with values based on a list of values.