Pandas dataframe allows you to store data in rows and column format.
You can select rows from Pandas Dataframe based on Column values using df.loc[df[‘No_Of_Units’] == 5] statement.
Basic Example
The code below demonstrates how to select rows with the value 5 for the column No_Of_Units.
df.loc[df['No_Of_Units'] == 5]
In this tutorial, you’ll learn how to select rows from the Pandas dataframe based on column values. Using single condition, multiple conditions or selecting rows with column values starting with a specific string.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
It contains,
- Rows with values for all columns
- Rows with Empty or Missing Data for each column
- Rows with Empty or Missing data for all columns
- One Duplicate row
- One column in the sample dataframe is of the
float
type.
import pandas as pd
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
"Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]
}
df = pd.DataFrame(data)
df = df.astype({"Unit_Price": float})
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |
Select Rows Where Column value is equal to the Specific Value
The loc attribute allows you to select a subset of rows based on the row or column labels.
- Condition
df['No_Of_Units'] == 5
creates aMask
for each row withTrue
andFalse
values if the condition pass - The
loc
attribute will select the rows from the dataframe where theMask
isTrue
.
Code
The code below demonstrates how to select rows from the dataframe with the value 5 for column No_Of_Units.
df.loc[df['No_Of_Units'] == 5]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 6 | 4/23/2021 |
This is how you can select rows from pandas dataframe based on a single condition.
Select Rows based on Column value in a List
In this section, you’ll learn how to select rows where a column value is in a list of values using the isin() method and the loc
attribute.
- Condition
df['No_Of_Units'].isin([5,10])]
creates aMask
for each row withTrue
where the column is5
or10
andFalse
for other values. - The
loc
attribute will select the rows from the dataframe where theMask
isTrue
.
Code
The code below demonstrates how to select rows from the dataframe where the column values are in the list [5,10]
.
df.loc[df['No_Of_Units'].isin([5,10])]
Dataframe Will Look like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
This is how you can select rows from Pandas dataframe based on values in the list of values.
Select Rows based on Multiple Conditions With OR
To select rows based on Multiple conditions,
- Combine conditions using OR operator
- The
OR
operator is denoted by the single pipe|
symbol.
The rows that pass any one of the multiple conditions are masked to True
and selected by the loc
attribute.
Code
The code below demonstrates how to select rows from Pandas dataframe based on multiple conditions and the OR
operator.
df.loc[(df['No_Of_Units'] ==8) | (df['No_Of_Units'] ==10)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
This is how you can select rows from pandas dataframe with multiple conditions and an OR
operator.
Select Rows based on Multiple Conditions With AND
To select rows based on Multiple conditions
- Combine multiple conditions using AND operator
- The
AND
operator is denoted by the single&
symbol.
The rows that pass all the conditions specified are masked to True
and selected by the loc
attribute.
Code
The code below demonstrates how to select rows from Pandas dataframe based on multiple conditions and the AND
operator.
df.loc[(df['No_Of_Units'] >= 8) & (df['No_Of_Units'] <= 15)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
This is how you can select rows from pandas dataframe with multiple conditions and an OR
operator.
Select Rows where column Value Starts with
In this section, you’ll learn how to select rows from the pandas dataframe where column values start with a specific string.
- Use the pandas series startswith() method to select rows where column values start with a specific value.
It also accepts another mandatory parameter, na
.
- Assign a boolean value to be used for elements that are not proper strings
- For example,
NaN
orNone
type missing Values.
Code
The code below demonstrates selecting the rows with the product name starting with the capital M
.
df[df.product_name.str.startswith('M', na=False)]
na=False
denotes that elements that are not properString
must be ignored.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |