Pandas allow you to store data as rows and columns.
You can select rows from Pandas dataframe based on conditions using df.loc[df[‘No_Of_Units’] == 5] statement.
Basic Example
df.loc[df['No_Of_Units'] == 5]
- The above code selects the rows from the dataframe where the column No_Of_Units equals Five.
In this tutorial, you’ll learn how to select rows from Pandas Dataframe based on conditions using loc
, isin()
and query()
methods.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
It contains,
- Rows with values for all columns
- Rows with Empty or Missing Data for each column
- Rows with Empty or Missing data for all columns
- One Duplicate row
- One column in the sample dataframe is of the
float
type.
Code
import pandas as pd
data = {
"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
"Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]
}
df = pd.DataFrame(data)
df = df.astype({"Unit_Price": float})
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |
You’ll use this dataframe to select rows based on different conditions.
Select Rows Based on Condition Using Loc
The loc
attribute selects a subset of rows based on the row and column labels.
- Pass the conditions to return the row labels/indexes to select rows based on the condition
Code
The code below demonstrates how to select rows from the dataframe where the column no_of_units equals five.
df.loc[df['No_Of_Units'] == 5]
The rows with the column value 5 will be retrieved and displayed.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 6 | 4/23/2021 |
This is how you can use loc
to select rows based on a single condition.
Select Rows Based on Multiple Conditions
In this section, you’ll select rows based on multiple conditions.
- Use the
&
operator to combine multiple conditions
Code
The following code demonstrates how to select rows based on multiple conditions where No_Of_Units >=8
and <=15
.
df.loc[(df['No_Of_Units'] >= 8) & (df['No_Of_Units'] <= 15)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
This is how you can select rows based on multiple conditions.
Select Rows Based on Condition using Query Method
The query()
method queries the dataframe with a boolean expression.
- Pass the condition to the query() method
- It checks each row to see if the expression is evaluated to
True
. - If yes, it selects that row. Else, it ignores the row.
It also accepts another parameter, inplace
.
inplace = True
– modifies the data in the same dataframeinplace = False
– Returns a new dataframe object based on the query condition
Be cautious when using inplace = True
because the data will permanently be modified in the dataframe.
Code
The code below demonstrates how to select rows that have Unit_Price>1000
. Also, a new dataframe will be created based on the result.
df2 = df.query('Unit_Price>1000', inplace=False)
df2
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
The source dataframe df
looks intact. There is no change to the data because of inplace=False
.
Code
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |
This is how you can use the query()
method to select data from the dataframe.
Using Loc and isin()
In this section, you’ll select rows from the dataframe with values in a list.
- Use the
loc
attribute and the isin() method - Pass the list of values to the
isin()
method - Rows with these list of values will be selected
Code
The below code demonstrates how to select rows from the dataframe where the No_Of_Units have the value of 5
or 10
.
df.loc[df['No_Of_Units'].isin([5,10])]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
This is how you can select rows with a list of values.
Select Rows where column Value Starts with
In this section, you’ll learn how to select rows where a column value starts with a specific String
.
- Use the column name and str.startswith() method to evaluate the value if it starts with a specific string.
- Pass the value for
na=False
. This is to ignore the rows that don’t start with the specific String.
The startswith()
comparison is a case sensitive comparison.
Code
The code below demonstrates how to select rows with the product_name starting with M
.
df[df.product_name.str.startswith('M', na=False)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
This is how you can select rows that start with a specific value.
Select Rows Containing String
In this section, you’ll learn how to select rows that contain a specific string.
- Use the column name and str.contains() method to evaluate the value if it contains a specific string.
- Pass the value
na=False
. This is to ignore the rows that don’t contain a specific string.
The contains()
comparison is a case sensitive comparison.
Code
The code below demonstrates how to select rows with the product_name containing U
.
df[df.product_name.str.contains('U', na=False)]
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
3 | CPU | 10000.55 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.55 | 20 | Not Available | 09/18/2021 |
This is how you can select rows that contain a specific string from the pandas dataframe.