How to Select Rows from Pandas Dataframe Based on Condition – Definitive Guide

Pandas allow you to store data as rows and columns.

You can select rows from Pandas dataframe based on conditions using df.loc[df[‘No_Of_Units’] == 5] statement.

In this tutorial, you’ll learn how to select rows from Pandas Dataframe based on conditions using loc, isin() and query() methods.

If You’re in Hurry…

df.loc[df['No_Of_Units'] == 5]

The above code selects the rows from the dataframe where the column No_Of_Units equals Five.

If You Want to Understand Details, Read on…

You’ll learn in detail about how to select rows from Pandas dataframe based on conditions using loc and other methods.

To know the different methods to select rows from the dataframe without conditions, read How To Select Rows From Pandas Dataframe.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

It contains,

  • Rows with values for all columns
  • Rows with Empty or Missing Data for each column
  • Rows with Empty or Missing data for all columns
  • One Duplicate row
  • One column in the sample dataframe is of the float type.

Code

import pandas as pd

data = {

"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],

        "Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],

        "No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],

        "Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],

        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]

       }


df = pd.DataFrame(data)

df = df.astype({"Unit_Price": float})

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.0005511/5/2021
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021
3CPU10000.55020Not Available09/18/2021
4CPU10000.55020Not Available09/18/2021
5Speakers250.5008NaT01/05/2021
6NaTNaNNaTNaTNaT

You’ll use this dataframe to select rows based on different conditions.

Select Rows Based on Condition Using Loc

In this section, you’ll use the loc attribute to select rows based on condition.

The loc attribute allows you to select a subset of rows based on the row and column labels.

You can pass the conditions to return the row labels/indexes to select rows based on condition.

Code

The code below demonstrates how to select rows from the dataframe where the column no_of_units equals five.

df.loc[df['No_Of_Units'] == 5]

The rows with the column value 5 will be retrieved and displayed.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.0564/23/2021

This is how you can use loc to select rows based on a single condition.

Select Rows Based on Multiple Conditions

In this section, you’ll select rows based on multiple conditions.

You can use the & operator to combine multiple conditions.

Code

The below code demonstrates how to select rows based on multiple conditions where No_Of_Units >=8 and <=15.

df.loc[(df['No_Of_Units'] >= 8) & (df['No_Of_Units'] <= 15)]

The rows with column values No_Of_Units >=8 and <=15 will be retrieved and displayed.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
2Monitor5000.235101008/21/2021
5Speakers250.5008NaT01/05/2021

This is how you can select rows based on multiple conditions.

Select Rows Based on Condition using Query Method

In this section, you’ll use the query() method to select rows based on condition.

The query() method queries the dataframe with a boolean expression.

When you pass a condition, it checks each row if the expression is evaluated as True. If yes, then it selects that row. Else it ignores that Rows.

It also accepts another parameter, inplace.

  • inplace = True – modifies the data in the same dataframe
  • inplace = False – Returns a new dataframe object based on the query condition

Be cautious when using inplace = True because the data will permanently be modified in the dataframe.

Code

The code below demonstrates how to select rows that have Unit_Price>1000. Also, a new dataframe will be created based on the result.

df2 = df.query('Unit_Price>1000', inplace=False)

df2

The rows with the unit_price greater than 1000 will be retrieved and assigned to the new dataframe df2.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
2Monitor5000.235101008/21/2021
3CPU10000.55020Not Available09/18/2021
4CPU10000.55020Not Available09/18/2021

The source dataframe df looks intact. There is no change to the data because of inplace=False.

Code

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.0005511/5/2021
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021
3CPU10000.55020Not Available09/18/2021
4CPU10000.55020Not Available09/18/2021
5Speakers250.5008NaT01/05/2021
6NaTNaNNaTNaTNaT

This is how you can use the query() method to select data from the dataframe.

Using Loc and isin()

In this section, you’ll select rows from the dataframe with values in a list.

You can use the loc attribute and the isin() method to select rows from the dataframe based on values in a list.

You can pass the list of values to the isin() method.

Code

The below code demonstrates how to select rows from the dataframe where the No_Of_Units have the value of 5 or 10.

df.loc[df['No_Of_Units'].isin([5,10])]

The rows with the value for column No_Of_Units as 5 or 10 are retrieved and displayed.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.0005511/5/2021
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021

This is how you can select rows with a list of values.

Select Rows where column Value Starts with

In this section, you’ll learn how to select rows where a column value starts with a specific String.

You can use the column name and str.startswith() method to evaluate the value if it starts with a specific string.

Also, you need to pass the value for na=False. This is to ignore the rows that don’t start with the specific String.

The startswith() comparison is a case sensitive comparison.

Code

The code below demonstrates how to select rows with the product_name starting with M.

df[df.product_name.str.startswith('M', na=False)]

The rows with product_name starting with M are selected and displayed.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021

This is how you can select rows that start with a specific value.

Select Rows Containing String

In this section, you’ll learn how to select rows that contain a specific string.

You can use the column name and str.contains() method to evaluate the value if it contains a specific string.

Also, you need to pass the value for na=False. This is to ignore the rows that don’t contain a specific string.

The contains() comparison is a case sensitive comparison.

Code

The code below demonstrates how to select rows with the product_name containing U.

df[df.product_name.str.contains('U', na=False)]

The rows that contain String ‘U’ in the product name column are selected and displayed.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
3CPU10000.5520Not Available09/18/2021
4CPU10000.5520Not Available09/18/2021

This is how you can select rows that contain a specific string from the pandas dataframe.

Conclusion

You’ve learned how to select rows from Pandas Dataframe based on conditions.

Also, you’ve learned how to select rows based on multiple conditions and how to select rows that have a starting value with a specific String.

You May Also Like

Leave a Comment