How to Select Rows from Pandas Dataframe Based on Condition – Definitive Guide

Pandas allow you to store data as rows and columns.

You can select rows from Pandas dataframe based on conditions using df.loc[df[‘No_Of_Units’] == 5] statement.

Basic Example

df.loc[df['No_Of_Units'] == 5]
  • The above code selects the rows from the dataframe where the column No_Of_Units equals Five.

In this tutorial, you’ll learn how to select rows from Pandas Dataframe based on conditions using loc, isin() and query() methods.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

It contains,

  • Rows with values for all columns
  • Rows with Empty or Missing Data for each column
  • Rows with Empty or Missing data for all columns
  • One Duplicate row
  • One column in the sample dataframe is of the float type.

Code

import pandas as pd

data = {

"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],

        "Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],

        "No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],

        "Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],

        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]

       }


df = pd.DataFrame(data)

df = df.astype({"Unit_Price": float})

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.0005511/5/2021
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021
3CPU10000.55020Not Available09/18/2021
4CPU10000.55020Not Available09/18/2021
5Speakers250.5008NaT01/05/2021
6NaTNaNNaTNaTNaT

You’ll use this dataframe to select rows based on different conditions.

Select Rows Based on Condition Using Loc

The loc attribute selects a subset of rows based on the row and column labels.

  • Pass the conditions to return the row labels/indexes to select rows based on the condition

Code

The code below demonstrates how to select rows from the dataframe where the column no_of_units equals five.

df.loc[df['No_Of_Units'] == 5]

The rows with the column value 5 will be retrieved and displayed.

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.0564/23/2021

This is how you can use loc to select rows based on a single condition.

Select Rows Based on Multiple Conditions

In this section, you’ll select rows based on multiple conditions.

  • Use the & operator to combine multiple conditions

Code

The following code demonstrates how to select rows based on multiple conditions where No_Of_Units >=8 and <=15.

df.loc[(df['No_Of_Units'] >= 8) & (df['No_Of_Units'] <= 15)]

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
2Monitor5000.235101008/21/2021
5Speakers250.5008NaT01/05/2021

This is how you can select rows based on multiple conditions.

Select Rows Based on Condition using Query Method

The query() method queries the dataframe with a boolean expression.

  • Pass the condition to the query() method
  • It checks each row to see if the expression is evaluated to True.
  • If yes, it selects that row. Else, it ignores the row.

It also accepts another parameter, inplace.

  • inplace = True – modifies the data in the same dataframe
  • inplace = False – Returns a new dataframe object based on the query condition

Be cautious when using inplace = True because the data will permanently be modified in the dataframe.

Code

The code below demonstrates how to select rows that have Unit_Price>1000. Also, a new dataframe will be created based on the result.

df2 = df.query('Unit_Price>1000', inplace=False)

df2

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
2Monitor5000.235101008/21/2021
3CPU10000.55020Not Available09/18/2021
4CPU10000.55020Not Available09/18/2021

The source dataframe df looks intact. There is no change to the data because of inplace=False.

Code

df

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.0005511/5/2021
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021
3CPU10000.55020Not Available09/18/2021
4CPU10000.55020Not Available09/18/2021
5Speakers250.5008NaT01/05/2021
6NaTNaNNaTNaTNaT

This is how you can use the query() method to select data from the dataframe.

Using Loc and isin()

In this section, you’ll select rows from the dataframe with values in a list.

  • Use the loc attribute and the isin() method
  • Pass the list of values to the isin() method
  • Rows with these list of values will be selected

Code

The below code demonstrates how to select rows from the dataframe where the No_Of_Units have the value of 5 or 10.

df.loc[df['No_Of_Units'].isin([5,10])]

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.0005511/5/2021
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021

This is how you can select rows with a list of values.

Select Rows where column Value Starts with

In this section, you’ll learn how to select rows where a column value starts with a specific String.

  • Use the column name and str.startswith() method to evaluate the value if it starts with a specific string.
  • Pass the value for na=False. This is to ignore the rows that don’t start with the specific String.

The startswith() comparison is a case sensitive comparison.

Code

The code below demonstrates how to select rows with the product_name starting with M.

df[df.product_name.str.startswith('M', na=False)]

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
1Mouse200.000564/23/2021
2Monitor5000.235101008/21/2021

This is how you can select rows that start with a specific value.

Select Rows Containing String

In this section, you’ll learn how to select rows that contain a specific string.

  • Use the column name and str.contains() method to evaluate the value if it contains a specific string.
  • Pass the value na=False. This is to ignore the rows that don’t contain a specific string.

The contains() comparison is a case sensitive comparison.

Code

The code below demonstrates how to select rows with the product_name containing U.

df[df.product_name.str.contains('U', na=False)]

Dataframe Will Look Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
3CPU10000.5520Not Available09/18/2021
4CPU10000.5520Not Available09/18/2021

This is how you can select rows that contain a specific string from the pandas dataframe.

Additional Resources

Leave a Comment