How To Filter Pandas Dataframe Using In And NOT IN like SQL – Definitive Guide

Pandas dataframe store data in a row and column format.

You can filter pandas dataframe using In and Not in like SQL using the df.isin() method.

This tutorial teaches you how to filter data using In and Not In, like SQL.

If you’re in Hurry

Use the following code to filter rows using In. To use NOT IN, prefix the tilde ~ symbol.

items_to_select=["Country", "India"]

df2=df[df.Country.isin(items_to_select)]

df2

If You Want to Understand Details, Read on…

The isin() method and the query() method allows you to filter data with IN and NOT IN like SQL.

Let us learn these methods in detail.

Creating Sample Dataframe

Create a sample dataframe to learn how to filter the records.

import pandas as pd 

# List of Tuples
users = [ ('Shivam', 'Pandey', 'India', 1),
             ('Kumar', 'Ram' , 'US', 2 ),
         ('Felix','John' , 'Germany', 3 ),
         ('Michael','John' , 'India', 1 ),
              ]

#Create a DataFrame object
df = pd.DataFrame(  users, 
                    columns = ['First Name' , 'Last Name', 'Country', 'Country_Code']
                    ) 

df

DataFrame Will Look Like

First NameLast NameCountryCountry_Code
0ShivamPandeyIndia1
1KumarRamUS2
2FelixJohnGermany3
3MichaelJohnIndia1

Filter Using ISIN (Similar to IN SQL)

To filter using the IN like SQL, use the isin() method.

  • Invoke the isin() method in the specific column with the list of items to select.
  • Pass the subset of the data to df[ ]. So only the rows with Truth values will be filtered.

Code

items_to_select=["Country", "India"]

df2=df[df.Country.isin(items_to_select)]

df2

DataFrame Will Look Like

First NameLast NameCountryCountry_Code
0ShivamPandeyIndia1
3MichaelJohnIndia1

Filter Using ~ISIN (Similar to NOT IN SQL)

To filter the rows using NOT IN, similar to SQL,

  • Prefix the ~ symbol and invoke the isin() method in the specific column with the list of items to select. It will inverse the result of the isin() method.
  • Pass the subset of the data to df[ ]. Only the rows with Truth values will be filtered.

Code

items_to_select=["Country", "India"]

df2=df[~df.Country.isin(items_to_select)]

df2

DataFrame Will Look Like

First NameLast NameCountryCountry_Code
1KumarRamUS2
2FelixJohnGermany3

Filter Using ISIN with Multiple Columns

To use multiple columns with isin() and filter rows,

  • Pass the columns as list in df[ ] to apply the isin() method for those columns
  • Pass the values for each column as a list to the isin() method
  • Use the any() method to select the matching rows

Code

df2 = df[df[['Country', 'Country_Code']].isin(['India',1]).any(axis=1)]

df2

DataFrame Will Look Like

First NameLast NameCountryCountry_Code
0ShivamPandeyIndia1
3MichaelJohnIndia1

Filter Using Query Method

The query() method is another method to filter rows from the pandas dataframe using an SQL-like query.

  • Create an SQL-like where clause query. For example, Country_Code in [1, 2]
  • Invoke the query method and pass the SQL expression to filter the rows based on the where clause

Code

df.query('Country_Code in [1, 2]')

DataFrame Will Look Like

First NameLast NameCountryCountry_Code
0ShivamPandeyIndia1
1KumarRamUS2
3MichaelJohnIndia1

Additional Resources

Leave a Comment