Pandas dataframe store data in a row and column format.
You can filter pandas dataframe using In and Not in like SQL using the df.isin()
method.
This tutorial teaches you how to filter data using In and Not In, like SQL.
If you’re in Hurry
Use the following code to filter rows using In
. To use NOT IN
, prefix the tilde ~
symbol.
items_to_select=["Country", "India"]
df2=df[df.Country.isin(items_to_select)]
df2
If You Want to Understand Details, Read on…
The isin()
method and the query()
method allows you to filter data with IN
and NOT IN
like SQL.
Let us learn these methods in detail.
Creating Sample Dataframe
Create a sample dataframe to learn how to filter the records.
import pandas as pd
# List of Tuples
users = [ ('Shivam', 'Pandey', 'India', 1),
('Kumar', 'Ram' , 'US', 2 ),
('Felix','John' , 'Germany', 3 ),
('Michael','John' , 'India', 1 ),
]
#Create a DataFrame object
df = pd.DataFrame( users,
columns = ['First Name' , 'Last Name', 'Country', 'Country_Code']
)
df
DataFrame Will Look Like
First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|
0 | Shivam | Pandey | India | 1 |
1 | Kumar | Ram | US | 2 |
2 | Felix | John | Germany | 3 |
3 | Michael | John | India | 1 |
Filter Using ISIN (Similar to IN SQL)
To filter using the IN
like SQL, use the isin() method.
- Invoke the
isin()
method in the specific column with the list of items to select. - Pass the subset of the data to
df[ ]
. So only the rows with Truth values will be filtered.
Code
items_to_select=["Country", "India"]
df2=df[df.Country.isin(items_to_select)]
df2
DataFrame Will Look Like
First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|
0 | Shivam | Pandey | India | 1 |
3 | Michael | John | India | 1 |
Filter Using ~ISIN (Similar to NOT IN SQL)
To filter the rows using NOT IN, similar to SQL,
- Prefix the
~
symbol and invoke theisin()
method in the specific column with the list of items to select. It will inverse the result of theisin()
method. - Pass the subset of the data to
df[ ]
. Only the rows with Truth values will be filtered.
Code
items_to_select=["Country", "India"]
df2=df[~df.Country.isin(items_to_select)]
df2
DataFrame Will Look Like
First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|
1 | Kumar | Ram | US | 2 |
2 | Felix | John | Germany | 3 |
Filter Using ISIN with Multiple Columns
To use multiple columns with isin()
and filter rows,
- Pass the columns as list in
df[ ]
to apply theisin()
method for those columns - Pass the values for each column as a list to the
isin()
method - Use the
any()
method to select the matching rows
Code
df2 = df[df[['Country', 'Country_Code']].isin(['India',1]).any(axis=1)]
df2
DataFrame Will Look Like
First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|
0 | Shivam | Pandey | India | 1 |
3 | Michael | John | India | 1 |
Filter Using Query Method
The query() method is another method to filter rows from the pandas dataframe using an SQL-like query.
- Create an SQL-like where clause query. For example,
Country_Code in [1, 2]
- Invoke the query method and pass the SQL expression to filter the rows based on the
where
clause
Code
df.query('Country_Code in [1, 2]')
DataFrame Will Look Like
First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|
0 | Shivam | Pandey | India | 1 |
1 | Kumar | Ram | US | 2 |
3 | Michael | John | India | 1 |