During data wrangling, you may need to filter rows by column values in a list.
Answer: df[df[‘Ccolumn_name‘].isin([‘V1’,’V2’])]
This tutorial teaches you the different methods to filter rows by column values in a list in Pandas Dataframe and their appropriate use cases.
Sample Dataframe
Create the following dataframe with four columns, namely Reg_no, First Name, Last Name and Country_Code.
import pandas as pd
# List of Tuples
users = [ (1,'Shivam', 'Pandey', 'India', 1),
(2, 'Kumar', 'Ram' , 'India', 1 ),
(3, 'Felix','John' , 'Germany', 2 ),
(4, 'Michael','Arndtz' , 'Germany', 2 ),
(5, 'Praveen','Dinakar' , 'USA', 3 )
]
#Create a DataFrame object
df = pd.DataFrame(users,
columns = ['RegNo.','First Name' , 'Last Name', 'Country', 'Country_Code']
)
df
DataFrame Will Look Like
RegNo. | First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|---|
0 | 1 | Shivam | Pandey | India | 1 |
1 | 2 | Kumar | Ram | India | 1 |
2 | 3 | Felix | John | Germany | 2 |
3 | 4 | Michael | Arndtz | Germany | 2 |
4 | 5 | Praveen | Dinakar | USA | 3 |
Using isin
The isin()
function checks each element in the dataframe is available in the list of values and returns a dataframe with booleans.
To filter rows by column values in a list,
- Use the column name and
isin()
function along with the list of values. - Pass the resultant dataframe again to the
df
object. The rows that contain aTrue
boolean value will be filtered and displayed.
Use this method when you want to select rows using a list of values.
Code
The following code filters the rows with country_code values 1 and 3.
df[df['Country_Code'].isin([1,3])]
DataFrame Will Look Like
RegNo. | First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|---|
0 | 1 | Shivam | Pandey | India | 1 |
1 | 2 | Kumar | Ram | India | 1 |
4 | 5 | Praveen | Dinakar | USA | 3 |
Using Query
The query()
function queries the dataframe using a boolean expression. You can use the query()
to filter rows from the dataframe using an SQL-like query.
To filter rows by column values in a list, you can pass the list of values using the in
clause in the query method. For example, column_name in (list_of_values)
.
Use this method when you want to select rows using the SQL like expression.
Code
The following code demonstrates how to select the rows with country_code in 1 and 3 and the query method.
df.query('Country_Code in (1,3)')
DataFrame Will Look Like
RegNo. | First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|---|
0 | 1 | Shivam | Pandey | India | 1 |
1 | 2 | Kumar | Ram | India | 1 |
4 | 5 | Praveen | Dinakar | USA | 3 |
Using Query and @List object
In some cases, you might have the list object with values in it.
In this case, you can specify the list name in the query method using the @
annotation instead of explicitly specifying the list elements.
Use this method when you want to use the already available list object in the query method.
Code
The following code demonstrates how to use the list_name directly in the query()
method.
lst = [1,3]
df.query('Country_Code in @lst')
DataFrame Will Look Like
RegNo. | First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|---|
0 | 1 | Shivam | Pandey | India | 1 |
1 | 2 | Kumar | Ram | India | 1 |
4 | 5 | Praveen | Dinakar | USA | 3 |
Using LOC
The pandas dataframes are indexed by default. It allows you to identify/select rows easily and get and set subsets of data in the dataframe.
If you want to filter rows from the dataframe multiple times based on a single column, then it is useful and efficient to set that specific column as an index of the dataframe.
- You can set the index of the dataframe using the set_index() method.
- Once the index is set, you can select/filter rows from the dataframe using the
loc[]
attribute and list of values. It selects the rows with that specific indexes.
Use this method when you want to select rows from the dataframe multiple times using the same column name.
Code
The following codes set the Country_code as the index and select the rows with country_code 1
and 2
as an index.
df.set_index('Country_Code', inplace=True)
df.loc[[1,2]]
DataFrame Will Look Like
RegNo. | First Name | Last Name | Country | |
---|---|---|---|---|
Country_Code | ||||
1 | 1 | Shivam | Pandey | India |
1 | 2 | Kumar | Ram | India |
2 | 3 | Felix | John | Germany |
2 | 4 | Michael | Arndtz | Germany |
Filter Rows by Column Values Not in A list
To filter rows by column value that is not in the list, you can use the isin()
method and the ~
tilde symbol that will inverse the result by filtering the values that are not in the list.
Code
The following code demonstrates how to use the ~
tilde symbol to filter rows by column value that is not in the list.
df[~df['Country_Code'].isin([1,3])]
DataFrame Will Look Like
RegNo. | First Name | Last Name | Country | Country_Code | |
---|---|---|---|---|---|
2 | 3 | Felix | John | Germany | 2 |
3 | 4 | Michael | Arndtz | Germany | 2 |