There may be some missing values in the dataset that need a cleanup.
You can find columns with NaN values in the pandas Dataframe using df.isna().any() statement.
If you’re in Hurry
Use the following code to find columns with NaN
values.
df.isna().any()
If You Want to Understand Details, Read on…
This tutorial teaches you how to get columns with NaN values in the Pandas dataframe.
The missing values are denoted using Np.NaN
or None
.
Table of Contents
Sample Dataframe
The following dataframe is created with missing values in a few columns.
import pandas as pd
import numpy as np
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
"Unit_Price":[500,200, 5000, np.nan, 250.50],
"No_Of_Units":[5,5, 10, np.nan, 8],
"Available_Quantity":[5,10,12,15, "Not Available"],
"Available_Since_Date":['11/5/2021', '4/23/2021', np.nan,'09/18/2021','01/05/2021']
}
df = pd.DataFrame(data)
df
DataFrame Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5.0 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5.0 | 10 | 4/23/2021 |
2 | Monitor | 5000.0 | 10.0 | 12 | NaN |
3 | CPU | NaN | NaN | 15 | 09/18/2021 |
4 | Speakers | 250.5 | 8.0 | Not Available | 01/05/2021 |
Using ISNA
You can find the columns that contain a missing value in any of the cells using df.isna() and any().
- The
isna()
method checks if a value is missing in a column - The
any()
function will check if one or more values are missing and returnsTrue
orFalse
based on the result.
Code
df.isna().any()
Output
True
denotes that the specific columns contain one or more missing values.
product_name False
Unit_Price True
No_Of_Units True
Available_Quantity False
Available_Since_Date True
dtype: bool
Using IsNull
Using the isnull()
method is an alternate method to check whether the value is missing.
Technically there is no difference between isna and isnull. It is just an alias. It uses the isna()
method internally.
Code
df.isnull().any()
Output
product_name False
Unit_Price True
No_Of_Units True
Available_Quantity False
Available_Since_Date True
dtype: bool
Get Columns With NaN as List
To get the pandas dataframe columns with NaN as a list,
- Use the
df.columns
anddf.isna().any()
to get the column names - Use the
tolist()
method to convert the names into the list.
Code
df.columns[df.isna().any()].tolist()
Output
The columns with NaN values are returned as a list.
[‘Unit_Price', 'No_Of_Units', 'Available_Since_Date']
Find Columns With All NaN Values
Some columns in the pandas dataframe might contain missing values in all cells.
To find the columns in which all values are missing, you can use the isna()
and the all() function.
Code
df.isna().all()
Output
None of the columns in the sample dataframe contains missing values in all cells. Hence, everything is False
in the output.
product_name False
Unit_Price False
No_Of_Units False
Available_Quantity False
Available_Since_Date False
dtype: bool
Select Columns With NaN values
To select the columns with NaN values, instead of just selecting the column names, you can pass the list of columns to the df[ ]
.
In this way, you can see all the values in the columns and identify which cells contain missing values.
Code
df[df.columns[df.isna().any()]]
DataFrame Will Look Like
The columns with NaN values are selected from the dataframe and displayed.
Unit_Price | No_Of_Units | Available_Since_Date | |
---|---|---|---|
0 | 500.0 | 5.0 | 11/5/2021 |
1 | 200.0 | 5.0 | 4/23/2021 |
2 | 5000.0 | 10.0 | NaN |
3 | NaN | NaN | 09/18/2021 |
4 | 250.5 | 8.0 | 01/05/2021 |