When cleaning data for machine learning, you need to find if any value is NaN in the dataset.
You can check if any value is NaN in pandas dataframe using df.isna().values.any() statement.
Basic Example
df.isna().values.any()
Output
True
In this tutorial, you’ll learn how to check if any value is NaN in a pandas dataframe.
Dataset may contain missing values. Missing values are denoted using pd.Nat
or None
.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
It contains,
- Rows with values for all columns
- Rows with Empty or Missing Data for each column
- Rows with Empty or Missing data for all columns
- One Duplicate row
- One column in the sample dataframe is of the
float
type.
Code
import pandas as pd
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
"Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]
}
df = pd.DataFrame(data)
df = df.astype({"Unit_Price": float})
df
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |
You’ll use this dataframe to check if any value is missing.
Using isna()
The isna() method checks if any value in the dataframe is missing.
It returns a mask of True
or False
for each cell of the dataframe based on the missing value.
True
denotes missing valuesFalse
denotes the available values
Code
df.isna()
Each cell will have a value of True
or False
.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | False | False | False | False | False |
1 | False | False | False | False | False |
2 | False | False | False | False | False |
3 | False | False | False | False | False |
4 | False | False | False | False | False |
5 | False | False | False | True | False |
6 | True | True | True | True | True |
Using isnull()
The isnull() method checks if any value in the dataframe is missing.
isnull()
is also similar to isna()
method.
It returns a mask of True
or False
for each cell of the dataframe based on the missing value.
True
denotes missing valuesFalse
denotes the available values
Code
df.isnull()
Each cell will have a value of True
or False
.
Dataframe Will Look like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | False | False | False | False | False |
1 | False | False | False | False | False |
2 | False | False | False | False | False |
3 | False | False | False | False | False |
4 | False | False | False | False | False |
5 | False | False | False | True | False |
6 | True | True | True | True | True |
In different use cases, you’ll see how to use the isna()
method or isnull()
method.
Check if Any Value is NaN in Single Column
Use the isnull()
method with the any() method to check if any value in the specific column is null or not.
- If ANY of the values is missing, it returns a single
True
.
Code
The code below demonstrates how to check if there are any missing values in the column Unit_Price.
df['Unit_Price'].isnull().values.any()
Since the Unit_Price column contains missing values, you’ll see the output True
.
Output
True
Check if Any Value is NaN in Multiple Columns
Use the isnull()
method with the any() method to check if any values in the multiple columns are null or not.
- Pass the multiple columns as a list and select the subset of those specific columns.
- Use the
isna()
method to check if any value is missing in those particular columns.
If ANY of the values is missing, it returns a single True
.
df[['Unit_Price','product_name']].isna().values.any()
Since the columns Unit_Price and product_name contain missing values, you’ll see the output True
.
Output
True
Check if Any value is NaN in Entire Dataframe
To check if any value is NaN in the entire dataframe,
- Apply the
isna()
and theany()
method to the dataframedf
Code
df.isna().values.any()
Output
True
Find Rows with NaN in a column
In this section, you’ll learn how to select rows with missing values in a specific column.
- Select the subset of the specific column and apply the
isna()
method. - It will return a mask that denotes the rows with missing values.
- Using the mask, retrieve the rows
Code
The code below demonstrates how to select rows with missing values in the column Available_Quantity.
Code
df[df['Available_Quantity'].isna()]
There are two rows where the Available_Quantity column has missing values. Those two rows will be selected and displayed.
Dataframe Will Look Like
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
5 | Speakers | 250.5 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |