Pandas dataframe allows you to store data in rows and column format.
You can sort pandas dataframe based on column value using the df.sort_values(by=[‘column value‘], inplace=True) statement.
In this tutorial, you’ll learn how to sort pandas dataframe based on column values in detail.
If You’re in Hurry…
df.sort_values(by=['Column Value'], inplace=True)
df
If You Want to Understand Details, Read on…
You can use the sort_values() method to sort the Pandas dataframe based on one column value or multiple columns. You can also sort it in ascending order or descending order.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
It contains,
- Rows with values for all columns
- Rows with Empty or Missing Data for each column
Code
import pandas as pd
import numpy as np
data = {'Product': ['CPU','Speaker','Keyboard','Mouse', 'Monitor'],
'Price': [22000,2500,3000,1500, np.NaN]
}
df = pd.DataFrame(data, columns=['Product','Price'])
print (df)
Dataframe will Look like
Product Price
0 CPU 22000.0
1 Speaker 2500.0
2 Keyboard 3000.0
3 Mouse 1500.0
4 Monitor NaN
Using the sort_values Method
sort_values()
method allows you to sort the dataframe based on the column passed in the parameter by
.
Parameters
by
– List of column names to sort byinplace
–True
denotes to sort in the same dataframe.False
denotes to sort and return a new dataframe.False
is the default value.
Code
The below code explains how to sort the dataframe based on the Product column and sort it in the source dataframe itself.
df.sort_values(by=['Product'], inplace=True)
df
You can see that the dataframe is sorted based on the Product name.
Dataframe Will Look Like
Product | Price | |
---|---|---|
0 | CPU | 22000.0 |
2 | Keyboard | 3000.0 |
4 | Monitor | NaN |
3 | Mouse | 1500.0 |
1 | Speaker | 2500.0 |
Sort Pandas Dataframe Based on Column Value in Ascending order
In this section, you’ll learn how to sort pandas dataframe based on column values in ascending order.
The sort_values()
method sorts in ascending order by default. Hence there is no additional parameter necessary.
Alternatively, you can also pass the ascending=True
parameter.
Code
The code below demonstrates how to sort the dataframe based on the Product column in ascending order.
df.sort_values(by=['Product'], inplace=True, ascending=True)
df
Dataframe Will Look Like
Product | Price | |
---|---|---|
0 | CPU | 22000.0 |
2 | Keyboard | 3000.0 |
4 | Monitor | NaN |
3 | Mouse | 1500.0 |
1 | Speaker | 2500.0 |
Sort Pandas Dataframe in Descending Order
In this section, you’ll learn how to sort pandas dataframe based on column values in descending order.
You can pass the ascending=False
parameter to sort pandas dataframe in descending order.
Code
The code below demonstrates how to sort the dataframe based on the Product column in descending order.
df.sort_values(by=['Product'], inplace=True, ascending=False)
df
Dataframe Will Look Like
Product | Price | |
---|---|---|
1 | Speaker | 2500.0 |
3 | Mouse | 1500.0 |
4 | Monitor | NaN |
2 | Keyboard | 3000.0 |
0 | CPU | 22000.0 |
Sort Pandas Dataframe Based on Multiple Columns
In this section, you’ll learn how to sort pandas dataframe based on multiple columns.
You can pass the multiple columns as a list to the by
parameter.
Code
The code below demonstrates how to sort the dataframe based on the Price, Product column in ascending order.
df.sort_values(by=['Price','Product'], inplace=True)
df
You can see that the dataframe is sorted based on the Price and Product columns in ascending order.
Dataframe Will Look Like
Product | Price | |
---|---|---|
3 | Mouse | 1500.0 |
1 | Speaker | 2500.0 |
2 | Keyboard | 3000.0 |
0 | CPU | 22000.0 |
4 | Monitor | NaN |
This is how you can sort pandas dataframe based on multiple columns.
Sort NaN Values First
In this section, you’ll learn how to handle the missing values while sorting the pandas dataframe.
You can either bring the NaN
values to the top of the dataframe while sorting or push them to the end.
- To check if the Pandas dataframe contains missing values, read How to check if any value is NaN in a Pandas DataFrame
- To count the number of missing values in the Pandas dataframe, read How To Count Nan Values In Pandas Dataframe
If the dataframe contains a missing value, you can handle missing values using the na_position
parameter.
It accepts two values.
first
– Brings Missing values to the toplast
– Pushes missing values to the end. This is the default.
Code
The code below demonstrates how to sort the dataframe and bring the NaN values to the top while sorting.
df.sort_values(by=['Price'], na_position='first', inplace=True)
df
The sample dataframe contains one NaN
value. This is brought to the top of the dataframe while sorting.
Dataframe Will Look Like
Product | Price | |
---|---|---|
4 | Monitor | NaN |
3 | Mouse | 1500.0 |
1 | Speaker | 2500.0 |
2 | Keyboard | 3000.0 |
0 | CPU | 22000.0 |
This is how you can handle the missing values while sorting the pandas dataframe.
Reset Index While Sorting
This section will teach you how to handle the row index while sorting the Pandas dataframe.
The row index is not changed when you sort the pandas dataframe using the sort_values()
method. The actual row index will be retained.
You can use the optional ignore_index
parameter to handle the indexes while sorting.
It accepts two values.
ignore_index = False
leaves the indexes unchanged. This is the default behaviour.ignore_index = True
updates the index of the rows while sorting
Code
The below code demonstrates how to leave the indexes of the rows unchanged using the ignore_index=False
parameter.
df2 = df.sort_values(by=['Price'], na_position='first', ignore_index=False)
df2
You can see that the row indexes remain unchanged.
Dataframe Will Look Like
Product | Price | |
---|---|---|
4 | Monitor | NaN |
3 | Mouse | 1500.0 |
1 | Speaker | 2500.0 |
2 | Keyboard | 3000.0 |
0 | CPU | 22000.0 |
Code
When you use the parameter ignore_index=True
, the row index will be reset.
df2 = df.sort_values(by=['Price'], na_position='first', ignore_index=True)
df2
The row indexes are updated while sorting.
Dataframe Will Look Like
Product | Price | |
---|---|---|
0 | Monitor | NaN |
1 | Mouse | 1500.0 |
2 | Speaker | 2500.0 |
3 | Keyboard | 3000.0 |
4 | CPU | 22000.0 |
Using reset_index Method
If you sort the dataframe and have the row indexes in an unordered way, you can reset the index using the reset_index() method.
The inplace=True
parameter updates the index in the same dataframe instead of creating a new dataframe.
Code
df2.reset_index(inplace=True)
df2
Dataframe Will Look Like
index | Product | Price | |
---|---|---|---|
0 | 4 | Monitor | NaN |
1 | 3 | Mouse | 1500.0 |
2 | 1 | Speaker | 2500.0 |
3 | 2 | Keyboard | 3000.0 |
4 | 0 | CPU | 22000.0 |
Conclusion
You’ve learned how to sort pandas dataframe based on column values, based on multiple column values. Also learnt how to sort in ascending order and descending order.
Additionally, you’ve learned how to handle missing values and the row index while sorting.