Pandas Dataframe stores data in a two-dimensional format. You need to select columns from Dataframe for various data analysis purposes. Selecting columns is also known as selecting a subset of columns from the dataframe.
You can select columns from Pandas Dataframe using the df.loc[:,’column_name’] statement.
If you’re in Hurry
You can use the below code snippet to select columns from the dataframe.
Snippet
df.loc[:,'Product_Name']
Where,
:
– denotes all the rows that must be selected'Product_Name'
– Denotes the name of the column that must be selected
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
This is how you can select columns from the dataframe.
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods available to select columns from the dataframe and how they can be used in different scenarios.
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
It contains,
- Rows with values for all columns
- Rows with Empty or Missing Data for each column
- One Duplicate row
- One column in the sample dataframe is of
float
type - One Empty column which will be used to select the empty columns
Snippet
import pandas as pd
data = {"Product_Name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers","Headset"],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
"Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT],
"Remarks":[pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT,pd.NaT]
}
df = pd.DataFrame(data)
df = df.astype({"Unit_Price": float})
df
The sample dataframe will look like follows.
DataFrame Will Look Like
Product_Name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | Buffer Column | |
---|---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 | NaT |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 | NaT |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 | NaT |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 | NaT |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 | NaT |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 | NaT |
6 | Headset | NaN | NaT | NaT | NaT | NaT |
Now, you’ll learn how to select columns from the dataframe using different methods.
Select Columns from Pandas Dataframe
You can select columns from the pandas dataframe using three different methods.
- Using
Loc
- Using
iLoc
- Using
df.columns
Using Loc pandas
You can select a column from the pandas dataframe using the loc property available in the dataframe.
It is used to locate the rows or columns from the dataframe based on the name passed. It is also called slicing the columns based on the column names.
It accepts row index and column names to be selected.
- First, select only columns, you can just use
:
in place of rows which will select all rows. - Second, you can pass the column names to be selected.
Note: Column names are case-sensitive. Hence, pass the column names as the same case in the dataframe. Otherwise, you’ll face the key error.
Use the below snippet to select the column from the dataframe using loc
.
Snippet
df.loc[:,'Product_Name']
The Product_Name column is selected from the dataframe and displayed as shown below. You can also assign it to a new dataframe if you want to store it for further activities.
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
This is how you can use the loc
property to select columns.
Using iloc Pandas
You can select columns from the dataframe using iloc property available in the dataframe.
It is used to locate the rows or columns from the dataframe based on the index passed. It is also called slicing the columns based on the indexes.
It accepts row index and column index to be selected.
- First, select only columns, you can just use
:
in place of rows which will select all rows. - Second, you can pass the column indexes to be selected.
Use the below snippet to select the column from the dataframe using iloc
.
Snippet
df.iloc[:,0]
The first column at the index 0
– Product_Name is selected from the dataframe and displayed as shown below.
You can also assign it to a new dataframe if you want to store it for further activities.
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
This is also known as indexing and selecting data from pandas dataframe.
Using DF.Columns
You can also select columns using the columns[] property. This method returns the list of columns for the indexes passed.
For example, if you pass, df.columns[0]
then it’ll return the first column.
Use the below snippet to select the columns from the dataframe using the df.columns
attribute.
Snippet
df[df.columns[0]]
The first column is Product_Name. Hence it’ll be selected and displayed.
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
This is how you can select columns using the df.columns
attribute.
You’ve seen the methods available to select columns. Now, you’ll see how it can be used in the different use-cases.
Select Columns by Name
You can select columns by name from the dataframe by passing the column names as a list to the dataframe object.
Use the below snippet to select only two columns Product_Name and Unit_Price as a subset from the dataframe.
Snippet
df[['Product_Name', 'Unit_Price']]
When you execute the script, you’ll see the subset dataframe as follows.
DataFrame Will Look Like
Product_Name | Unit_Price | |
---|---|---|
0 | Keyboard | 500.000 |
1 | Mouse | 200.000 |
2 | Monitor | 5000.235 |
3 | CPU | 10000.550 |
4 | CPU | 10000.550 |
5 | Speakers | 250.500 |
6 | Headset | NaN |
This is how you can get columns by name from the pandas dataframe.
Select Columns by Index
You can select columns by index from the dataframe by using the iloc attribute of the dataframe.
Use the below snippet to select only the column at index position 0
.
Snippet
df.iloc[:,0]
You’ll see the column at the index position 0
selected and displayed as shown below.
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
You can also get column by index using df.columns
by passing the index as shown below.
Snippet
df[df.columns[0]]
You’ll see the column at the index position 0
selected and displayed as shown below.
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
This is how you can select columns by index from pandas dataframe.
Select Columns by Condition
You can select columns by condition by using the df.loc[]
attribute and specifying the condition for selecting the columns.
Use the below snippet to select columns that have a value 5
in any row.
(df == 5).any()
evaluates each cell and finds the columns which have a value 5
in any of the cells.
Snippet
df.loc[: , (df == 5).any()]
Since the two columns, No_Of_Units and Available_Quantity have value 5 in them, it is displayed as follows.
You can also assign it to a dataframe if you want this subset to be saved for later use.
DataFrame Will Look Like
No_Of_Units | Available_Quantity | |
---|---|---|
0 | 5 | 5 |
1 | 5 | 6 |
2 | 10 | 10 |
3 | 20 | Not Available |
4 | 20 | Not Available |
5 | 8 | NaT |
6 | NaT | NaT |
This is how you can select columns based on a condition.
Select Multiple Columns
You can select multiple columns by labels by passing the column labels as a list to the dataframe object.
Use the below snippet to select multiple columns from the dataframe.
Snippet
df[['No_Of_Units','Available_Quantity']]
You’ll see the columns No_Of_Units, Available_Quantity selected as subsets as shown below.
DataFrame Will Look Like
No_Of_Units | Available_Quantity | |
---|---|---|
0 | 5 | 5 |
1 | 5 | 6 |
2 | 10 | 10 |
3 | 20 | Not Available |
4 | 20 | Not Available |
5 | 8 | NaT |
6 | NaT | NaT |
This is how you can get multiple columns from the dataframe as a subset.
Select Range of Columns Using Name
You can select a range of columns using the name by passing the columns range separated by :
in the loc attribute.
Use the below snippet to select columns from No_Of_Units to Available_Since_Date.
Snippet
df.loc[: , 'No_Of_Units':'Available_Since_Date']
There are three columns in the specified range. Hence, those three columns will be selected as a subset from the dataframe.
DataFrame Will Look Like
No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|
0 | 5 | 5 | 11/5/2021 |
1 | 5 | 6 | 4/23/2021 |
2 | 10 | 10 | 08/21/2021 |
3 | 20 | Not Available | 09/18/2021 |
4 | 20 | Not Available | 09/18/2021 |
5 | 8 | NaT | 01/05/2021 |
6 | NaT | NaT | NaT |
This is how you can get a range of columns using names.
Select Range of Columns Using Index
You can select a range of columns using the index by passing the index range separated by :
in the iloc attribute.
Use the below snippet to select columns from 2
to 4
. The beginning index is inclusive and the end index is exclusive. Hence, you’ll see the columns at the index 2
and 3
.
Snippet
df.iloc[: , 2:4]
You’ll see the columns at the index position 2
and 3
as shown below.
DataFrame Will Look Like
No_Of_Units | Available_Quantity | |
---|---|---|
0 | 5 | 5 |
1 | 5 | 6 |
2 | 10 | 10 |
3 | 20 | Not Available |
4 | 20 | Not Available |
5 | 8 | NaT |
6 | NaT | NaT |
This is how you can get a range of columns using the index.
Select First Column
You can select the first column from the dataframe using df.iloc[:,0]
.
Use the below snippet to select the first column from the dataframe.
:
– Denotes all rows that must be selected0
– Denotes only the first column must be selected.
Snippet
df.iloc[:,0]
You’ll see the values of the first column printed as shown below.
Output
0 Keyboard
1 Mouse
2 Monitor
3 CPU
4 CPU
5 Speakers
6 Headset
Name: Product_Name, dtype: object
This is how you can select the first column from the pandas dataframe.
Select Last Column
You can select the last column from the dataframe using df.iloc[:,-1:]
.
Use the below snippet to select the first column from the dataframe.
:
– Denotes all rows that must be selected-1:
– Denotes only the last column must be selected.
Snippet
df.iloc[:,-1:]
You’ll see the last column displayed as a dataframe as shown below.
DataFrame Will Look Like
Remarks | |
---|---|
0 | NaT |
1 | NaT |
2 | NaT |
3 | NaT |
4 | NaT |
5 | NaT |
6 | NaT |
This is how you can get the last column of the dataframe.
Select Columns With Missing Values
You can select columns with missing values by using the df.isnull().any() in the df.columns[] attribute.
This will return the list of columns which has a missing value in it to df[]
. Then, it’ll return the columns with the missing value.
You can use it to print columns with missing values or impute values for the missing values.
Snippet
df[df.columns[df.isnull().any()]]
In the sample dataframe, all the columns except Product_Name is having missing values. Hence, all those columns will be printed as shown below.
DataFrame Will Look Like
Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | Buffer Column | |
---|---|---|---|---|---|
0 | 500.000 | 5 | 5 | 11/5/2021 | NaT |
1 | 200.000 | 5 | 6 | 4/23/2021 | NaT |
2 | 5000.235 | 10 | 10 | 08/21/2021 | NaT |
3 | 10000.550 | 20 | Not Available | 09/18/2021 | NaT |
4 | 10000.550 | 20 | Not Available | 09/18/2021 | NaT |
5 | 250.500 | 8 | NaT | 01/05/2021 | NaT |
6 | NaN | NaT | NaT | NaT | NaT |
This is how you can get columns with missing values to impute new values to them.
Select Columns Without Missing Values
You can select columns without missing values using the ~df.isnull().any()
in the df.columns
attribute.
In this way, you can select all columns without NaN or missing values.
The tilde ~
symbol inverses the flag which is returned after checking if any of the values is null.
If any of the value is null
, df.isnull().any()
returns True
. The tilde symbol inverses this and makes it False
. Hence columns with missing values will not be added.
Snippet
df[df.columns[~df.isnull().any()]]
The Product_Name is the only column that doesn’t have any missing value of null value. Hence that column will be displayed as shown below.
DataFrame Will Look Like
Product_Name | |
---|---|
0 | Keyboard |
1 | Mouse |
2 | Monitor |
3 | CPU |
4 | CPU |
5 | Speakers |
6 | Headset |
This is how you can get columns without any missing values.
Select Empty Columns
You can select empty columns by using df.isnull().all()
in the df.columns
attribute.
isnull().all()
checks if all the cells of the column is null
. If yes, then it returns True
.
Use the below snippet to get the empty columns.
Snippet
df[df.columns[df.isnull().all()]]
Remarks is the only column without values for any of the cells in it. Hence, it’ll be printed as below.
DataFrame Will Look Like
Remarks | |
---|---|
0 | NaT |
1 | NaT |
2 | NaT |
3 | NaT |
4 | NaT |
5 | NaT |
6 | NaT |
This is how you can get the empty columns.
Conclusion
To summarize, you’ve learned how to select columns from pandas dataframe using loc
, iloc
or df.columns
.
You’ve learned how this can be used to select various subsets of columns from the dataframe such as selecting the first column, selecting last columns, selecting columns by name or index, and so on.
If you have any questions, comment below.