Pandas allow you to read excel files with a single line of code.
You can read an excel file in Pandas using the pd.read_excel() method.
Basic Example
- Use the
pd.read_excel()
method to read an excel file in Pandas - The first sheet in the excel file will be read if no sheet name is specified
import pandas as pd
df = pd.read_excel(
"testExcel.xlsx"
)
df
- The excel file is read, and a dataframe is created.
In this tutorial, you’ll learn how to read an excel file and handle header information in the excel file.
Table of Contents
Using read_excel()
The read_excel() method in Pandas reads an excel file.
- It accepts the file path as a parameter
- It reads the excel and creates a dataframe out of the data in excel
Code
The following code reads the excel file textExcel.xlsx and assigns it to the dataframe df
.
import pandas as pd
df = pd.read_excel(
"testExcel.xlsx"
)
df
Dataframe Will Look Like
Column 1 | Column 2 | |
---|---|---|
0 | C1 Row1 | C2 Row1 |
1 | C1 Row2 | C2 Row2 |
2 | C1 Row3 | C2 Row3 |
3 | C1 Row4 | C2 Row 4 |
4 | C1 Row5 | C2 Row 5 |
This is how the read_excel()
method is used to read the excel file using Pandas.
You can select rows from the dataframe to check its data.
Reading Specific Sheet in Excel Using Read_excel()
To read a specific sheet in an excel file
- Specify the sheet name using the
sheet_Name
parameter - Or pass the sheet index if your sheets are ordered, or pass the sheet name directly.
To read multiple sheets from excel in one line, read How To Read Excel With Multiple Sheets In Pandas?
Code
import pandas as pd
df = pd.read_excel(
"testExcel.xlsx", sheet_name = 'MY_Sheet_1'
)
df
Read Excel File With Header
The header parameter in the read_excel()
method allows you to read an excel file with a Header in Pandas.
- This is an optional parameter; if this parameter is not specified, it’ll default to
0
. - The first row in the excel file will be considered Headers.
Code
import pandas as pd
df = pd.read_excel(
"testExcel.xlsx", header =0)
df
The excel file will be read, and a dataframe will be created with the header information.
Dataframe Will Look like
Column 1 | Column 2 | |
---|---|---|
0 | C1 Row1 | C2 Row1 |
1 | C1 Row2 | C2 Row2 |
2 | C1 Row3 | C2 Row3 |
3 | C1 Row4 | C2 Row 4 |
4 | C1 Row5 | C2 Row 5 |
This is how you can handle the header information using the header
parameter in the read_excel()
method.
Once the files are read, you can Rename Column in pandas if you want to use custom column headers.
Read Excel File – Skip Rows
This section will teach you how to skip rows while reading an excel file using Pandas.
The skiprows
parameter skips the desired number of rows.
- This is an optional parameter, and it defaults to
0
when not specified. - No rows will be skipped when this parameter is not used.
Code
import pandas as pd
df = pd.read_excel(
"testExcel.xlsx", skiprows =2)
df
The below dataframe shows that the first two rows are skipped.
Dataframe Will Look Like
C1 Row2 | C2 Row2 | |
---|---|---|
0 | C1 Row3 | C2 Row3 |
1 | C1 Row4 | C2 Row 4 |
2 | C1 Row5 | C2 Row 5 |
This is how you can skip rows while reading excel files in Pandas.
Read Excel File with OpenPyxl
In this section, you’ll learn how to use the OpenPyxl engine to read the excel file in pandas.
- The Pandas version greater than
1.2.0
uses theopenpyxl
library by default. - When using the earlier version of Pandas (Less than 1.2.0), it uses the
xlrd
engine to read the excel files. Thisxlrd
engine supports only reading the.xls
files.
When you read a .xlsx
file using the read_excel()
method, it’ll throw xlrd.biffh.XLRDError: Excel xlsx file; not supported Error. In this case, you need to specify the engine openpyxl
to read the .xlsx
files.
Code
Install the library openpyxl
engine using the following statement.
%pip install openpyxl
Prefixing the %
in PIP lets you update the packages directly from Jupyter.
Output
Requirement already satisfied: openpyxl in /opt/anaconda3/lib/python3.9/site-packages (3.0.9)
Requirement already satisfied: et-xmlfile in /opt/anaconda3/lib/python3.9/site-packages (from openpyxl) (1.1.0)
Note: you may need to restart the kernel to use updated packages.
The package will be installed successfully.
You can use this to read an xlsx
file with an older pandas version.
The following code demonstrates how to use the openpyxl
engine to read the testExcel.xlsx files.
import pandas as pd
df = pd.read_excel(
"testExcel.xlsx",
engine='openpyxl')
The excel file will be read successfully, and the dataframe df will be created.