How to Read An Excel File in Pandas – With Examples

Pandas allow you to read excel files with a single line of code.

You can read an excel file in Pandas using the pd.read_excel() method.

In this tutorial, you’ll learn how to read an excel file and handle header information in the excel file.

If You’re in Hurry…

You can use the pd.read_excel() method to read an excel file in Pandas. The first sheet in the excel file will be read if no sheet name is specified.

code

import pandas as pd

df = pd.read_excel(
     "testExcel.xlsx"
)

df

The excel file is read, and a dataframe is created.

If You Want to Understand Details, Read on…

While reading an excel, you may need to read or ignore the headers. Also, you may need to skip a few rows that may be irrelevant to your application.

In this detailed tutorial, you’ll learn how to read an excel file along with headers and ignore rows while reading the excel file.

Using read_excel()

You can use the read_excel() method in Pandas to read an excel file.

It accepts the file path as a parameter.

When using this method, the excel file will be read, and a dataframe will be created.

The code below demonstrates how the read_excel() method reads the excel file textExcel.xlsx and assigns it to the dataframe df.

Code

import pandas as pd

df = pd.read_excel(
     "testExcel.xlsx"
)

df

Dataframe Will Look Like

Column 1Column 2
0C1 Row1C2 Row1
1C1 Row2C2 Row2
2C1 Row3C2 Row3
3C1 Row4C2 Row 4
4C1 Row5C2 Row 5

This is how the read_excel() method is used to read the excel file using Pandas.

You can select rows from the dataframe to check its data.

Reading Specific Sheet in Excel Using Read_excel()

You can read a specific sheet in excel file by specifying the sheet name in the sheet_Name parameter.

You can pass the sheet index if your sheets are ordered or you can pass the sheet name directly.

To read multiple sheets from excel in one line, read How To Read Excel With Multiple Sheets In Pandas?

The below code demonstrates how to read a specific sheet uisng the sheet_name parameter.

Code

import pandas as pd

df = pd.read_excel(
     "testExcel.xlsx", sheet_name = 'MY_Sheet_1'
)

df

Read Excel File With Header

This section will teach you how to read an excel file with a Header in Pandas.

You can read an Excel file with a header using the header parameter. This is an optional parameter, and if this parameter is not specified, it’ll default to 0. This means the first row in the excel file will be considered Headers.

The code below demonstrates how to use the read_excel() method to read an excel file with header information.

Code

import pandas as pd

df = pd.read_excel(
     "testExcel.xlsx", header =0)

df

The excel file will be read, and a dataframe will be created with the header information.

Dataframe Will Look like

Column 1Column 2
0C1 Row1C2 Row1
1C1 Row2C2 Row2
2C1 Row3C2 Row3
3C1 Row4C2 Row 4
4C1 Row5C2 Row 5

This is how you can handle the header information using the header parameter in the read_excel() method.

Once the files are read, you can Rename Column in pandas if you want to use custom column headers.

Read Excel File – Skip Rows

This section will teach you how to skip rows while reading an excel file using pandas.

You can use the skiprows parameter to skip the desired number of rows. This is an optional parameter, and it defaults to 0 when not specified. No rows will be skipped when this parameter is not used.

The below code demonstrates how to skip two rows while reading the excel file using Pandas.

Code

import pandas as pd

df = pd.read_excel(
     "testExcel.xlsx", skiprows =2)

df

The below dataframe shows that the first two rows are skipped.

Dataframe Will Look Like

C1 Row2C2 Row2
0C1 Row3C2 Row3
1C1 Row4C2 Row 4
2C1 Row5C2 Row 5

This is how you can skip rows while reading excel files in Pandas.

Read Excel File with OpenPyxl

In this section, you’ll learn how to use the OpenPyxl engine to read the excel file in pandas.

The Pandas version greater than 1.2.0 uses the openpyxl library by default. But when you’re using the earlier version of Pandas (Less than 1.2.0), it uses the xlrd engine to read the excel files. This xlrd engine supports only reading the .xls files..

When you try to read a .xlsx file using the read_excel() method, then it’ll throw xlrd.biffh.XLRDError: Excel xlsx file; not supported Error.

In this case, you need to specify the engine openpyxl to read the .xlsx files.

First, install the library openpyxl engine using the below snippet.

Code

%pip install openpyxl

Prefixing the % in PIP lets you update the packages directly from Jupyter.

Output

    Requirement already satisfied: openpyxl in /opt/anaconda3/lib/python3.9/site-packages (3.0.9)
    Requirement already satisfied: et-xmlfile in /opt/anaconda3/lib/python3.9/site-packages (from openpyxl) (1.1.0)
    Note: you may need to restart the kernel to use updated packages.

The package will be installed successfully. Now you can use this to read an xlsx file with an older pandas version.

The below code demonstrates how to use the openpyxl engine to read the testExcel.xlsx files.

import pandas as pd

df = pd.read_excel(
     "testExcel.xlsx",
     engine='openpyxl')

The excel file will be read successfully, and the dataframe df will be created.

Conclusion

You’ve learned how to read an excel file using the pandas read_excel() method.

Also learned how to handle headers information and skip the unnecessary rows while reading an excel file.

You May Also Like

Leave a Comment