Pandas Dataframe is a two-dimensional data structure that allows you to store data in a row and column format. When storing data in row and column format, you may need to name the columns for better identification and ease of accessing the data. Sometimes, the first row of the dataframe will be having the column header information.
You can replace the header with the first row of the dataframe by using df.columns = df.iloc[0].
If you’re in Hurry
You can use the below code snippet to replace the header with the first row of the pandas dataframe.
Snippet
df.columns = df.iloc[0]
df = df[1:]
df.head()
While reading Data from CSV File
Snippet
import pandas as pd
df= pd.read_csv('iris.csv', header=[0])
df.head()
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods available to replace the header with the first row and set the first two rows as multiple headers as pandas.
If you want to add a new header that doesn’t exist in the dataframe, refer to How to Add Header To Pandas Dataframe.
Table of Contents
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
You’ll first create a dataframe using the iris data. iris
is having a list of tuples where each tuple is having sepal_length, sepal_width, petal_length, petal_width and the flower_type which denotes the category of the flower based on sepal and petal measurements.
Here, the column headers are also directly available in the list, hence the pd.DataFrame()
method will consider it as just another row and create a dataframe with the index numbers as column headers as shown below.
Snippet
import pandas as pd
iris = [ ('sepal_length', 'sepal_width', 'petal_length','petal_width', 'flower_type'),
('spl_len(cm)', 'spl_wid(cm)' , 'petal_len(cm)', 'petal_wid(cm)','flower_type'),
(5.1,3.5,1.4,0.2,'Iris-setosa'),
(4.9,3,1.4,0.2,'Iris-setosa'),
(4.7,3.2,1.3,0.2,'Iris-setosa'),
(4.6,3.1,1.5,0.2,'Iris-setosa'),
(5,3.6,1.4,0.2,'Iris-setosa')
]
#Create a DataFrame object with iris Data
df = pd.DataFrame(iris)
df.head(5)
When you print the dataframe, you can see that the numbers are available as column headers and the column names are available as rows separately.
Dataframe Looks Like
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | sepal_length | sepal_width | petal_length | petal_width | flower_type |
1 | spl_len(cm) | spl_wid(cm) | petal_len(cm) | petal_wid(cm) | flower_type |
2 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
3 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
4 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
Now, you’ll see how to replace the header of the pandas dataframe with the First row.
Pandas Replace Header With First Row
When the column headers are available in the first row of the dataframe, you can make that information as a column header and remove it from the dataframe rows.
There are two methods available for it.
- Using the Slicing operator
- Using the iLOC
Let’s see these methods in detail.
Using Slicing Operator to Replace Header With First Row
The slicing operator is used to slice the rows of a dataframe from a specific index.
For example, if you want to slice the rows beginning from the index 1
, you can use the df[1:]
statement.
where,
1
denotes the beginning index of the rows to be sliced:
used to denote the range. If you want to slice until a specific row, you can use that index after the:
. Otherwise, you can just use the:
. This means all the rows until the end will be sliced.
In the below snippet, the following operations happen.
- First row of the dataframe is assigned to the df.columns using the df.iloc[0] statement
- Next, the dataframe is sliced from the second row using its index 1 and assigned to the dataframe index. This will remove the first row with index
0
from the dataframe - With these steps, the header of the dataframe is replaced with the first row of the dataframe.
This method will not reset the index of the rows. The header row will have index 0, the first row will have index 1 and the second row will have index 2, and so on.
Snippet
df.columns = df.iloc[0]
df = df[1:]
df.head()
When you print the dataframe, you’ll see that the first row of the dataframe is made as the header of the pandas dataframe.
Dataframe Looks Like
0 | sepal_length | sepal_width | petal_length | petal_width | flower_type |
---|---|---|---|---|---|
1 | spl_len(cm) | spl_wid(cm) | petal_len(cm) | petal_wid(cm) | flower_type |
2 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
3 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
4 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
5 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
Using df.reset_index() to Replace Header With First Row
In this section, you’ll learn how to replace the header with the first row of the dataframe.
Similar to the previous section, first assign the first row to the dataframe columns using the df.columns = df.iloc[0]
.
Next, slice the dataframe from the first row using the iloc[1:] and reset its row index using the reset_index() method.
The statement drop=True
will drop the first row as you have already made that as the header column.
This method will reset the index of the rows. The header row will not have an index and the first row will have an index 0
and the second row will have an index 1
and so on.
Snippet
df.columns = df.iloc[0]
df = df.iloc[1:].reset_index(drop=True)
df.head()
Dataframe Looks Like
sepal_length | sepal_width | petal_length | petal_width | flower_type | |
---|---|---|---|---|---|
0 | spl_len(cm) | spl_wid(cm) | petal_len(cm) | petal_wid(cm) | flower_type |
1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
Next, you’ll learn how to set the first two rows as headers.
Pandas Set First Two rows as Header
Pandas dataframe supports having multiple headers for each column. In this section, you’ll learn how to set the first two rows as the header. When you use this method, the pandas dataframe will have multiple header rows.
Similar to setting the first row as header, you can set the first two rows as a header by assigning the first two rows to the df.columns
attribute using the statement df.columns = [df.iloc[0], df.iloc[1]]
.
After that, you can remove the first two rows from the dataframe by slicing the dataframe from the third row using the df[2:]
.
If you want to reset the index, you can use the reset_index()
method while setting two rows as header.
Use the below snippet to set the first two rows as header rows from the dataframe.
Snippet
df.columns = [df.iloc[0], df.iloc[1]]
df = df[2:]
df.head()
When you print the dataframe using the df.head()
method, you can see that the pandas dataframe is having two column headers for each column.
Dataframe Looks Like
sepal_length | sepal_width | petal_length | petal_width | flower_type | |
---|---|---|---|---|---|
1 | spl_len(cm) | spl_wid(cm) | petal_len(cm) | petal_wid(cm) | flower_type |
2 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
3 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
4 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
5 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
6 | 5 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Pandas Replace Header With nth Row
If you have the potential headers at any of the header rows, you can replace the header with the nth row.
Just use the index of that specific row in the place of i
in df.iloc[i]
statement.
Pandas Set First Row as Header While Reading CSV
In this section, you’ll learn how to set the first row as a header while reading the data from a CSV file using the read_csv method.
The read_csv()
method accepts the parameter header
. You can pass header=[0]
to make the first row from the CSV file as a header of the dataframe.
Use the below snippet to set the first row as a header while reading the CSV file to create the dataframe.
Snippet
import pandas as pd
df= pd.read_csv('iris.csv', header=[0])
df.head()
When printing the dataframe, you can see that the first row from the CSV file is set as the header of the dataframe.
Dataframe Looks Like
no | sepal_length | sepal_width | petal_length | petal_width | flower_type | |
---|---|---|---|---|---|---|
0 | no | spl_len(cm) | spl_wid(cm) | petal_len(cm) | peral_wid(cm) | flower |
1 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
2 | 2 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
3 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
4 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
Pandas Set Two Rows as Header While Reading CSV
In this section, you’ll learn how to set two rows as a header while reading the data from a CSV file.
The read_csv()
method accepts the parameter header
. You can pass header=[0, 1]
to make the first two rows from the CSV file as a header of the dataframe. Using this way, you can create a dataframe with multiple header rows.
Use the below snippet to set the first two rows as a header while reading the CSV file to create the dataframe.
Snippet
import pandas as pd
df= pd.read_csv('iris.csv', header=[0,1])
df.head()
When you print the dataframe, you can see that the first two rows of the CSV file are made as the header of the dataframe.
Dataframe Looks Like
no | sepal_length | sepal_width | petal_length | petal_width | flower_type | |
---|---|---|---|---|---|---|
no | spl_len(cm) | spl_wid(cm) | petal_len(cm) | peral_wid(cm) | flower | |
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
This is how you can make the first row as the header of the dataframe while reading data from the CSV file.
Conclusion
To summarize, you’ve learned how to replace the header with the first row of the dataframe and setting the first two rows as a header of the dataframe.
Additionally, you’ve also learned how to set the first row as a header while reading data from the CSV file.
If you have any questions, comment below.
Even after I use
df.columns = df.iloc[0]
df = df.iloc[1:].reset_index(drop=True)
my header still have 0 in the first column. How to get rid of that 0?