NumPy arrays typically do not have index column and column headers in them.
You can use the index
and columns
parameters with the pd.DataFrame()
method to create a pandas dataframe from Numpy Array and specify index column and column headers.
This tutorial teaches you how to set the index column and column headers while creating a pandas dataframe from a NumPy array.
Table of Contents
Using Index And Columns Attribute
The pd.Dataframe()
constructor supports multiple parameters to set different dataframe attributes while creating it.
- Use the
index
attribute and pass the list of indexes to set as an index column. - Use the
columns
attribute and pass the list of column names to set as column names.
The size of the index list must be equal to the number of rows in the NumPy array, and the size of the columns list must be equal to the number of columns in the NumPy array. Otherwise, you’ll get the ValueError Shape of passed values is (X, X), indices imply (Y, Z).
Code
The following code demonstrates how to use the index
and the columns
attributes.
import numpy as np
import pandas as pd
array = np.random.rand(3, 3)
indexes = [0,1,2]
headers = ['Column 1', 'Column 2', 'Column 3']
df = pd.DataFrame(array,
index = indexes,
columns = headers)
df
DataFrame Will Look Like
The dataframe will have indexes and column headers as passed in the parameters.
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | 0.214015 | 0.729726 | 0.677093 |
1 | 0.600606 | 0.057173 | 0.524019 |
2 | 0.329387 | 0.107479 | 0.411816 |
Using First Row From NumPy Array as Headers
In some cases, the NumPy array might have header information in the first row of a two-dimensional array.
In this case, you can
- use
columns=numpyArray[0, 1:]
to use the first row as a header. - use
data = numpyArray[1:, 1:]
to use the data from the second row as the data for the dataframe.
Code
The following code demonstrates how to use the first row in the NumPy array as a header of the pandas dataframe.
import numpy as np
import pandas as pd
numpyArray = np.array([['', 'Column 1', 'Column 2', 'Column 3'],
['0', 5, 10, 15],
['1', 20, 25, 30],
['2', 35, 40, 45]
])
indexes = [0,1,2]
df = pd.DataFrame(data = numpyArray[1:, 1:],
index = indexes,
columns = numpyArray[0, 1:])
df
DataFrame Will Look Like
The first row from the NumPy array is set as the Column Header.
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | 5 | 10 | 15 |
1 | 20 | 25 | 30 |
2 | 35 | 40 | 45 |
Using First Column From NumPy Array As Column Indexes
In some cases, the NumPy array might have column index information in the first column of a two-dimensional array.
In this case, you can
- use
index = numpyArray[0:, 0]
to use the first row as an index column. - use
data = numpyArray[0:, 1:]
to use the data from the second column as the data for the dataframe.
Code
The following code demonstrates how to use the first column in the NumPy array as an index column of the pandas dataframe.
import numpy as np
import pandas as pd
numpyArray = np.array([
['0', 5, 10, 15],
['1', 20, 25, 30],
['2', 35, 40, 45]
])
headers = ['Column 1', 'Column 2', 'Column 3']
df = pd.DataFrame(data = numpyArray[0:, 1:],
index = numpyArray[0:, 0],
columns = headers)
df
DataFrame Will Look Like
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | 5 | 10 | 15 |
1 | 20 | 25 | 30 |
2 | 35 | 40 | 45 |