Create a Pandas Dataframe From a Numpy Array and Specify Index Column and Column Headers – With Examples

NumPy arrays typically do not have index column and column headers in them.

You can use the index and columns parameters with the pd.DataFrame() method to create a pandas dataframe from Numpy Array and specify index column and column headers.

This tutorial teaches you how to set the index column and column headers while creating a pandas dataframe from a NumPy array.

Using Index And Columns Attribute

The pd.Dataframe() constructor supports multiple parameters to set different dataframe attributes while creating it.

  • Use the index attribute and pass the list of indexes to set as an index column.
  • Use the columns attribute and pass the list of column names to set as column names.

The size of the index list must be equal to the number of rows in the NumPy array, and the size of the columns list must be equal to the number of columns in the NumPy array. Otherwise, you’ll get the ValueError Shape of passed values is (X, X), indices imply (Y, Z).

Code

The following code demonstrates how to use the index and the columns attributes.

import numpy as np

import pandas as pd

array = np.random.rand(3, 3)

indexes = [0,1,2]

headers = ['Column 1', 'Column 2', 'Column 3']

df = pd.DataFrame(array,
                  index = indexes, 
                  columns = headers)

df

DataFrame Will Look Like

The dataframe will have indexes and column headers as passed in the parameters.

Column 1Column 2Column 3
00.2140150.7297260.677093
10.6006060.0571730.524019
20.3293870.1074790.411816

Using First Row From NumPy Array as Headers

In some cases, the NumPy array might have header information in the first row of a two-dimensional array.

In this case, you can

  • use columns=numpyArray[0, 1:] to use the first row as a header.
  • use data = numpyArray[1:, 1:] to use the data from the second row as the data for the dataframe.

Code

The following code demonstrates how to use the first row in the NumPy array as a header of the pandas dataframe.

import numpy as np

import pandas as pd

numpyArray = np.array([['', 'Column 1', 'Column 2', 'Column 3'],
                       ['0', 5, 10, 15],
                       ['1', 20, 25, 30],
                       ['2', 35, 40, 45]
                      ])

indexes = [0,1,2]

df = pd.DataFrame(data = numpyArray[1:, 1:],
                  index = indexes, 
                  columns = numpyArray[0, 1:])

df

DataFrame Will Look Like

The first row from the NumPy array is set as the Column Header.

Column 1Column 2Column 3
051015
1202530
2354045

Using First Column From NumPy Array As Column Indexes

In some cases, the NumPy array might have column index information in the first column of a two-dimensional array.

In this case, you can

  • use index = numpyArray[0:, 0] to use the first row as an index column.
  • use data = numpyArray[0:, 1:] to use the data from the second column as the data for the dataframe.

Code

The following code demonstrates how to use the first column in the NumPy array as an index column of the pandas dataframe.

import numpy as np

import pandas as pd

numpyArray = np.array([
                       ['0', 5, 10, 15],
                       ['1', 20, 25, 30],
                       ['2', 35, 40, 45]
                      ])

headers = ['Column 1', 'Column 2', 'Column 3']

df = pd.DataFrame(data = numpyArray[0:, 1:],
                  index = numpyArray[0:, 0], 
                  columns = headers)

df

DataFrame Will Look Like

Column 1Column 2Column 3
051015
1202530
2354045

Additional Resources

Leave a Comment