How to Convert Pandas Dataframe to Numpy Array – With Examples

Numpy arrays provide fast and versatile ways to normalize data that can be used to clean and scale the data during the training of the machine learning models.

You can convert pandas dataframe to numpy array using the df.to_numpy() method.

In this tutorial, you’ll learn how to convert pandas dataframe to numpy array with examples and different conditions.

If you’re in Hurry

You can use the below code snippet to convert pandas dataframe into numpy array.

numpy_array = df.to_numpy()

print(type(numpy_array))

Output

<class 'numpy.ndarray'>

If You Want to Understand Details, Read on…

Lets create a sample dataframe and convert it into a NumPy array.

Sample Dataframe

Create a sample dataframe that you’ll use to convert to a NumPy array. It contains two columns and four rows. Also in one cell, it contains NaN which means a missing value.

Code

import pandas as pd

import numpy as np



data = {'Age': [15,25,35,45],

'Birth Year': [2006,1996,1986, np.NaN]

}



df = pd.DataFrame(data, columns = ['Age','Birth Year'])



df

Dataframe Will Look Like

AgeBirth Year
0152006.0
1251996.0
2351986.0
345NaN

Now, you’ll use this dataframe to convert it into a numpy array.

Using to_numpy()

You can convert a pandas dataframe to a NumPy array using the method to_numpy().

It accepts three optional parameters.

  • dtype – to specify the datatype of the values in the array
  • copycopy=True makes a new copy of the array and copy=False returns just a view of another array. False is default and it’ll return just a view of another array, if it exists.
  • na_value – To specify a value to be used for any missing value in the array. You can pass any value here.

This is an officially recommended method to convert a pandas dataframe into a NumPy array.

Code

When executing the snippet below, the dataframe will be converted into a NumPy array.

  • The missing value will not be replaced with any value because you are not specifying any value to a missing value.
numpy_array = df.to_numpy()

print(numpy_array)

print(type(numpy_array))

Output

[[ 15. 2006.]

[ 25. 1996.]

[ 35. 1986.]

[ 45. nan]]

<class 'numpy.ndarray'>

This is how you can convert a pandas dataframe into a numpy array.

Using dataframe.values

In this section, you’ll convert the dataframe into a NumPy array using df.values. The values method returns the NumPy array representation of the dataframe.

Only the cell values in the dataframe will be returned as an array. row and column axes labels will be removed.

Code

Use the following code to convert the dataframe into a number array.

values_array = df.values

print(values_array)

print(type(values_array))

Output

[[ 15. 2006.]

[ 25. 1996.]

[ 35. 1986.]

[ 45. nan]]

<class 'numpy.ndarray'>

This is how you can convert a dataframe into an numpy array using the values attribute of the dataframe.

Convert Select Columns into Numpy Array

You can convert select columns of a dataframe into an numpy array using the to_numpy() method by passing the column subset of the dataframe.

For example, df[['Age']] will return just the age column. When you invoke the to_numpy() method in the resultant dataframe, you’ll get the numpy array of the age column in the dataframe.

Snippet

age_array = df[['Age']].to_numpy()

print(age_array)

You’ll see the age column as an NumPy array.

Output

[[15]

[25]

[35]

[45]]

This is how you can convert a select column of a pandas dataframe into a numpy array.

Handle Missing Values while converting Dataframe to Numpy Array

In this section, you’ll learn how to handle missing values while converting a pandas dataframe to a numpy array.

You can replace missing values by passing the value to be used in case of missing values using the na_value parameter.

If you use na_value = 0, the missing values will be replaced with 0.

Snippet

In the sample dataframe, you’ve created before there is one missing value for birth year. Now, when you execute the below snippet on the sample dataframe, the missing year will be replaced with 1950.

array = df.to_numpy(na_value='1950')

print(array)

Output

[[ 15. 2006.]

[ 25. 1996.]

[ 35. 1986.]

[ 45. 1950.]]

This is how you can replace a missing value with a value while converting a dataframe into a numpy array.

Handling Index While Converting Pandas Dataframe to Numpy Array

You may need to include or exclude the index column of the dataframe while converting it into the dataframe.

You can control this by using the method to_records().

to_records() will convert the dataframe into a numpy record array. It accepts three optional parameters.

  • Index – Flag to denote when the index column must be included in the resultant record array. By default its True and the index column will be included in the resultant array.
  • column_dtypes – Datatypes of the columns in the resultant record array.
  • index_dtypes – Datatype to be used for the index columns, if the index columns are included in the data array. This is applied only if Index = True.

Output

[(0, 15, 2006.) (1, 25, 1996.) (2, 35, 1986.) (3, 45, nan)]

You May Also Like

Leave a Comment