How to Convert Pandas Dataframe to Numpy Array – With Examples

Pandas dataframe is a two-dimensional data structure to store and retrieve data in rows and columns format.

Numpy arrays provide fast and versatile ways to normalize data that can be used to clean and scale the data during the training of the machine learning models.

While the pandas dataframe has the raw data, you need to convert the dataframe into a numpy array in order to use it for training the machine learning model.

You can convert pandas dataframe to numpy array using the df.to_numpy() method available in the dataframe object.

In this tutorial, you’ll learn how to convert pandas dataframe to numpy array with examples and different conditions.

If You’re in Hurry…

You can use the below code snippet to convert pandas dataframe into numpy array.

numpy_array = df.to_numpy()

print(type(numpy_array))

Output

<class 'numpy.ndarray'>

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods available to convert pandas dataframe to numpy array and how it can be used to convert in various scenarios.

Sample Dataframe

Create a sample dataframe that you’ll use to convert to a NumPy array. It contains two columns and four rows. Also in one cell, it contains NaN which means a missing value.

Snippet

import pandas as pd

import numpy as np



data = {'Age': [15,25,35,45],

'Birth Year': [2006,1996,1986, np.NaN]

}



df = pd.DataFrame(data, columns = ['Age','Birth Year'])



df

Dataframe Will Look Like

AgeBirth Year
0152006.0
1251996.0
2351986.0
345NaN

Now, you’ll use this dataframe to convert it into a numpy array.

Using to_numpy()

You can convert a pandas dataframe to a NumPy array using the method to_numpy().

It accepts three optional parameters.

  • dtype – to specify the datatype of the values in the array
  • copycopy=True makes a new copy of the array and copy=False returns just a view of another array. False is default and it’ll return just a view of another array, if it exists.
  • na_value – To specify a value to be used for any missing value in the array. You can pass any value here.

Note: This is an officially recommended method to convert a pandas dataframe into a NumPy array.

Snippet

When you execute the below snippet, the dataframe will be converted into a NumPy array. The missing value will not be replaced with any value because you are not specifying any value to a missing value.

Finally when you print the type of the array using type() method, you’ll see the output of <class 'numpy.ndarray'> which means the dataframe is successfully converted into a numpy array.

numpy_array = df.to_numpy()

print(numpy_array)

print(type(numpy_array))

Output

[[ 15. 2006.]

[ 25. 1996.]

[ 35. 1986.]

[ 45. nan]]

<class 'numpy.ndarray'>

This is how you can convert a pandas dataframe into a numpy array.

Using dataframe.values

In this section, you’ll convert the dataframe into a NumPy array using df.values. The values method returns the NumPy array representation of the dataframe.

Only the cell values in the dataframe will be returned as an array. row and column axes labels will be removed.

Snippet

Use the below snippet to convert the dataframe into a number array using the values property.

values_array = df.values

print(values_array)

print(type(values_array))

Output

[[ 15. 2006.]

[ 25. 1996.]

[ 35. 1986.]

[ 45. nan]]

<class 'numpy.ndarray'>

This is how you can convert a dataframe into an numpy array using the values attribute of the dataframe.

Convert Select Columns into Numpy Array

**You can convert select columns of a dataframe into an numpy array using the to_numpy() method by passing the column subset of the dataframe.

For example, df[['Age']] will return just the age column. When you invoke the to_numpy() method in the resultant dataframe, you’ll get the numpy array of the age column in the dataframe.

Snippet

age_array = df[['Age']].to_numpy()

print(age_array)

You’ll see the age column as an NumPy array.

Output

[[15]

[25]

[35]

[45]]

This is how you can convert a select column of a pandas dataframe into a numpy array.

Handle Missing Values while converting Dataframe to Numpy Array

In this section, you’ll learn how to handle missing values while converting a pandas dataframe to a numpy array.

You can replace missing values by passing the value to be used in case of missing values using the na_value parameter.

If you use na_value = 0, the missing values will be replaced with 0.

In the sample dataframe, you’ve created before there is one missing value for birth year. Now, when you execute the below snippet on the sample dataframe, the missing year will be replaced with 1950.

Snippet

array = df.to_numpy(na_value='1950')

print(array)

Output

[[ 15. 2006.]

[ 25. 1996.]

[ 35. 1986.]

[ 45. 1950.]]

This is how you can replace a missing value with a value while converting a dataframe into a numpy array.

Handling Index While Converting Pandas Dataframe to Numpy Array

You may need to include or exclude the index column of the dataframe while converting it into the dataframe.

You can control this by using the method to_records().

to_records() will convert the dataframe into a numpy record array. It accepts three optional parameters.

  • Index – Flag to denote when the index column must be included in the resultant record array. By default its True and the index column will be included in the resultant array.
  • column_dtypes – Datatypes of the columns in the resultant record array.
  • index_dtypes – Datatype to be used for the index columns, if the index columns are included in the data array. This is applied only if Index = True.

Converting With Index

Use the below snippet to convert a pandas dataframe into an numpy array.

You’ll explicitly specify index=True to include the index column in the resultant record array. Though its default as discussed above. The index column will be included even if you did not use this parameter at all.

Snippet

res = df.to_records(index=True)

print(res)

You could see the index values in each record. 0, 1, 2, 3.

Output

[(0, 15, 2006.) (1, 25, 1996.) (2, 35, 1986.) (3, 45, nan)]

Converting Without Index

In this section, you’ll convert a pandas dataframe into a numpy record array without the index columns.

You can convert without index using the parameter index=False.

Snippet

res = df.to_records(index=False)

print(res)

You can see that the column index 0,1,2,3 is not included in the records.

Output

[(15, 2006.) (25, 1996.) (35, 1986.) (45, nan)]

Convert Pandas Dataframe to Numpy Array with Headers

In this section, you’ll learn how to convert pandas dataframe to numpy array with the column headers.

Even if you don’t include the index columns while converting them into a record array, the column names will still be stored.

“record array”, which is a subclass of ndarray allows field access using attributes. E.g. array['age'] or array.age.

Snippet

array = df.to_records(index=False)

print(array['Age'])

Output

[15 25 35 45]

Also, when you ravel the array into the dataframe again using ravel(), you’ll see the column name and the column indexes as shown below.

res_pd = pd.DataFrame(res.ravel())

print(res_pd)

Output

Age Birth Year

0 15 2006.0

1 25 1996.0

2 35 1986.0

3 45 NaN

This is how you can handle column names while converting a dataframe into a numpy record array.

Convert Pandas Dataframe to 2D Numpy Array

In this section, you’ll learn how to convert a pandas dataframe to a 2d numpy array. To do this, you need to have only two columns in the dataframe.

Hence, create a dataframe with two columns A and B and invoke the to_numpy() array.

Snippet

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()

df

When you print the array, you could see the two-dimensional array.

Output

array([[1, 3],

[2, 4]], dtype=int64)

This is how you can convert a pandas dataframe into a 2D array.

Convert Pandas Dataframe to Numpy Structured Array

A Structured Numpy Array is an array of structures (Similar to a C struct). Numpy arrays are homogeneous which means it contains values of only one data type.

So when you want to create an array with a different type, you can create a structure that has values of different types and create a structured numpy array with structures.

The below snippet shows how you can convert a pandas dataframe to a numpy structured array.

Assume you have a pandas series that has different types of values in it. When you use those series while creating an array, then you’ll get a numpy structured array.

Snippet

x = np.array([('Sarvah', 3, 12.0), ('Vikram', 31, 58.0)],

dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])

x

Output

array([('Sarvah', 3, 12.), ('Vikram', 31, 58.)],

dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

When you print the array, you could see the different dtypes available.

Conclusion

To summarize, you’ve learned the different methods available to convert pandas dataframe into a numpy array.

You’ve also learned how to convert select columns into a NumPy array, how to handle indexes and column names while converting the dataframe into the NumPy array. Also, you’ve learned how to create a numpy structured array from a pandas dataframe.

You can use these methods to convert the data into an array that can be used to normalize and scale as you need to perform the machine learning activities.

If you have any questions, comment below.

You May Also Like

Leave a Comment