Numpy
arrays provide fast and versatile ways to normalize data that can be used to clean and scale the data during the training of the machine learning models.
You can convert pandas dataframe to numpy
array using the df.to_numpy()
method.
In this tutorial, you’ll learn how to convert pandas dataframe to numpy
array with examples and different conditions.
If you’re in Hurry
You can use the below code snippet to convert pandas dataframe into numpy
array.
numpy_array = df.to_numpy()
print(type(numpy_array))
Output
<class 'numpy.ndarray'>
If You Want to Understand Details, Read on…
Lets create a sample dataframe and convert it into a NumPy array.
Sample Dataframe
Create a sample dataframe that you’ll use to convert to a NumPy array. It contains two columns and four rows. Also in one cell, it contains NaN
which means a missing value.
Code
import pandas as pd
import numpy as np
data = {'Age': [15,25,35,45],
'Birth Year': [2006,1996,1986, np.NaN]
}
df = pd.DataFrame(data, columns = ['Age','Birth Year'])
df
Dataframe Will Look Like
Age | Birth Year | |
---|---|---|
0 | 15 | 2006.0 |
1 | 25 | 1996.0 |
2 | 35 | 1986.0 |
3 | 45 | NaN |
Now, you’ll use this dataframe to convert it into a numpy
array.
Using to_numpy()
You can convert a pandas dataframe to a NumPy array using the method to_numpy()
.
It accepts three optional parameters.
dtype
– to specify the datatype of the values in the arraycopy
–copy=True
makes a new copy of the array andcopy=False
returns just a view of another array.False
is default and it’ll return just a view of another array, if it exists.na_value
– To specify a value to be used for any missing value in the array. You can pass any value here.
This is an officially recommended method to convert a pandas dataframe into a NumPy array.
Code
When executing the snippet below, the dataframe will be converted into a NumPy array.
- The missing value will not be replaced with any value because you are not specifying any value to a missing value.
numpy_array = df.to_numpy()
print(numpy_array)
print(type(numpy_array))
Output
[[ 15. 2006.]
[ 25. 1996.]
[ 35. 1986.]
[ 45. nan]]
<class 'numpy.ndarray'>
This is how you can convert a pandas dataframe into a numpy
array.
Using dataframe.values
In this section, you’ll convert the dataframe into a NumPy array using df.values. The values method returns the NumPy array representation of the dataframe.
Only the cell values in the dataframe will be returned as an array. row
and column
axes labels will be removed.
Code
Use the following code to convert the dataframe into a number array.
values_array = df.values
print(values_array)
print(type(values_array))
Output
[[ 15. 2006.]
[ 25. 1996.]
[ 35. 1986.]
[ 45. nan]]
<class 'numpy.ndarray'>
This is how you can convert a dataframe into an numpy
array using the values attribute of the dataframe.
Convert Select Columns into Numpy Array
You can convert select columns of a dataframe into an numpy
array using the to_numpy()
method by passing the column subset of the dataframe.
For example, df[['Age']]
will return just the age
column. When you invoke the to_numpy()
method in the resultant dataframe, you’ll get the numpy
array of the age
column in the dataframe.
Snippet
age_array = df[['Age']].to_numpy()
print(age_array)
You’ll see the age column as an NumPy
array.
Output
[[15]
[25]
[35]
[45]]
This is how you can convert a select column of a pandas dataframe into a numpy
array.
Handle Missing Values while converting Dataframe to Numpy Array
In this section, you’ll learn how to handle missing values while converting a pandas dataframe to a numpy
array.
You can replace missing values by passing the value to be used in case of missing values using the na_value
parameter.
If you use na_value = 0
, the missing values will be replaced with 0
.
Snippet
In the sample dataframe, you’ve created before there is one missing value for birth year. Now, when you execute the below snippet on the sample dataframe, the missing year will be replaced with 1950.
array = df.to_numpy(na_value='1950')
print(array)
Output
[[ 15. 2006.]
[ 25. 1996.]
[ 35. 1986.]
[ 45. 1950.]]
This is how you can replace a missing value with a value while converting a dataframe into a numpy
array.
Handling Index While Converting Pandas Dataframe to Numpy Array
You may need to include or exclude the index column of the dataframe while converting it into the dataframe.
You can control this by using the method to_records().
to_records()
will convert the dataframe into a numpy
record array. It accepts three optional parameters.
Index
– Flag to denote when the index column must be included in the resultant record array. By default itsTrue
and the index column will be included in the resultant array.column_dtypes
– Datatypes of the columns in the resultant record array.index_dtypes
– Datatype to be used for the index columns, if the index columns are included in the data array. This is applied only ifIndex = True
.
Output
[(0, 15, 2006.) (1, 25, 1996.) (2, 35, 1986.) (3, 45, nan)]