`Pandas`

dataframe is a two-dimensional data structure to store and retrieve data in rows and columns format.

**You can convert pandas dataframe to numpy array using the df.to_numpy() method.**

`Numpy`

arrays provide fast and versatile ways to normalize data that can be used to clean and scale the data during the training of the machine learning models.

In this tutorial, you’ll learn how to convert pandas dataframe to `numpy`

array with examples and different conditions.

**If You’re in Hurry…**

You can use the below code snippet to convert pandas dataframe into `numpy`

array.

`numpy_array = ``df.to_numpy()`
print(type(numpy_array))

**Output**

`<class 'numpy.ndarray'>`

**If You Want to Understand Details, Read on…**

In this tutorial, you’ll learn the different methods available to convert pandas dataframe to `numpy`

array and how it can be used to convert in various scenarios.

Table of Contents

## Sample Dataframe

Create a sample dataframe that you’ll use to convert to a NumPy array. It contains two columns and four rows. Also in one cell, it contains `NaN`

which means a missing value.

**Snippet**

```
import pandas as pd
import numpy as np
data = {'Age': [15,25,35,45],
'Birth Year': [2006,1996,1986, np.NaN]
}
df = pd.DataFrame(data, columns = ['Age','Birth Year'])
df
```

**Dataframe Will Look Like**

Age | Birth Year | |
---|---|---|

0 | 15 | 2006.0 |

1 | 25 | 1996.0 |

2 | 35 | 1986.0 |

3 | 45 | NaN |

Now, you’ll use this dataframe to convert it into a `numpy`

array.

## Using to_numpy()

You can convert a pandas dataframe to a NumPy array using the method `to_numpy()`

.

It accepts three **optional** parameters.

`dtype`

– to specify the datatype of the values in the array`copy`

–`copy=True`

makes a new copy of the array and`copy=False`

returns just a view of another array.`False`

is default and it’ll return just a view of another array, if it exists.`na_value`

– To specify a value to be used for any missing value in the array. You can pass any value here.

**Note:** This is an officially recommended method to convert a pandas dataframe into a NumPy array.

**Snippet**

When you execute the below snippet, the dataframe will be converted into a NumPy array. The missing value will not be replaced with any value because you are not specifying any value to a missing value.

Finally when you print the type of the array using `type()`

method, you’ll see the output of `<class 'numpy.ndarray'>`

which means the dataframe is successfully converted into a `numpy`

array.

`numpy_array = ``df.to_numpy()`
print(numpy_array)
print(type(numpy_array))

**Output**

```
[[ 15. 2006.]
[ 25. 1996.]
[ 35. 1986.]
[ 45. nan]]
<class 'numpy.ndarray'>
```

This is how you can convert a pandas dataframe into a `numpy`

array.

## Using dataframe.values

In this section, you’ll convert the dataframe into a NumPy array using df.values. The values method returns the NumPy array representation of the dataframe.

Only the cell values in the dataframe will be returned as an array. `row`

and `column`

axes labels will be removed.

**Snippet**

Use the below snippet to convert the dataframe into a number array using the values property.

`values_array = ``df.values`
print(values_array)
print(type(values_array))

**Output**

```
[[ 15. 2006.]
[ 25. 1996.]
[ 35. 1986.]
[ 45. nan]]
<class 'numpy.ndarray'>
```

This is how you can convert a dataframe into an `numpy`

array using the values attribute of the dataframe.

## Convert Select Columns into Numpy Array

You can convert select columns of a dataframe into an `numpy`

array using the `to_numpy()`

method by passing the column subset of the dataframe.

For example, `df[['Age']]`

will return *just* the `age`

column. When you invoke the `to_numpy()`

method in the resultant dataframe, you’ll get the `numpy`

array of the `age`

column in the dataframe.

**Snippet**

`age_array = df[``['Age']`].to_numpy()
print(age_array)

You’ll see the age column as an `NumPy`

array.

**Output**

```
[[15]
[25]
[35]
[45]]
```

This is how you can convert a select column of a pandas dataframe into a `numpy`

array.

## Handle Missing Values while converting Dataframe to Numpy Array

In this section, you’ll learn how to handle missing values while converting a pandas dataframe to a `numpy`

array.

**You can replace missing values by passing the value to be used in case of missing values using the na_value parameter.**

If you use `na_value = 0`

, the missing values will be replaced with `0`

.

In the sample dataframe, you’ve created before there is one missing value for birth year. Now, when you execute the below snippet on the sample dataframe, the missing year will be replaced with 1950.

**Snippet**

`array = df.to_numpy(``na_value='1950'`)
print(array)

**Output**

```
[[ 15. 2006.]
[ 25. 1996.]
[ 35. 1986.]
[ 45. 1950.]]
```

This is how you can replace a missing value with a value while converting a dataframe into a `numpy`

array.

## Handling Index While Converting Pandas Dataframe to Numpy Array

You may need to include or exclude the index column of the dataframe while converting it into the dataframe.

You can control this by using the method to_records().

`to_records()`

will convert the dataframe into a `numpy`

record array. It accepts three optional parameters.

`Index`

– Flag to denote when the index column must be included in the resultant record array. By*default*its`True`

and the index column will be included in the resultant array.`column_dtypes`

– Datatypes of the columns in the resultant record array.`index_dtypes`

– Datatype to be used for the index columns, if the index columns are included in the data array. This is applied only if`Index = True`

.

### Converting With Index

Use the below snippet to convert a pandas dataframe into an `numpy`

array.

You’ll explicitly specify `index=True`

to include the index column in the resultant record array. Though its default as discussed above. The index column will be included even if you did not use this parameter at all.

**Snippet**

`res = df.``to_records(index=True)`
print(res)

You could see the index values in each record. 0, 1, 2, 3.

**Output**

`[(0, 15, 2006.) (1, 25, 1996.) (2, 35, 1986.) (3, 45, nan)]`

### Converting Without Index

In this section, you’ll convert a pandas dataframe into a `numpy`

record array without the index columns.

You can convert without index using the parameter `index=False`

.

**Snippet**

`res = df.to_records(``index=False`)
print(res)

You can see that the column index 0,1,2,3 is not included in the records.

**Output**

`[(15, 2006.) (25, 1996.) (35, 1986.) (45, nan)]`

## Convert Pandas Dataframe to Numpy Array with Headers

In this section, you’ll learn how to convert pandas dataframe to `numpy`

array with the column headers.

Even if you don’t include the index columns while converting them into a record array, the column names will still be stored.

“record array”, which is a subclass of `ndarray`

allows field access using attributes. E.g. `array['age']`

or `array.age`

.

**Snippet**

`array = df.to_records(``index=False`)
print(array['Age'])

**Output**

`[15 25 35 45]`

Also, when you ravel the array into the dataframe again using `ravel()`

, you’ll see the column name and the column indexes as shown below.

`res_pd = pd.DataFrame(``res.ravel()`)
print(res_pd)

**Output**

```
Age Birth Year
0 15 2006.0
1 25 1996.0
2 35 1986.0
3 45 NaN
```

This is how you can handle column names while converting a dataframe into a `numpy`

record array.

## Convert Pandas Dataframe to 2D Numpy Array

In this section, you’ll learn how to convert a pandas dataframe to a 2d `numpy`

array. To do this, you need to have only two columns in the dataframe.

Hence, create a dataframe with two columns `A`

and `B`

and invoke the `to_numpy()`

array.

**Snippet**

```
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
df
```

When you print the array, you could see the two-dimensional array.

**Output**

```
array([[1, 3],
[2, 4]], dtype=int64)
```

This is how you can convert a pandas dataframe into a 2D array.

## Convert Pandas Dataframe to Numpy Structured Array

A **Structured Numpy Array** is an **array** of structures (Similar to a C **struct**). Numpy arrays are homogeneous which means it contains values of only one data type.

So when you want to create an array with a different type, you can create a structure that has values of different types and create a structured `numpy`

array with structures.

The below snippet shows how you can convert a pandas dataframe to a `numpy`

structured array.

Assume you have a pandas series that has different types of values in it. When you use those series while creating an array, then you’ll get a `numpy`

structured array.

**Snippet**

```
x = np.array([('Sarvah', 3, 12.0), ('Vikram', 31, 58.0)],
dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
x
```

**Output**

```
array([('Sarvah', 3, 12.), ('Vikram', 31, 58.)],
dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])
```

When you print the array, you could see the different `dtypes`

available.

## Conclusion

To summarize, you’ve learned the different methods available to convert pandas dataframe into a `numpy`

array.

You’ve also learned how to convert select columns into a `NumPy`

array, how to handle indexes and column names while converting the dataframe into the NumPy array. Also, you’ve learned how to create a `numpy`

structured array from a pandas dataframe.

You can use these methods to convert the data into an array that can be used to normalize and scale as you need to perform the machine learning activities.

If you have any questions, comment below.

## You May Also Like

- How to Convert Numpy Array to Pandas Dataframe
- How to Normalize Data Between 0 and 1 Range
- How to Normalize Numpy Array into a Unit Vector
- How to Add Column to Dataframe in Pandas
- How to Rename Column in pandas
- How to Drop Column in pandas dataframe
- How to Get Column Name in Pandas
- How to Get Number of Rows from Pandas Dataframe
- How to select rows from pandas dataframe based on column values