How To Convert Select Columns in Pandas Dataframe to Numpy Array – Definitive Guide

Pandas dataframe allows you to store data in a 2-dimensional structure.

You can convert select columns in Pandas dataframe to NumPy array using arr = df[[“RegNo”, “Country_Code”]].to_numpy() statement.

This tutorial teaches you the different methods to convert select columns in a pandas dataframe into a NumPy array.

If you’re in Hurry

Use the following code to convert select columns into a NumPy array.

arr = df[["RegNo", "Country_Code"]].to_numpy()

print(arr)

If You Want to Understand Details, Read on…

Using different methods, let us create a sample dataframe and convert a few columns to a NumPy array.

Creating Dataframe

Create a sample dataframe with five columns in it.

import pandas as pd 

# List of Tuples
users = [ (101,'Shivam', 'Pandey', 'India', 1),
             (102,'Kumar', 'Ram' , 'US', 2 ),
         (103,'Felix','John' , 'Germany', 3 ),
         (104,'Michael','John' , 'India', 1 ),
              ]

#Create a DataFrame object
df = pd.DataFrame(  users, 
                    columns = ['RegNo', 'First Name’ , 'Last Name', 'Country', 'Country_Code']
                    ) 

df

DataFrame Will Look Like

RegNoFirst NameLast NameCountryCountry_Code
0101ShivamPandeyIndia1
1102KumarRamUS2
2103FelixJohnGermany3
3104MichaelJohnIndia1

Using to_numpy()

The to_numpy() method converts the pandas dataframe into a Numpy Array.

To convert select columns using to_numpy(),

  • Select the subset of dataframe columns by passing the list of columns
  • Invoke the to_numpy() method to convert those columns to NumPy array

Use this method when you know the column names to convert it into a NumPy array.

Code

The following code converts the two columns RegNo and the Country_code into a numpy array.

arr = df[["RegNo", "Country_Code"]].to_numpy()

print(arr)

Output

    [[101   1]
     [102   2]
     [103   3]
     [104   1]]

Using iloc And to_numpy()

The iloc attribute of the dataframe allows you to select the subset of the dataframe using the index.

To convert select columns into a NumPy array using iloc,

  • Select the subset of columns using its index position.
  • Invoke the to_numpy() method

Use this method when you want to select columns in a specific range. Using this method, you can also filter rows that need to be converted into a NumPy array. For example, the first ten rows can be converted.

Code

The following code demonstrates converting the columns starting from index three until the end.

arr=df.iloc[:,3:].to_numpy()

print(arr)

Output

    [['India' 1]
     ['US' 2]
     ['Germany' 3]
     ['India' 1]]

Using loc And to_numpy()

The loc attribute of the dataframe allows you to access specific rows/columns using its label.

To convert select columns into a NumPy array using loc,

  • Select the subset of columns using its labels.
  • Invoke the to_numpy() method

Use this method when you know the columns and filter rows that need to be converted into a NumPy array.

Code

The following code demonstrates how to convert the row with the index 0 to 2 and the specific columns RegNo and Country_Code of those rows into a NumPy array.

arr=df.loc[0:2,["RegNo", "Country_Code"]].to_numpy()

print(arr)

Output

    [[101   1]
     [102   2]
     [103   3]

Using Values Attribute

The values attribute returns a NumPy representation of the pandas dataframe.

To convert specific columns of the dataframe,

  • Select the desired columns using the column names
  • Invoke the values attribute, and it’ll return the NumPy representation of the values

Code

arr = df[["RegNo", "Country_Code"]].values

print(arr)

Output

    [[101   1]
     [102   2]
     [103   3]
     [104   1]]

Additional Resources

Leave a Comment