How To Read a CSV file Into a Record Array in Numpy Python – Definitive Guide

Numpy arrays are used for data manipulation.

You can read the csv file into a record array in numpy using the np.genfromtxt() method.

This tutorial teaches you how to read a CSV file into a record array in numpy.

If you’re in Hurry

Use the following code to read CSV file into a record array in NumPy.

import numpy as np

csv_arr = np.genfromtxt('User_details.csv', delimiter=",",dtype =None,  encoding=None)

csv_arr
  • The data will default be read as a float dtype. Use dtype=None to determine datatypes based on the content available in the file.
  • Use encoding=None to use the system’s default encoding. Otherwise, there will be a warning message.

If You Want to Understand Details, Read on…

The sample CSV file contains only data and doesn’t have headers. Let us convert the CSV data into a NumPy array.

Using NumPy GenFromTxt

The genfromtxt() method loads the data from the CSV file.

It provides options to handle missing values, and you can also specify how to handle errors on invalid data.

  • The data will default be read as a float dtype. Use dtype=None to determine datatypes based on the content available in the file.
  • Use encoding=None to use the system’s default encoding. If this optional parameter is not specified, VisibleDeprecationWarning: Reading Unicode strings without specifying the encoding argument is deprecated will occur.

Use this method when you want to handle missing values in a special way.

Code

import numpy as np

csv_arr = np.genfromtxt('User_details.csv', delimiter=",", dtype =None,  encoding=None)

csv_arr

Output

    array([(1, 'Shivam', 'Pandey', 'India', 1),
           (2, 'Kumar', 'Ram', 'India', 1),
           (3, 'Felix', 'John', 'Germany', 2)],
          dtype=[('f0', '<i8'), ('f1', '<U6'), ('f2', '<U6'), ('f3', '<U7'), ('f4', '<i8')])

Using NumPy LoadTxt

The loadtxt() method reads data from the text file.

You cannot handle missing values in the loadTxt() method. Hence, all the rows in the file must have the same number of columns.

  • The data will default be read as a float dtype. You cannot pass dtype=None to automatically infer datatypes as in the file. You need to specify the appropriate converter function to convert the data.
  • Alternatively, you can read only a few columns that contain numerical data using the usecols parameter.

Use this method when your data doesn’t contain any missing values.

Code

The following code demonstrates how to read the first column of the CSV file into a NumPy array.

import numpy as np

csv_arr = np.loadtxt('User_details.csv', usecols = (0),  delimiter=",")

csv_arr

Output

    array([1., 2., 3.])

Using Pandas Read_CSV

Pandas read_csv() is an alternate method to read the CSV file into a NumPy array.

It provides a dataframe, and you can use the to_numpy() to convert the pandas dataframe into a NumPy array.

Advantages

  • read_csv() works with the commas inside quotes.
  • No need to worry about the datatypes. It will automatically infer the data types based on the data.
  • It is fast and has only less CPU usage.

Code

The following code demonstrates how to read a CSV file as a dataframe and convert it into a NumPy array.

import pandas as pd

df = pd.read_csv('User_details.csv', header=None)

csv_arr = df.values

csv_arr

Output

    array([[1, 'Shivam', 'Pandey', 'India', 1],
           [2, 'Kumar', 'Ram', 'India', 1],
           [3, 'Felix', 'John', 'Germany', 2]], dtype=object)

Converting Specific Column from CSV to Numpy Array

To convert specific columns from CSV into a NumPy array, you can pass the usecols parameter and specify the list of column indexes.

Code

import numpy as np

csv_arr = np.genfromtxt('User_details.csv', delimiter=",",usecols = (0), dtype =None, encoding=None)

csv_arr

Output

    array([1, 2, 3])

Additional Resources

Leave a Comment