How To Read File Content From S3 Using Boto3? – Definitive Guide

Boto3 is a Python API to interact with AWS services like S3.

You can read file content from S3 using Boto3 using the s3.Object(‘bucket_name’, ‘filename.txt’).get()[‘Body’].read().decode(‘utf-8’) statement.

This tutorial teaches you how to read file content from S3 using Boto3 resource or libraries like smartopen.

Using these libraries, you can read a file from boto3 without downloading the files to your system.

Using Boto3 Resource

This section uses the boto3 resource object to download the file from the S3 bucket.

To know about the boto3 resource, read the difference between the Boto3 resources and the client tutorial.

Boto3 Installation

You can install boto3 using the following command. Prefix the % symbol to install directly from the Jupyter notebook.

% pip install boto3

Read File Content Using Boto3 Resource

To read the file from S3 using Boto3, create a session to your AWS account using the security credentials.

Follow the steps to read the content of the file using the Boto3 resource.

  • Create an S3 resource object using s3 = session.resource('s3’)
  • Create an S3 object for the specific bucket and the file name using s3.Object(‘bucket_name’, ‘filename.txt’)
  • Read the object body using the statement obj.get()['Body'].read().decode(‘utf-8’). The decode() method is used to decode the file objects using a specific encoding. UTF-8 is the commonly used encoding and it supports most special characters.

Code

The following code demonstrates how to read file content from the S3 bucket using boto3.

import boto3

session = boto3.Session(
     aws_access_key_id='<your_access_key_id>',
    aws_secret_access_key='<your_secret_access_key>'
)

s3 = session.resource('s3')

obj = s3.Object(‘your_bucket_name’, ‘sample.txt’)

#Reading the File as String With Encoding
file_content = obj.get()['Body'].read().decode(‘utf-8’)

print(file_content)

There is no method available in boto3 to read file line by line instead of reading all lines at once.

Output

You’ll see the following output text read from the sample.txt file.

    This is a test file. 
    This file is uploaded to the S3 bucket to demonstrate the Boto3 operations. 

This is how you can use the boto3 directly to read file content from S3.

Using Smart-open Library

This section teaches you how to use the smart-open library to read file content from the S3 bucket.

The smart-open library is used to efficiently stream large files from/to the cloud storage such as AWS S3 or GCS or cloud.

Advantages of using the smart-open over boto3:

  • You can read large files easily using the smart-open library
  • Read a file line by line instead of reading the file all at once

Smart-open also uses the boto3 credentials to establish the connection to your AWS account.

Installing Smart-open

Use the following code snippet to install the smart-open library. Prefix the% symbol to install directly from the Jupyter notebook.

The code installs libraries specific for connecting to aws s3.

%pip install smart_open[s3]

Reading the File

To read the file using smart_open, you need the S3 URI.

S3URI consists of S3:// along with the bucket name and the object name.

  • Once you have the S3 URI, use it in the smart_open() constructor with the read mode.
  • r – specifies to open the file in the read-only mode.
  • It returns the line iterator. You can print each line during each iteration.

Code

The following code demonstrates how to read file content from the S3 bucket line by line using Boto3.

from smart_open import smart_open

# stream lines from an S3 object
for line in smart_open('s3://stackvidhya/sample.txt', 'r'):
    print(line)

Output

    This is a test file. 
    This file is uploaded to S3 bucket to demonstrate the Boto3 operations. 

Read All Files From S3 Bucket Using Boto3

This section teaches you how to read all files from the S3 bucket using Boto3.

You can use this method when you want to read the content of all files at once.

All the file’s content will be printed regardless of its type. If the file is not a text file, the content will be printed as a binary file.

To read all files from the S3 bucket in one shot,

  • Create a Boto3 session using your security credentials
  • Create a resource object for the S3 service
  • Create an object for your specific bucket
  • Iterate over all the file objects in the S3 bucket using bucket.objects.all()
  • During each iteration, print each file content using obj.get()['Body'].read()

Code

The following code demonstrates how to read all files from the S3 bucket using boto3.

import boto3

session = boto3.Session(
   aws_access_key_id='<your_access_key_id>',
    aws_secret_access_key='<your_secret_access_key>'
)

#Creating S3 Resource From the Session.
s3 = session.resource('s3')

bucket = s3.Bucket('stackvidhya')

for obj in bucket.objects.all():

    key = obj.key
    body = obj.get()['Body'].read()

    print("File contents of : "  + key)
    print(body)

Output

    File contents of : csv_files/IRIS.csv
    b'sepal_length,sepal_width,petal_length,petal_width,species\r\n5.1,3.5,1.4,0.2,Iris-setosa\r\n4.9,3,1.4,0.2,Iris-setosa\r\n4.7,3.2,1.3,0.2,Iris-setosa\r\n4.6,3.1,1.5,0.2,Iris-setosa\r\n5,3.6,1.4,0.2,Iris-setosa\r\n5.4,3.9,1.7,0.4,Iris-setosa\r\n4.6,3.4,1.4,0.3,Iris-

    File contents of: file2_uploaded_by_boto3.txt
    b'This is a test file to demonstrate the file upload functionality to aws S3 bucket. \n\n'

    File contents of : text_files/testfile.txt
    b'This is a test file to demonstrate the file upload functionality to aws S3 bucket. \n\n'

Conclusion

You’ve learned how to read file content from the S3 bucket using the Boto3 library or the smart_open library.

You learned how to read files line by line or the contents of all files in the specified bucket.

You May Also Like

Leave a Comment