How to Open S3 Object as String With Boto3 (with Encoding) Python?

Introduction

Boto3 is an AWS SDK for Python. It allows users to create, and manage AWS services such as EC2 and S3. It provides object-oriented API services and low-level services to the AWS services.

S3 is a Simple Storage Service that allows you to store files as objects. It is also known as an object-based storage service.

In this tutorial, you’ll learn how to open the S3 object as String with Boto3 by using the proper file encodings.

File Encoding

Encoding is used to represent a set of characters by some kind of encoding system that assigns a number to each character for digital/binary representation.

UTF-8 is the commonly used encoding system for text files. It supports all the special characters in various languages, such as German umlauts Ä. These special characters are considered Multibyte characters.

When a file is encoded using a specific encoding, then while reading the file, you need to specify that encoding to decode the file contents to see the special characters without problems

[Optional]. When you store a file in S3, you can set the encoding using the file Metadata option. You’ll be taken to the file metadata screen.

  • The system-defined metadata will be available by default with key as content-type and value as text/plain.
  • Add the encoding by selecting the Add metadata option.
  • Select System Defined Type and Key as content-encoding and value as utf-8

You’ve set the encoding for your file objects in S3.

Now you’ll read how to read files from S3.

Reading File as String From S3

In this section, you’ll read the file as a string from S3 with encoding as UTF-8.

  • Create a session with Boto3 by using the AWS Access key id and secret access key.
  • Create an S3 resource with the Boto3 session.
  • Create an S3 object to represent the AWS S3 Object by using your bucket name and objectname.
  • With this object’s get() action, you can retrieve the S3 Object body using the ['body'] argument. It’ll give you an HTTP response.

This HTTP response can be read using the read() and decoded using the UTF-8 encoding, as shown below.

import boto3

#Creating Session With Boto3.
session = boto3.Session(
aws_access_key_id='Your Access Key ID',
aws_secret_access_key='You Secret access key'
)

#Creating S3 Resource From the Session.
s3 = session.resource('s3')

#Creating Object From the S3 Resource.   
obj = s3.Object('Your_bucket_name', 'You File Object Name/Key')
   

#Reading the File as String With Encoding
file_content = obj.get()['Body'].read().decode('utf-8') 

# Printing the File
print(file_content)

When you execute the above script, you’ll see the contents of the files printed.

This is a test file to demonstrate file reading functionality from aws S3 bucket.

You’ve read the file as a string.

Reading S3 File Line by Line

In this section, you’ll read a file from S3 line by line using the iter_lines() method.

  • Read the file to the S3 object by using the Boto3 session and resource.
  • Iterate the Object body using the iter_lines() method.
  • decode the line using utf-8 and print it.
import boto3

#Creating Session With Boto3.
session = boto3.Session(
aws_access_key_id='Your Access Key ID',
aws_secret_access_key='You Secret access key'
)

s3 = session.resource('s3')
   
obj = s3.Object('Your_bucket_name', 'You File Object Name/Key')
   
for line in obj.get()['Body'].iter_lines():

    print(line.decode('utf-8'))

If you did not specify the charset, you’ll see character ‘b’ prefixed with every line you print.

When you execute the above script, it’ll print the contents of the file line by line as shown below.

This is the first line of the file.
this is the second line of the file. 

You’ve read the file line by line with proper encoding and decoding.

Conclusion

You’ve learned how to open an s3 object as a string with Boto3 and also learned how to read a file line by line using Boto3.

You may also like

How to Download Files From S3 Using Boto3[Python]?

How To Read JSON File From S3 Using Boto3 Python? – Detailed Guide

How do I get rid of the b-prefix in a string in python?

You need to decode the line with the proper encoding name while you print the line. For e.g. print(line.decode(‘utf-8’)) to decode the line using UTF-8 encoding.

Leave a Comment