Boto3 is an AWS SDK for Python. It allows users to create, and manage AWS services such as EC2 and S3. It provides object-oriented API services and low-level services to the AWS services.
S3 is a Simple Storage Service that allows you to store files as objects. It is also known as an object-based storage service.
In this tutorial, you’ll learn how to open the S3 object as String with Boto3 by using the proper file encodings.
UTF-8 is the commonly used encoding system for text files. It supports all the special characters in various languages such as German umlauts Ä. These special characters are considered as Multibyte characters.
When a file is encoded using a specific encoding, then while reading the file, you need to specify that encoding to decode the file contents. Then only you’ll be able to see all the special characters without any problem.
[Optional]. When you store a file in S3, you can set the encoding using the file Metadata option.
Edit metadata of file using the steps shown below.
You’ll be taken to the file metadata screen.
The system-defined metadata will be available by default with key as content-type and value as text/plain.
You can add the encoding by selecting the Add metadata option. Select System Defined Type and Key as content-encoding and value as utf-8 as shown below.
You’ve set the encoding for your file objects in S3.
Now you’ll read how to read files from S3.
Reading File as String From S3
In this section, you’ll read the file as a string from S3 with encoding as UTF-8.
First, you’ll create a session with Boto3 by using the AWS Access key id and secret access key.
Then create an S3 resource with the Boto3 session. Then you’ll create an S3 object to represent the AWS S3 Object by using your bucket name and objectname.
Now, with the
get() action of this object, you can retrieve the S3 Object body using the
It’ll give you a
http response can be read using the read() and decoded using the UTF-8 encoding as shown below.
import boto3 #Creating Session With Boto3. session = boto3.Session( aws_access_key_id='Your Access Key ID', aws_secret_access_key='You Secret access key' ) #Creating S3 Resource From the Session. s3 = session.resource('s3') #Creating Object From the S3 Resource. obj = s3.Object('Your_bucket_name', 'You File Object Name/Key') #Reading the File as String With Encoding file_content = obj.get()['Body'].read().decode('utf-8') # Printing the File print(file_content)
When you execute the above script, you’ll see the contents of the files printed.
This is a test file to demonstrate file reading functionality from aws S3 bucket.
You’ve read the file as a string. Next, you’ll read the file line by line.
Reading S3 File Line by Line
In this section, you’ll read a file from S3 line by line using the
You’ll first read the file to the S3 object by using the Boto3 session and resource. Next, you’ll iterate the Object body using the
import boto3 #Creating Session With Boto3. session = boto3.Session( aws_access_key_id='Your Access Key ID', aws_secret_access_key='You Secret access key' ) s3 = session.resource('s3') obj = s3.Object('Your_bucket_name', 'You File Object Name/Key') for line in obj.get()['Body'].iter_lines(): print(line.decode('utf-8'))
In the print method, the
line object is decoded using
UTF-8 to appropriately decode the line. Because you’ve encoded the file in the previous step of this tutorial. If you did not specify the decode, you’ll see character ‘b’ prefixed with every line you print.
When you execute the above script, it’ll print the contents of the file line by line as shown below.
This is the first line of the file. this is the second line of the file.
You’ve read the file line by line with proper encoding and decoding.
You’ve learned how to open an s3 object as a string with Boto3 and also learned how to read a file line by line using Boto3.
You may also like
How do I get rid of the b-prefix in a string in python?
You need to decode the line with the proper encoding name while you print the line. For e.g. print(line.decode(‘utf-8’)) to decode the line using UTF-8 encoding.