Boto3 is a Python API to interact with AWS services like S3.
You can read file content from S3 using Boto3 using the s3.Object(‘bucket_name’, ‘filename.txt’).get()[‘Body’].read().decode(‘utf-8’) statement.
This tutorial teaches you how to read file content from S3 using Boto3 resource or libraries like smartopen
.
Using these libraries, you can read a file from boto3 without downloading the files to your system.
Using Boto3 Resource
This section uses the boto3
resource
object to download the file from the S3 bucket.
To know about the boto3
resource, read the difference between the Boto3 resources and the client tutorial.
Boto3 Installation
You can install boto3
using the following command. Prefix the %
symbol to install directly from the Jupyter notebook.
% pip install boto3
Read File Content Using Boto3 Resource
To read the file from S3 using Boto3, create a session to your AWS account using the security credentials.
Follow the steps to read the content of the file using the Boto3
resource.
- Create an S3
resource
object usings3 = session.resource('s3’)
- Create an S3 object for the specific bucket and the file name using
s3.Object(‘bucket_name’, ‘filename.txt’)
- Read the object body using the statement
obj.get()['Body'].read().decode(‘utf-8’)
. Thedecode()
method is used to decode the file objects using a specific encoding. UTF-8 is the commonly used encoding and it supports most special characters.
Code
The following code demonstrates how to read file content from the S3 bucket using boto3
.
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>'
)
s3 = session.resource('s3')
obj = s3.Object(‘your_bucket_name’, ‘sample.txt’)
#Reading the File as String With Encoding
file_content = obj.get()['Body'].read().decode(‘utf-8’)
print(file_content)
There is no method available in boto3
to read file line by line instead of reading all lines at once.
Output
You’ll see the following output text read from the sample.txt
file.
This is a test file.
This file is uploaded to the S3 bucket to demonstrate the Boto3 operations.
This is how you can use the boto3 directly to read file content from S3.
Using Smart-open Library
This section teaches you how to use the smart-open
library to read file content from the S3 bucket.
The smart-open library is used to efficiently stream large files from/to the cloud storage such as AWS S3 or GCS or cloud.
Advantages of using the smart-open over boto3:
- You can read large files easily using the smart-open library
- Read a file line by line instead of reading the file all at once
Smart-open also uses the boto3 credentials to establish the connection to your AWS account.
Installing Smart-open
Use the following code snippet to install the smart-open
library. Prefix the%
symbol to install directly from the Jupyter notebook.
The code installs libraries specific for connecting to aws s3.
%pip install smart_open[s3]
Reading the File
To read the file using smart_open
, you need the S3 URI.
S3URI consists of S3://
along with the bucket name
and the object name
.
- Once you have the S3 URI, use it in the
smart_open()
constructor with the read mode. r
– specifies to open the file in the read-only mode.- It returns the line iterator. You can print each line during each iteration.
Code
The following code demonstrates how to read file content from the S3 bucket line by line using Boto3
.
from smart_open import smart_open
# stream lines from an S3 object
for line in smart_open('s3://stackvidhya/sample.txt', 'r'):
print(line)
Output
This is a test file.
This file is uploaded to S3 bucket to demonstrate the Boto3 operations.
Read All Files From S3 Bucket Using Boto3
This section teaches you how to read all files from the S3 bucket using Boto3.
You can use this method when you want to read the content of all files at once.
All the file’s content will be printed regardless of its type. If the file is not a text file, the content will be printed as a binary file.
To read all files from the S3
bucket in one shot,
- Create a
Boto3
session using your security credentials - Create a resource object for the
S3
service - Create an object for your specific bucket
- Iterate over all the file objects in the S3 bucket using
bucket.objects.all()
- During each iteration, print each file content using
obj.get()['Body'].read()
Code
The following code demonstrates how to read all files from the S3 bucket using boto3.
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>'
)
#Creating S3 Resource From the Session.
s3 = session.resource('s3')
bucket = s3.Bucket('stackvidhya')
for obj in bucket.objects.all():
key = obj.key
body = obj.get()['Body'].read()
print("File contents of : " + key)
print(body)
Output
File contents of : csv_files/IRIS.csv
b'sepal_length,sepal_width,petal_length,petal_width,species\r\n5.1,3.5,1.4,0.2,Iris-setosa\r\n4.9,3,1.4,0.2,Iris-setosa\r\n4.7,3.2,1.3,0.2,Iris-setosa\r\n4.6,3.1,1.5,0.2,Iris-setosa\r\n5,3.6,1.4,0.2,Iris-setosa\r\n5.4,3.9,1.7,0.4,Iris-setosa\r\n4.6,3.4,1.4,0.3,Iris-
File contents of: file2_uploaded_by_boto3.txt
b'This is a test file to demonstrate the file upload functionality to aws S3 bucket. \n\n'
File contents of : text_files/testfile.txt
b'This is a test file to demonstrate the file upload functionality to aws S3 bucket. \n\n'
Conclusion
You’ve learned how to read file content from the S3 bucket using the Boto3
library or the smart_open
library.
You learned how to read files line by line or the contents of all files in the specified bucket.