S3 is a storage service from AWS used to store any files such as JSON files or text files.
You can read JSON file from S3 using boto3 by using the s3.object.read()
method.
In this tutorial, you’ll learn how to read a json file from S3 using Boto3
.
Prerequisites
Boto3
– Additional package to be installed(Explained below)JSON
– Available by default
Installing Boto3
If you’ve not installed boto3 yet, you can install it by using the below snippet.
You can use the % symbol before pip to install packages directly from the Jupyter notebook instead of launching the Anaconda Prompt.
Snippet
%pip install boto3
Boto3 will be installed successfully.
Now, you can use it to access AWS resources.
Reading JSON file from S3 Bucket
In this section, you’ll use the Boto3 resource to list contents from an s3 bucket.
Boto3 resource is a high-level object-oriented API that represents the AWS services. Follow the below steps to list the contents from the S3 Bucket using the Boto3 resource.
- Create Boto3 session using
boto3.session()
method passing the security credentials. - Create the
S3
resourcesession.resource('s3')
snippet. - Using the
resource
object, create a reference to yourS3
object by using the Bucket name and the file object name. - Using the object, you can use the
get()
method to get theHTTPResponse
. Use the['Body']
tag andread()
method to read the body from the HTTPResponse. - Optionally, you can use the
decode()
method to decode the file content with any charset such asutf-8
. This is necessary when your file has any special characters that are available in any specific charsets or your file is encoded explicitly in any of the charsets. File encoding is explained in the next section in detail. - Next, you can use the
json.loads()
method to parse the json content of your file and convert it into the python dictionary. Now, you can iterate through the dictionary to access the items in the JSON text.
You can use the below code to read a json file from S3.
Code
import boto3
import json
#Creating Session With Boto3.
session = boto3.Session(
aws_access_key_id='Your Access Key ID',
aws_secret_access_key='You Secret access key'
)
#Creating S3 Resource From the Session.
s3 = session.resource('s3')
#Creating Object From the S3 Resource.
obj = s3.Object('stackvidhya', 'sample_json.json')
#Reading the File as String With Encoding
file_content = obj.get()['Body'].read().decode('utf-8')
json_data = json.loads(file_content)
print(json_data)
Output
{'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12'}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}}
This is how you can read JSON files from S3.
Next, you’ll learn about file encoding and how to set file encoding explicitly in AWS S3
.
File Encoding
Encoding is used to represent a set of characters by some kind of encoding system that assigns a number to each character for digital/binary representation.
UTF-8
is the commonly used encoding system for text files. It supports all the special characters in various languages such as German umlauts Ä. These special characters are considered as Multibyte characters.
When a file is encoded using a specific encoding, then while reading the file, you need to specify that encoding to decode the file contents. Then only you’ll be able to see all the special characters without any problem.
When you store a file in S3, you can set the encoding using the file Metadata option.
Opening Metadata section of S3 object.


You’ll be taken to the file metadata screen.
Edit metadata of file using the steps shown below.


The system-defined metadata will be available by default with key as content-type and value as text/plain.
You can add the encoding by selecting the Add metadata option. Select System Defined Type and Key as content-encoding and value as utf-8
or JSON based on your file type.
This is how you can set encoding for your file objects in S3.
Conclusion
In this tutorial, you’ve learned how to read a JSON file object from S3 using the boto3 library in Python.