How to Read JSON file from S3 using Boto3 Python? – Detailed Guide

S3 is a storage service from AWS used to store any files such as JSON files or text files.

You can read JSON file from S3 using boto3 by using the s3.object.read() method.

In this tutorial, you’ll learn how to read a json file from S3 using Boto3.

Prerequisites

  • Boto3 – Additional package to be installed(Explained below)
  • JSON – Available by default

Installing Boto3

If you’ve not installed boto3 yet, you can install it by using the below snippet.

You can use the % symbol before pip to install packages directly from the Jupyter notebook instead of launching the Anaconda Prompt.

Snippet

%pip install boto3

Boto3 will be installed successfully.

Now, you can use it to access AWS resources.

Reading JSON file from S3 Bucket

In this section, you’ll use the Boto3 resource to list contents from an s3 bucket.

Boto3 resource is a high-level object-oriented API that represents the AWS services. Follow the below steps to list the contents from the S3 Bucket using the Boto3 resource.

  1. Create Boto3 session using boto3.session() method passing the security credentials.
  2. Create the S3 resource session.resource('s3') snippet.
  3. Using the resource object, create a reference to your S3 object by using the Bucket name and the file object name.
  4. Using the object, you can use the get() method to get the HTTPResponse. Use the ['Body'] tag and read() method to read the body from the HTTPResponse.
  5. Optionally, you can use the decode() method to decode the file content with any charset such as utf-8. This is necessary when your file has any special characters that are available in any specific charsets or your file is encoded explicitly in any of the charsets. File encoding is explained in the next section in detail.
  6. Next, you can use the json.loads() method to parse the json content of your file and convert it into the python dictionary. Now, you can iterate through the dictionary to access the items in the JSON text.

You can use the below code to read a json file from S3.

Code

import boto3

import json


#Creating Session With Boto3.
session = boto3.Session(
aws_access_key_id='Your Access Key ID',
aws_secret_access_key='You Secret access key'
)

#Creating S3 Resource From the Session.
s3 = session.resource('s3')

#Creating Object From the S3 Resource.   
obj = s3.Object('stackvidhya', 'sample_json.json')


#Reading the File as String With Encoding
file_content = obj.get()['Body'].read().decode('utf-8') 

json_data = json.loads(file_content)

print(json_data)

Output

    {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12'}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}}

This is how you can read JSON files from S3.

Next, you’ll learn about file encoding and how to set file encoding explicitly in AWS S3.

File Encoding

Encoding is used to represent a set of characters by some kind of encoding system that assigns a number to each character for digital/binary representation.

UTF-8 is the commonly used encoding system for text files. It supports all the special characters in various languages such as German umlauts Ä. These special characters are considered as Multibyte characters.

When a file is encoded using a specific encoding, then while reading the file, you need to specify that encoding to decode the file contents. Then only you’ll be able to see all the special characters without any problem.

When you store a file in S3, you can set the encoding using the file Metadata option.

Opening Metadata section of S3 object.

read json file from s3

You’ll be taken to the file metadata screen.

Edit metadata of file using the steps shown below.

Screenshot 2022 02 17 at 6.30.19 PM
setting metadata of file

The system-defined metadata will be available by default with key as content-type and value as text/plain.

You can add the encoding by selecting the Add metadata option. Select System Defined Type and Key as content-encoding and value as utf-8 or JSON based on your file type.

This is how you can set encoding for your file objects in S3.

Conclusion

In this tutorial, you’ve learned how to read a JSON file object from S3 using the boto3 library in Python.

You May Also Like

Leave a Comment