S3 is a storage service from AWS. You can store any files such as CSV files or text files. You may need to retrieve the list of files to make some file operations. You’ll learn how to list contents of S3 bucket in this tutorial.
You can list the contents of the S3 Bucket by iterating the dictionary returned from my_bucket.objects.all()
method.
If you’re in Hurry
You can use the below code snippet to list the contents of the S3 Bucket using boto3.
Snippet
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>')
#Then use the session to get the resource
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)
Output
csv_files/
csv_files/IRIS.csv
df.csv
dfdd.csv
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/
text_files/testfile.txt
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods to list contents from an S3 bucket using boto3.
You’ll use boto3 resource and boto3 client to list the contents and also use the filtering methods to list specific file types and list files from the specific directory of the S3 Bucket.
Installing Boto3
If you’ve not installed boto3 yet, you can install it by using the below snippet.
You can use the % symbol before pip to install packages directly from the Jupyter notebook instead of launching the Anaconda Prompt.
Code
%pip install boto3
Boto3 will be installed successfully.
Now, you can use it to access AWS resources.
Using Boto3 Resource
Boto3 resource is a high-level object-oriented API that represents the AWS services. Follow the below steps to list the contents from the S3 Bucket using the Boto3 resource.
- Create Boto3 session using
boto3.session()
method passing the security credentials. - Create the
S3
resourcesession.resource('s3')
snippet - Create bucket object using the
resource.Bucket(<Bucket_name>)
method. - Invoke the objects.all() method from your bucket and iterate the returned collection to get the each object details and print each object name using thy attribute
key
.
In addition to listing objects present in the Bucket, it’ll also list the sub-directories and the objects inside the sub-directories.
Code
Use the following code to list objects of an S3 bucket.
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>')
#Then use the session to get the resource
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)
You’ll see the list of objects present in the Bucket as below in alphabetical order.
Output
csv_files/
csv_files/IRIS.csv
df.csv
dfdd.csv
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/
text_files/testfile.txt
This is how you can use the boto3 resource to List objects in S3 Bucket.
Using Boto3 Client
Boto3 client is a low-level AWS service class that provides methods to connect and access AWS services similar to the API service. Follow the below steps to list the contents from the S3 Bucket using the boto3 client.
- Create Boto3 session using
boto3.session()
method - Create the boto3 s3 client using the
boto3.client('s3')
method. - Invoke the list_objects_v2() method with the bucket name to list all the objects in the S3 bucket. It returns the dictionary object with the object details.
- Iterate the returned dictionary and display the object names using the
obj[key]
.
Similar to the Boto3 resource methods, the Boto3 client also returns the objects in the sub-directories.
Code
Use the following code to list objects of an S3 bucket.
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>')
objects = s3_client.list_objects_v2(Bucket='stackvidhya')
for obj in objects['Contents']:
print(obj['Key'])
You’ll see the objects in the S3 Bucket listed below.
Output
csv_files/
csv_files/IRIS.csv
df.csv
dfdd.csv
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/
text_files/testfile.txt
This is how you can list keys in the S3 Bucket using the boto3 client.
List Contents of A Specific Directory
In this section, you’ll learn how to list a subdirectory’s contents that are available in an S3 bucket. This will be useful when there are multiple subdirectories available in your S3 Bucket, and you need to know the contents of a specific directory.
- Use the filter() method in bucket objects to filter content from the specific subdirectory
- Use the
Prefix
attribute to denote the name of the subdirectory.
Filter()
and Prefix
will also be helpful when you want to select only a specific object from the S3 Bucket.
Snippet
Use the following code to select content from a specific directory called csv_files from the Bucket called stackvidhya.
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>')
#Then use the session to get the resource
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for objects in my_bucket.objects.filter(Prefix="csv_files/"):
print(objects.key)
You’ll see the list of objects present in the sub-directory csv_files in alphabetical order.
Output
csv_files/
csv_files/IRIS.csv
This is how you can list files in the folder or select objects from a specific directory of an S3 bucket.
List Specific File Types From a Bucket
In this section, you’ll learn how to list specific file types from an S3 bucket.
- First, select all objects from the Bucket and check if the object name ends with the particular type.
- If it ends with your desired type, list the object.
It’ll list the files of that specific type from the Bucket and including all subdirectories.
Code
Use the following code to list specific file types from an S3 bucket.
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>')
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for obj in my_bucket.objects.all():
if obj.key.endswith('txt'):
print(obj.key)
You’ll see all the text files available in the S3 Bucket in alphabetical order.
Output
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/testfile.txt
This is how you can list files of a specific type from an S3 bucket.
List Contents From A directory Using Regular Expression
Boto3 currently doesn’t support server-side filtering of the objects using regular expressions.
- Get all the files using the
objects.all()
method - Filter them using the regular expression in the
IF
condition.
To do an advanced pattern matching search, you can refer to the regex cheat sheet.
Code
For example, if you want to list files containing a number in their name, you can use the following code.
import re
import boto3
session = boto3.Session(
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>')
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
substring = "\d"
for obj in my_bucket.objects.all():
if re.search(substring, obj.key):
print(obj.key)
You’ll see the file names with numbers listed below.
Output
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
This is how you can list contents from a directory of an S3 bucket using the regular expression.