How to List Contents of S3 Bucket Using Boto3 Python?

S3 is a storage service from AWS. You can store any files such as CSV files or text files. You may need to retrieve the list of files to make some file operations. You’ll learn how to list contents of S3 bucket in this tutorial.

You can list the contents of the S3 Bucket by iterating the dictionary returned from my_bucket.objects.all() method.

If you’re in Hurry

You can use the below code snippet to list the contents of the S3 Bucket using boto3.

Snippet

import boto3

session = boto3.Session( 
         aws_access_key_id='<your_access_key_id>', 
         aws_secret_access_key='<your_secret_access_key>')


#Then use the session to get the resource
s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Output

    csv_files/
    csv_files/IRIS.csv
    df.csv
    dfdd.csv
    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/
    text_files/testfile.txt

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods to list contents from an S3 bucket using boto3.

You’ll use boto3 resource and boto3 client to list the contents and also use the filtering methods to list specific file types and list files from the specific directory of the S3 Bucket.

Installing Boto3

If you’ve not installed boto3 yet, you can install it by using the below snippet.

You can use the % symbol before pip to install packages directly from the Jupyter notebook instead of launching the Anaconda Prompt.

Code

%pip install boto3

Boto3 will be installed successfully.

Now, you can use it to access AWS resources.

Using Boto3 Resource

Boto3 resource is a high-level object-oriented API that represents the AWS services. Follow the below steps to list the contents from the S3 Bucket using the Boto3 resource.

  1. Create Boto3 session using boto3.session() method passing the security credentials.
  2. Create the S3 resource session.resource('s3') snippet
  3. Create bucket object using the resource.Bucket(<Bucket_name>) method.
  4. Invoke the objects.all() method from your bucket and iterate the returned collection to get the each object details and print each object name using thy attribute key.

In addition to listing objects present in the Bucket, it’ll also list the sub-directories and the objects inside the sub-directories.

Code

Use the following code to list objects of an S3 bucket.

import boto3

session = boto3.Session( 
         aws_access_key_id='<your_access_key_id>', 
         aws_secret_access_key='<your_secret_access_key>')

#Then use the session to get the resource
s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

You’ll see the list of objects present in the Bucket as below in alphabetical order.

Output

    csv_files/
    csv_files/IRIS.csv
    df.csv
    dfdd.csv
    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/
    text_files/testfile.txt

This is how you can use the boto3 resource to List objects in S3 Bucket.

Using Boto3 Client

Boto3 client is a low-level AWS service class that provides methods to connect and access AWS services similar to the API service. Follow the below steps to list the contents from the S3 Bucket using the boto3 client.

  1. Create Boto3 session using boto3.session() method
  2. Create the boto3 s3 client using the boto3.client('s3') method.
  3. Invoke the list_objects_v2() method with the bucket name to list all the objects in the S3 bucket. It returns the dictionary object with the object details.
  4. Iterate the returned dictionary and display the object names using the obj[key].

Similar to the Boto3 resource methods, the Boto3 client also returns the objects in the sub-directories.

Code

Use the following code to list objects of an S3 bucket.

import boto3

session = boto3.Session( 
         aws_access_key_id='<your_access_key_id>', 
         aws_secret_access_key='<your_secret_access_key>')

objects = s3_client.list_objects_v2(Bucket='stackvidhya')

for obj in objects['Contents']:
    print(obj['Key'])

You’ll see the objects in the S3 Bucket listed below.

Output

    csv_files/
    csv_files/IRIS.csv
    df.csv
    dfdd.csv
    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/
    text_files/testfile.txt

This is how you can list keys in the S3 Bucket using the boto3 client.

List Contents of A Specific Directory

In this section, you’ll learn how to list a subdirectory’s contents that are available in an S3 bucket. This will be useful when there are multiple subdirectories available in your S3 Bucket, and you need to know the contents of a specific directory.

  • Use the filter() method in bucket objects to filter content from the specific subdirectory
  • Use the Prefix attribute to denote the name of the subdirectory.

Filter() and Prefix will also be helpful when you want to select only a specific object from the S3 Bucket.

Snippet

Use the following code to select content from a specific directory called csv_files from the Bucket called stackvidhya.

import boto3

session = boto3.Session( 
         aws_access_key_id='<your_access_key_id>', 
         aws_secret_access_key='<your_secret_access_key>')

#Then use the session to get the resource
s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for objects in my_bucket.objects.filter(Prefix="csv_files/"):
    print(objects.key)

You’ll see the list of objects present in the sub-directory csv_files in alphabetical order.

Output

    csv_files/
    csv_files/IRIS.csv

This is how you can list files in the folder or select objects from a specific directory of an S3 bucket.

List Specific File Types From a Bucket

In this section, you’ll learn how to list specific file types from an S3 bucket.

  • First, select all objects from the Bucket and check if the object name ends with the particular type.
  • If it ends with your desired type, list the object.

It’ll list the files of that specific type from the Bucket and including all subdirectories.

Code

Use the following code to list specific file types from an S3 bucket.

import boto3

session = boto3.Session( 
         aws_access_key_id='<your_access_key_id>', 
         aws_secret_access_key='<your_secret_access_key>')

s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for obj in my_bucket.objects.all():
    if obj.key.endswith('txt'):
        print(obj.key)

You’ll see all the text files available in the S3 Bucket in alphabetical order.

Output

    file2_uploaded_by_boto3.txt
    file3_uploaded_by_boto3.txt
    file_uploaded_by_boto3.txt
    filename_by_client_put_object.txt
    text_files/testfile.txt

This is how you can list files of a specific type from an S3 bucket.

List Contents From A directory Using Regular Expression

Boto3 currently doesn’t support server-side filtering of the objects using regular expressions.

  • Get all the files using the objects.all() method
  • Filter them using the regular expression in the IF condition.

To do an advanced pattern matching search, you can refer to the regex cheat sheet.

Code

For example, if you want to list files containing a number in their name, you can use the following code.

import re 
import boto3

session = boto3.Session( 
         aws_access_key_id='<your_access_key_id>', 
         aws_secret_access_key='<your_secret_access_key>')

s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

substring =  "\d"

for obj in my_bucket.objects.all():
    if re.search(substring,  obj.key):  
        print(obj.key)

You’ll see the file names with numbers listed below.

Output

file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt

This is how you can list contents from a directory of an S3 bucket using the regular expression.

Additional Resources

Leave a Comment