Python Read Binary File – Detailed Guide

Binary files are files that are not normal text files. Example: An Image File. These files are also stored as a sequence of bytes in the computer hard disk. These types of binary files cannot be opened in the normal mode and read as text.

You can read binary file by opening the file in binary mode using the open('filename', 'rb').

When working with the problems like image classification in Machine learning, you may need to open the file in binary mode and read the bytes to create ML models. In this situation, you can open the file in binary mode, and read the file as bytes. In this case, decoding of bytes to the relevant characters will not be attempted. On the other hand, when you open a normal file in the normal read mode, the bytes will be decoded to string or the other relevant characters based on the file encoding.

If You’re in Hurry…

You can open the file using open() method by passing b parameter to open it in binary mode and read the file bytes.

open('filename', "rb") opens the binary file in read mode.

r– To specify to open the file in reading mode
b – To specify it’s a binary file. No decoding of bytes to string attempt will be made.

Example

The below example reads the file one byte at a time and prints the byte.

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:
        byte = f.read(1)
        while byte:
            # Do stuff with byte.
            byte = f.read(1)
            print(byte)
except IOError:
     print('Error While Opening the file!')  

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn how to read binary files in different ways.

Read binary file byte by byte

In this section, you’ll learn how to read a binary file byte by byte and print it. This is one of the fastest ways to read the binary file.

The file is opened using the open() method and the mode is mentioned as “rb” which means opening the file in reading mode and denoting it’s a binary file. In this case, decoding of the bytes to string will not be made. It’ll just be read as bytes.

The below example shows how the file is read byte by byte using the file.read(1) method.

The parameter value 1 ensures one byte is read during each read() method call.

Example

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:
        byte = f.read(1)
        while byte:
            # Do stuff with byte.
            byte = f.read(1)
            print(byte)
except IOError:
     print('Error While Opening the file!')  

Output

    b'\xd8'
    b'\xff'
    b'\xe0'
    b'\x00'
    b'\x10'
    b'J'
    b'F'
    b'I'
    b'F'
    b'\x00'
    b'\x01'
    b'\x01'
    b'\x00'
    b'\x00'
    b'\x01'
    b'\x00'
    b'\x01'
    b'\x00'
    b'\x00'
    b'\xff'
    b'\xed'
    b'\x00'
    b'|'
    b'P'
    b'h'
    b'o'
    b't'
    b'o'
    b's'
    b'h'
    b'o'
    b'p'
    b' '
    b'3'
    b'.'
    b'0'
    b'\xc6'
    b'\xb3'
    b'\xff'
    b'\xd9'
    b''

Python Read Binary File into Byte Array

In this section, you’ll learn how to read the binary files into a byte array.

First, the file is opened in therb mode.

A byte array called mybytearray is initialized using the bytearray() method.

Then the file is read one byte at a time using f.read(1) and appended to the byte array using += operator. Each byte is appended to the bytearray.

At last, you can print the bytearray to display the bytes that are read.

Example

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:

        mybytearray = bytearray()

        # Do stuff with byte.
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)
        mybytearray+=f.read(1)

        print(mybytearray)

except IOError:
    print('Error While Opening the file!')    

Output

    bytearray(b'\xff\xd8\xff\xe0\x00\x10')

Python read binary file into numpy array

In this section, you’ll learn how to read the binary file into a NumPy array.

First, import numpy as np to import the numpy library.

Then specify the datatype as bytes for the np object using np.dtype('B')

Next, open the binary file in reading mode.

Now, create the NumPy array using the fromfile() method using the np object.

Parameters are the file object and the datatype initialized as bytes. This will create a NumPy array of bytes.

numpy_data = np.fromfile(f,dtype)

Example

import numpy as np

dtype = np.dtype('B')
try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:
        numpy_data = np.fromfile(f,dtype)
    print(numpy_data)
except IOError:
    print('Error While Opening the file!')    

Output

[255 216 255 ... 179 255 217]


The bytes are read into the numpy array and the bytes are printed.

Read binary file Line by Line

In this section, you’ll learn how to read binary file line by line.

You can read the file line by line using the readlines() method available in the file object.

Each line will be stored as an item in the list. This list can be iterated to access each line of the file.

rstrip() method is used to remove the spaces in the beginning and end of the lines while printing the lines.

Example

f = open("c:\temp\Binary_File.jpg",'rb')

lines = f.readlines()

for line in lines:

    print(line.rstrip())

Output

    b'\x07\x07\x07\x07'
    b''
    b''
    b''
    b''
    b''
    b'\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06'
    b'\x06\x06'

Read Binary File Fully in One Shot

In this section, you’ll learn how to read binary file in one shot.

You can do this by passing -1 to the file.read() method. This will read the binary file fully in one shot as shown below.

Example

try:
    f = open("c:\temp\Binary_File.jpg", 'rb')
    while True:
        binarycontent = f.read(-1)  
        if not binarycontent:
            break
        print(binarycontent)
except IOError:
    print('Error While Opening the file!')

Output

 b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xed\x00|Photoshop 3.0\x008BIM\x04\x04\x00\x00\x00\x00\x00\x1c\x02(\x00ZFBMD2300096c010000fe0e000032160000051b00003d2b000055300000d6360000bb3c0000ce4100008b490000\x00\xff\xdb\x00C\x00\x03\x03\x03\x03\x03\x03\x05\x03\x03\x05\x07\x05\x05\x05\x07\n\x07\x07\x07\x07\n\x0c\n\n\n\n\n\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06\n\x06\x06\n\x18\x11\x0e\x11\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18\x18

Python Read Binary File and Convert to Ascii

In this section, you’ll learn how to read a binary file and convert to ASCII using the binascii library. This will convert all the bytes into ASCII characters.

Read the file as binary as explained in the previous section.

Next, use the method binascii.b2a_uu(bytes). This will convert the bytes into ascii and return an ascii value.

Then you can print this to check the ascii characters.

Example

import binascii

try:
    with open("c:\temp\Binary_File.jpg", "rb") as f:

        mybytes = f.read(45)

        data_bytes2ascii = binascii.b2a_uu(mybytes)

        print("Binary String to Ascii")

        print(data_bytes2ascii)

except IOError:

    print("Error While opening the file!")

Output

 Binary String to Ascii
 b'M_]C_X  02D9)[email protected] ! 0   0 !  #_[0!\\4&AO=&]S:&]P(#,N,  X0DE-! 0 \n'

Read binary file into dataframe

In this section, you’ll learn how to read the binary file into pandas dataframe.

First, you need to read the binary file into a numpy array. Because there is no method available to read the binary file to dataframe directly.

Once you have the numpy array, then you can create a dataframe with the numpy array.

Pass the NumPy array data into the pd.DataFrame(). Then you’ll have the dataframe with the bytes read from the binary file.

Example

import numpy as np

import pandas as pd

# Create a dtype with the binary data format and the desired column names
try:

    dt = np.dtype('B')

    data = np.fromfile("c:\temp\Binary_File.jpg", dtype=dt)

    df = pd.DataFrame(data)

    print(df)

except IOError:

    print("Error while opening the file!")

Output

             0
    0      255
    1      216
    2      255
    3      224
    4        0
    ...    ...
    18822    0
    18823  198
    18824  179
    18825  255
    18826  217

    [18827 rows x 1 columns]

This is how you can read a binary file using NumPy and use that NumPy array to create the pandas dataframe.

With the NumPy array, you can also read the bytes into the dictionary.

Read binary file skip header

In this section, you’ll learn how to read binary file, skipping the header line in the binary file. Some binary files will be having the ASCII header in them.

This skip header method can be useful when reading the binary files with the ASCII headers.

You can use the readlines() method available in the File object and specify [1:] as an additional parameter. This means the line from index 1 will be read.

The ASCII header line 0 will be ignored.

Example

f = open("c:\temp\Binary_File.jpg",'rb')

lines = f.readlines()[1:]
for line in lines:
    print(line.rstrip())

Output

    b'\x07\x07\x07\x07'
    b''
    b''
    b''
    b''
    b''
    b'\x0c\x0f\x0c\x0c\x0c\x0c\x0c\x0c\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x12\x12\x12\x12\x12\x12\x15\x15\x15\x15\x15\x17\x17\x17\x17\x17\x17\x17\x17\x17\x17\xff\xdb\x00C\x01\x04\x04\x04\x06\x06\x06'
    b'\x06\x06'

    b"\x93\x80\x18\x98\xc9\xdc\x8bm\x90&'\xc5U\xb18\x81\xc7y\xf0\x80\x00\x14\x1c\xceQd\x83\x13\xa0\xbf-D9\xe0\xae;\x8f\\LK\xb8\xc3\x8ae\xd4\xd1C\x10\x7f\x02\x02\xa6\x822K&D\x9a\x04\xd4\xc8\xfbC\x87\xf2\x8d\xdcN\xdes)rq\xbbI\x92\xb6\xeeu8\x1d\xfdG\xabv\xe8q\xa5\xb6\xb56\xe0\xa1\x06\x84n#\xf0\x1c\x86\xb0\x83\xee\x99\xe7\xc6\xaaN\xafY\xdf\xd9\xcfe\xd5\x84"

    b'\xd9\x0b\xc2\x1b0\xa1Q\x17\x88\xb4et\x81u8\xed\xf5\xe8\xd9#c\t\xf9\xc0\xa7\x06\xa2/={\x87l\x01K\x870\xe3\xa1\x024\xdc^\x11\x96\x96\xba\[email protected]\x91A\xd6U\xea\xe1\xbb\xb733'

Readind Binary file using Pickle

In this section, you’ll learn how to read binary files in python using the Pickle.

This is really tricky as all the types of binary files cannot be read in this mode. You may face problems while pickling a binary file. As invalid load key errors may occur.

Hence it’s not recommended to use this method.

Example

import pickle


file_to_read = open("c:\temp\Binary_File.jpg", "rb")

loaded_dictionary = pickle.load(file_to_read)

print(loaded_dictionary)

Output

    ---------------------------------------------------------------------------

    UnpicklingError                           Traceback (most recent call last)

    <ipython-input-23-dea0d83e3f49> in <module>
          7 file_to_read = open("E:\Vikram_Blogging\Stack_Vidhya\Python_Notebooks\Read_Binary_File_Python\Binary_File.jpg", "rb")
          8 
    ----> 9 loaded_dictionary = pickle.load(file_to_read)
         10 
         11 print(loaded_dictionary)


    UnpicklingError: invalid load key, '\xff'.

Conclusion

Reading a binary file is an important functionality. For example, reading the bytes of an image file is very useful when you are working with image classification problems. In this case, you can read the image file as binary and read the bytes to create the model.

In this tutorial, you’ve learned the different methods available to read binary files in python and the different libraries available in it.

If you have any questions, feel free to comment below.

Leave a Comment