This tutorial covers prevalent approaches for reading binary files in Python with examples. Python utilizes the open() function, with the ‘with’ statement context manager, to read binary files efficiently. It provides options for both reading and writing files. When reading a binary file, Python processes the headers and parses each line accordingly. Additionally, Python gives us the flexibility to read files of various formats, including .bin (binary) files.
Meanwhile, understanding binary file formats is crucial as they are commonly used for data interchange between systems. Using binary files to extract and process data makes it easier to store and transmit complex data structures in a compressed and efficient manner. In some scenarios, reading binary (.bin) files can perform better than reading text (.txt) files. Below, we will explore the steps and methods to read binary files using Python, providing you with the necessary knowledge and skills for handling binary data effectively.
Introduction To a binary file
A binary file is composed of 1s and 0s, represented in various formats such as base64, raw, or image. Unlike standard text files that contain letters and numbers, binary files consist solely of binary data. Due to their compact representation, binary files offer faster reading, efficient and speedy code execution, making them ideal for handling large datasets.
Opening The Binary File
To read binary (.bin) files in Python, you can use the built-in function open() with the ‘with’ statement as a context manager. Here are the steps to read data from binary files:
- Use the ‘with’ statement in conjunction with the open() function to open the .bin file.
- Open the .bin file in ‘rb’ mode, which stands for binary reading.
- Utilize the open() function to access the binary file.
Lets see an example code.
file = open("file_name.bin", "rb")
or
with open("file_name.bin", "rb") as file:
#rest of your code
The open function has two parameters, the first one is the name of the file you want to open, and the second one is the mode in which you want to open the file. “rb” means read binary mode. The “file” variable contains the object used to access the binary file.
Getting Started With Reading binary files
Python gives us the flexibility to read various file formats, including binary files (.BIN). However, reading a binary (.BIN) file is pretty straightforward. For this, append the ‘b’ character to the ‘r’ mode when opening the file. To read a binary file, you set the mode of the file to ‘rb’ like this: mode='rb'
. The ‘r’ character indicates that the file is in reading mode, and the ‘b’ character indicates that the file is a binary file to be read. This combination allows Python to handle binary data correctly during reading operations.
Here are the various characters used to specify different file modes when working with files in Python:
- r’: Used to read data from a file.
- ‘w’: Used to write data into a file. If the file exists, it will be truncated. If it doesn’t exist, a new file will be created.
- ‘x’: Used to create a new file. If the file already exists, an error will be raised.
- ‘a’: Used to append data to an existing file. If the file doesn’t exist, a new one will be created.
- ‘b’: Used to open a binary file, where data is read or written in binary format.
- ‘t’: This is the default mode when opening a file. It is used to open a text file and read or write text data.
- ‘+’: Used to open a disk file for both reading and writing (updating).
The syntax to read the binary file in Python
To read .bin (binary) files in Python, the built-in open() function helps us do this task in Python. To access a .bin file, use the ‘with’ statement as a context manager associated with the ‘open()’ function. Open the .bin file in mode ‘rb’ for binary reading.
with open('webfont.bin', 'rb') as binfile:
data = binfile.readlines()
print(data)
Where:
- The ‘webfont.bin’ is the binary file that contains information about fonts.
- ‘rb’ is to read the binary file
- An array of lines is returned by readlines(). Within its parentheses () define the number of lines you want to parse. Upon exceeding the control number, all lines will be stopped reading.
Here is how the procedure executes to read a binary file.
with open('webfont.bin', 'rb') as binfile:
data = binfile.read(2)
print(data)
b'\x00\x07'
However, to write the binary file, use the ‘wb’ character instead of the ‘w’ character with open() function.
Using The Read() Function To Read A Binary File In Python
The read()
function allows us to read a specified number of bytes from the binary file. If you don’t specify the number of bytes, it will read the entire file.
with open('file.bin', 'rb') as file:
data = file.read() # Read the entire file
The binary file in the below example is the Python.bin file containing data, ‘a b c’.
Or you can try this one if reading a particular bytes from the file. The following example reads two bytes from the file and stores them in the ‘data‘ variable. However, the output will be the first character and white space in bytes literal.
with open('Python.bin', 'rb') as file:
data = file.read(2) # Read one byte/8 bits of data from the binary file
b'a '
Readline() Function To Read Binary Files In Python
The readline()
method reads a single line from the file as a sequence of bytes, including the newline character (\n
).
with open('file.bin', 'rb') as file:
line = file.readline() # Read a single line
The next time you read the file using the readline() function in the same code, it will start from the following line in your file. This means that you can read each line of your file iteratively until it reaches the end of the file.
The file contains no newline character; the for loop will read six bytes (‘a b c’) and then terminate. The output will be the binary form of the ASCII character 'a b c'
:
with open('Python.bin', 'rb') as file:
for line in file.readline():
print(line)
#code to decode data file
with open('Python.bin', 'rb') as file:
data = file.read()
print('\n file data \n', data.decode())
97 32 98 32 99 10 file data a b c
Readlines() Function To Extract Data From Binary Files In Python
The readlines()
the method reads all lines from the file and returns them as a list of bytes, where each line is an element of the list.
with open('file.bin', 'rb') as file:
lines = file.readlines() # Read all lines
print(lines[0]) #print the first line in the file
You can also read a specific number of lines from your file using the readlines() function by simultaneously passing the number of lines to reads.
The binary file in the below example is the Python.bin file containing data, ‘a b c’.
The file.readlines(1) will read one byte from the file and store it in the ‘lines’ variable. The output will be a list containing one element: a byte literal (b’).
with open('Python.bin', 'rb') as file:
lines = file.readlines(1) # Read only one line
b' a'
Iterating Over The File Object to read .bin files in chunks
You can also iterate over the file object to read the file line by line or in chunks. However, Parsing the file in binary mode (‘rb’) and using a for loop to iterate over the file.
The binary file in the below example is the Python.bin file containing data, ‘a b c’.
And the ‘Python.bin’ file contains only a single character, ‘a’. In this case, the file contains just one byte of data. So, for…in structure, it will interpret the single byte as a line and print it.
with open('Python.bin', 'rb') as file:
for line in file:
print(line) #print each line
b' a'
Seek() Function to parse binary Files in Python
The seek()
function allows you to jump the file pointer to a particular position within the file. This can be useful when reading or skipping specific file parts.
However, the binary file in the below example is the webfont.bin file containing binary data, 0 1 01 0011.
The file pointer is moved to position 5 using the file.seek(5)
, and data is read from the current position until the end of the binary file. Meanwhile, after executing the file.seek(5), the file pointer moves to position 5 and reads the character ‘1 0011”. The output of data will be b'1 0011\n'.
#the webfont.bin file contains binary data, 0 1 01 0011.
with open('webfont.bin', 'rb') as file:
file.seek(5) # Move the file pointer to position 5
data = file.read() # Read from the current position
print(data)
b'1 0011\n'
Readinto() Function For Reading .bin files in Python
The readinto()
method reads data from the file directly into a pre-allocated buffer, such as a byte array or a ctypes
buffer. This method is proper when you want to avoid creating a new object for each read operation, which can benefit performance.
The binary file in the below example is the webfont.bin file containing binary data, 0 1 01 0011.
So coming to the point, the file contains 11 bytes (‘0’, ‘ ‘, ‘1’, ‘ ‘, ‘0’, ‘1’, ‘ ‘, ‘0’, ‘0’, ‘1’, ‘1’) and one null terminator (‘\x00’) at the end. When you read this data using the file.readinto(buffer), it reads 11 bytes from the file. The null terminator is a byte indicate the end of a string, which reads into the buffer, totaling 12 bytes.
buffer = bytearray(1024) # Pre-allocate a buffer
with open('webfont.bin', 'rb') as file:
num_bytes_read = file.readinto(buffer) # Read data into the buffer
print(num_bytes_read)
12
Struct Module to read binary files
The struct
module provides functions for reading and unpacking binary data according to a specified format. This is useful when the binary file has a known structure, and you must extract specific data types, such as integers, floats, or custom data structures.
Applying struct.unpack to the first 4 bytes of a binary file named ‘file.bin’ interprets them into a signed 32-bit integer.
Why conversion of 4 bytes of data into 32 bits?
In general, computer architecture is composed of 8 bits. A 32-bit value is therefore calculated based on 4 bytes x 8 bits per byte.
The webfont.bin file contains binary data, 0 1 01 0011. The ASCII code for ‘0’ is 48, ‘1’ is 49, and ‘ ‘ (for white space) is 32.
import struct
with open('webfont.bin', 'rb') as file:
data = file.read(4) # Read 4 bytes
integer = struct.unpack('i', data) # Unpack as an integer
print(integer)
(540090416,)
It returns a single-element tuple. 540090416 reflects the interpretation of the first four bytes (‘0’, ‘1’, ”, ‘0’) as signed integers.
The Numpy Module to read binary files
To read binary files in Python, the numpy module provides efficient and convenient functions for reading and manipulating such data by numpy.fromfile() function. The following command reads binary (.bin) data from the file into a NumPy array using the built-in numpy.fromfile() function with the data type set to an 8-bit integer number within its scope. The numpy.fromfile() function allows you to read data from a binary file directly into a numpy
array.
Binary data translates as ASCII values of characters when handling data into the fromfile() function in Python.
The webfont.bin file contains binary data, 0 1 01 0011. The ASCII code for ‘0’ is 48, ‘1’ is 49, and ‘ ‘ (for white space) is 32.
import numpy as np
data = np.fromfile('webfont.bin', dtype=np.int8) # Read binary data into a numpy array
data
array([48, 32, 49, 32, 48, 49, 32, 48, 48, 49, 49], dtype=int8)
Reading a binary file to a byte in Python
Follow the following steps to read a binary file to a byte in Python.
- Pathlib – read binary file from directory using Pathlib module
- int.from_bytes() – Bytes represented in an array are returned as integers.
- struct – The format string is used to unpack the values of a tuple. However, calcsize(format) calculates the buffer’s size in bytes.
Here’s how it executes to read a binary file to a byte in Python:
#considering int from bytes 0-4 of the data:
i = int.from_bytes(data[:3], byteorder='little', signed=False)
print(i)
#To unpack multiple ints from the data using struct
import struct
ints = struct.unpack('ii', data[:8])
ints
4589312
(1866860288, 2000778350)
Reading a file in binary to an array
Write an array in a binary file using ‘wb’ writing mode instead of mode=’w.’ However, mode=’wb’ specifies the ‘w’ character to write and ‘b’ to write a binary file. Moreover, the bytearray() method returns byte arrays.
Don’t forget to close() the file
#write a file in binary file to an array
data=open("array.bin","wb")
x=[1,2,3,4]
arr=bytearray(x)
data.write(arr)
data.close()
#read a binary file to an array
data=open("array.bin","rb")
#read up to count 3
arr=data.read(3)
print (arr)
file.close()
b'\x01\x02\x03'
Read a binary file into a numpy array
First, to parse a binary file into a numpy array, create an array in a numpy environment. A text or binary file can be constructed into a numeric array using np.fromfile. However, To read binary data of a known type, there is a very efficient way of doing so. ‘tofile’ writes the data into the file. However, to read the data from the ‘tofile’ method, the fromfile() function uses.
Here is how it executes to read a binary file into a numpy array:
import numpy as np
arr = np.array([2,8,7]).tofile("arr.bin")
print(np.fromfile("arr.bin", dtype=np.int8))
[2 0 0 0 8 0 0 0 7 0 0 0]
data=open("arr.bin","rb")
#construct the data up to count 3 newlines
byte = data.read(3)
while byte:
print(byte)
byte = data.read(3) #control up to three bytes
b'\x02\x00\x00'
b'\x00\x08\x00'
b'\x00\x00\x07'
b'\x00\x00\x00'
Reading binary files chunk-by-chunk
Here os how to read binary file in chunks:
- Import hashlib module from library. As, hashlib allows us to read all bytes from the binary data file.
- Define a function using keyword ‘def’ and within its scope pass two arguments. One is the binary file and rest is buffer size. Here we are reading for eight kilobytes chunks.
- Within the define function scope, read a bytes with a hash object using dot(.) operator.
- Now read the binary file using ‘rb’ with open() function by its content ‘with’.
- Iterate over a while loop until last chunk reads, updated the hash object using the update()
- Lastly, we invoke the hexdigest() function with that hash object using dot(.) operator to retrieve the chunks from the binary file in hexadecimal characters.
#reading binary file by chunk-by-chunk
import hashlib
def chunks(filename, buffer_size=2**10*8):
file_hash = hashlib.sha256()
with open(filename, "rb") as binfile:
while chunk := binfile.read(buffer_size):
file_hash.update(chunk)
return file_hash.hexdigest()
chunks("webfont.bin", buffer_size=2**10*8)
'94c1966357ca4651d3d5e3bec3b78b7053f96c209122fbb9bab327b6baa35730'
Defining a function to read a binary file at byte-level
To work at a byte-level in Python define a function using the keyword ‘low_bytes’. However, the function will read binary content from the provided binary (.BIN) file.
In this example code, our interest is to read a binary file in bytes. For this purpose, imports hashlib module. The hashlib module in Python provides us flexibility to handle the bytes. hashlib module mostly executes low byte level. However, cause problems when handling large byte level. However, to read all the bytes in a binary file create a hash object using sha256() function. Afterwards, invoke hexdigest() function on the hash object using (.) dot operator to retrieve the string of hexa-decimal characters that depicts the checksum file of the binary data.
#define function to read a binary file
import hashlib
def low_bytes(data):
with open(data, mode="rb") as binfile:
return hashlib.sha256(binfile.read()).hexdigest()
low_bytes("webfont.bin")
'94c1966357ca4651d3d5e3bec3b78b7053f96c209122fbb9bab327b6baa35730'
Conclusion
n conclusion, reading binary files in Python is a crucial skill for efficiently handling and processing raw binary data. The binary file format is widely used for data interchange between different systems or applications due to its compact representation and compatibility. Python provides various methods to open and read binary files, making it flexible and convenient to access the data. We explored different approaches, such as using read(), readline(), readlines(), iterating over the file object, seek(), readinto(), struct module and numpy module. From reading individual bytes to iterating through the file content, each method caters to specific use cases, ensuring efficient data extraction and processing. By harnessing these techniques, developers can unlock the full potential of binary data, enabling innovation and driving robust solutions for data-intensive applications. For any queries, contact us.