How do I handle file encoding and binary data in Python?

· Category: Python Programming

Short answer

Python 3 clearly separates text (str) from binary data (bytes). Always specify an encoding when reading or writing text. Use bytes and bytearray for binary data, and encode/decode explicitly at boundaries.

Steps

  1. Open text files with an explicit encoding, usually utf-8.
  2. Use .encode() to convert str to bytes.
  3. Use .decode() to convert bytes to str.
text = "Hello, €"
encoded = text.encode("utf-8")
print(encoded)               # b'Hello, €'
print(encoded.decode("utf-8"))  # Hello, €

# Binary file
with open("image.png", "rb") as f:
    header = f.read(8)
    print(header)

Tips

  • UTF-8 is the safest default for most modern applications.
  • Use errors="replace" or errors="ignore" when reading files with unknown or mixed encodings.
  • bytearray is a mutable version of bytes useful for in-place modifications.
  • Use chardet or charset-normalizer libraries to guess file encodings when they are unknown.

Common issues

  • Omitting the encoding parameter causes Python to use the platform default, which varies by OS.
  • UnicodeDecodeError means the file is not valid in the specified encoding; try a different encoding or error handler.
  • Concatenating str and bytes directly raises a TypeError.