How do I handle file encoding and binary data in Python?
· Category: Python Programming
Short answer
Python 3 clearly separates text (str) from binary data (bytes). Always specify an encoding when reading or writing text. Use bytes and bytearray for binary data, and encode/decode explicitly at boundaries.
Steps
- Open text files with an explicit encoding, usually
utf-8. - Use
.encode()to convertstrtobytes. - Use
.decode()to convertbytestostr.
text = "Hello, €"
encoded = text.encode("utf-8")
print(encoded) # b'Hello, â¬'
print(encoded.decode("utf-8")) # Hello, €
# Binary file
with open("image.png", "rb") as f:
header = f.read(8)
print(header)
Tips
- UTF-8 is the safest default for most modern applications.
- Use
errors="replace"orerrors="ignore"when reading files with unknown or mixed encodings. bytearrayis a mutable version ofbytesuseful for in-place modifications.- Use
chardetorcharset-normalizerlibraries to guess file encodings when they are unknown.
Common issues
- Omitting the encoding parameter causes Python to use the platform default, which varies by OS.
UnicodeDecodeErrormeans the file is not valid in the specified encoding; try a different encoding or error handler.- Concatenating
strandbytesdirectly raises aTypeError.