Monday, August 24, 2020

Python 3: solving 'utf-8' codec can't decode byte 0x92

In Python 3, UTF-8 is the default encoding. In windows you may run into issues where characters will not decode such as 'utf-8' codec can't decode byte 0x92. Use a header with either cp1252 or windows-1252


# coding=CP1252
import re

text = """
    Alack the day. She’s dead, she’s dead, she’s dead!
    Ha! Let me see her. Out, alas! She’s cold.
    Her blood is settled, and her joints are stiff.
    Life and these lips have long been separated.
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
    O lamentable day!

