How To Enable UTF-8 In Python ? - Gankrin

| Gankrin |
  • Home
  • Blogs
  • About
  • Contact

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

docker kafka Node.js JavaScript Kubernetes Linux Previous Next How to Enable UTF-8 in Python ?

In this post , we will see - How to Enable UTF-8 in Python.

  • In Python 3 UTF-8 is the default source encoding
  • When the encoding is not correctly set-up , it is commonly seen to throw an ""UnicodeDecodeError: 'ascii' codec can't encode" error
  • Python string function uses the default character encoding .
    • Check sys.stdout.encoding value - sometimes it is set to "None".
    • The encoding default can be located in - /etc/default/locale
    • The default is defined by the variables LANG, LC_ALL, LC_CTYPE
    • Check the values set against these variables.
      • For example - If the default is UTF-8 , these would be LANG="UTF-8" , LC_ALL="UTF-8" , LC_CTYPE="UTF-8"
  • A Standard option is to use "UTF-8" as a encode option which more or less works fine.
  • Verify if the text editor encodes properly your code in UTF-8. Else there would be invisible characters which are not interpreted as UTF-8.
  Let's see the the options to set the UTF-8 Encoding (If you are using Python 3, UTF-8 is the default source encoding)  

  • Set the Python encoding to UTF-8. This will ensure the fix for the current session .
$ export PYTHONIOENCODING=utf8    

  • Set the environment variables in /etc/default/locale .  This way the system`s default locale encoding is set to the UTF-8 format.
LANG="UTF-8" or "en\_US.UTF-8" LC\_ALL="UTF-8" or "en\_US.UTF-8" LC\_CTYPE="UTF-8" or "en\_US.UTF-8" Or use command line export LC\_ALL="UTF-8" export LC\_ALL="UTF-8" export LC\_CTYPE="UTF-8"    

  • You can Set the encoding in the code also.
a = <STRING\_WITH\_UNICODE\_CHARACTER> b = str1.encode('utf-8') print (a.encode('utf-8')) print (b) a = <STRING\_WITH\_UNICODE\_CHARACTER> b = str1.encode('utf-8', 'ignore').decode('utf-8') print (b)    

  • Set the encoding at Script Level
\# encoding=utf8 from \_\_future\_\_ import unicode\_literals import sys reload(sys) sys.setdefaultencoding('utf8')  

  • Using locale
import os import locale os.environ\["PYTHONIOENCODING"\] = "utf-8" thisLocale=locale.setlocale(category=locale.LC\_ALL, locale="en\_GB.UTF-8")  

  • When you use IDLE (Python 2) and the file contains non-ASCII characters , then it will prompt you to add an encoding declaration, using the Emacs -*- style. This basically tells the text editor what codec to use.
#!/usr/bin/env python # -\*- coding: utf-8 -\*- #!/usr/bin/env python # coding: utf8  

  • If you encode with ascii an decide to throw out the unicode characters ,use the below option . In this example , unicode characters will be dropped from varB.
varb = str1.encode('ascii', 'ignore').decode('ascii') print (varB)  

Additional points :

  • UTF-8 properties -
    • Can handle any Unicode code point.
    • A string of ASCII text is also valid UTF-8 text.
    • UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded.
  Hope this helps.  

Other Interesting Reads -

  • How to log an error in Python ?

  • How to Code Custom Exception Handling in Python ?

  • How to Handle Errors and Exceptions in Python ?

  • How to Handle Bad or Corrupt records in Apache Spark ?

 

Does Python use UTF 8,  How do I encode utf8 in Python, How do I change encoding in Python, What is Character Set in Python, decode utf-8 python, convert string to unicode python 3, python utf-8 to ascii, python utf-8 header, python unicode to utf8, python encoding types, python unicode() function, python print utf-8, decode utf-8 python, convert string to unicode python 3, python utf-8 to ascii, python utf-8 header, python unicode to utf8, python encoding types, python unicode() function, python print utf-8, decode utf-8 python, python utf-8 header, convert string to unicode python, # -\*- coding: utf-8 -\*-, python utf-8 to ascii, python unicode() function, python unicode to utf8, python print utf-8,Does Python use UTF 8?,How do I encode utf8 in Python?,How do you encode a character in python?, How do I get Unicode in Python?, decode utf-8 python, python utf-8 header, convert string to unicode python, # -\*- coding: utf-8 -\*-, python utf-8 to ascii, python unicode() function, python unicode to utf8, python print utf-8, utf-8 in python, how to decode utf-8 in python, how to use utf-8 in python, how to convert string to utf-8 in python, how to encode a string to utf-8 in python, how to decode utf-8 in python 3,how to convert ascii to utf-8 in python, how to convert a file to utf-8 in python, how to convert iso-8859-1 to utf-8 in python, how to convert ansi to utf-8 in python, utf-8 python ,utf-8 encoding ,utf-8 characters ,utf-8 meaning ,utf-8 vs utf-16 ,utf-8 decoder ,utf-8 vs ascii ,utf-8 converter ,utf-8 character set ,utf-8 table, decode utf-8 python ,encoding utf-8 python ,python utf-8 header ,convert string to unicode python 3 ,python string to unicode ,python unicode to utf8 ,python encoding types ,python print utf-8, set utf 8 python, set default encoding utf-8 python

Site Status Popular Articles
  • Apply Pod Security Standards To Kubernetes Cluster

  • Indentation Problem Fix in Python
  • Most Important Metrics To Monitor In Kafka
  • Data Skewness in Spark (Salting Method)
  • Unicode Encode Error in Python (Ascii Codec Encode)

Tag » Coding Utf-8 Python Header