C2 A0 -> NO-BREAK SPACE With Special Spaces In UTF-8 Encoding
Recently found a problem with field value data exception in database. Spaces are not allowed in this string field in business scenarios, but some data still has "spaces". After repeated validation, it is found that code written by you will indeed remove the space trim. After repeated debugging, there is no problem with modern code, but what makes these data escape businessVerification of code?
Ready to solve the caseAre the'spaces'that I see with my naked eye not the ones we usually see or understand?
With this question, I searched for related problems and found that if not, many people have encountered the invisible character C2 A0, so what is it exactly?
Open the encoding table for UTF-8, https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=dec Find the corresponding character
First of all, let's make sure what the encoding number C2 A0 represents. Obviously, we just need to convert this hexadecimal to decimal, C2=194 A0=160, which corresponds to
In general, the encoding of spaces is 32
So let's simulate these two characters through code
Normal space Unicode code point is U+0020 or 32
C2 A0 Space Unicode code point is U+00A0 or 160
Once we find out why, we try to get rid of this C2 A0 space
Source code see below
package com.lingyejun.dating.chap11; import java.nio.charset.StandardCharsets; import java.util.regex.Matcher; import java.util.regex.Pattern; public class SpecialSpace { public static void main(String[] args) { String str1 = "lingyejun "; byte[] str1Bytes = str1.getBytes(); String space = new String(str1Bytes, StandardCharsets.UTF_8); System.out.println("With 32 Space String of:" + space); System.out.println("Use trim Remove 32 -> Space:" + space.trim()); byte[] str2Bytes = new byte[11]; System.arraycopy(str1Bytes, 0, str2Bytes, 0, str1Bytes.length); str2Bytes[9] = (byte) 0xC2; str2Bytes[10] = (byte) 0xA0; String noBreakSpace = new String(str2Bytes, StandardCharsets.UTF_8); System.out.println("Have C2 A0 -> NO-BREAK SPACE String of:" + noBreakSpace); System.out.println("Use trim Unable to remove C2 A0 -> NO-BREAK SPACE:" + noBreakSpace.trim()); // 32 for the Spacespace we usually talk about - > Space byte[] bytes1 = new byte[]{(byte) 0x20}; String space1 = new String(bytes1, StandardCharsets.UTF_8); System.out.println("UTF-8 Character Encoding Number 32 -> 0x1F output:" + space1); // 0xC2=194 0xA0=160 -> NO-BREAK SPACE byte[] bytes2 = new byte[]{(byte) 0xC2, (byte) 0xA0}; String space2 = new String(bytes2, StandardCharsets.UTF_8); char[] chars3 = space2.toCharArray(); System.out.println("UTF-8 Character Encoding Number 194 -> 0xC2 160 -> 0xA0 output:" + space2); byte[] bytes3 = new byte[]{(byte) 0xC2, (byte) 0xA0}; String c2a0Space = new String(bytes3, StandardCharsets.UTF_8); Pattern p = Pattern.compile(c2a0Space); Matcher m = null; m = p.matcher(noBreakSpace); noBreakSpace = m.replaceAll(""); System.out.println("Use Regular Removal C2 A0 -> NO-BREAK SPACE:" + noBreakSpace); } }
If it helps you, please don't forget to give Ling Ye Jun some compliments.
Từ khóa » C2 A0 Vs 20
-
C2a0 And 20 String Comparision - Php - Stack Overflow
-
Thread: Url Encode %A0 Vs. %20 - Dynamic Drive
-
Non-breaking Space - Wikipedia
-
HTML URL Encoding Reference - W3Schools
-
Replace Non-breaking Space UTF-8 (C2 A0) | Notepad++ Community
-
HTML URL-encoding Reference
-
Unicode/UTF-8-character Table
-
Non-breaking Space C2 A0 (U+00A0) Causing Problems With CSS ...
-
Desktop Vs Mobile Vs Tablet%C2%A0 Market Share Cyprus
-
Desktop Vs Mobile Vs Tablet%C2%A0 Market Share Kuwait
-
HTML | URL Encoding - GeeksforGeeks