UTF-8 String - Programming Questions - Arduino Forum
Maybe your like
Hello, I am doing some tests on Arduino+SM800L (GSM modem)
consider a table of utf-8 codes:
uint16_t P16[] = {0x0672,0x0631,0x062F,0x0648,0x064A,0x0646,0x0648,0x03B2};
It is possible to manipulate it as String object to benefit of things like .indexOf() or .substring() …
Thank you,
UKHeliBob January 8, 2021, 12:37pm 2What exactly are you trying to achieve ?
aoumnad January 8, 2021, 12:51pm 3I receive a SMS in UCS2 (HEX) format (Arabic) something like this "06720631062F0648064A06460648" i break it 4 by 4, i transform to numbers with strtoul(), i change the code to utf-8 , it is OK I can display the correct message in the correct language on The serial monitor Now I would like to find specific substrings in the message to perform specific tasks.
UKHeliBob January 8, 2021, 1:05pm 4If you insist on using Strings then why not put what you received in a String and use that ?
aoumnad January 8, 2021, 1:20pm 5the received string "06720631062F0648064A06460648" is formed by the unicode code points (2 bytes) in hex representation of the word ٲردوينو, to display on the serial monitor, it is necessary to transform to utf-8. To do that, i was obliged to cast the string to numbers. But now, i dont know how to cast it back to String
void setup() { Serial.begin(9600); char UCS2[] = "06720631062F0648064A06460648"; uint8_t n = strlen(UCS2); char S[5]; // 4 digits + \0 for(uint8_t i = 0; i < n ; i+=4){ strncpy(S, &UCS2[i], 4); uint16_t CP = strtoul(S,NULL,16); unicode2utf8(CP); //inversé [L H] Serial.write((byte*)&CP,2); // little Endian } } void loop() { } void unicode2utf8(uint16_t& U){ // pour points de codes 0 --> u+07FF if(U > 127){ uint8_t UL = (U & 0x003F)| 0B10000000; uint8_t UH = (U >> 6) | 0B11000000; U = (UL << 8) | UH ; //inversé } } UKHeliBob January 8, 2021, 1:28pm 6But now, i dont know how to cast it back to String
Copy it before you change it then you can use either format
aoumnad January 8, 2021, 1:36pm 7Thank you, I am going to do somme tests
johnwasser January 8, 2021, 2:48pm 8You could change unicode2utf8() to append the byte or two to a String passed as an argument.
void unicode2utf8(String &result; uint16_t U) { // pour points de codes 0 --> u+07FF if(U > 127) { char UL = (U & 0x003F) | 0B10000000; char UH = (U >> 6) | 0B11000000; result += UL; result += UH; //inversé } else result += (char) U; } aoumnad January 8, 2021, 4:29pm 9That it, (result += (char) U;) we have just to do it byte by byte maybe because (char) is byte
here is little code de show it
void setup() { Serial.begin(9600); // Extract utf-8 codes of String characters String STR1 = "ββδΨωββ"; // utf-8(β) = CE B2 uint16_t *P16 = (uint16_t*) STR1.c_str(); // better than .tocharArray() for (int i = 0; i < 7; i++){ Serial.println(P16[i],HEX); //--> B2CE,B2CE,B4CE,A8CE,89CF,B2CE,B2CE } // make a String from utf-8 codes uint16_t M16[]={0xCEB2,0xCEB2,0xCEB4, 0xCEA8,0xCF89,0xCEB2,0xCEB2}; String STR2 = ""; for (int i = 0; i < 7; i++){ STR2 += (char) highByte(M16[i]); STR2 += (char) lowByte(M16[i]); } Serial.println(STR1); // --> ββδΨωββ Serial.println(STR2); // --> ββδΨωββ Serial.println(STR2.indexOf("ω")); // --> 8 (packed 2 by 2) String STR3 = STR2.substring(4,10); Serial.println(STR3); // --> δΨω } void loop() { // put your main code here, to run repeatedly: }here is the output B2CE B2CE B4CE A8CE 89CF B2CE B2CE ββδΨωββ ββδΨωββ 8 δΨω
Thank you very much
1 Like system Closed May 8, 2021, 4:30pm 10This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.
Related topics
| Topic | Replies | Views | Activity |
|---|---|---|---|
| Processing UTF-8 code points in SafeString Showcase | 0 | 33 | September 4, 2025 |
| conversion UTF-8 <-> GSM Networking, Protocols, and Devices | 6 | 4948 | May 6, 2021 |
| receiving SMS in UTF MKR GSM 1400 | 2 | 2330 | May 7, 2021 |
| ASCII Char array in uint8 wandeln --> Hex to uint Deutsch | 25 | 165 | November 7, 2025 |
| Serial wont read UTF-8 Programming | 10 | 340 | August 30, 2025 |
Tag » Arduino.write(byte(x 'utf-8'))
-
int With UTF-8 Characters - Arduino Forum
-
Serial.write() - Arduino Reference
-
Sending A Value To Arduino UNO(pyserial), Then Transfer That To A ...
-
Serial Communication Between Python And Arduino
-
Is Arduino Not Recieving My Byte Sent - Programming Questions
-
[Solved] What Is The Character Encoding Of Arduino's Serial Messages?
-
Get Arduino To Accept Two Different Lengths (number Of Bytes)?
-
Comparing UTF-8 Chars - Programming Questions - Arduino Forum
-
Read/print Only One Byte From Utf8 Input? - Arduino Forum
-
Arduino Uno - Pyserial Communication - Interfacing W
-
BitWrite() - Arduino Reference
-
Slow Serial Write In Python Compared To Arduino IDE Serial Monitor
-
Arduino Behaves Strangely When Reading Utf8 Via Serial
-
Arduino - Send Commands With Serial Communication With Python