UTF-8 String - Programming Questions - Arduino Forum

UTF-8 String Projects Programming January 8, 2021, 12:34pm 1

Hello, I am doing some tests on Arduino+SM800L (GSM modem)

consider a table of utf-8 codes:

uint16_t P16[] = {0x0672,0x0631,0x062F,0x0648,0x064A,0x0646,0x0648,0x03B2};

It is possible to manipulate it as String object to benefit of things like .indexOf() or .substring() …

Thank you,

January 8, 2021, 12:37pm 2

What exactly are you trying to achieve ?

January 8, 2021, 12:51pm 3

I receive a SMS in UCS2 (HEX) format (Arabic) something like this "06720631062F0648064A06460648" i break it 4 by 4, i transform to numbers with strtoul(), i change the code to utf-8 , it is OK I can display the correct message in the correct language on The serial monitor Now I would like to find specific substrings in the message to perform specific tasks.

January 8, 2021, 1:05pm 4

If you insist on using Strings then why not put what you received in a String and use that ?

January 8, 2021, 1:20pm 5

the received string "06720631062F0648064A06460648" is formed by the unicode code points (2 bytes) in hex representation of the word ٲردوينو, to display on the serial monitor, it is necessary to transform to utf-8. To do that, i was obliged to cast the string to numbers. But now, i dont know how to cast it back to String

void setup() {     Serial.begin(9600);                char UCS2[] = "06720631062F0648064A06460648";        uint8_t n = strlen(UCS2);        char S[5];  // 4 digits + \0        for(uint8_t i = 0; i < n ; i+=4){         strncpy(S, &UCS2[i], 4);         uint16_t CP = strtoul(S,NULL,16);         unicode2utf8(CP);  //inversé [L H]         Serial.write((byte*)&CP,2);  // little Endian          } } void loop() { } void unicode2utf8(uint16_t& U){     // pour points de codes 0 --> u+07FF     if(U > 127){         uint8_t UL = (U & 0x003F)| 0B10000000;         uint8_t UH = (U >> 6)    | 0B11000000;         U = (UL << 8) | UH ;  //inversé     } } January 8, 2021, 1:28pm 6

But now, i dont know how to cast it back to String

Copy it before you change it then you can use either format

January 8, 2021, 1:36pm 7

Thank you, I am going to do somme tests

January 8, 2021, 2:48pm 8

You could change unicode2utf8() to append the byte or two to a String passed as an argument.

void unicode2utf8(String &result; uint16_t U) {   // pour points de codes 0 --> u+07FF   if(U > 127)   {     char UL = (U & 0x003F) | 0B10000000;     char UH = (U >> 6) | 0B11000000;     result += UL;     result += UH;  //inversé   }   else     result += (char) U; } January 8, 2021, 4:29pm 9

That it, (result += (char) U;) we have just to do it byte by byte maybe because (char) is byte

here is little code de show it

void setup() {     Serial.begin(9600);         // Extract utf-8 codes of String characters     String STR1 = "ββδΨωββ";  // utf-8(β) = CE B2        uint16_t *P16 = (uint16_t*) STR1.c_str();  // better than .tocharArray()     for (int i = 0; i < 7; i++){         Serial.println(P16[i],HEX); //--> B2CE,B2CE,B4CE,A8CE,89CF,B2CE,B2CE     }     // make a String from utf-8 codes     uint16_t M16[]={0xCEB2,0xCEB2,0xCEB4, 0xCEA8,0xCF89,0xCEB2,0xCEB2};     String STR2 = "";     for (int i = 0; i < 7; i++){                STR2 += (char) highByte(M16[i]);         STR2 += (char) lowByte(M16[i]);     }     Serial.println(STR1); // --> ββδΨωββ     Serial.println(STR2); // --> ββδΨωββ     Serial.println(STR2.indexOf("ω")); // --> 8 (packed 2 by 2)     String STR3 = STR2.substring(4,10);     Serial.println(STR3); // --> δΨω } void loop() {     // put your main code here, to run repeatedly: }

here is the output B2CE B2CE B4CE A8CE 89CF B2CE B2CE ββδΨωββ ββδΨωββ 8 δΨω

Thank you very much

1 Like May 8, 2021, 4:30pm 10

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.

Topic Replies Views Activity
Processing UTF-8 code points in SafeString Showcase 0 33 September 4, 2025
conversion UTF-8 <-> GSM Networking, Protocols, and Devices 6 4948 May 6, 2021
receiving SMS in UTF MKR GSM 1400 2 2330 May 7, 2021
ASCII Char array in uint8 wandeln --> Hex to uint Deutsch 25 165 November 7, 2025
Serial wont read UTF-8 Programming 10 340 August 30, 2025
Unfortunately, your browser is unsupported. Please switch to a supported browser to view rich content, log in and reply.

Tag » Arduino.write(byte(x 'utf-8'))