omnibrain 15 Posted November 28, 2022 Are there any (known) changes to the compiler regarding string handling from Delphi 10.2 (or 11.0)to 11.2? I don't have code, because I can't reproduce it yet, it's more of a feeling, that something is wrong. We do a lot of serial communication. Parts of the code are rather old, but survived man Delphi version changes. Some of the code may be pre-Unicode but got modernized before my time and thus before source control, so I can't check the history. We read the ansichars that come via the serial connection and put them into chars and them into strings and work from there. I think that's not ideal because strings are unicode (nowadays) but so far it worked fine. Recently we switched from Delphi 10.2 to 11.2 (with a short stint in 11.0, but I'm only 80% sure the error wasn't there). And now it only works like 99.9% of the time. (With our test systems everything works, but our customers have more "traffic). The error is, that we get symbols we can't explain in positions where they don't belong. It looks like chars get converted to other hex values. At the moment I'm just poking around, because the error is rare enough and we don't have a trace yet. But perhaps someone knows of a possible change to string/char handling with the most recent compiler versions. Share this post Link to post
programmerdelphi2k 237 Posted November 28, 2022 (edited) I dont know very well about this "transliteration" occurring,... but, do you tryed use "AnsiString" instead "String" type? https://docwiki.embarcadero.com/Libraries/Alexandria/en/System.AnsiStrings I could see, too, that you're almost a statistical timer... 🙂 Edited November 28, 2022 by programmerdelphi2k Share this post Link to post
omnibrain 15 Posted November 28, 2022 (edited) 18 minutes ago, programmerdelphi2k said: I dont know very well about this "transliteration" occurring,... "transliteration" is a good word. My gut feeling is, that we receive some byte value that translates to a char, that get's "transliteraded" to a unicode glyph and when we try to work with the byte value again we get the value of the unicode glyph. 18 minutes ago, programmerdelphi2k said: but, do you tryed use "AnsiString" instead "String" type? https://docwiki.embarcadero.com/Libraries/Alexandria/en/System.AnsiStrings Yes, I'm currently thinking about converting everything to ansichars and ansistrings, or even rawbytestrings, or possible TBytes (though we depend heavily on pos() for protocol parsing). But it may very well be, that I'm chasing ghosts. So if someone could chime up and say "yes, something really changed", that would give me confidence. 18 minutes ago, programmerdelphi2k said: I could see, too, that you're almost a statistical timer... 🙂 ? Edited November 28, 2022 by omnibrain Share this post Link to post
programmerdelphi2k 237 Posted November 28, 2022 "statistical timer"... your percents values is well defined!!! :))) 1 Share this post Link to post
programmerdelphi2k 237 Posted November 28, 2022 (edited) procedure TForm1.Button1Click(Sender: TObject); var MyAnsiString: AnsiString; MyText : string; begin MyAnsiString := { } '123' + { } chr(10) { appears, but not to eyes } + { } 'hello' + { } chr(0) { ... from forward will be losted!!! } + { } 'world' + { } chr(11200) + { } 'hi'; // MyText := ''; // for var C in MyAnsiString do MyText := MyText + ',"' + C + ' - Code: ' + Ord(C).ToString + '"'; // Memo1.Lines.DelimitedText := MyText.Remove(0, 1); end; Edited November 28, 2022 by programmerdelphi2k Share this post Link to post
David Heffernan 2345 Posted November 28, 2022 2 hours ago, programmerdelphi2k said: I dont know very well about this "transliteration" occurring,... but, do you tryed use "AnsiString" instead "String" type? https://docwiki.embarcadero.com/Libraries/Alexandria/en/System.AnsiStrings I could see, too, that you're almost a statistical timer... 🙂 Nobody has any idea what the actual problem is, but yeah, let's just randomly through some AnsiStrings around. This approach to coding doesn't work. The monkeys still haven't typed Shakespeare yet.. 1 Share this post Link to post
omnibrain 15 Posted November 28, 2022 1 hour ago, David Heffernan said: Nobody has any idea what the actual problem is, but yeah, let's just randomly through some AnsiStrings around. Yeah, me neither. That's why I ask in general terms. Like if someone asked if anything with pointers and DLLs has changed in the latest release. Then we would answer "of course, ASLR and HE-ASLR is enabled by default. Look into the linker options." I hope for something like that. I suspect the code has been broken since the unicode migration and now some (undefined?) behaviour in edge cases may have changed. I'm afraid I won't be able to avoid reworking it, into proper datatypes for stuff received via serial connection. (A proper mix between raw byte protocols, text protocols and protocols that mix both). The discussion in Best type for data buffer: TBytes, RawByteString, String, AnsiString, ... - Algorithms, Data Structures and Class Design - Delphi-PRAXiS [en] (delphipraxis.net) goes into a similar direction, thoguh starting at another angle. Share this post Link to post
programmerdelphi2k 237 Posted November 28, 2022 Apparently, we have a critic on duty pointing the finger towards infinity? Does the answer lie in your galaxy, in a hazy light-years of poor teratian mortals? I'm get out here... help help... the aliens are coming Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 13 hours ago, omnibrain said: Yeah, me neither. That's why I ask in general terms. Like if someone asked if anything with pointers and DLLs has changed in the latest release. Then we would answer "of course, ASLR and HE-ASLR is enabled by default. Look into the linker options." I hope for something like that. I was pushing back against the quoted response, which was deeply unhelpful. As for what you need to do, I doubt the issue is with the update. I'd look to debug your code. Share this post Link to post
BerndS 0 Posted November 29, 2022 chr(11200) What do you expect Delphi to make of chr(11200). Everything not between #0 to #255 then becomes a "?". Share this post Link to post
PeterBelow 238 Posted November 29, 2022 18 hours ago, omnibrain said: We read the ansichars that come via the serial connection and put them into chars and them into strings and work from there. I think that's not ideal because strings are unicode (nowadays) but so far it worked fine. That is the source of your problem because converting an AnsiChar to UnicodeChar ( = Char) is not a simple byte copy, it will convert characters not in the 7 bit ASCII set to Unicode codepoints that may have different ordinal values. This depends on the active ANSI codepage of the system as well. The only sensible solution is to treat bytes as bytes and not as characters, use TBytes or other suitable containers to store sequences of bytes instead of strings (ANSI or Unicode). Writing a Pos equivalent for TBytes is easy and there are likely dozens of implementations around, you just have to find one . 1 Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 28 minutes ago, BerndS said: What do you expect Delphi to make of chr(11200). That's going to be a 16 bit Char with ordinal value 11200. It's KHOJKI LETTER A. See https://unicode-table.com/en/11200/ Nothing to see here Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 (edited) 29 minutes ago, PeterBelow said: That is the source of your problem because converting an AnsiChar to UnicodeChar ( = Char) is not a simple byte copy, it will convert characters not in the 7 bit ASCII set to Unicode codepoints that may have different ordinal values. This depends on the active ANSI codepage of the system as well. It depends on the encoding of the 8 bit data. For all we know, that data could be ASCII. The problem is that the question doesn't have any actionable information, and the asked is just hoping for some silver bullet. Asker needs to get some real information rather than hope that people here can guess what's up. Edited November 29, 2022 by David Heffernan 1 Share this post Link to post
Vandrovnik 212 Posted November 29, 2022 Just now, David Heffernan said: That's going to be a 16 bit Char with ordinal value 11200. It's KHOJKI LETTER A. See https://unicode-table.com/en/11200/ Nothing to see here It is a black square: https://www.fileformat.info/info/unicode/char/2bc0/index.htm 11200 decimal, not 11200 hex. Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 1 minute ago, Vandrovnik said: It is a black square: https://www.fileformat.info/info/unicode/char/2bc0/index.htm 11200 decimal, not 11200 hex. Thanks for the correction. The main point stands, namely that Chr(11200) is perfectly valid. Share this post Link to post
Lajos Juhász 293 Posted November 29, 2022 8 minutes ago, David Heffernan said: Thanks for the correction. The main point stands, namely that Chr(11200) is perfectly valid. This is true for strings. However in the example it's used to assign the value to ansistring: var MyAnsiString: AnsiString; AnsiString is not a unicode string thus there is no chr(11200) most probably the code page of the ansistring will have no conversation for that unicode code point thus will be converted to ? Share this post Link to post
omnibrain 15 Posted November 29, 2022 I tried to condense the code: procedure tfr_com.dataavail(sender:TObject; Count:integer); var i : word; c : char; // serielles zeichen empfangen ac : ansichar; s : string; begin for i:=1 to count do if com.ReadChar(ac) then begin c:=char(ac); {$R-} showinchar(c); if assigned(receivecharproc) then receivecharproc(c); {$R+} end; end; //receivecharproc is procedure tfi_m.receivechar(ach:char); begin //state machine that works with the chars to parse the various protocols and adds them to a string typed variable for further processing end; ReadChar provides us ansichars, so I guess it's easiest just to stay with ansichars for further processing. But why did it work for 10 years and suddenly stopped? I can't rule out, that we see a new type of input we haven't seen before. But I still have no trace of what we actually receive... Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 10 minutes ago, omnibrain said: But why did it work for 10 years and suddenly stopped? I can't rule out, that we see a new type of input we haven't seen before. But I still have no trace of what we actually receive... What is the encoding of the input. Can you guarantee that it is ascii? Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 1 hour ago, Lajos Juhász said: This is true for strings. However in the example it's used to assign the value to ansistring: var MyAnsiString: AnsiString; AnsiString is not a unicode string thus there is no chr(11200) most probably the code page of the ansistring will have no conversation for that unicode code point thus will be converted to ? Fair. I was just looking at a statement about Chr(11200) in isolation. My bad. Off topic aside follows below: Interestingly, if the process has UTF-8 as the active code page (ccchttps://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page) then you can use AnsiString fine and be fully Unicode compliant. I discovered this by accident lately when my MATLAB mex file, which uses ANSI because MATLAB doesn't do UTF16, unexpectedly started handling Unicode with a recent MATLAB update! The update set this code page in its manifest. Share this post Link to post
Lajos Juhász 293 Posted November 29, 2022 30 minutes ago, David Heffernan said: Interestingly, if the process has UTF-8 as the active code page (ccchttps://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page) then you can use AnsiString fine and be fully Unicode compliant. I discovered this by accident lately when my MATLAB mex file, which uses ANSI because MATLAB doesn't do UTF16, unexpectedly started handling Unicode with a recent MATLAB update! The update set this code page in its manifest. Yeah, and breaks FireDAC as it converts from UTF-16 using the language for non-unicode programs instead of using conversion from client locale to server locale. Share this post Link to post
omnibrain 15 Posted November 29, 2022 58 minutes ago, David Heffernan said: What is the encoding of the input. Can you guarantee that it is ascii? The value can be basically anything from $01 to $ff. It's serial communication with various protocols. Some text based, some byte based and some of them a mix of both. Some of them are delimited by EOT ($04), for some we need to calculate the lengths, for some we need to calculate CRCs, etc. Not all in the same process, but the pattern ist the same for all of them and the tfr_com.dataavail ist the same. The serial communications components provides ansichars. And we don't expect to receive multi byte characters via serial communications anyway. We communicate with old hardware, with old protocols. Most of the time there is no encoding specified, but for the text parts (if there are some) most of the times just plain ascii characters are used. Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 59 minutes ago, Lajos Juhász said: Yeah, and breaks FireDAC as it converts from UTF-16 using the language for non-unicode programs instead of using conversion from client locale to server locale. That's on FireDAC I guess. AnsiString conversions works just fine in this scenario because it calls GetACP and uses the returned value (65001) for all conversions. Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 24 minutes ago, omnibrain said: The value can be basically anything from $01 to $ff. What do you expect and intend to happen then with values of >= $80? I don't think anything has changed in recent Delphi releases, but your code may have been broken forever. Share this post Link to post
programmerdelphi2k 237 Posted November 29, 2022 (edited) in HELP RAD 11.2 say: System.AnsiString Quote ANSISTRING: Represents a dynamically allocated string whose maximum length is limited only by available memory. An AnsiString variable is a structure containing string information. When the variable is empty (when it contains a zero-length string), the pointer is nil and the string uses no additional storage. When the variable is nonempty, it points to a dynamically allocated block of memory that contains the string value. This memory is allocated on the heap, but its management is entirely automatic and requires no user code. The AnsiString structure contains a 32-bit length indicator, a 32-bit reference count, a 16-bit data length indicating the number of bytes per character, and a 16-bit code page. This code page is set, by default, to the operating system's code page. It can be changed by calling SetMultiByteConversionCodePage. An AnsiString represents a single-byte string. With a single-byte character set (SBCS), each byte in a string represents one character. In a multibyte character set (MBCS), the elements are still single bytes, but some characters are represented by one byte and others by more than one byte. Multibyte character sets--especially double-byte character sets (DBCS)--are widely used for Asian languages. An AnsiString can contain MBCS characters. etc.. for that Char(11200) = ? ... or exists in Default O.S. page? Edited November 29, 2022 by programmerdelphi2k Share this post Link to post
David Heffernan 2345 Posted November 29, 2022 2 hours ago, programmerdelphi2k said: for that Char(11200) = ? ... or exists in Default O.S. page? Char(11200) is a perfectly valid Char, and represents a well defined UTF-16 element. The issue is when you convert to AnsiString. Share this post Link to post