baoquan.zuo 36 Posted yesterday at 03:52 AM Hi, I'd like to share a post. It addressed a byte loss issue captured from a discussion. // Compile with code page 936 program Problem; const strPublicKey: RawByteString = #$30#$3C#$30#$0D#$06#$09#$2A#$86#$48#$86#$F7#$0D#$01#$01#$01#$05 + #$00#$03#$2B#$00#$30#$28#$02#$21#$00#$A4#$65#$B8#$CD#$B4#$29#$A9 + #$64#$1A#$C5#$80#$55#$22#$1B#$BB#$C5#$98#$36#$B9#$23#$0C#$CA#$D4 + #$A8#$B8#$7C#$E6#$32#$E3#$89#$3D#$77#$02#$03#$01#$00#$01; begin Writeln(Length(strPublicKey)); // expected 62 got 58 - why? Readln; end. https://devjetsoftware.com/delphi/byte-loss-in-string-literal-concatenation/ 3 Share this post Link to post
David Heffernan 2422 Posted yesterday at 06:51 AM What is wrong with the world of Delphi programmers that in 2025 there are still people who can't understand the difference between text and bytes? The article you link to goes on and on about text but your data is bytes. Why not just use the correct data type? 1 Share this post Link to post
baoquan.zuo 36 Posted yesterday at 07:08 AM 2 minutes ago, David Heffernan said: What is wrong with the world of Delphi programmers that in 2025 there are still people who can't understand the difference between text and bytes? The article you link to goes on and on about text but your data is bytes. Why not just use the correct data type? Yes — it’s clear we should use the proper data type to represent raw bytes. What the article (and my curiosity) really digs into is why the data loss occurs and how DCC handles string literals (as there is no formal Delphi language specification). That matters to me because I’m also writing some Delphi compiler-frontend code in my products. In any case, these insights should help when migrating legacy ANSI-based Delphi projects. Share this post Link to post
David Heffernan 2422 Posted yesterday at 07:14 AM 5 minutes ago, baoquan.zuo said: In any case, these insights should help when migrating legacy ANSI-based Delphi projects I don't see this as helpful to anyone. Use bytes to represent bytes. Use strings to represent text. Don't use ANSI strings. Share this post Link to post
baoquan.zuo 36 Posted yesterday at 07:28 AM 13 minutes ago, David Heffernan said: I don't see this as helpful to anyone. Use bytes to represent bytes. Use strings to represent text. Don't use ANSI strings. Thanks for sharing your view. Share this post Link to post
Roger Cigol 130 Posted yesterday at 09:36 AM Of course there ARE times when the use of ANSI strings makes sense. One example is when sending data to/from an external device down an RS232 port where the external device uses a protocol based on simple ANSI text. We have many real world cases such as this (eg Eurotherm temperature controllers). The key point that @David Heffernan makes is that you should choose your types carefully to closely (or exactly!) reflect your needs. Time spent thinking carefully about your type selection will save you time in the long run..... 2 Share this post Link to post
David Heffernan 2422 Posted yesterday at 11:53 AM 2 hours ago, Roger Cigol said: Of course there ARE times when the use of ANSI strings makes sense. One example is when sending data to/from an external device down an RS232 port where the external device uses a protocol based on simple ANSI text. We have many real world cases such as this (eg Eurotherm temperature controllers). The key point that @David Heffernan makes is that you should choose your types carefully to closely (or exactly!) reflect your needs. Time spent thinking carefully about your type selection will save you time in the long run..... I mean, you work with strings and do TEncoding.ASCII.GetBytes Share this post Link to post
Roger Cigol 130 Posted yesterday at 12:14 PM 20 minutes ago, David Heffernan said: I mean, you work with strings and do TEncoding.ASCII.GetBytes All good ! There is more than one way to skin a cat..... Share this post Link to post
Anders Melander 1973 Posted yesterday at 01:58 PM 6 hours ago, David Heffernan said: I don't see this as helpful to anyone. That's a bit... unnuanced. I thought it was helpful. Even though the "problem" is pretty obscure, and I haven't encountered it myself, it's a point of data that might come in handy some day. Share this post Link to post
David Heffernan 2422 Posted 11 hours ago 16 hours ago, Roger Cigol said: All good ! There is more than one way to skin a cat..... This way is reliable and works Share this post Link to post
David Heffernan 2422 Posted 10 hours ago 15 hours ago, Anders Melander said: That's a bit... unnuanced. I thought it was helpful. Even though the "problem" is pretty obscure, and I haven't encountered it myself, it's a point of data that might come in handy some day. My point is that it's behaviour that you don't ever need to know because the correct way to handle byte data is as, well, bytes and not text. So for sure there's an algorithm, but it's not one that anyone actually needs to know. 1 Share this post Link to post
baoquan.zuo 36 Posted 9 hours ago (edited) I forgot to mention that, in the original case, the proper solution is to use a byte array —I’d assumed this was common knowledge, but I should have spelled it out. As I wrote at the beginning: Quote I was curious about the data-loss issue, so I decided to investigate it. I simply documented the journey, shared it, and hope it helps someone. At the very least, the exercise deepened my understanding of character encoding and how dcc handles string literals. In the end, it’s just an article. If you skimmed it, read the conclusion, and found nothing useful -- no worries, and thanks for taking a look. Edited 8 hours ago by baoquan.zuo proofread Share this post Link to post
David Heffernan 2422 Posted 3 hours ago 5 hours ago, baoquan.zuo said: I’d assumed this was common knowledge, but I should have spelled it out. That makes a lot more sense. Assumed it was common knowledge? I'm not so sure. I think there's still a big underbelly of Delphi coders that don't get this. Share this post Link to post
baoquan.zuo 36 Posted 16 minutes ago Yes. It was a bit surprised that, when CnPack published the Chinese Translation, advwang mentioned he had reported the issue (RSP-20624) back in 2018. The issue was closed as 'Work as Designed', with a suggestion to add a warning in cases of potential data loss. He also said Eurekalog 7.0 used this approach in their shellcode but fixed with byte array later. btw. I added this paragraph to the introduction: Quote Note: Generally, the correct approach is to use a byte array to represent binary data, since strings are intended for textual content. You may also skip the analysis and jump directly to the Conclusion section. and improve the Conclusion section: Quote Why does data loss occur? In short, it is caused by converting an invalid AnsiString to a UnicodeString. Invalid byte sequences are replaced with the ? character. But how does this happen exactly? What's the underlying reason? Share this post Link to post