Fons N 17 Posted April 10, 2022 Hi, I am trying to import using the clipboard some text which has to be processed. I am not a professional coder, it's just a hobby, but I do use some of my application at work (administration department). Below is data from a web application. In notepad is looks like this. In EditPad Pro is looks like this. Using the hex view it looks like this. The "white" space after the first Transaction seems to be 2 characters. The  is C2 and the space after is A0. According to a web search C2A0 is a non-breaking space. But you all probably know this I am trying to delete all spaces including this "special" unicode space. But I cannot get it to work. S := StringReplace(S, #$C2#$A0, '', [rfReplaceAll]); S := StringReplace(S, #$C2A0, '', [rfReplaceAll]); Delphi does not complain about any syntax errors, but the result is that this non-breaking space is not deleted - as in replaced by an empty string. I am at a loss. Delphi does include a function IsWhiteSpace that can detect this character, but I need something to delete it. Thanks in advance. Best regards, Fons Share this post Link to post
Fons N 17 Posted April 10, 2022 I have figured it out... after trying and searching for about an hour before posting my question... I suddenly have the answer. At first I tried the decimal value of C2A0 which is 49824. But that did not work either. Then I found this: When I use #160 it works Greetings, Fons Share this post Link to post
PeterBelow 238 Posted April 10, 2022 (edited) 56 minutes ago, Fons N said: Hi, I am trying to import using the clipboard some text which has to be processed. I am not a professional coder, it's just a hobby, but I do use some of my application at work (administration department). Below is data from a web application. In notepad is looks like this. You are approaching this from the wrong angle. The data you showed pasted into notepad looks like a semicolon-separated CSV format. To dissect this you can use TStringlist. Something like this: var LText, LLine: TStringlist; i: integer; begin LText := TStringlist.Create; try LText := Clipboard.AsText; // This splits the data into lines LLine := TStringlist.Create; try LLine.StrictDelimiter := true; LLine.Delimiter := ';'; for i:= 0 to LText.Count - 1 do begin LLine.DelimitedText := LText[i]; if i = 0 then ProcessHeaderLine(LLine) else ProcessDataLine(LLine); end; finally LLine.Free; end; finally LText.Free; end; end; Untested, just typed into the post directly. The two Process routines are something you would write yourself. For each the passed stringlist should hold 5 lines, the column captions for the header and the column values for the data. If you really want to replace a non-breaking Unicode space it is a single character with the code #$00A0, not a two-character string. Your hex viewer probably pastes the clipboard content as ANSI text, while Notepad pastes it as Unicode (UTF-16). Edited April 10, 2022 by PeterBelow 1 1 Share this post Link to post
Remy Lebeau 1397 Posted April 11, 2022 On 4/10/2022 at 9:07 AM, Fons N said: When I use #160 it works That is because the original data is encoded in UTF-8, but once it is loaded into your string, it is no longer encoded in UTF-8, it is encoded in UTF-16 instead. $C2 $A0 are the UTF-8 bytes for the non-breaking character, whereas $00A0 (decimal 160) is the UTF-16 value of that same character. 1 1 Share this post Link to post
Fons N 17 Posted April 14, 2022 On 4/10/2022 at 6:44 PM, PeterBelow said: If you really want to replace a non-breaking Unicode Peter, Thanks for your help. Sorry for the elaborate introduction to my question. I wasn't sure of it all, thus the long introduction. And yes, the actually splitting is done similar to your example. Greetings, Fons Share this post Link to post
Fons N 17 Posted April 14, 2022 On 4/11/2022 at 10:47 PM, Remy Lebeau said: but once it is loaded into your string, it is no longer encoded in UTF-8, it is encoded in UTF-16 instead. Remy, Thanks. Reading your reply, yes, it does make sense to me. I know there is UTF-8, 16 and 32, but didn't realize the "code point" (not sure about the name) would be different, just that the storage would be. But that it of course not the case, quite logically I suppose, but having not to deal with that issue thus far, it just didn't occur to me, yet. Greetings, Fons Share this post Link to post