Leaderboard

in all areas
Custom Date
- Custom Date
  Between and

David Heffernan

Members
- Points
  
  4
- Content Count
  
  3710
- Find Content
Lars Fosdal

Administrators
- Points
  
  3
- Content Count
  
  3515
- Find Content
gidesa

Members
- Points
  
  2
- Content Count
  
  9
- Find Content
A.M. Hoornweg

Members
- Points
  
  1
- Content Count
  
  494
- Find Content

Popular Content

Showing content with the highest reputation on 04/08/23 in all areas

ANN: New Opencv v. 4.6 C++ API wrapper

gidesa replied to gidesa's topic in Delphi Third-Party

Hello to all, I have released a new version of wrapper library, with many improvements, and instructions for build on Linux, too. https://github.com/gidesa/ocvWrapper46
- April 7, 2023
- 10 replies
Unicode weirdness

David Heffernan replied to David Schwartz's topic in VCL

This is fair. I looked at the first couple which are UTF-8, and then assumed they all were. But a couple of them aren't. Not implausible that the Delphi code in the OP is wrong though.
- April 8, 2023
- 28 replies
Unicode weirdness

David Heffernan replied to David Schwartz's topic in VCL

This entire thread blows my mind. The number of people who think it's normal to read UTF8 as though it were ANSI.
- April 8, 2023
- 28 replies
TO ChatGPT: In Delphi, is there any kind of an adapter or class that takes a TList<T> and makes it look like a TDataSet?

Anders Melander replied to David Schwartz's topic in Databases

Gold!
- April 7, 2023
- 79 replies
Unicode weirdness

David Heffernan replied to David Schwartz's topic in VCL

I'd just read them using the UTF8 encoding in the first place and so never ever see these characters. I'm sure you would too.
- April 7, 2023
- 28 replies
Windows App Store icon sizes - unplated?

Cristian Peța replied to Incus J's topic in Delphi IDE and APIs

Strange... if I change the 44x44 logo to have more black, it will show transparent in taskbar. I discovered this accidentally trying do alter the image to be sure it's the right one. A little black will not suffice. Something more than 1/3.
- April 7, 2023
- 9 replies
- - appx
  - windows store app
  - (and 3 more)
    Tagged with:
    
    appx
    
    windows store app
    
    icon
    
    plated
    
    unplated
Indy TCP client

Remy Lebeau replied to @rturas's topic in Network, Cloud and Web

There is usually no need to ever use ReuseSocket on a client, unless you use its BoundIP and BoundPort properties to make it connect from a specific local IP/Port pair, AND you are connecting to the same remote IP/Port pair as a previous connection from the same local IP/Port. By default, a client socket connects from a random local port. ReuseSocket is more commonly used only on servers instead. When a server gets shutdown and restarted quickly, ReuseSocket can can allow it to listen on the same local IP/Port as the previous run, in case the OS hasn't released the local IP/Port yet. You are using the InputBuffer the wrong way. After you Write() out your wBuffer, you are waiting in an endless loop until a reply arrives. Why are you using a loop at all? That is not necessary. Indy's reading behavior is designed to block the calling thread waiting for requested data to arrive. Most of the IOHandler's reading methods get their data from the InputBuffer only, not from the socket directly. CheckForDataOnSource() reads directly from the socket and saves whatever it receives into the InputBuffer. So, if you ask a reading method (ie, in this case, ReadBytes()) to read something (ie, in this case, 65 bytes), the method does not exit until all of the bytes for that something are available in the InputBuffer, or until the ReadTimeout elapses (which is infinite by default). In your case, when you do eventually get a reply, you are reading it from the InputBuffer into your rBuffer, but then you are ignoring rBuffer that you just read into and instead you are checking the InputBuffer directly to see if it has any buffered bytes that you have NOT read yet, and only if it DOES then you are flagging yourself to make the next Write() call. But if the InputBuffer is EMPTY (because you have already read everything that was in it) then you are NEVER calling Write() again, and you end up stuck in your reading loop waiting for CheckForDataOnSource() to receive new bytes from the socket which you are NEVER requesting the server to send. You have over-complicated your thread logic. You don't need all of that InputBuffer handling at all. Just call Write(), then ReadBytes() (letting it block), and repeat. That being said, there are some other issues with your code, too. You are not calling Disconnect() after Connect() is successful. You are accessing the InputBuffer in the main threadd while your worker thread may also be accessing it at the same time. And you are not capturing the Exception when calling TThread.Queue(), so the Exception will be destroyed long before the main thread has a chance to access its Message. With all of that said, try this instead: inherited Create(True); TCPClient := TIdTCPClient.Create; TCPClient.Host := AHost; TCPClient.Port := APort; TCPClient.ConnectTimeout := 5000; TCPClient.ReadTimeout := ...; // infinite by default ... // a separate procedure is needed so that TThread.Queue() can // capture and extend the lifetime of the String. See the // documentation for more details: // https://docwiki.embarcadero.com/RADStudio/Sydney/en/Anonymous_Methods_in_Delphi // procedure DisplayMessageInUI(const AMsg: string); begin TThread.Queue(nil, procedure begin Form2.mmo1.Lines.Add(AMsg); end); end; ... SetLength(wBuffer, 6); //write some bytes into wBuffer SetLength(rBuffer, 65); while not Terminated do begin try TCPClient.Connect; except on E: Exception do begin DisplayMessageInUI('Exception: ' + e.Message); for i := 1 to 5 do begin if Terminated then Exit; Sleep(1000); end; Continue; end; end; try try i := 1; while not Terminated do begin TCPClient.IOHandler.Write(wBuffer); TCPClient.IOHandler.ReadBytes(rBuffer, 65, False); // waits for reply // { Alternatively, if you want to keep checking Terminated while waiting: while TCPClient.IOHandler.InputBufferIsEmpty do begin TCPClient.IOHandler.CheckForDataOnSource(100); TCPClient.IOHandler.CheckForDisconnect; if Terminated then Exit; end; TCPClient.IOHandler.ReadBytes(rBuffer, 65, False); } DisplayMessageInUI('Reply received'); //do some stuff with rBuffer Inc(i); Sleep(1000); end; finally TCPClient.Disconnect; end; except on E: Exception do begin DisplayMessageInUI('Exception: ' + e.Message); if Terminated then Exit; Sleep(1000); end; end; end;
Unicode weirdness

Fr0sT.Brutal replied to David Schwartz's topic in VCL

So you had file interpreted as ANSI and converted into UTF16 with all the "weird" chars just widened ($AB => $00AB). And you had your UTF16-encoded literals defined in the same way because IDE thought the source file is in ANSI. Then, in new version, the option has changed to UTF8. And your literals which together form a valid UTF8 compound char turned to single UTF16 char which is not contained in source string. That's my version.
- April 6, 2023
- 28 replies
Unicode weirdness

Lars Fosdal replied to David Schwartz's topic in VCL

@David Schwartz - You wouldn't happen to have an original file uploaded as an attachment to a post here, so that we can try some conversions?
- April 6, 2023
- 28 replies
Unicode weirdness

Brian Evans replied to David Schwartz's topic in VCL

See this most often when ASCII is automatically cleaned up typographically for printing by doing some conversions like dash to em dash. If this text is put back / interpreted as ASCII bytes the various UTF8 encodings of the typographical replacements end up us multiple characters each like a minus/dash that was converted to em dash then ending up as â€ ". Need to find where the problem is - the data in the PDF itself could already be corrupted this way or it can happen at some other stage including in the PDF -> Text or in how you load the text. Often even if you do interpret the encodings correctly so there is a — (em dash) instead of â€ " the equivalent replacements might be worthwhile to convert text back to plain ASCII.
- April 5, 2023
- 28 replies
Unicode weirdness

A.M. Hoornweg replied to David Schwartz's topic in VCL

Just open the extracted *.txt files in Notepad++ and try out the different encoding options that this program offers until the files display correctly. Then save them as "unicode with bom". tStringlist.loadfromfile will load the files correctly even if they countain foreign characters.
- April 5, 2023
- 28 replies
Unicode weirdness

Lars Fosdal replied to David Schwartz's topic in VCL

Wouldn't converting the chars to Unicode solve that problem? All strings in modern Delphi components are using Unicode. I don't understand why you don't want to handle the text as what it is. Once you have the text as Unicode, you also get all the nice TCharHelper functions to understand what kind of character you are looking at, in case you want to do more manipulations. A lot better and more robust than string replacements.
- April 4, 2023
- 28 replies
Unicode weirdness

David Heffernan replied to David Schwartz's topic in VCL

Isn't the real problem that you have interpreted UTF-8 encoded data as though it were ANSI? I mean, it's clearly not ASCII because none of the characters in your code are in the ASCII set. You can actually delete all of these StringReplace calls by simply using the correct encoding for your extracted data.
- April 4, 2023
- 28 replies
Unicode weirdness

Stefan Glienke replied to David Schwartz's topic in VCL

That would be quite nonsense given that strs is TStrings as David wrote ("strs points to a memo.Lines property"). Then don't use a Memo and its Lines property I would say - they are Unicode.
- April 4, 2023
- 28 replies
Unicode weirdness

timfrost replied to David Schwartz's topic in VCL

Can you not find a better 'text extractor' which produces more useful output?
- April 4, 2023
- 28 replies
Unicode weirdness

Lars Fosdal replied to David Schwartz's topic in VCL

@David Schwartz - This looks like MBCS encoding - the old ANSI multibyte character set encoding scheme in Windows. The ANSI routines should be capable of converting the strings to Unicode, but they depend on knowing the appropriate code page. https://docwiki.embarcadero.com/RADStudio/Alexandria/en/Commonly_Used_Routines_for_AnsiStrings
- April 4, 2023
- 28 replies
Unicode weirdness

David Schwartz posted a topic in VCL

I have some PDF files that I ran through a text extractor to get simple text files (.txt). I assumed they were ASCII text, but it appears not. The files have lots of things like â€™ and â€“ and â€¦ scattered throughout. I found a table that shows what they're supposed to be and wrote this to convert them (strs points to a memo.Lines property): var ln := ''; strs.BeginUpdate; for var n := 0 to strs.Count-1 do begin ln := StringReplace( strs[n], 'âž¤', '>', [rfReplaceAll] ); // 'âž¤' ln := StringReplace( ln, 'â€™', '''', [rfReplaceAll] ); // 'â€™' ln := StringReplace( ln, 'â€œ', '"', [rfReplaceAll] ); // 'â€œ' ln := StringReplace( ln, 'â€', '"', [rfReplaceAll] ); // 'â€' ln := StringReplace( ln, 'â€¦', '...', [rfReplaceAll] ); // 'â€¦' ln := StringReplace( ln, 'â€"', '--', [rfReplaceAll] ); // 'â€"' ln := StringReplace( ln, 'â€“', '--', [rfReplaceAll] ); // 'â€“' strs[n] := ln; end; strs.EndUpdate; This worked for a little while, until the Dephi IDE (10.4.2) unexpectedly decided to convert all of the string literals into actual Unicode characters, and then it stopped working since StringReplace didn't find any of them in the text. Ugh. I corrected it here before pasting this code, and hopefully it won't get changed here as well. For my purposes, these characters are irrelevant. I'm replacing them with ASCII characters so they make sense if you're reading the text. But whether they're ASCII or Unicode doesn't matter. I found a table here: https://www.i18nqa.com/debug/utf8-debug.html and it says an apostrophe can be represented in several ways: How can I replace a 2- or 3-char literal like â € ™ with one of these above codes so the compiler doesn't change them back to Unicode representations? Is there a simpler way to do this? Depending on what I'm using to look at the text data files, they may appear as their "real" Unicode representation, or they may appear as 2- or 3-char gibberish. I just need ASCII text that comes close to what they represent.
- April 4, 2023
- 28 replies
Unicode weirdness

Javier Tarí replied to David Schwartz's topic in VCL

I would just substitute the original characters for their equivalent codes, in the #99 or #$ab format
- April 8, 2023
- 28 replies

Sign In

Leaderboard

David Heffernan

Points

Content Count

Lars Fosdal

Points

Content Count

gidesa

Points

Content Count

A.M. Hoornweg

Points

Content Count

Popular Content

ANN: New Opencv v. 4.6 C++ API wrapper

Unicode weirdness

Unicode weirdness

TO ChatGPT: In Delphi, is there any kind of an adapter or class that takes a TList<T> and makes it look like a TDataSet?

Unicode weirdness

Windows App Store icon sizes - unplated?

Indy TCP client

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Unicode weirdness

Browse

Activity