Jump to content

Leaderboard


Popular Content

Showing content with the highest reputation on 04/08/23 in Posts

  1. gidesa

    ANN: New Opencv v. 4.6 C++ API wrapper

    Hello to all, I have released a new version of wrapper library, with many improvements, and instructions for build on Linux, too. https://github.com/gidesa/ocvWrapper46
  2. David Heffernan

    Unicode weirdness

    This is fair. I looked at the first couple which are UTF-8, and then assumed they all were. But a couple of them aren't. Not implausible that the Delphi code in the OP is wrong though.
  3. David Heffernan

    Unicode weirdness

    This entire thread blows my mind. The number of people who think it's normal to read UTF8 as though it were ANSI.
  4. David Heffernan

    Unicode weirdness

    I'd just read them using the UTF8 encoding in the first place and so never ever see these characters. I'm sure you would too.
  5. Cristian Peța

    Windows App Store icon sizes - unplated?

    Strange... if I change the 44x44 logo to have more black, it will show transparent in taskbar. I discovered this accidentally trying do alter the image to be sure it's the right one. A little black will not suffice. Something more than 1/3.
  6. Remy Lebeau

    Indy TCP client

    There is usually no need to ever use ReuseSocket on a client, unless you use its BoundIP and BoundPort properties to make it connect from a specific local IP/Port pair, AND you are connecting to the same remote IP/Port pair as a previous connection from the same local IP/Port. By default, a client socket connects from a random local port. ReuseSocket is more commonly used only on servers instead. When a server gets shutdown and restarted quickly, ReuseSocket can can allow it to listen on the same local IP/Port as the previous run, in case the OS hasn't released the local IP/Port yet. You are using the InputBuffer the wrong way. After you Write() out your wBuffer, you are waiting in an endless loop until a reply arrives. Why are you using a loop at all? That is not necessary. Indy's reading behavior is designed to block the calling thread waiting for requested data to arrive. Most of the IOHandler's reading methods get their data from the InputBuffer only, not from the socket directly. CheckForDataOnSource() reads directly from the socket and saves whatever it receives into the InputBuffer. So, if you ask a reading method (ie, in this case, ReadBytes()) to read something (ie, in this case, 65 bytes), the method does not exit until all of the bytes for that something are available in the InputBuffer, or until the ReadTimeout elapses (which is infinite by default). In your case, when you do eventually get a reply, you are reading it from the InputBuffer into your rBuffer, but then you are ignoring rBuffer that you just read into and instead you are checking the InputBuffer directly to see if it has any buffered bytes that you have NOT read yet, and only if it DOES then you are flagging yourself to make the next Write() call. But if the InputBuffer is EMPTY (because you have already read everything that was in it) then you are NEVER calling Write() again, and you end up stuck in your reading loop waiting for CheckForDataOnSource() to receive new bytes from the socket which you are NEVER requesting the server to send. You have over-complicated your thread logic. You don't need all of that InputBuffer handling at all. Just call Write(), then ReadBytes() (letting it block), and repeat. That being said, there are some other issues with your code, too. You are not calling Disconnect() after Connect() is successful. You are accessing the InputBuffer in the main threadd while your worker thread may also be accessing it at the same time. And you are not capturing the Exception when calling TThread.Queue(), so the Exception will be destroyed long before the main thread has a chance to access its Message. With all of that said, try this instead: inherited Create(True); TCPClient := TIdTCPClient.Create; TCPClient.Host := AHost; TCPClient.Port := APort; TCPClient.ConnectTimeout := 5000; TCPClient.ReadTimeout := ...; // infinite by default ... // a separate procedure is needed so that TThread.Queue() can // capture and extend the lifetime of the String. See the // documentation for more details: // https://docwiki.embarcadero.com/RADStudio/Sydney/en/Anonymous_Methods_in_Delphi // procedure DisplayMessageInUI(const AMsg: string); begin TThread.Queue(nil, procedure begin Form2.mmo1.Lines.Add(AMsg); end); end; ... SetLength(wBuffer, 6); //write some bytes into wBuffer SetLength(rBuffer, 65); while not Terminated do begin try TCPClient.Connect; except on E: Exception do begin DisplayMessageInUI('Exception: ' + e.Message); for i := 1 to 5 do begin if Terminated then Exit; Sleep(1000); end; Continue; end; end; try try i := 1; while not Terminated do begin TCPClient.IOHandler.Write(wBuffer); TCPClient.IOHandler.ReadBytes(rBuffer, 65, False); // waits for reply // { Alternatively, if you want to keep checking Terminated while waiting: while TCPClient.IOHandler.InputBufferIsEmpty do begin TCPClient.IOHandler.CheckForDataOnSource(100); TCPClient.IOHandler.CheckForDisconnect; if Terminated then Exit; end; TCPClient.IOHandler.ReadBytes(rBuffer, 65, False); } DisplayMessageInUI('Reply received'); //do some stuff with rBuffer Inc(i); Sleep(1000); end; finally TCPClient.Disconnect; end; except on E: Exception do begin DisplayMessageInUI('Exception: ' + e.Message); if Terminated then Exit; Sleep(1000); end; end; end;
  7. Fr0sT.Brutal

    Unicode weirdness

    So you had file interpreted as ANSI and converted into UTF16 with all the "weird" chars just widened ($AB => $00AB). And you had your UTF16-encoded literals defined in the same way because IDE thought the source file is in ANSI. Then, in new version, the option has changed to UTF8. And your literals which together form a valid UTF8 compound char turned to single UTF16 char which is not contained in source string. That's my version.
  8. Lars Fosdal

    Unicode weirdness

    @David Schwartz - You wouldn't happen to have an original file uploaded as an attachment to a post here, so that we can try some conversions?
  9. Brian Evans

    Unicode weirdness

    See this most often when ASCII is automatically cleaned up typographically for printing by doing some conversions like dash to em dash. If this text is put back / interpreted as ASCII bytes the various UTF8 encodings of the typographical replacements end up us multiple characters each like a minus/dash that was converted to em dash then ending up as †". Need to find where the problem is - the data in the PDF itself could already be corrupted this way or it can happen at some other stage including in the PDF -> Text or in how you load the text. Often even if you do interpret the encodings correctly so there is a — (em dash) instead of †" the equivalent replacements might be worthwhile to convert text back to plain ASCII.
  10. A.M. Hoornweg

    Unicode weirdness

    Just open the extracted *.txt files in Notepad++ and try out the different encoding options that this program offers until the files display correctly. Then save them as "unicode with bom". tStringlist.loadfromfile will load the files correctly even if they countain foreign characters.
  11. Lars Fosdal

    Unicode weirdness

    Wouldn't converting the chars to Unicode solve that problem? All strings in modern Delphi components are using Unicode. I don't understand why you don't want to handle the text as what it is. Once you have the text as Unicode, you also get all the nice TCharHelper functions to understand what kind of character you are looking at, in case you want to do more manipulations. A lot better and more robust than string replacements.
  12. David Heffernan

    Unicode weirdness

    Isn't the real problem that you have interpreted UTF-8 encoded data as though it were ANSI? I mean, it's clearly not ASCII because none of the characters in your code are in the ASCII set. You can actually delete all of these StringReplace calls by simply using the correct encoding for your extracted data.
  13. Stefan Glienke

    Unicode weirdness

    That would be quite nonsense given that strs is TStrings as David wrote ("strs points to a memo.Lines property"). Then don't use a Memo and its Lines property I would say - they are Unicode.
  14. timfrost

    Unicode weirdness

    Can you not find a better 'text extractor' which produces more useful output?
  15. Lars Fosdal

    Unicode weirdness

    @David Schwartz - This looks like MBCS encoding - the old ANSI multibyte character set encoding scheme in Windows. The ANSI routines should be capable of converting the strings to Unicode, but they depend on knowing the appropriate code page. https://docwiki.embarcadero.com/RADStudio/Alexandria/en/Commonly_Used_Routines_for_AnsiStrings
  16. David Schwartz

    Unicode weirdness

    I have some PDF files that I ran through a text extractor to get simple text files (.txt). I assumed they were ASCII text, but it appears not. The files have lots of things like ’ and – and … scattered throughout. I found a table that shows what they're supposed to be and wrote this to convert them (strs points to a memo.Lines property): var ln := ''; strs.BeginUpdate; for var n := 0 to strs.Count-1 do begin ln := StringReplace( strs[n], '➤', '>', [rfReplaceAll] ); // '➤' ln := StringReplace( ln, '’', '''', [rfReplaceAll] ); // '’' ln := StringReplace( ln, '“', '"', [rfReplaceAll] ); // '“' ln := StringReplace( ln, 'â€', '"', [rfReplaceAll] ); // 'â€' ln := StringReplace( ln, '…', '...', [rfReplaceAll] ); // '…' ln := StringReplace( ln, 'â€"', '--', [rfReplaceAll] ); // 'â€"' ln := StringReplace( ln, '–', '--', [rfReplaceAll] ); // '–' strs[n] := ln; end; strs.EndUpdate; This worked for a little while, until the Dephi IDE (10.4.2) unexpectedly decided to convert all of the string literals into actual Unicode characters, and then it stopped working since StringReplace didn't find any of them in the text. Ugh. I corrected it here before pasting this code, and hopefully it won't get changed here as well. For my purposes, these characters are irrelevant. I'm replacing them with ASCII characters so they make sense if you're reading the text. But whether they're ASCII or Unicode doesn't matter. I found a table here: https://www.i18nqa.com/debug/utf8-debug.html and it says an apostrophe can be represented in several ways: How can I replace a 2- or 3-char literal like â € ™ with one of these above codes so the compiler doesn't change them back to Unicode representations? Is there a simpler way to do this? Depending on what I'm using to look at the text data files, they may appear as their "real" Unicode representation, or they may appear as 2- or 3-char gibberish. I just need ASCII text that comes close to what they represent.
  17. Javier Tarí

    Unicode weirdness

    I would just substitute the original characters for their equivalent codes, in the #99 or #$ab format
×