JeanCremers 1 Posted December 21, 2024 (edited) i have this code, but it does not catch the filename 'Duo Canopée play Ständchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm', why not??? #define DIACRITIC_COUNT 10 static const WideChar diacritics[DIACRITIC_COUNT] = { L'á', L'à', L'â', L'ä', L'ã', L'å', L'é', L'è', L'ê', L'ë', }; static const WideChar replacements[DIACRITIC_COUNT] = { L'a', L'a', L'a', L'a', L'a', L'a', L'e', L'e', L'e', L'e', }; if(FindFirst(dir + L"\\*.*", faAnyFile, sr) == 0) { do { if(!(sr.Attr & faDirectory)) { String newName = sr.Name; bool changed = false; for(int i = 1; i <= newName.Length(); i++) { WideChar ch = newName; for (int j = 0; j < DIACRITIC_COUNT; j++) { if(ch == diacritics[j]) { newName = replacements[j]; changed = true; } } } if(changed) { TListItem* item = ListView1->Items->Add(); item->Caption = sr.Name; item->SubItems->Add(newName); } } } while(FindNext(sr) == 0); FindClose(sr); } Edited December 21, 2024 by JeanCremers forgot [i] Share this post Link to post
JeanCremers 1 Posted December 21, 2024 (edited) it should read newName [ i ] = replacements[j]; but the board does not let me change that. Edited December 21, 2024 by JeanCremers Share this post Link to post
JeanCremers 1 Posted December 21, 2024 damn, it should be, newName [ i ] Share this post Link to post
JeanCremers 1 Posted December 21, 2024 (edited) The debugger displays the name like Debug Output: Duo Canope´e play Sta¨ndchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm How do i catch these so i can rename them? Edit, had to use the windows FindFirstFileW to get it right. Ps, I have used c++builder 12 in the past, this one is real shitty. FindFile/Next not working properly. Properties not surviving oncreate(), i had a crash that trashed my source completely, mwooooooh. I'm not gonna port my main app to this version. Edited December 21, 2024 by JeanCremers Share this post Link to post
Remy Lebeau 1459 Posted December 21, 2024 2 hours ago, JeanCremers said: Edit, had to use the windows FindFirstFileW to get it right. The RTL's Find(First|Next)() functions use the Win32 Find(First|Next)FileW() APIs internally, and have done so since 2009. 2 hours ago, JeanCremers said: Properties not surviving oncreate() That might be related to the removal of the Form's OldCreateOrder property in RAD Studio 11. But, you should never have been using the OnCreate (and OnDestroy) event(s) in C++ anyway, as that has always had the potential of introducing Undefined Behavior in user C++ code due to the different creation models between Delphi vs C++. Use the Form's constructor (and destructor) instead, that is always safe. And streamed property values are available in the constructor. Share this post Link to post
JeanCremers 1 Posted December 21, 2024 (edited) 3 minutes ago, Remy Lebeau said: The RTL's Find(First|Next)() functions use the Win32 Find(First|Next)FileW() APIs internally, and have done so since 2009. Hi Remy, the files i mention really did not show up using the RTL ones. Off course i would prefer that. And using the debugger i got strange names like Canope´e play Sta¨ndchen. Edited December 21, 2024 by JeanCremers Share this post Link to post
Remy Lebeau 1459 Posted December 21, 2024 (edited) Feel free to step into the RTL source code for yourself with the debugger (see the code below). On Windows, the RTL's Find(First|Next)() functions simply call the API Find(First|Next)FileW() functions and then copy the WIN32_FIND_DATA fields into the TSearchRec fields: function FindFirstFile; external kernelbase name 'FindFirstFileW'; function FindNextFile; external kernelbase name 'FindNextFileW'; ... function FindMatchingFile(var F: TSearchRec): Integer; ... begin while F.FindData.dwFileAttributes and F.ExcludeAttr <> 0 do if not FindNextFile(F.FindHandle, F.FindData) then begin Result := GetLastError; Exit; end; ... F.Name := F.FindData.cFileName; // <-- HERE Result := 0; end; function FindFirst(const Path: string; Attr: Integer; var F: TSearchRec): Integer; const faSpecial = faHidden or faSysFile or faDirectory; begin F.ExcludeAttr := not Attr and faSpecial; F.FindHandle := FindFirstFile(PChar(Path), F.FindData); if F.FindHandle <> INVALID_HANDLE_VALUE then begin Result := FindMatchingFile(F); // <-- HERE if Result <> 0 then FindClose(F); end else Result := GetLastError; end; function FindNext(var F: TSearchRec): Integer; begin if FindNextFile(F.FindHandle, F.FindData) then Result := FindMatchingFile(F) // <-- HERE else Result := GetLastError; end; As you can see, the API's WIN32_FIND_DATA::cFileName is assign as-is to the RTL's TSearchRec::Name field, and since both field are based on WideChar (WIN32_FIND_DATA::cFileName is a WideChar[] array and TSearchRec::Name is a UnicodeString) then there is no manipulation of the reported characters in any way, they are copied as-is. What you get back in your code SHOULD be exactly what Windows actually reported. The TSearchRec::FindData field is the raw WIN32_FIND_DATA data that Find(First|Next)File() actually reported. Edited December 21, 2024 by Remy Lebeau Share this post Link to post
JeanCremers 1 Posted December 21, 2024 You are right, it was a my fault, i did not test it properly, the file is catched with plain RTL functions, only that file is never having detected to have diacritics, never gets added to the listview. const int DIACRITIC_COUNT = 29; static const WideChar diacritics[DIACRITIC_COUNT] = { L'á', L'à', L'â', L'ä', L'ã', L'å', L'é', L'è', L'ê', L'ë', L'í', L'ì', L'î', L'ï', L'ó', L'ò', L'ô', L'ö', L'õ', L'ú', L'ù', L'û', L'ü', L'ý', L'ÿ', L'ñ', L'ç', L'Ñ' }; static const WideChar replacements[DIACRITIC_COUNT] = { L'a', L'a', L'a', L'a', L'a', L'a', L'e', L'e', L'e', L'e', L'i', L'i', L'i', L'i', L'o', L'o', L'o', L'o', L'o', L'u', L'u', L'u', L'u', L'y', L'y', L'n', L'c', L'N' }; if (FindFirst(dir + "\\*.*", faAnyFile, R) == 0) do if(!(R.Attr & faDirectory)) { String newName = R.Name; bool changed = false; for(int i = 1; i <= newName.Length(); i++) for (int j = 0; j < DIACRITIC_COUNT; j++) { if(newName == diacritics[j]) { newName = replacements[j]; changed = true; } } if(changed) { TListItem* item = ListView1->Items->Add(); item->Caption = R.Name; item->SubItems->Add(newName); } } while (FindNext(R) == 0); FindClose(R); Share this post Link to post
JeanCremers 1 Posted December 21, 2024 (edited) the forum does not display the code correctly, i have to use spaces in the bracket like newName[ i ] Forgot to say, other files with diacritics are catched, just this particular one is not. Duo Canopée play Ständchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm If i put this in the findfirst/next loop R.Name is still the original but ch becomes plain 'e' if (R.Name.Pos("Duo Canop")) { WideChar ch = R.Name[10]; } Edited December 21, 2024 by JeanCremers Share this post Link to post
Remy Lebeau 1459 Posted December 21, 2024 (edited) 1 hour ago, JeanCremers said: You are right, it was a my fault, i did not test it properly, the file is catched with plain RTL functions, only that file is never having detected to have diacritics, never gets added to the listview. One issue I see is you are not taking into account either UTF-16 surrogates, or Unicode combining codepoints. Not all Unicode characters take up 1 WideChar, sometimes they require 2+ WideChars, especially if they are not in a normalized form. For example, the character 'á' may be 1 WideChar 0x00E1 (Latin Small Letter A with Acute), or it may be 2 WideChars 0x0061 (Latin Small Letter A) and 0x0301 (Combining Acute Accent) working together. What are the actual numeric values of the WideChars that you are actually receiving for the filename you are having trouble with? 1 hour ago, JeanCremers said: the forum does not display the code correctly, i have to use spaces in the bracket like newName[ i ] That is because you are posting the code as plain text. Put it inside of a code block instead (the '</>' button on the editor toolbar). For example: void sayHi() { cout << "This is in a code block!"; } Edited December 21, 2024 by Remy Lebeau Share this post Link to post
JeanCremers 1 Posted December 21, 2024 (edited) But how can R.Name be 'Duo Canopée play Ständchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm' and widechar W = R.Name[10] a plain e? Yes i was using [ blocks ], thanks. Edited December 21, 2024 by JeanCremers Share this post Link to post
Remy Lebeau 1459 Posted December 21, 2024 (edited) 34 minutes ago, JeanCremers said: But how can R.Name be 'Duo Canopée play Ständchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm' and widechar W = R.Name[10] a plain e? Just like in the example I gave you in my last reply, in this case I'm guessing that R.Name[10] is 0x0065 (Latin Small Letter E) and R.Name[11] is 0x0301 (Combining Acute Accent), whereas you are expecting R.Name[10] to be 0x00E9 (Latin Small Letter E with Acute) instead. To accomplish what you want, you should normalize the Unicode characters, probably to form NFC, before you can then process and replace them. Read the following for more details: Unicode Standard Annex #15: Unicode Normalization Forms Using Unicode Normalization to Represent Strings NormalizeString function Quote Yes i was using [ blocks ], thanks. I realize that. But that is reserved for markup language in plain text. You need to actually click on the '</>' button in the toolbar and put your code in the resulting popup dialog. Edited December 21, 2024 by Remy Lebeau Share this post Link to post
David Heffernan 2357 Posted December 21, 2024 Recurring theme here is that you think that everything else is at fault when you can't achieve things that others can. Perhaps you need the curiosity to ask why this is. Share this post Link to post
Remy Lebeau 1459 Posted December 22, 2024 @David Heffernan who are you directing that to? Share this post Link to post
JeanCremers 1 Posted December 22, 2024 I didn't see your question about the numeric values. The é = 101 and the ä = 116. The corresponding chars in my diacritics table are 233 and 228 though. i tried changing them with if (newName[i] == 101 || newName[i] == 116) { newName[i] = (newName[i] == 101 ? 'e' : 'a'); OutputDebugStringW(newName.c_str()); changed = true; } But i get: Debug Output: Duo Canope´e play Sta¨ndchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm Process diacritics.exe (2792) Share this post Link to post
David Heffernan 2357 Posted December 22, 2024 7 hours ago, Remy Lebeau said: @David Heffernan who are you directing that to? The OP Share this post Link to post