Jump to content
JeanCremers

filenames with unicode chars

Recommended Posts

i have this code, but it does not catch the filename 'Duo Canopée play Ständchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm', why not???

 

#define DIACRITIC_COUNT 10
    static const WideChar diacritics[DIACRITIC_COUNT] = {
        L'á', L'à', L'â', L'ä', L'ã', L'å',
        L'é', L'è', L'ê', L'ë',
    };

    static const WideChar replacements[DIACRITIC_COUNT] = {
        L'a', L'a', L'a', L'a', L'a', L'a',
        L'e', L'e', L'e', L'e',
    };


if(FindFirst(dir + L"\\*.*", faAnyFile, sr) == 0)
    {
        do {
            if(!(sr.Attr & faDirectory))
            {
                String newName = sr.Name;
                bool changed = false;
                for(int i = 1; i <= newName.Length(); i++)
                  {
                  WideChar ch = newName;
                   for (int j = 0; j < DIACRITIC_COUNT; j++)
                      {
                        if(ch == diacritics[j])
                        {
                        newName = replacements[j];
                        changed = true;
                        }
                      }
                  }
                if(changed) {
                    TListItem* item = ListView1->Items->Add();
                    item->Caption = sr.Name;
                    item->SubItems->Add(newName);
                }
            }
        } while(FindNext(sr) == 0);
        FindClose(sr);
    }

 

Edited by JeanCremers
forgot [i]

Share this post


Link to post

it should read newName [ i ] = replacements[j];

but the board does not let me change that.

Edited by JeanCremers

Share this post


Link to post

The debugger displays the name like

Debug Output: Duo Canope´e play Sta¨ndchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm

How do i catch these so i can rename them?

Edit, had to use the windows FindFirstFileW to get it right.

 

Ps, I have used c++builder 12 in the past, this one is real shitty. FindFile/Next not working properly. Properties not surviving oncreate(), i had a crash that trashed my source completely, mwooooooh. I'm not gonna port my main app to this version.

Edited by JeanCremers

Share this post


Link to post
2 hours ago, JeanCremers said:

Edit, had to use the windows FindFirstFileW to get it right.

The RTL's Find(First|Next)() functions use the Win32 Find(First|Next)FileW() APIs internally, and have done so since 2009.

2 hours ago, JeanCremers said:

Properties not surviving oncreate()

That might be related to the removal of the Form's OldCreateOrder property in RAD Studio 11.  But, you should never have been using the OnCreate (and OnDestroy) event(s) in C++ anyway, as that has always had the potential of introducing Undefined Behavior in user C++ code due to the different creation models between Delphi vs C++.  Use the Form's constructor (and destructor) instead, that is always safe.  And streamed property values are available in the constructor.

Share this post


Link to post
3 minutes ago, Remy Lebeau said:

The RTL's Find(First|Next)() functions use the Win32 Find(First|Next)FileW() APIs internally, and have done so since 2009.

 

Hi Remy, the files i mention really did not show up using the RTL ones. Off course i would prefer that. And using the debugger i got strange names like Canope´e play Sta¨ndchen.

Edited by JeanCremers

Share this post


Link to post

Feel free to step into the RTL source code for yourself with the debugger (see the code below).  On Windows, the RTL's Find(First|Next)() functions simply call the  API Find(First|Next)FileW() functions and then copy the WIN32_FIND_DATA fields into the TSearchRec fields:

function FindFirstFile; external kernelbase name 'FindFirstFileW';
function FindNextFile; external kernelbase name 'FindNextFileW';

...

function FindMatchingFile(var F: TSearchRec): Integer;
var
  LocalFileTime: TFileTime;
begin
  while F.FindData.dwFileAttributes and F.ExcludeAttr <> 0 do
    if not FindNextFile(F.FindHandle, F.FindData) then
    begin
      Result := GetLastError;
      Exit;
    end;
  FileTimeToLocalFileTime(F.FindData.ftLastWriteTime, LocalFileTime);
  FileTimeToDosDateTime(LocalFileTime, LongRec(F.Time).Hi,
    LongRec(F.Time).Lo);
  F.Size := F.FindData.nFileSizeLow or Int64(F.FindData.nFileSizeHigh) shl 32;
  F.Attr := F.FindData.dwFileAttributes;
  F.Name := F.FindData.cFileName; // <-- HERE
  Result := 0;
end;

function FindFirst(const Path: string; Attr: Integer;
  var F: TSearchRec): Integer;
const
  faSpecial = faHidden or faSysFile or faDirectory;
begin
  F.ExcludeAttr := not Attr and faSpecial;
  F.FindHandle := FindFirstFile(PChar(Path), F.FindData);
  if F.FindHandle <> INVALID_HANDLE_VALUE then
  begin
    Result := FindMatchingFile(F); // <-- HERE
    if Result <> 0 then FindClose(F);
  end
  else
    Result := GetLastError;
end;

function FindNext(var F: TSearchRec): Integer;
begin
  if FindNextFile(F.FindHandle, F.FindData) then
    Result := FindMatchingFile(F) // <-- HERE
  else
    Result := GetLastError;
end;

As you can see, the API's WIN32_FIND_DATA::cFileName is assign as-is to the RTL's TSearchRec::Name field, and since both field are based on WideChar (WIN32_FIND_DATA::cFileName is a WideChar[] array and TSearchRec::Name is a UnicodeString) then there is no manipulation of the reported characters in any way, they are copied as-is.  What you get back in your code SHOULD be exactly what Windows actually reported.

 

The TSearchRec::FindData field is the raw WIN32_FIND_DATA data that Find(First|Next)File() actually reported.

Edited by Remy Lebeau

Share this post


Link to post

You are right, it was a my fault, i did not test it properly, the file is catched with plain RTL functions, only that file is never having detected to have diacritics, never gets added to the listview.

 

    const int DIACRITIC_COUNT = 29; 
    static const WideChar diacritics[DIACRITIC_COUNT] = {
        L'á', L'à', L'â', L'ä', L'ã', L'å',
        L'é', L'è', L'ê', L'ë',
        L'í', L'ì', L'î', L'ï',
        L'ó', L'ò', L'ô', L'ö', L'õ',
        L'ú', L'ù', L'û', L'ü',
        L'ý', L'ÿ',
        L'ñ',
        L'ç',
        L'Ñ'
    };
    static const WideChar replacements[DIACRITIC_COUNT] = {
        L'a', L'a', L'a', L'a', L'a', L'a',
        L'e', L'e', L'e', L'e',
        L'i', L'i', L'i', L'i',
        L'o', L'o', L'o', L'o', L'o',
        L'u', L'u', L'u', L'u',
        L'y', L'y',
        L'n',
        L'c',
        L'N'
    };

if (FindFirst(dir + "\\*.*", faAnyFile, R) == 0) do if(!(R.Attr & faDirectory))
  {
  String newName = R.Name;
  bool changed = false;
  for(int i = 1; i <= newName.Length(); i++)
    for (int j = 0; j < DIACRITIC_COUNT; j++)
      {
      if(newName == diacritics[j])
        {
        newName = replacements[j];
        changed = true;
        }
      }
    if(changed)
      {
      TListItem* item = ListView1->Items->Add();
      item->Caption = R.Name;
      item->SubItems->Add(newName);
      }
  } while (FindNext(R) == 0);
FindClose(R);
 

Share this post


Link to post

the forum does not display the code correctly, i have to use spaces in the bracket like newName[ i ]

Forgot to say, other files with diacritics are catched, just this particular one is not. Duo Canopée play Ständchen by Franz Schubert on a 1968 D. Friederich Guitar & Violoncello.webm

Edited by JeanCremers

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×