Jump to content
Sign in to follow this  
KodeZwerg

fast file searching, what do you recommend, please?

Recommended Posts

Hi there,

i know a few ways to play with files but never really compared them.

 

classical way with FindFirst() FindNext(), easy handling but slow if searching with '*.*' mask 🙂

PIDL if you know how to deal with em, my opinion very fast but a bit harder to play with.

and IOUtils offer TPath, where i have less experience.

 

Wich way should a modern application go if Windows is target?

And if speed would be an aspect, what you think might be fastest?

 

as proto code without any code at all

// function that retrive a list with matching filenames "FileMask"
function FindFiles( const StartPath: String = ''; const FileMask: String = '*.*'; const SubFolder: Boolean = False ): TStrings;
var
  AStrings: TStrings;
begin
  AStrings := TStrings.Create();
  try
    AStrings := CollectData( StartPath, FileMask, SubFolder );
  finally
    Result := AStrings;
    AStrings.Free;
  end;
end;

 

Edited by KodeZwerg

Share this post


Link to post
35 minutes ago, KodeZwerg said:

and IOUtils offer TPath, where i have less experience.

I guess you are referring to TDirectory.GetFiles here as TPath has no such functionality. Well, that is merely a wrapper for FindFirst/FindNext so not actually a new approach and such not worth investigating in terms of speed over convenience.

Share this post


Link to post

Sorry if I interpreted wrong.

function FindFiles( const initPath, FileMask: String; const DoRecursive, IncludeDirectories, IncludeFiles: Boolean ): TStringList;
var
  LList: TStringDynArray;
  I: Integer;
  LSearchOption: TSearchOption;
begin
  Result := TStringList.Create();

  { Select the search option }
  if ( DoRecursive = True ) then
    LSearchOption := TSearchOption.soAllDirectories
  else
    LSearchOption := TSearchOption.soTopDirectoryOnly;

  try
    { For all entries use GetFileSystemEntries method }
    if ( IncludeDirectories = True ) and ( IncludeFiles = True ) then
      LList := TDirectory.GetFileSystemEntries(initPath, LSearchOption, nil);

    { For directories use GetDirectories method }
    if ( IncludeDirectories = True ) and not ( IncludeFiles = True ) then
      LList := TDirectory.GetDirectories(initPath, FileMask, LSearchOption);

    { For files use GetFiles method }
    if not ( IncludeDirectories = True ) and ( IncludeFiles = True ) then
      LList := TDirectory.GetFiles(initPath, FileMask, LSearchOption);
  except
    { Catch the possible exceptions }
//    MessageDlg('Incorrect path or search mask', mtError, [mbOK], 0);
    Exit;
  end;

  if Length( LList ) > 0 then
    begin
      for I := 0 to Length(LList) - 1 do
        Result.Add(LList[I]);
    end;
end;

Thankyou for hint with speed between those two (FindFirst() and IOUtils)

I do stick with that Embarcadero sample i found.

If someone know faster ways to do what this function does, please share knowledge 🙂

Share this post


Link to post
Guest

So, since Uwe made that clear as i would guess (that the IOUtils is mostly "sugar") then your next move is googling around the Windows API world.

I would guess that if you want to deliver faster file access that you competition you would need to write drivers, but i do not have the time to google so this is a guess.

Good luck!

Share this post


Link to post

To make a general observation first: There's a reason why tools like "Everything" exists, and why TortoiseSVN (and TortoiseGit and TortoiseBazaar etc) has a cache built in, and/or why Windows itself creates search indexes etc. to speed searching in Explorer.  And it all boils down to the same thing, the fact that searching the filesystem via the existing API's isn't particularly fast. 

 

The broad solution in all of the above cases is conceptually the same: Create an index or cache of the stuff you're trying to search.  And so the answer to your question is then IMHO essentially the same: You have to create some sort of cache to speed the type of search you're doing and think about how that cache is to be maintained.

 

In one of the solutions we've developed we basically wrote something basic but similar to "Everything" that hooks into the Windows filesystem change notifications to keep a cache updated, which is then used to find file locations vastly faster than trying to trundle over gigantic folder trees.  Obviously this also means you get to own a new set of additional problems/responsibilities, e.g. dealing with cache integrity/consistency/reliability etc. but its possible to deal with these.

Edited by ByteJuggler
  • Like 1

Share this post


Link to post
4 hours ago, ByteJuggler said:

To make a general observation first: There's a reason why tools like "Everything" exists, ........

Everything is the best tool ever. I use it multiple times a day and I find every file i need. In one second.

Edited by bernau
  • Like 3

Share this post


Link to post
10 minutes ago, Lars Fosdal said:

I used it for a while, but it caused my PC to die unexpectedly from time to time.

Give a try to UltraSearch, I use it regularly and it is really good.

  • Like 1

Share this post


Link to post

I mostly search in specific trees, with a specific file extension, and with a text or regular expression, using TextPad 8. Fast enough with SSDs.

Share this post


Link to post
Guest

Visual Code's Ctrl+F is very powerful and it visualize's the hits in a good way. Especially if you search from a "root" and "up" mostly.

I use it for everything that i dare not open with the IDE (like units containing the DevExpress skinning control...), sifting through dfms and such.

Share this post


Link to post
50 minutes ago, Lars Fosdal said:

I used it for a while, but it caused my PC to die unexpectedly from time to time.

Never happend to me. How did you know, that "Everything" is the reason, your PC died?

Share this post


Link to post

Well, I can't be 100% certain - but the problem began when I used it, and never happened again after I stopped using it - and I did that for two different periods, on two different computers.

Share this post


Link to post

Write Indexer to speed up search is a good  best idea, I second that for sure.

 

My thought was, if I am not wrong, PIDL is cached Windows stuff. So if you do not index but start new search, I hoped to get faster results than using above IOUtils.

I should look deeper in that virtual shell tree example from Borland. There the PIDL usage isnt explained much but I hope I get it working like above function does.

 

Indexing and monitoring every write access aint my goal. For that purpose way better tools exists than I need in my application.

Since I use not a "*.*" filemask, it is quit fast or at least fastest way ATM for me.

 

Thankyou for reading and your opinions.

Share this post


Link to post
10 hours ago, Lars Fosdal said:

I used it for a while, but it caused my PC to die unexpectedly from time to time.

Determinism is overrated. 😉 

Share this post


Link to post

Is there a way to tweak above example so it accept as filemask somethink like "*.dll;*.exe" ?

Or is running twice only option?

 

Thank you for reading.

Share this post


Link to post

TDirectory.GetFiles/GetDirectories/GetFileSystemEntries have overloads taking a TFilterPredicate. You can provide your own accept function with that. The following example lists all dll and exe files from the given folder in one go.

 

files := TDirectory.GetFiles('C:\Temp\',
    function(const Path: string; const SearchRec: TSearchRec): Boolean
    begin
      Result := TPath.MatchesPattern(SearchRec.Name, '*.exe') or
                TPath.MatchesPattern(SearchRec.Name, '*.dll');
    end);

 

  • Like 3

Share this post


Link to post

From point of logic, to check if i understood correct:

use two kind of lists internal, one to hold only foldernames and one only for filenames

init on creation both lists with current data

do loop folderlist to add things to filelist with your mentioned TFilterPredicate method (thankyou!)

when all done give filelist back as result

 

Share this post


Link to post

If you want only files use GetFiles. If you want only folders use GetDirectories. If you want both use GetFileSystemEntries. You don't need multiple calls - at least I can't imagine a use case for it in the moment. If you want all DLL and EXE files use the code I gave as an example.

 

The Embarcadero example above selects the correct method to call depending on the given parameters - admitted in a somewhat weird way. It never calls more than one GetXXX method, because the three if-clauses are mutually exclusive.

 

BTW, the loop to add the LList to the TStringlist can be replaced by a simple AddStrings call in recent Delphi versions.

Also returning a TStringList instance as a function result is not a first class approach anyway. That's why all the IOUtils functions return string arrays.

Share this post


Link to post

i try to be as specific as i can be

from Embarcadero sample, all i need/call is

    { For files use GetFiles method }
    if not ( IncludeDirectories = True ) and ( IncludeFiles = True ) then
      LList := TDirectory.GetFiles(initPath, FileMask, LSearchOption);

so i have a initpath like "C:\", i have a filemask like "*.dll", i have option to have subfolders included.

so far it works like it should.

now i wanted to integrate a second mask, when i replace with your method, no subfolders are included anymore, only rootfolder will be utilized.

that why i posted my logical construct.

Share this post


Link to post

Look at the different overloads of GetFiles and use that one matching your needs. If you have to specify TSearchOption.soAllDirectories you should specify that to the appropriate overload.

 

files := TDirectory.GetFiles('C:\Temp\', TSearchOption.soAllDirectories, 
    function(const Path: string; const SearchRec: TSearchRec): Boolean
    begin
      Result := TPath.MatchesPattern(SearchRec.Name, '*.exe') or
                TPath.MatchesPattern(SearchRec.Name, '*.dll');
    end);

 

  • Thanks 1

Share this post


Link to post

Great, my own tried where not that successful. With your help i am able to change function so split my filemask in several calls like you showed, my hero, thankyou.

Share this post


Link to post

I also followed your advice with TStringList, now internal and externals are of Type TStringDynArray.

Still a little messy, but works like i needed.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×