Jump to content
KeithLatham

Opinions solicited: Parallel processing to delete 40+Gig file structure?

Recommended Posts

OK, my project involves copying a file structure containing music onto a USB messagestick. The structure is thousands of tracks, in hundreds of albums, in many artists, in a few categories. So the file structure is along the lines of I:\Music\<Category>\<Artist>\<Album>\<track>.mp3

 

Before I start writing the output, I check to see if the target drive already has a Music directory and offer to delete it if so. Usually the the target is a USB message stick and I can just format it. But that doesn't have to be so, it could be a hard drive and I do not want to format that, so in that case I (currently) just delete the target directory, which for the large structure I am talking about can take many minutes.

 

My question is, would it be (1) SAFE and (2) WORTHWHILE to do the delete in parallel?

 

Say one thread task for each category?

Or maybe one thread task for each 10 artists (I suppose an artist could be in more than one category so this wouldn't be very safe)?

 

 

Share this post


Link to post
20 minutes ago, KeithLatham said:

My question is, would it be (1) SAFE and (2) WORTHWHILE to do the delete in parallel?

Given that file access is almost sequential (remember how hard drives work), I assume that the OS will serialize those parallel tasks in the end anyway. But perhaps I am just missing something.

Share this post


Link to post

How do you delete the folder? By deleting files one by one and the removing the parent folder? In that case, you may want to execute cmd /c rmdir /q /s target_folder from your program. Should be much faster.

 

Share this post


Link to post

Generally multiple threads give performance benefits for cpu bound tasks. This one isn't cpu bound. So how do you imagine there to be any benefit? 

Share this post


Link to post
13 hours ago, Primož Gabrijelčič said:

How do you delete the folder? By deleting files one by one and the removing the parent folder? In that case, you may want to execute cmd /c rmdir /q /s target_folder from your program. Should be much faster.

 

Using IFILEOPERATION 'FO_DELETE' against the target folder.

4 hours ago, David Heffernan said:

Generally multiple threads give performance benefits for cpu bound tasks. This one isn't cpu bound. So how do you imagine there to be any benefit? 

Good point. That takes care of that dumb idea. Thanks.

Share this post


Link to post
On 1/6/2019 at 12:28 AM, Uwe Raabe said:

Given that file access is almost sequential (remember how hard drives work), I assume that the OS will serialize those parallel tasks in the end anyway. But perhaps I am just missing something.

I'm not sure that I agree with this comment. Any memory based storage i.e. SSD, Memory Sticks etc can perform multiple operations at the same time. There is no head to move and multiple parts can be written or changed at the same time.

So yes, using a thread would be very helpful. You should create a queue (pool) that receives files to delete, and another process that scans directories. This way you won't need to wait before the directory scan is done to start removing things. You may even want to consider the low level features Everything by Voidtools uses to get the directory structures/file structures faster than FindFirst.

Share this post


Link to post

SSD disks are usually connected via SATA (Serial ATA) or PCIe using the NVMe protocol.  The first does not do parallel operations, while the second does. However, the speed benefit of the latter is when writing large amounts of data in parallel to individual areas. When deleting files, the OS is rewriting minor amounts of data in a shared area that needs to be integrity managed i.e. shared access locking, so I would suspect that there is no gain to parallelizing deletion of files.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×