KeithLatham 5 Posted January 5, 2019 OK, my project involves copying a file structure containing music onto a USB messagestick. The structure is thousands of tracks, in hundreds of albums, in many artists, in a few categories. So the file structure is along the lines of I:\Music\<Category>\<Artist>\<Album>\<track>.mp3 Before I start writing the output, I check to see if the target drive already has a Music directory and offer to delete it if so. Usually the the target is a USB message stick and I can just format it. But that doesn't have to be so, it could be a hard drive and I do not want to format that, so in that case I (currently) just delete the target directory, which for the large structure I am talking about can take many minutes. My question is, would it be (1) SAFE and (2) WORTHWHILE to do the delete in parallel? Say one thread task for each category? Or maybe one thread task for each 10 artists (I suppose an artist could be in more than one category so this wouldn't be very safe)? Share this post Link to post
Uwe Raabe 2057 Posted January 5, 2019 20 minutes ago, KeithLatham said: My question is, would it be (1) SAFE and (2) WORTHWHILE to do the delete in parallel? Given that file access is almost sequential (remember how hard drives work), I assume that the OS will serialize those parallel tasks in the end anyway. But perhaps I am just missing something. Share this post Link to post
Primož Gabrijelčič 223 Posted January 5, 2019 How do you delete the folder? By deleting files one by one and the removing the parent folder? In that case, you may want to execute cmd /c rmdir /q /s target_folder from your program. Should be much faster. Share this post Link to post
David Heffernan 2345 Posted January 6, 2019 Generally multiple threads give performance benefits for cpu bound tasks. This one isn't cpu bound. So how do you imagine there to be any benefit? Share this post Link to post
KeithLatham 5 Posted January 6, 2019 13 hours ago, Primož Gabrijelčič said: How do you delete the folder? By deleting files one by one and the removing the parent folder? In that case, you may want to execute cmd /c rmdir /q /s target_folder from your program. Should be much faster. Using IFILEOPERATION 'FO_DELETE' against the target folder. 4 hours ago, David Heffernan said: Generally multiple threads give performance benefits for cpu bound tasks. This one isn't cpu bound. So how do you imagine there to be any benefit? Good point. That takes care of that dumb idea. Thanks. Share this post Link to post
hsvandrew 23 Posted January 21, 2019 On 1/6/2019 at 12:28 AM, Uwe Raabe said: Given that file access is almost sequential (remember how hard drives work), I assume that the OS will serialize those parallel tasks in the end anyway. But perhaps I am just missing something. I'm not sure that I agree with this comment. Any memory based storage i.e. SSD, Memory Sticks etc can perform multiple operations at the same time. There is no head to move and multiple parts can be written or changed at the same time. So yes, using a thread would be very helpful. You should create a queue (pool) that receives files to delete, and another process that scans directories. This way you won't need to wait before the directory scan is done to start removing things. You may even want to consider the low level features Everything by Voidtools uses to get the directory structures/file structures faster than FindFirst. Share this post Link to post
Lars Fosdal 1791 Posted January 22, 2019 SSD disks are usually connected via SATA (Serial ATA) or PCIe using the NVMe protocol. The first does not do parallel operations, while the second does. However, the speed benefit of the latter is when writing large amounts of data in parallel to individual areas. When deleting files, the OS is rewriting minor amounts of data in a shared area that needs to be integrity managed i.e. shared access locking, so I would suspect that there is no gain to parallelizing deletion of files. Share this post Link to post