Jud 1 Posted September 17, 2021 Well, I think I have figured it out. I think the memory manager is allocating memory for the dynamic arrays in 4KB chunks, no matter how much is needed. That is causing it to swap to the SATA SSD like crazy. I develop on a computer that has 32GB. I have it show me how much memory is available after I SetLength, and it says about 19GB left. I took the very same EXE and ran it on a computer with only 16GB. It should use up all of the memory, but it was showing about 9GB available. So it has to be swapping out. In the old days, I would have hard the hard drive accesses. In Task Manager, it does NOT show the drive activity, but the system must be doing it, I think (otherwise there is no way it would run at all on the 16GB machine). If it is allocating minimum chunks of 4KB, as I think, the total memory use is approaching 200GB. Which brings be back to the original Delphi philosophy of "you'll never need more than 2GB, even in 64-bit programs". But I've thought of some work-arounds. Primarily, reduce the number of buckets by a factor of, say, 256. Then anything that would have gone into buckets 1-256 go into the first bucket, 257-512 go into the second bucket, etc. That will make efficient use of the 4K blocks. Then a little processing on each of these bigger buckets will get it ready for the final processing stage. Share this post Link to post
Lars Fosdal 1792 Posted September 17, 2021 Without a proper code example with similar structures to what you use, it is not possible to understand the nature of your problem. Share this post Link to post
Guest Posted September 17, 2021 3 hours ago, Jud said: I think the memory manager is allocating memory for the dynamic arrays in 4KB chunks, no matter how much is needed. If we are talking about the stock MM which is FastMM, then that is not exact, FastMM doesn't care for the allocation types or usage, only sizes, if it is small then will provide it from pools, big pools allocated and managed by it, if the allocation is big then it will delegate this to the OS and use VIrtualAlloc. 3 hours ago, Jud said: I develop on a computer that has 32GB. I have it show me how much memory is available after I SetLength, and it says about 19GB left. I took the very same EXE and ran it on a computer with only 16GB. It should use up all of the memory, but it was showing about 9GB available. So it has to be swapping out. In the old days, I would have hard the hard drive accesses. In Task Manager, it does NOT show the drive activity, but the system must be doing it, I think (otherwise there is no way it would run at all on the 16GB machine). Lets make sure we understand how SetLength works, when you call SetLength with size X with D (lets assume it is an array) then MM will provide that allocation, but it first it will check if it is allocated if D is holding an allocation, 1) if not means D = nil then it will pick suitable allocation (enough to fit X) from wither its pool or ask the system to provide it and return the new D. 2) if D<>nil ,this means D was holding an allocation (size Y), then MM here will allocate a new allocation (enough to fit X), then and copy the old size (Y) to the new one then return the new D, only after than will release the old allocation (size Y), so if it is was form the system then it will return it, if it is small and coming form a pool then it will hold on it to be reused, (here to notice sometime the new allocation size X is very close to Y, MM might do nothing and return it without copy, but this is different story) Now as you can see at a moment in (2) MM was holding both allocations (sizes X and Y) and this will be problem if both are big, ex. X was 1GB and you need to add few thousands to it (Y), then MM will be hitting 2GB before releasing X, this might stress the OS and cause paging to disk, in sometimes like you example with an application using 16gb and you asked to be extended, then the MM will ask for another 16gb(+the thousands) and this will force the system to start paging many things including your application own memory so part of X and Y will be paged, here MM still holding both to perform the copying, will start trigger page faults in thousands, slowing very badly. Page faults is long story but they have high impact on the performance, specially the hard ones. 3 hours ago, Jud said: If it is allocating minimum chunks of 4KB, as I think, the total memory use is approaching 200GB. The minimum chunk to allocation on OS, (i mean virtual meemory) is 64kb. 3 hours ago, Jud said: Which brings be back to the original Delphi philosophy of "you'll never need more than 2GB, even in 64-bit programs". But I've thought of some work-arounds. Primarily, reduce the number of buckets by a factor of, say, 256. Then anything that would have gone into buckets 1-256 go into the first bucket, 257-512 go into the second bucket, etc. That will make efficient use of the 4K blocks. Then a little processing on each of these bigger buckets will get it ready for the final processing stage. There is always a better way, in general i mean, the important thing is to understand where and what is the problem, only after that you can solve it (or make it better) I am not sure i do understand your code and the problems is suffering, but sharing a small code to demonstrate will bring better understanding, i am sure many here can help and give insights. Share this post Link to post
David Heffernan 2345 Posted September 17, 2021 This entire thread is largely pointless because we don't have a clear idea what you are trying to do and what your code looks like. Share this post Link to post
Stefan Glienke 2006 Posted September 17, 2021 (edited) If your quicksort is taking 80% of the time then its implementation very likely is garbage. Worst case scenario for QuickSort is O(n²) which is why usually one should not use pure QuickSort but IntroSort which mitigates against the worst case. Also depending on the type of the elements you are are sorting the swapping can be ridiculously expensive and in case you are using the generic RTL implementation from TArray it does not do that very well. Edited September 17, 2021 by Stefan Glienke Share this post Link to post
Jud 1 Posted September 18, 2021 It is late and I'm about to be away for a couple of days. I haven't finished all of the changes I talked about (with fewer but bigger buckets) but it definitely seems that the problem is in swapping out the very large dynamic array to the drive millions and millions of times. Details by the middle of next week. Share this post Link to post
Jud 1 Posted September 18, 2021 10 hours ago, Stefan Glienke said: If your quicksort is taking 80% of the time then its implementation very likely is garbage. No, it depends on what the rest of the program is doing. An O(n log n) quicksort could take 99%+ of the time, depending. I've been using quicksort for over 40 years and I tested it on this data. Share this post Link to post
Fr0sT.Brutal 900 Posted September 20, 2021 Hereby I name this thread "Crystal ball challenge". No code shalt be shown as that makes etheric vision unclear 2 Share this post Link to post
Jud 1 Posted October 10, 2021 I don't know if anyone is still reading this, but I got the bucket sort method working. On small cases, it is up to 3 times faster than the quicksort method. But on data sets that are of interest, it is much slower. On a test case with real data, the quicksort method takes 61 minutes but the bucket sort method takes 27 hours. The procedure that does the work doesn't know how many buckets there will be or how large they will need to be until it gets the data. For that reason, I used a dynamic array of dynamic arrays, outside the procedure. I did some statistics on the test case (that has real data). The procedure is called about 143,000,000 times. There are up to about 26,000,000 buckets. However, there are at most about 5,400 items spread out in those buckets. So about 99.98% of the buckets are empty. That is the reason it is so much slower than quicksort - going through all of those empty buckets. The bucket sort was potentially a good idea, but it didn't turn out well. I wasted a lot of time on it, but I learned some things. Share this post Link to post
Jud 1 Posted October 10, 2021 On 9/17/2021 at 2:25 AM, Guest said: > The minimum chunk to allocation on OS, (i mean virtual meemory) is 64kb. I thought the minimum was 4KB. So if I have a dynamic array with the SetLength to 1000 bytes, it is actually going to allocate 64KB? I could rig up a small demo, with real data in a couple of hours, but I think that the problem is with bucket sort, it turns out that almost all of the buckets are empty (I couldn't tell this until I got it working and collected some statistics). A secondary cause could be that I have a static array of counters to tell how many items are in each bucket, and I use FillChar to zero out in each of the 143,000,000 calls to the procedure. That array has about 60,000,000 bytes, but I only zero out as much as I need to, which varies with each call to the subroutine. Quote I am not sure i do understand your code and the problems is suffering, but sharing a small code to demonstrate will bring better understanding, i am sure many here can help and give insights. Share this post Link to post
Lars Fosdal 1792 Posted October 11, 2021 7 hours ago, Jud said: if I have a dynamic array with the SetLength to 1000 bytes, it is actually going to allocate 64KB? No. FastMM has different heaps for small vs large allocations. FastMM can grab very large chunks of memory from Windows and suballocate according to what you need. https://docwiki.embarcadero.com/RADStudio/Sydney/en/Memory_Management#The_FastMM_Memory_Manager_.28Win32_and_Win64.29 The guesswork from the memory manager on how your allocations may be resized, will impact how much memory it allocates and can explain the "overuse" of memory. Also, how your data is represented decides how much data needs to be rearranged on sort. The larger the chunks are that need to be rearranged, the higher the time cost. To rephrase: Less moving of data = better performance. Again, without actual code, it is not possible to theorize on the bottlenecks of your bucket sort vs quick sort comparison. Share this post Link to post
David Heffernan 2345 Posted October 11, 2021 8 hours ago, Jud said: I don't know if anyone is still reading this It's become a somewhat abstract blog post. Because you won't offer details nobody can offer any practical input. Share this post Link to post
Guest Posted October 11, 2021 Delphi Praxis w/o Kas. Ob is useless. What happened @Kas Ob.? Share this post Link to post
Stefan Glienke 2006 Posted October 12, 2021 15 hours ago, Dany Marmur said: Delphi Praxis w/o Kas. Ob is useless. I am sure you did not mean that as it came across because to me it sounded like an insult to any member of Delphi Praxis. Share this post Link to post
Attila Kovacs 629 Posted October 12, 2021 Oh, is he gone again? 😕 s/useless/less Share this post Link to post
Guest Posted October 12, 2021 5 hours ago, Stefan Glienke said: I am sure you did not mean that as it came across because to me it sounded like an insult to any member of Delphi Praxis. Yes, that it did. I was trying to be nice to KasOb and being to fast i neglected context completely. No i did not mean it like that, thank you for pointing that out. Apologies to all, and thank you, @Stefan Glienke. Share this post Link to post
Jud 1 Posted October 15, 2021 On 10/11/2021 at 3:21 AM, Lars Fosdal said: Again, without actual code, it is not possible to theorize on the bottlenecks of your bucket sort vs quick sort comparison. I could extract the relevant code and give a real data set, but it would take some work and the data would be very large. But several days ago I abandoned the bucket sort method and now the quicksort method is working and producing results. I think the problem was because there are millions of buckets and almost all of them are empty, so there is a lot of work with memory. (Actually, on two small test cases, the bucket sort system was 30% and 3X faster than the quicksort version, but on big data sets, there are way too many buckets, almost all empty.) Also, I can multithread the version that uses quicksort. I wouldn't be able to do that effectively with the bucket sort system because of the large memory. (The workstations here have 128GB of RAM, and one instance was taking up nearly half of it.) --------------- A few years ago when I had a large dynamic array of short tLists, it seemed that it was reserving 4KB for each list, even if the list was much smaller. Is that how the memory manager works? Share this post Link to post
Fr0sT.Brutal 900 Posted October 15, 2021 7 hours ago, Jud said: A few years ago when I had a large dynamic array of short tLists, it seemed that it was reserving 4KB for each list, even if the list was much smaller. Is that how the memory manager works? Nope. And it never was, at least since D7 Share this post Link to post
David Heffernan 2345 Posted October 15, 2021 7 minutes ago, Fr0sT.Brutal said: And it never was, at least since D7 It never was ever. 8 hours ago, Jud said: A few years ago when I had a large dynamic array of short tLists, it seemed that it was reserving 4KB for each list, even if the list was much smaller. Is that how the memory manager works? You are probably misinterpreting memory stats from a task manager program. Share this post Link to post
Fr0sT.Brutal 900 Posted October 15, 2021 2 hours ago, David Heffernan said: It never was ever. Yup, I just added that remark to not assert things about ancient D versions that I wasn't using Share this post Link to post
Jud 1 Posted October 15, 2021 11 hours ago, David Heffernan said: . You are probably misinterpreting memory stats from a task manager program. Maybe. It was a few years ago. I was getting the memory available inside my program too, I think. But last night I had a good insight - rather than set up the 100,000,000+ potential buckets when only a few thousand of them will have something in them, do a pass through the data to see what buckets will have something in them and set up only that many buckets! I haven't had a chance to work on that yet, but it should be faster than quicksort and make the final processing step faster too. Share this post Link to post
David Heffernan 2345 Posted October 15, 2021 Have you considered blogging? 2 Share this post Link to post
Jud 1 Posted October 15, 2021 3 minutes ago, David Heffernan said: Have you considered blogging? I don't know how. Is there a blog for Delphi? Share this post Link to post
Fr0sT.Brutal 900 Posted October 18, 2021 On 10/15/2021 at 11:19 PM, David Heffernan said: Have you considered blogging? Twitter will fit better IMHO 1 Share this post Link to post