Jump to content

Jud

Members
  • Content Count

    118
  • Joined

  • Last visited

Posts posted by Jud


  1. On 9/17/2021 at 2:25 AM, Guest said:

    > The minimum chunk to allocation on OS, (i mean virtual meemory) is 64kb.

    I thought the minimum was 4KB.  So if I have a dynamic array with the SetLength to 1000 bytes, it is actually going to allocate 64KB?

     

    I could rig up a small demo, with real data in a couple of hours, but I think that the problem is with bucket sort, it turns out that almost all of the buckets are empty (I couldn't tell this until I got it working and collected some statistics).  A secondary cause could be that I have a static array of counters to tell how many items are in each bucket, and I use FillChar to zero out in each of the 143,000,000 calls to the procedure.  That array has about 60,000,000 bytes, but I only zero out as much as I need to, which varies with each call to the subroutine.

    Quote

     

     

     

    I am not sure i do understand your code and the problems is suffering, but sharing a small code to demonstrate will bring better understanding, i am sure many here can help and give insights.

     


  2. I don't know if anyone is still reading this, but I got the bucket sort method working.  On small cases, it is up to 3 times faster than the quicksort method.  But on data sets that are of interest, it is much slower.  On a test case with real data, the quicksort method takes 61 minutes but the bucket sort method takes 27 hours.

    The procedure that does the work doesn't know how many buckets there will be or how large they will need to be until it gets the data.  For that reason, I used a dynamic array of dynamic arrays, outside the procedure.  I did some statistics on the test case (that has real data).

    The procedure is called about 143,000,000 times.  There are up to about 26,000,000 buckets.  However, there are at most about 5,400 items spread out in those buckets.  So about 99.98% of the buckets are empty.  That is the reason it is so much slower than quicksort - going through all of those empty buckets.

    The bucket sort was potentially a good idea, but it didn't turn out well.  I wasted a lot of time on it, but I learned some things.

     


  3. 10 hours ago, Stefan Glienke said:

    If your quicksort is taking 80% of the time then its implementation very likely is garbage.

     

    No, it depends on what the rest of the program is doing.  An O(n log n) quicksort could take 99%+ of the time, depending. 

     

    I've been using quicksort for over 40 years and I tested it on this data.


  4. It is late and I'm about to be away for a couple of days.  I haven't finished all of the changes I talked about (with fewer but bigger buckets) but it definitely seems that the problem is in swapping out the very large dynamic array to the drive millions and millions of times.  Details by the middle of next week.


  5. Well, I think I have figured it out.  I think the memory manager is allocating memory for the dynamic arrays in 4KB chunks, no matter how much is needed.  That is causing it to swap to the SATA SSD like crazy. 

     

    I develop on a computer that has 32GB.  I have it show me how much memory is available after I SetLength, and it says about 19GB left.  I took the very same EXE and ran it on a computer with only 16GB.  It should use up all of the memory, but it was showing about 9GB available.  So it has to be swapping out.

     

    In the old days, I would have hard the hard drive accesses.  In Task Manager, it does NOT show the drive activity, but the system must be doing it, I think (otherwise there is no way it would run at all on the 16GB machine).

     

    If it is allocating minimum chunks of 4KB, as I think, the total memory use is approaching 200GB.

     

    Which brings be back to the original Delphi philosophy of "you'll never need more than 2GB, even in 64-bit programs".

     

    But I've thought of some work-arounds.  Primarily, reduce the number of buckets by a factor of, say, 256.  Then anything that would have gone into buckets 1-256 go into the first bucket, 257-512 go into the second bucket, etc.  That will make efficient use of the 4K blocks. Then a little processing on each of these bigger buckets will get it ready for the final processing stage.

     

     


  6. I think the problem is probably calling SetLength on a fairly large dynamic array millions of times inside a procedure is overwhelming the memory manager.  I need to keep a count showing how many times it has been called in the last second, and see if that starts to drop.

     

    Maybe I can get some sample code.  I'm trying different things.  Today I finished a run after making the 2-dimensional non-rectangular array global, so SetLength isn't called for it millions of time inside the procedure that prepares the data for the final phase.  This was faster, but the run took 39.6 hours, as opposed to about 2.2 for the quicksort approach.  But I have a dynamic array inside that procedure that tells me how many elements across the big array is.

     

    I gathered some statistics during the run and it shows that this array could be less than 200 megabytes, so I'm going to try making it a global static array.  Inside the procedure that is called millions of times, I have to set part of the array to zero, but I know how much of that array needs to use FillChar.


  7. The final phase of the processing needs the items in buckets, so bucket sort seems logical.  It should beat quicksort.  I think the problem making it slow was that the procedure that prepares the data for final processing is called millions of times and the large dynamic array of arrays was inside that procedure, calling SetLength each time.  I've just made the large dynamic array global and SetLength to once to reasonable estimates, and then increase it if necessary.  That is working but I haven't compared it to quicksort yet.


  8. I'm hot shuffling the records, I'm grouping them according to how they are to be processed.  The records are 16-bytes long (there are a lot of them).

     

    Yesterday (after watching a Webinar and installing Delphi 11.0, yes it is available) I did more profiling.

     

    I said that the quicksort was taking 80%.  There is a SwapRecords procedure nested in the quicksort procedure that is taking an additional 7%.

     

    I went back to my earlier version (with compiler directives) - the one that uses a non-rectulangular dynamic array of dynamic arrays.  The profile shows that SetLength is taking most of the time.  The profiler shows "system" using 77% of the time.  @FillChar is using the most, which must be zeroing out the dynamic array (which ( don't need).  SysGetMem is second in  the profiler.

     

    But yesterday I had the idea of using a large dynamic array of small static arrays.  I think that is going to work (in 64-bit mode).

     

    Also, what Administrators said above just made me think of another possible approach.  Keep the dynamic array of records and go through and pick out the ones with a certain key, and process them.  Repeat with the next key.  The problem with this is that there are so many keys (and each one will have to make a pass through the array) that I think it will be back to quadratic time.

     

    Probably tomorrow I will implement the dynamic array of small static array method.

     


  9. I've installed that.  It gives me the message "Neither TD32, MAP, nor JDB information were found...". EDIT: I found where to turn on the MAP file, and I think it is working.

     

    Results: it is spending 80% of its time in Quicksort.  So I need to go back to trying the bucket sort.


  10. 12 hours ago, Stefan Glienke said:

    I am absolutely putting my bet on this. The second issue after that will most likely be a non-cache-friendly access pattern.

     

    But you don't have to guess - run the code under a profiler and it will tell you the slow parts.

     

    I haven't gotten back to that yet, but I'm going to do timings and see.


  11. On 9/2/2021 at 3:54 AM, Fr0sT.Brutal said:

    So the issue is looping through data not allocation. At the 1st glance, if you could calculate the matching criteria beforehand, you could build a structure that ties criteria and item index so that at the next step you compare only flags/short hashes to receive indices.

    I think the problem may have to be with the data allocation, since bucket sort (the natural thing for this data) is so much slower than quadratic-time brute force.  But I need to get back to looking at it - I've taken a different approach for now and I got the first working version going yesterday,

     

    The procedure that puts the data in buckets is called millions of times, with data arrays of varying sizes, but the bulk of them under 100.  I switched to doing a quicksort and it took 6.8 hours on the test case, as opposed to 17.7 hours before.  So a lot of the time is not in this procedure now, so I don't know how much faster a linear-time sort would be.

     

    Since so many calls to the procedure have a small number of records, I compared several sorting methods to quicksort on different sizes of data.  Quicksort beat them all until the size got down into the 20s - then selection sort started outperforming quicksort by a good margin.

     

    An interesting thing - I was testing several sorts, and I've read many times that insertion sort is the fastest of the quadratic-time sorting algorithms.  But, on these data records, selection sort beat insertion sort significantly.

     

    I'm going to get back to looking at the original bucket method.


  12. I need to update (I've been away for a couple of days).

     

    Here is the situation.  A program does a lot of processing and calls a procedure to do the final work.  This procedure is given an array of records, which can number in the millions.  The procedure is called a large number of times. 

     

    The procedure does some work matching pairs of records that meet certain criteria.  At first, I just did a brute-force quadratic time pair of nested loops, to make sure it was working.  (Get it right before making it fast.)  The test run took 17.7 hours. 

     

    I knew there were better ways to do it.  The obvious choice is to do a bucket sort on the data before processing it.  A static array is limited to something less than 2 billion bytes, as far as I know.  This isn't enough, so I tried a tw0-dimensional non-rectangular array for the bucket sort.  (The first dimension is potentially in the millions, or many millions, but the size of each bucket is small.)  I thought this would be much faster than the quadratic-time brute-force method, but it ran for 2 days on the test case and didn't get anywhere near finishing.  I think part of the problem is that I'm using SetLength on the daynamic array of array a lot of times inside the procedure.  I might can avoid that, but I haven't tested it yet.

     

    Right now I'm working on a quicksort of the data, even though a bucket sort is theoretically faster.  I have it working on small cases, but later today I should have it working on the test case.

     

    So that is the background and where I am.


  13. 8 hours ago, Kas Ob. said:

    Well, this is not short story as there is many things to distinguish between, so i will try to point few things and most likely will miss few things too, thankfully there is others here will help.

     

    But first lets make sure we are on the same page as your post combined many things.

    What did you mean with static ?!!

    This is unclear, did you mean static memory declared as global in the project ?

    By static, I mean an array that is not a dynamic array. 

     

    Briefly, hat the program needs to do would naturally use a two-dimensional non-rectangular dynamic array.  The array would have millions of rows, but each row would be a short list.  I wrote it that way, but performance is terrible.  The sizes aren't known until the procedure is run.  I believe memory is allocated in 4KB blocks (and zeroed out), which is a lot more than each row needs.  I think this is the bottleneck.

     

    added... I thought of something else to try - set the length of the dynamic array of arrays once, outside the procedure and enlarge as necessary.  That may work if it doesn't use up too much memory.


  14. What is the largest amount of static memory you can get in Windows?  I know about the 2GB limit on static structures (even on 64-bit platform!)  The most static memory I can get is about 1,994,000,000 bytes - then I start getting access violations.  Is there a way to get larger static arrays or is there a way to get several static arrays under 2GB each, with the total static memory over 2GB?


  15. 4 hours ago, BruceTTTT said:

    My experience exactly as well.

     

    Can't figure out a pattern either, gets annoying at times. I seem to have the problem more with 10.4.2 than 10.4.1 (but 10.4.2 fixed a lot of other problems).

    Yes.  I don't know why or when it happens.  I don't think I ever had the problem with 10.3, which is why I still also have 10.3 on my desktop.  When the 10.4.x IDE isn't working, I have to go back to 10.3.  Of the 10.4 versions, if I recall correctly, 10.4.1 BEFORE the patches worked well.


  16. Follow-up.

     

    When I put a recursive Fibonacci function in the parallel.for and ran it on various inputs, it worked.  When I called a recursive procedure to do Depth First Search (DFS) on a graph, it would not work if it was doing more than one in parallel. 

     

    But then I made a small procedure to start the recursive calls to the DFS procedure and put that small procedure inside the parallel.for instead of the actual call to the DFS procedure, it works!

     

    Just in case someone asks about this.

×