Jump to content

Jud

Members
  • Content Count

    118
  • Joined

  • Last visited

Posts posted by Jud


  1. 15 hours ago, David Heffernan said:

    Show complete but minimal code please

    Here is a short demo.  With Delphi 11.3 I get a compiler error on the line with using BlockWrite as a function.  The documentation says that BlockWrite is a function that returns an integer.

     

    procedure ErrorDemo;
    var CountNewData, WriteResult : integer;
        OutFile : file;
        NewData : array of int64;
    begin
    AssignFile( OutFile, 'c:\NewData.data');
    rewrite( OutFile, 8);
    WriteResult := BlockWrite( OutFile, NewData[ 0], CountNewData, WriteResult);
    { compiler error - E2010 incompatible types - integer and procedure }
    BlockWrite( OutFile, NewData[ 0], CountNewData, WriteResult); // no error
    CloseFile( OutFile);
    end;

     


  2. This is the first time I'm using BlockRead and BlockWrite in Delphi 11,3.  I've used them in previous versions.  I thought they were functions, returning an integer.  The help file says so and the source code says so.  But I'm getting an E2010 error when trying to assign the result to an integer, e.g.

     

    RecordsWritten := BlockWrite( OutFile, ...

     

    Just calling it like a procedure works.  I'm attaching a screenshot of the error message.  Is this a bug or am I wrong?

     

    BlockWrite E2010.jpg


  3. 3 hours ago, Uwe Raabe said:

    Wow! If my math is correct that needs more than 32GB of memory.

     

    I have no idea what the purpose is, but perhaps there are other approaches to achieve the same.

    1. 32GB is nothing today.  Every computer here that it might be running on has at last 128GB of RAM.

     

    2. No, this is the best approach.


  4. 32 minutes ago, David Heffernan said:

    How many such instances of this type do you need in memory at any time? And what's the expected number of bits that are set at any time? Do you really need to store all bits, both 0 and 1. Can't you just stor the 1s and infer the 0s from the fact that they aren't stored as 1s?

     

    I estimate that about 5% of the bits will be 1s.  They will be read-only in the actual run, so the data will be shared among 20 threads.  I don't understand how storing just the 1s would work.  That would result in a series of about 15,000,000 1s, but that would be useless.

     

     


  5. 10 hours ago, Der schöne Günther said:

    The maximum size TBits can be before bugging out is Integer.MaxValue - 31. That's roughly 256 Megabytes of consecutive boolean storage spage. You really need that much?

    Yes, I need more.  I current;y need about 300 billion bits. I can write it myself, but I thought there might be one already available.


  6. On 6/24/2023 at 4:01 AM, dummzeuch said:

    There are usually some tasks that do not need to run on the performance cores, so setting the affinity mask for the whole program may not be the best strategy, even though it's the easiest way. But I'm sure that sooner or later Windows will start ignoring those masks because everybody sets them.

     

    Of course this is currently the only way to do that for threads generated using parallel for.

     

    Well, I need all of the power I have available.

     

    BUT - I realized a fallacy in my thinking and analysis.  I assumed that if I ran 16 threads with the parallel for loop, they would all be on performance cores, and that if I ran 20 threads, the extra 4 would go to the efficient cores.  But after more experimentation, parallel for seems to put the threads on any core.  I thought of timing how long each thread took to run, and with, say 20 threads, there wasn't that much difference.  And running 20 threads with parallel for (with 8 P=cores and 4 E-cores), the performance was 9-10% better than running 16 threads.  So it is using the E-cores too.  And running 100 or so threads, it pretty much evens out the difference between the E and P cores (because the ones on the P-cores finish sooner and get reassigned another thread.)


  7. 5 hours ago, DelphiUdIT said:

    You can try to call this:

    
    function uCoreId: uint64; register;
    asm
      //from 10 gen it reads the IA32_TSC_AUX MSR which should theoretically
      //indicate the CORE (THREAD in the case of HyperThread processors) in which rdpid runs
      rdpid RAX;
    end;

    This function returns the ID of the core (means CORE THREAD) in wich rdpid runs. It works form Intel 10th generation CPU.

     

    The first Core Thread is numbered 0 (zero).

     

    You will see that also the efficients core will be sometimes used.

    This is because the ThreadDirector allocates threads (meaning processes) based on various factors. The distribution is not predictable.

    If you want to avoid using the efficient cores you have to use the affinity mask (for the whole program) and select only the performance cores.

     

    P.S.: this is for WIN64 program.

     

    That is what I'm using.  I've done some testing with the parallel for, for different numbers of threads, say 1..16, 1..20, 1..100, 1..500,  etc.  All I used was the time to complete the run, with each task being the same size.  It seems that if I'm running 20 threads on a CPU with 8 performance cores and 4 efficient cores, it seems to be assigning the tasks across all CPUs.  But if the number of tasks gets into the hundreds, it is assigning tasks to the efficient cores, but with a few hundred tasks, as the performance cores are resassigned when they finish while the efficient cores are still running.  So when the number of tasks is in the hundreds (or more) it seems to naturally balance the load among the cores.

     


  8. Thanks. After I had posted the message and two updates, I searched for the words in the error message and found that the problem had been answered:

     

    "That is known (reported issue). There is some problem with migrating or applying Welcome screen layout after migration.

     When you launch IDE click Edit Layout on Welcome Screen. Reset Welcome Screen to default layout and then adjust it again the way you like it.

    Next time you start IDE it should run normally."

     

     


  9. I moved from an old computer to a new one,  I get the following message about not being able to add columns or rows when I try to bring up a VCL project that worked on the old computer.  I used the migration tool to copy my settings to the new computer.

     

    How can this be fixed?

     

    Also, I get a bunch of access errors after this.

     

    PS - also, this seems to happen with every VCL project  but not with console apps.

     

     

    Delphi Screenshot 2023-05-22 233309.jpg


  10. On some programs I get strange stack overflow messages.  For instance, in a program that I'm working on in Delphi 11 now.  In the IDE, I run it, enter some stuff into VCL controls, and click a button to run a procedure.  It all works.  As long as I have it running, I can run the procedure again, with or without new parameters. 

     

    If I exit from the IDE, restart the IDE, DON'T make any changes to the source code, and try to run it, it gives a stack overflow message at the first line of the procedure when I click the button to run it.

     

    If I make ANY change to the source code, even adding a space and leaving it in or removing it, then I don't get the stack overflow message.

     

    Also, after I get the stack overflow message in the IDE, if I then exit the IDE and run the EXE, I get the stack overflow message.

     

    Does anyone know what is happening and how to fix it (other than making a change to the source code between each run)?

     

    Additional information: it happens in 64-bit mode but not 32-bit mode.

     


  11. Thanks, that is what I was wondering.  I know how to wait for ONE task to finish and to wait for ALL tasks to finish.  So for 9 tasks with 8 logical CPUs, I can wait for one to finish before starting the 9th one.  But what if there are, say, 20 tasks, and I don't want to run all so in parallel.  Is there an easy way to keep only 8 running in parallel at the time?


  12. If you have started more TTasks than you have logical CPUs, does it do them in parallel or does it finish one before starting the next one in the queue?  Example, you have a CPU with 8 logical cores and start 9 TTasks.  Does it finish one of the first 8 before starting #9?


  13. 11 hours ago, David Heffernan said:

     

    You are probably misinterpreting memory stats from a task manager program. 

    Maybe.  It was a few years ago.  I was getting the memory available inside my program too, I think.

     

    But last night I had a good insight - rather than set up the 100,000,000+ potential buckets when only a few thousand of them  will have something in them, do a pass through the data to see what buckets will have something in them and set up only that many buckets!

     

    I haven't had a chance to work on that yet, but it should be faster than quicksort and make the final processing step faster too.


  14. On 10/11/2021 at 3:21 AM, Lars Fosdal said:

     

    Again, without actual code, it is not possible to theorize on the bottlenecks of your bucket sort vs quick sort comparison.

    I could extract the relevant code and give a real data set, but it would take some work and the data would be very large.  But several days ago I abandoned the bucket sort method and now the quicksort method is working and producing results. 

     

    I think the problem was because there are millions of buckets and almost all of them are empty, so there is a lot of work with memory.  (Actually, on two  small test cases, the bucket sort system was 30% and 3X faster than the quicksort version, but on big data sets, there are way too many buckets, almost all empty.)

     

    Also, I can multithread the version that uses quicksort.  I wouldn't be able to do that effectively with the bucket sort system because of the large memory.  (The workstations here have 128GB of RAM, and one instance was taking up nearly half of it.)

     

    ---------------

    A few years ago when I had a large dynamic array of short tLists, it seemed that it was reserving 4KB for each list, even if the list was much smaller.  Is that how the memory manager works?

×