Jump to content

Jud

Members
  • Content Count

    153
  • Joined

  • Last visited

Posts posted by Jud


  1. 1 hour ago, David Heffernan said:

    Depending on what you do when you query these bits you may find that the memory is the bottleneck and threading won't help. Did you do any benchmarking yet? 

     

    Also if performance matters then delphi is invariably not the answer. 

     

    Actually I did benchmarking.  The first version of this program read a 30GB boolean array from a drive and used that.  Naturally I did a single-thread before multithreading.  I forgot the ratio, but 20 threads on a 12th generation i7 was something like 8-9x faster than the single-thread version.  Now I'm working on replacing the 30GB boolean array with a much larger bit vector.

     


  2. 49 minutes ago, David Heffernan said:

    Depending on what you do when you query these bits you may find that the memory is the bottleneck and threading won't help. Did you do any benchmarking yet? 

     

    Also if performance matters then delphi is invariably not the answer. 

    No benchmarking with the bit vector yet, but I've definitely encountered the memory bottleneck.  In old tests with 8 threads (hyperthreaded 4-core i7), I would get about 5.5x over a single thread on CPU-intensive tasks.  But on tests that were memory-intensive, I would get around 3x on 16 threads.


  3. 9 hours ago, David Heffernan said:
    
    unit Bitset;
    
    interface
    
    uses
      SysUtils, Math;
    
    type
      TBitSet = record
      private
        FBitCount: Int64;
        FSets: array of set of 0..255;
        class function SetCount(BitCount: Int64): Int64; static;
        procedure MakeUnique;
        procedure GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
        function GetIsEmpty: Boolean;
        procedure SetBitCount(Value: Int64);
        function GetSize: Int64;
      public
        class operator In(const Bit: Int64; const BitSet: TBitSet): Boolean;
        class operator Equal(const bs1, bs2: TBitSet): Boolean;
        class operator NotEqual(const bs1, bs2: TBitSet): Boolean;
        property BitCount: Int64 read FBitCount write SetBitCount;
        property Size: Int64 read GetSize;
        property IsEmpty: Boolean read GetIsEmpty;
        procedure Clear;
        procedure IncludeAll;
        procedure Include(const Bit: Int64);
        procedure Exclude(const Bit: Int64);
      end;
    
    
    implementation
    
    { TBitSet }
    
    procedure TBitSet.MakeUnique;
    begin
      // this is used to implement copy-on-write so that the type behaves like a value
      SetLength(FSets, Length(FSets));
    end;
    
    procedure TBitSet.GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
    begin
      Assert(InRange(Bit, 0, FBitCount-1));
      SetIndex := Bit shr 8;   // shr 8   = div 256
      BitIndex := Bit and 255; // and 255 = mod 256
    end;
    
    function TBitSet.GetIsEmpty: Boolean;
    var
      i: Int64;
    begin
      for i := 0 to High(FSets) do begin
        if FSets[i]<>[] then begin
          Result := False;
          Exit;
        end;
      end;
    
      Result := True;
    end;
    
    procedure TBitSet.SetBitCount(Value: Int64);
    var
      Bit, BitIndex: Integer;
      SetIndex: Int64;
    begin
      if (Value<>FBitCount) or not Assigned(FSets) then begin
        Assert(Value>=0);
        FBitCount := Value;
        SetLength(FSets, SetCount(Value));
        if Value>0 then begin
          (* Ensure that unused bits are cleared, necessary give the CompareMem call in Equal. This also
             means that state does not persist when we decrease and then increase BitCount. For instance,
             consider this code:
               var
                 bs: TBitSet;
               ...
               bs.BitCount := 2;
               bs.Include(1);
               bs.BitCount := 1;
               bs.BitCount := 2;
               Assert(not (1 in bs)); *)
          GetSetIndexAndBitIndex(Value - 1, SetIndex, BitIndex);
          for Bit := BitIndex + 1 to 255 do begin
            System.Exclude(FSets[SetIndex], Bit);
          end;
        end;
      end;
    end;
    
    function TBitSet.GetSize: Int64;
    begin
      Result := Length(FSets)*SizeOf(FSets[0]);
    end;
    
    class function TBitSet.SetCount(BitCount: Int64): Int64;
    begin
      Result := (BitCount + 255) shr 8; // shr 8 = div 256
    end;
    
    class operator TBitSet.In(const Bit: Int64; const BitSet: TBitSet): Boolean;
    var
      SetIndex: Int64;
      BitIndex: Integer;
    begin
      BitSet.GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
      Result := BitIndex in BitSet.FSets[SetIndex];
    end;
    
    class operator TBitSet.Equal(const bs1, bs2: TBitSet): Boolean;
    begin
      Result := (bs1.FBitCount=bs2.FBitCount)
        and CompareMem(Pointer(bs1.FSets), Pointer(bs2.FSets), bs1.Size);
    end;
    
    class operator TBitSet.NotEqual(const bs1, bs2: TBitSet): Boolean;
    begin
      Result := not (bs1=bs2);
    end;
    
    procedure TBitSet.Clear;
    var
      i: Int64;
    begin
      MakeUnique;
      for i := 0 to High(FSets) do begin
        FSets[i] := [];
      end;
    end;
    
    procedure TBitSet.IncludeAll;
    var
      i: Int64;
    begin
      for i := 0 to BitCount-1 do begin
        Include(i);
      end;
    end;
    
    procedure TBitSet.Include(const Bit: Int64);
    var
      SetIndex: Int64;
      BitIndex: Integer;
    begin
      MakeUnique;
      GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
      System.Include(FSets[SetIndex], BitIndex);
    end;
    
    procedure TBitSet.Exclude(const Bit: Int64);
    var
      SetIndex: Int64;
      BitIndex: Integer;
    begin
      MakeUnique;
      GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
      System.Exclude(FSets[SetIndex], BitIndex);
    end;
    
    end.

    This is based on code of mine that has is limited to integer bit count. I've not tested it extended to Int64, but I'm sure anyone that wanted to use random code like this would test.

    That looks like what I'm looking for!!! I'm going to try it shortly.  Thanks!


  4. 9 hours ago, David Heffernan said:
    
    unit Bitset;
    
    interface
    
    uses
      SysUtils, Math;
    
    type
      TBitSet = record
      private
        FBitCount: Int64;
        FSets: array of set of 0..255;
        class function SetCount(BitCount: Int64): Int64; static;
        procedure MakeUnique;
        procedure GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
        function GetIsEmpty: Boolean;
        procedure SetBitCount(Value: Int64);
        function GetSize: Int64;
      public
        class operator In(const Bit: Int64; const BitSet: TBitSet): Boolean;
        class operator Equal(const bs1, bs2: TBitSet): Boolean;
        class operator NotEqual(const bs1, bs2: TBitSet): Boolean;
        property BitCount: Int64 read FBitCount write SetBitCount;
        property Size: Int64 read GetSize;
        property IsEmpty: Boolean read GetIsEmpty;
        procedure Clear;
        procedure IncludeAll;
        procedure Include(const Bit: Int64);
        procedure Exclude(const Bit: Int64);
      end;
    
    
    implementation
    
    { TBitSet }
    
    procedure TBitSet.MakeUnique;
    begin
      // this is used to implement copy-on-write so that the type behaves like a value
      SetLength(FSets, Length(FSets));
    end;
    
    procedure TBitSet.GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
    begin
      Assert(InRange(Bit, 0, FBitCount-1));
      SetIndex := Bit shr 8;   // shr 8   = div 256
      BitIndex := Bit and 255; // and 255 = mod 256
    end;
    
    function TBitSet.GetIsEmpty: Boolean;
    var
      i: Int64;
    begin
      for i := 0 to High(FSets) do begin
        if FSets[i]<>[] then begin
          Result := False;
          Exit;
        end;
      end;
    
      Result := True;
    end;
    
    procedure TBitSet.SetBitCount(Value: Int64);
    var
      Bit, BitIndex: Integer;
      SetIndex: Int64;
    begin
      if (Value<>FBitCount) or not Assigned(FSets) then begin
        Assert(Value>=0);
        FBitCount := Value;
        SetLength(FSets, SetCount(Value));
        if Value>0 then begin
          (* Ensure that unused bits are cleared, necessary give the CompareMem call in Equal. This also
             means that state does not persist when we decrease and then increase BitCount. For instance,
             consider this code:
               var
                 bs: TBitSet;
               ...
               bs.BitCount := 2;
               bs.Include(1);
               bs.BitCount := 1;
               bs.BitCount := 2;
               Assert(not (1 in bs)); *)
          GetSetIndexAndBitIndex(Value - 1, SetIndex, BitIndex);
          for Bit := BitIndex + 1 to 255 do begin
            System.Exclude(FSets[SetIndex], Bit);
          end;
        end;
      end;
    end;
    
    function TBitSet.GetSize: Int64;
    begin
      Result := Length(FSets)*SizeOf(FSets[0]);
    end;
    
    class function TBitSet.SetCount(BitCount: Int64): Int64;
    begin
      Result := (BitCount + 255) shr 8; // shr 8 = div 256
    end;
    
    class operator TBitSet.In(const Bit: Int64; const BitSet: TBitSet): Boolean;
    var
      SetIndex: Int64;
      BitIndex: Integer;
    begin
      BitSet.GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
      Result := BitIndex in BitSet.FSets[SetIndex];
    end;
    
    class operator TBitSet.Equal(const bs1, bs2: TBitSet): Boolean;
    begin
      Result := (bs1.FBitCount=bs2.FBitCount)
        and CompareMem(Pointer(bs1.FSets), Pointer(bs2.FSets), bs1.Size);
    end;
    
    class operator TBitSet.NotEqual(const bs1, bs2: TBitSet): Boolean;
    begin
      Result := not (bs1=bs2);
    end;
    
    procedure TBitSet.Clear;
    var
      i: Int64;
    begin
      MakeUnique;
      for i := 0 to High(FSets) do begin
        FSets[i] := [];
      end;
    end;
    
    procedure TBitSet.IncludeAll;
    var
      i: Int64;
    begin
      for i := 0 to BitCount-1 do begin
        Include(i);
      end;
    end;
    
    procedure TBitSet.Include(const Bit: Int64);
    var
      SetIndex: Int64;
      BitIndex: Integer;
    begin
      MakeUnique;
      GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
      System.Include(FSets[SetIndex], BitIndex);
    end;
    
    procedure TBitSet.Exclude(const Bit: Int64);
    var
      SetIndex: Int64;
      BitIndex: Integer;
    begin
      MakeUnique;
      GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
      System.Exclude(FSets[SetIndex], BitIndex);
    end;
    
    end.

    This is based on code of mine that has is limited to integer bit count. I've not tested it extended to Int64, but I'm sure anyone that wanted to use random code like this would test.

     


  5. 11 hours ago, Rollo62 said:

    Like said, just an alternative idea, with pros and cons.

    Only @Jud may know if this is workable for him somehow, or not.

     

     

    I doubt that it would be workable, since they are natually one dimentional, and the program just needs to see if a bit is 1 or 0, and it needs to do a large number of them.


  6. 12 hours ago, David Heffernan said:

    What about the cost of moving data between gpu and main memory? Benefits of gpu are in highly parallel computation. Where is that in this scenario. Just saying GPU doesn't make something fast or efficient. 

    Interesting idea, but I don't know how to deal with the GPU.  This will be running 20 instances in parallel, but the accesses will be pretty much random.  The target calls for doing at least 10 trillion accesses.


  7. 17 hours ago, David Heffernan said:

    No. You use a hashed collection. Like a dictionary, but one without a value, which is known as a set. 

     

    But at 5% full then I don't think it will get you any benefit, because of the overhead of the hashed collection. 

     

    Performance is an issue.  The plan is to do at least 10 trillion accesses to the bit vector - more if I can manage it.

     


  8. 15 hours ago, David Heffernan said:

    Show complete but minimal code please

    Here is a short demo.  With Delphi 11.3 I get a compiler error on the line with using BlockWrite as a function.  The documentation says that BlockWrite is a function that returns an integer.

     

    procedure ErrorDemo;
    var CountNewData, WriteResult : integer;
        OutFile : file;
        NewData : array of int64;
    begin
    AssignFile( OutFile, 'c:\NewData.data');
    rewrite( OutFile, 8);
    WriteResult := BlockWrite( OutFile, NewData[ 0], CountNewData, WriteResult);
    { compiler error - E2010 incompatible types - integer and procedure }
    BlockWrite( OutFile, NewData[ 0], CountNewData, WriteResult); // no error
    CloseFile( OutFile);
    end;

     


  9. This is the first time I'm using BlockRead and BlockWrite in Delphi 11,3.  I've used them in previous versions.  I thought they were functions, returning an integer.  The help file says so and the source code says so.  But I'm getting an E2010 error when trying to assign the result to an integer, e.g.

     

    RecordsWritten := BlockWrite( OutFile, ...

     

    Just calling it like a procedure works.  I'm attaching a screenshot of the error message.  Is this a bug or am I wrong?

     

    BlockWrite E2010.jpg


  10. 3 hours ago, Uwe Raabe said:

    Wow! If my math is correct that needs more than 32GB of memory.

     

    I have no idea what the purpose is, but perhaps there are other approaches to achieve the same.

    1. 32GB is nothing today.  Every computer here that it might be running on has at last 128GB of RAM.

     

    2. No, this is the best approach.


  11. 32 minutes ago, David Heffernan said:

    How many such instances of this type do you need in memory at any time? And what's the expected number of bits that are set at any time? Do you really need to store all bits, both 0 and 1. Can't you just stor the 1s and infer the 0s from the fact that they aren't stored as 1s?

     

    I estimate that about 5% of the bits will be 1s.  They will be read-only in the actual run, so the data will be shared among 20 threads.  I don't understand how storing just the 1s would work.  That would result in a series of about 15,000,000 1s, but that would be useless.

     

     


  12. 10 hours ago, Der schöne Günther said:

    The maximum size TBits can be before bugging out is Integer.MaxValue - 31. That's roughly 256 Megabytes of consecutive boolean storage spage. You really need that much?

    Yes, I need more.  I current;y need about 300 billion bits. I can write it myself, but I thought there might be one already available.


  13. On 6/24/2023 at 4:01 AM, dummzeuch said:

    There are usually some tasks that do not need to run on the performance cores, so setting the affinity mask for the whole program may not be the best strategy, even though it's the easiest way. But I'm sure that sooner or later Windows will start ignoring those masks because everybody sets them.

     

    Of course this is currently the only way to do that for threads generated using parallel for.

     

    Well, I need all of the power I have available.

     

    BUT - I realized a fallacy in my thinking and analysis.  I assumed that if I ran 16 threads with the parallel for loop, they would all be on performance cores, and that if I ran 20 threads, the extra 4 would go to the efficient cores.  But after more experimentation, parallel for seems to put the threads on any core.  I thought of timing how long each thread took to run, and with, say 20 threads, there wasn't that much difference.  And running 20 threads with parallel for (with 8 P=cores and 4 E-cores), the performance was 9-10% better than running 16 threads.  So it is using the E-cores too.  And running 100 or so threads, it pretty much evens out the difference between the E and P cores (because the ones on the P-cores finish sooner and get reassigned another thread.)


  14. 5 hours ago, DelphiUdIT said:

    You can try to call this:

    
    function uCoreId: uint64; register;
    asm
      //from 10 gen it reads the IA32_TSC_AUX MSR which should theoretically
      //indicate the CORE (THREAD in the case of HyperThread processors) in which rdpid runs
      rdpid RAX;
    end;

    This function returns the ID of the core (means CORE THREAD) in wich rdpid runs. It works form Intel 10th generation CPU.

     

    The first Core Thread is numbered 0 (zero).

     

    You will see that also the efficients core will be sometimes used.

    This is because the ThreadDirector allocates threads (meaning processes) based on various factors. The distribution is not predictable.

    If you want to avoid using the efficient cores you have to use the affinity mask (for the whole program) and select only the performance cores.

     

    P.S.: this is for WIN64 program.

     

    That is what I'm using.  I've done some testing with the parallel for, for different numbers of threads, say 1..16, 1..20, 1..100, 1..500,  etc.  All I used was the time to complete the run, with each task being the same size.  It seems that if I'm running 20 threads on a CPU with 8 performance cores and 4 efficient cores, it seems to be assigning the tasks across all CPUs.  But if the number of tasks gets into the hundreds, it is assigning tasks to the efficient cores, but with a few hundred tasks, as the performance cores are resassigned when they finish while the efficient cores are still running.  So when the number of tasks is in the hundreds (or more) it seems to naturally balance the load among the cores.

     


  15. Thanks. After I had posted the message and two updates, I searched for the words in the error message and found that the problem had been answered:

     

    "That is known (reported issue). There is some problem with migrating or applying Welcome screen layout after migration.

     When you launch IDE click Edit Layout on Welcome Screen. Reset Welcome Screen to default layout and then adjust it again the way you like it.

    Next time you start IDE it should run normally."

     

     


  16. I moved from an old computer to a new one,  I get the following message about not being able to add columns or rows when I try to bring up a VCL project that worked on the old computer.  I used the migration tool to copy my settings to the new computer.

     

    How can this be fixed?

     

    Also, I get a bunch of access errors after this.

     

    PS - also, this seems to happen with every VCL project  but not with console apps.

     

     

    Delphi Screenshot 2023-05-22 233309.jpg

×