Jump to content
Jud

Replacement for TBits?

Recommended Posts

TBits is still limited to 2^31 bits, even on the 64-bit platform.  Is there a replacement for it that will allow > 2^32 bits (i.e. int64?)

 

(I could write one, but a lot of people are more efficient with this kind of thing than I,)

Share this post


Link to post

The maximum size TBits can be before bugging out is Integer.MaxValue - 31. That's roughly 256 Megabytes of consecutive boolean storage spage. You really need that much?

Share this post


Link to post
10 hours ago, Der schöne Günther said:

The maximum size TBits can be before bugging out is Integer.MaxValue - 31. That's roughly 256 Megabytes of consecutive boolean storage spage. You really need that much?

Yes, I need more.  I current;y need about 300 billion bits. I can write it myself, but I thought there might be one already available.

Share this post


Link to post
3 hours ago, Jud said:

I current;y need about 300 billion bits.

Wow! If my math is correct that needs more than 32GB of memory.

 

I have no idea what the purpose is, but perhaps there are other approaches to achieve the same.

  • Like 1

Share this post


Link to post
5 hours ago, Jud said:

Yes, I need more.  I current;y need about 300 billion bits. I can write it myself, but I thought there might be one already available.

How many such instances of this type do you need in memory at any time? And what's the expected number of bits that are set at any time? Do you really need to store all bits, both 0 and 1. Can't you just stor the 1s and infer the 0s from the fact that they aren't stored as 1s?

Share this post


Link to post
32 minutes ago, David Heffernan said:

How many such instances of this type do you need in memory at any time? And what's the expected number of bits that are set at any time? Do you really need to store all bits, both 0 and 1. Can't you just stor the 1s and infer the 0s from the fact that they aren't stored as 1s?

 

I estimate that about 5% of the bits will be 1s.  They will be read-only in the actual run, so the data will be shared among 20 threads.  I don't understand how storing just the 1s would work.  That would result in a series of about 15,000,000 1s, but that would be useless.

 

 

Share this post


Link to post
3 hours ago, Uwe Raabe said:

Wow! If my math is correct that needs more than 32GB of memory.

 

I have no idea what the purpose is, but perhaps there are other approaches to achieve the same.

1. 32GB is nothing today.  Every computer here that it might be running on has at last 128GB of RAM.

 

2. No, this is the best approach.

Share this post


Link to post
5 hours ago, Jud said:

That would result in a series of about 15,000,000 1s, but that would be useless.

No. You use a hashed collection. Like a dictionary, but one without a value, which is known as a set. 

 

But at 5% full then I don't think it will get you any benefit, because of the overhead of the hashed collection. 

Share this post


Link to post

How about the idea to store this in a 2D-image as Bitfield, of maybe 17320x17320 pixel ?

Of course storage will be maybe more efficient, but access would be slower.

 

Thinking further, using such Bitfield with involvement of the GPU ....

  • Like 2

Share this post


Link to post
2 hours ago, Rollo62 said:

How about the idea to store this in a 2D-image as Bitfield, of maybe 17320x17320 pixel ?

How could this be more efficient than a contiguous array? 

Share this post


Link to post
2 hours ago, Rollo62 said:

Thinking further, using such Bitfield with involvement of the GPU ....

What about the cost of moving data between gpu and main memory? Benefits of gpu are in highly parallel computation. Where is that in this scenario. Just saying GPU doesn't make something fast or efficient. 

Share this post


Link to post

Like said, just an alternative idea, with pros and cons.

Only @Jud may know if this is workable for him somehow, or not.

 

 

Share this post


Link to post

EDIT:

 

Bad brain day. Of course there is no problem getting whatever chunk of memory you want out of the available address space. 

Edited by Brandon Staggs

Share this post


Link to post
18 minutes ago, Brandon Staggs said:

I would not want to count on always being able to allocate a 32 GB block, even if I could guarantee all my hardware had 128 GB of RAM. If you don't want to use a hash bucket for your 5% of items that are true, because of performance, you are going to have to write your own class for this. Personally I would stripe it into 1GB chunks or something. I have written my own bits class because I wanted fast boolean operations on all of the bits with other instances. It's not a hard thing to design and striping your values across multiple allocated blocks would not add much overhead.

Why would you not expect to get a 32GB block when the available address space is 128TB?

Share this post


Link to post
unit Bitset;

interface

uses
  SysUtils, Math;

type
  TBitSet = record
  private
    FBitCount: Int64;
    FSets: array of set of 0..255;
    class function SetCount(BitCount: Int64): Int64; static;
    procedure MakeUnique;
    procedure GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
    function GetIsEmpty: Boolean;
    procedure SetBitCount(Value: Int64);
    function GetSize: Int64;
  public
    class operator In(const Bit: Int64; const BitSet: TBitSet): Boolean;
    class operator Equal(const bs1, bs2: TBitSet): Boolean;
    class operator NotEqual(const bs1, bs2: TBitSet): Boolean;
    property BitCount: Int64 read FBitCount write SetBitCount;
    property Size: Int64 read GetSize;
    property IsEmpty: Boolean read GetIsEmpty;
    procedure Clear;
    procedure IncludeAll;
    procedure Include(const Bit: Int64);
    procedure Exclude(const Bit: Int64);
  end;


implementation

{ TBitSet }

procedure TBitSet.MakeUnique;
begin
  // this is used to implement copy-on-write so that the type behaves like a value
  SetLength(FSets, Length(FSets));
end;

procedure TBitSet.GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
begin
  Assert(InRange(Bit, 0, FBitCount-1));
  SetIndex := Bit shr 8;   // shr 8   = div 256
  BitIndex := Bit and 255; // and 255 = mod 256
end;

function TBitSet.GetIsEmpty: Boolean;
var
  i: Int64;
begin
  for i := 0 to High(FSets) do begin
    if FSets[i]<>[] then begin
      Result := False;
      Exit;
    end;
  end;

  Result := True;
end;

procedure TBitSet.SetBitCount(Value: Int64);
var
  Bit, BitIndex: Integer;
  SetIndex: Int64;
begin
  if (Value<>FBitCount) or not Assigned(FSets) then begin
    Assert(Value>=0);
    FBitCount := Value;
    SetLength(FSets, SetCount(Value));
    if Value>0 then begin
      (* Ensure that unused bits are cleared, necessary give the CompareMem call in Equal. This also
         means that state does not persist when we decrease and then increase BitCount. For instance,
         consider this code:
           var
             bs: TBitSet;
           ...
           bs.BitCount := 2;
           bs.Include(1);
           bs.BitCount := 1;
           bs.BitCount := 2;
           Assert(not (1 in bs)); *)
      GetSetIndexAndBitIndex(Value - 1, SetIndex, BitIndex);
      for Bit := BitIndex + 1 to 255 do begin
        System.Exclude(FSets[SetIndex], Bit);
      end;
    end;
  end;
end;

function TBitSet.GetSize: Int64;
begin
  Result := Length(FSets)*SizeOf(FSets[0]);
end;

class function TBitSet.SetCount(BitCount: Int64): Int64;
begin
  Result := (BitCount + 255) shr 8; // shr 8 = div 256
end;

class operator TBitSet.In(const Bit: Int64; const BitSet: TBitSet): Boolean;
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  BitSet.GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  Result := BitIndex in BitSet.FSets[SetIndex];
end;

class operator TBitSet.Equal(const bs1, bs2: TBitSet): Boolean;
begin
  Result := (bs1.FBitCount=bs2.FBitCount)
    and CompareMem(Pointer(bs1.FSets), Pointer(bs2.FSets), bs1.Size);
end;

class operator TBitSet.NotEqual(const bs1, bs2: TBitSet): Boolean;
begin
  Result := not (bs1=bs2);
end;

procedure TBitSet.Clear;
var
  i: Int64;
begin
  MakeUnique;
  for i := 0 to High(FSets) do begin
    FSets[i] := [];
  end;
end;

procedure TBitSet.IncludeAll;
var
  i: Int64;
begin
  for i := 0 to BitCount-1 do begin
    Include(i);
  end;
end;

procedure TBitSet.Include(const Bit: Int64);
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  MakeUnique;
  GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  System.Include(FSets[SetIndex], BitIndex);
end;

procedure TBitSet.Exclude(const Bit: Int64);
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  MakeUnique;
  GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  System.Exclude(FSets[SetIndex], BitIndex);
end;

end.

This is based on code of mine that has is limited to integer bit count. I've not tested it extended to Int64, but I'm sure anyone that wanted to use random code like this would test.

  • Like 2

Share this post


Link to post
33 minutes ago, David Heffernan said:

Why would you not expect to get a 32GB block when the available address space is 128TB?

LOL, I don't know what I was thinking. That was obviously wrong. I'm deleting the post since it's pointless noise now.

Share this post


Link to post
17 hours ago, David Heffernan said:

No. You use a hashed collection. Like a dictionary, but one without a value, which is known as a set. 

 

But at 5% full then I don't think it will get you any benefit, because of the overhead of the hashed collection. 

 

Performance is an issue.  The plan is to do at least 10 trillion accesses to the bit vector - more if I can manage it.

 

Share this post


Link to post
12 hours ago, David Heffernan said:

What about the cost of moving data between gpu and main memory? Benefits of gpu are in highly parallel computation. Where is that in this scenario. Just saying GPU doesn't make something fast or efficient. 

Interesting idea, but I don't know how to deal with the GPU.  This will be running 20 instances in parallel, but the accesses will be pretty much random.  The target calls for doing at least 10 trillion accesses.

Share this post


Link to post
11 hours ago, Rollo62 said:

Like said, just an alternative idea, with pros and cons.

Only @Jud may know if this is workable for him somehow, or not.

 

 

I doubt that it would be workable, since they are natually one dimentional, and the program just needs to see if a bit is 1 or 0, and it needs to do a large number of them.

Share this post


Link to post
11 minutes ago, Jud said:

This will be running 20 instances in parallel, but the accesses will be pretty much random.  The target calls for doing at least 10 trillion accesses.

Depending on what you do when you query these bits you may find that the memory is the bottleneck and threading won't help. Did you do any benchmarking yet? 

 

Also if performance matters then delphi is invariably not the answer. 

Edited by David Heffernan

Share this post


Link to post
9 hours ago, David Heffernan said:

unit Bitset;

interface

uses
  SysUtils, Math;

type
  TBitSet = record
  private
    FBitCount: Int64;
    FSets: array of set of 0..255;
    class function SetCount(BitCount: Int64): Int64; static;
    procedure MakeUnique;
    procedure GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
    function GetIsEmpty: Boolean;
    procedure SetBitCount(Value: Int64);
    function GetSize: Int64;
  public
    class operator In(const Bit: Int64; const BitSet: TBitSet): Boolean;
    class operator Equal(const bs1, bs2: TBitSet): Boolean;
    class operator NotEqual(const bs1, bs2: TBitSet): Boolean;
    property BitCount: Int64 read FBitCount write SetBitCount;
    property Size: Int64 read GetSize;
    property IsEmpty: Boolean read GetIsEmpty;
    procedure Clear;
    procedure IncludeAll;
    procedure Include(const Bit: Int64);
    procedure Exclude(const Bit: Int64);
  end;


implementation

{ TBitSet }

procedure TBitSet.MakeUnique;
begin
  // this is used to implement copy-on-write so that the type behaves like a value
  SetLength(FSets, Length(FSets));
end;

procedure TBitSet.GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
begin
  Assert(InRange(Bit, 0, FBitCount-1));
  SetIndex := Bit shr 8;   // shr 8   = div 256
  BitIndex := Bit and 255; // and 255 = mod 256
end;

function TBitSet.GetIsEmpty: Boolean;
var
  i: Int64;
begin
  for i := 0 to High(FSets) do begin
    if FSets[i]<>[] then begin
      Result := False;
      Exit;
    end;
  end;

  Result := True;
end;

procedure TBitSet.SetBitCount(Value: Int64);
var
  Bit, BitIndex: Integer;
  SetIndex: Int64;
begin
  if (Value<>FBitCount) or not Assigned(FSets) then begin
    Assert(Value>=0);
    FBitCount := Value;
    SetLength(FSets, SetCount(Value));
    if Value>0 then begin
      (* Ensure that unused bits are cleared, necessary give the CompareMem call in Equal. This also
         means that state does not persist when we decrease and then increase BitCount. For instance,
         consider this code:
           var
             bs: TBitSet;
           ...
           bs.BitCount := 2;
           bs.Include(1);
           bs.BitCount := 1;
           bs.BitCount := 2;
           Assert(not (1 in bs)); *)
      GetSetIndexAndBitIndex(Value - 1, SetIndex, BitIndex);
      for Bit := BitIndex + 1 to 255 do begin
        System.Exclude(FSets[SetIndex], Bit);
      end;
    end;
  end;
end;

function TBitSet.GetSize: Int64;
begin
  Result := Length(FSets)*SizeOf(FSets[0]);
end;

class function TBitSet.SetCount(BitCount: Int64): Int64;
begin
  Result := (BitCount + 255) shr 8; // shr 8 = div 256
end;

class operator TBitSet.In(const Bit: Int64; const BitSet: TBitSet): Boolean;
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  BitSet.GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  Result := BitIndex in BitSet.FSets[SetIndex];
end;

class operator TBitSet.Equal(const bs1, bs2: TBitSet): Boolean;
begin
  Result := (bs1.FBitCount=bs2.FBitCount)
    and CompareMem(Pointer(bs1.FSets), Pointer(bs2.FSets), bs1.Size);
end;

class operator TBitSet.NotEqual(const bs1, bs2: TBitSet): Boolean;
begin
  Result := not (bs1=bs2);
end;

procedure TBitSet.Clear;
var
  i: Int64;
begin
  MakeUnique;
  for i := 0 to High(FSets) do begin
    FSets[i] := [];
  end;
end;

procedure TBitSet.IncludeAll;
var
  i: Int64;
begin
  for i := 0 to BitCount-1 do begin
    Include(i);
  end;
end;

procedure TBitSet.Include(const Bit: Int64);
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  MakeUnique;
  GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  System.Include(FSets[SetIndex], BitIndex);
end;

procedure TBitSet.Exclude(const Bit: Int64);
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  MakeUnique;
  GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  System.Exclude(FSets[SetIndex], BitIndex);
end;

end.

This is based on code of mine that has is limited to integer bit count. I've not tested it extended to Int64, but I'm sure anyone that wanted to use random code like this would test.

 

Share this post


Link to post
9 hours ago, David Heffernan said:

unit Bitset;

interface

uses
  SysUtils, Math;

type
  TBitSet = record
  private
    FBitCount: Int64;
    FSets: array of set of 0..255;
    class function SetCount(BitCount: Int64): Int64; static;
    procedure MakeUnique;
    procedure GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
    function GetIsEmpty: Boolean;
    procedure SetBitCount(Value: Int64);
    function GetSize: Int64;
  public
    class operator In(const Bit: Int64; const BitSet: TBitSet): Boolean;
    class operator Equal(const bs1, bs2: TBitSet): Boolean;
    class operator NotEqual(const bs1, bs2: TBitSet): Boolean;
    property BitCount: Int64 read FBitCount write SetBitCount;
    property Size: Int64 read GetSize;
    property IsEmpty: Boolean read GetIsEmpty;
    procedure Clear;
    procedure IncludeAll;
    procedure Include(const Bit: Int64);
    procedure Exclude(const Bit: Int64);
  end;


implementation

{ TBitSet }

procedure TBitSet.MakeUnique;
begin
  // this is used to implement copy-on-write so that the type behaves like a value
  SetLength(FSets, Length(FSets));
end;

procedure TBitSet.GetSetIndexAndBitIndex(Bit: Int64; out SetIndex: Int64; out BitIndex: Integer);
begin
  Assert(InRange(Bit, 0, FBitCount-1));
  SetIndex := Bit shr 8;   // shr 8   = div 256
  BitIndex := Bit and 255; // and 255 = mod 256
end;

function TBitSet.GetIsEmpty: Boolean;
var
  i: Int64;
begin
  for i := 0 to High(FSets) do begin
    if FSets[i]<>[] then begin
      Result := False;
      Exit;
    end;
  end;

  Result := True;
end;

procedure TBitSet.SetBitCount(Value: Int64);
var
  Bit, BitIndex: Integer;
  SetIndex: Int64;
begin
  if (Value<>FBitCount) or not Assigned(FSets) then begin
    Assert(Value>=0);
    FBitCount := Value;
    SetLength(FSets, SetCount(Value));
    if Value>0 then begin
      (* Ensure that unused bits are cleared, necessary give the CompareMem call in Equal. This also
         means that state does not persist when we decrease and then increase BitCount. For instance,
         consider this code:
           var
             bs: TBitSet;
           ...
           bs.BitCount := 2;
           bs.Include(1);
           bs.BitCount := 1;
           bs.BitCount := 2;
           Assert(not (1 in bs)); *)
      GetSetIndexAndBitIndex(Value - 1, SetIndex, BitIndex);
      for Bit := BitIndex + 1 to 255 do begin
        System.Exclude(FSets[SetIndex], Bit);
      end;
    end;
  end;
end;

function TBitSet.GetSize: Int64;
begin
  Result := Length(FSets)*SizeOf(FSets[0]);
end;

class function TBitSet.SetCount(BitCount: Int64): Int64;
begin
  Result := (BitCount + 255) shr 8; // shr 8 = div 256
end;

class operator TBitSet.In(const Bit: Int64; const BitSet: TBitSet): Boolean;
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  BitSet.GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  Result := BitIndex in BitSet.FSets[SetIndex];
end;

class operator TBitSet.Equal(const bs1, bs2: TBitSet): Boolean;
begin
  Result := (bs1.FBitCount=bs2.FBitCount)
    and CompareMem(Pointer(bs1.FSets), Pointer(bs2.FSets), bs1.Size);
end;

class operator TBitSet.NotEqual(const bs1, bs2: TBitSet): Boolean;
begin
  Result := not (bs1=bs2);
end;

procedure TBitSet.Clear;
var
  i: Int64;
begin
  MakeUnique;
  for i := 0 to High(FSets) do begin
    FSets[i] := [];
  end;
end;

procedure TBitSet.IncludeAll;
var
  i: Int64;
begin
  for i := 0 to BitCount-1 do begin
    Include(i);
  end;
end;

procedure TBitSet.Include(const Bit: Int64);
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  MakeUnique;
  GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  System.Include(FSets[SetIndex], BitIndex);
end;

procedure TBitSet.Exclude(const Bit: Int64);
var
  SetIndex: Int64;
  BitIndex: Integer;
begin
  MakeUnique;
  GetSetIndexAndBitIndex(Bit, SetIndex, BitIndex);
  System.Exclude(FSets[SetIndex], BitIndex);
end;

end.

This is based on code of mine that has is limited to integer bit count. I've not tested it extended to Int64, but I'm sure anyone that wanted to use random code like this would test.

That looks like what I'm looking for!!! I'm going to try it shortly.  Thanks!

Share this post


Link to post
49 minutes ago, David Heffernan said:

Depending on what you do when you query these bits you may find that the memory is the bottleneck and threading won't help. Did you do any benchmarking yet? 

 

Also if performance matters then delphi is invariably not the answer. 

No benchmarking with the bit vector yet, but I've definitely encountered the memory bottleneck.  In old tests with 8 threads (hyperthreaded 4-core i7), I would get about 5.5x over a single thread on CPU-intensive tasks.  But on tests that were memory-intensive, I would get around 3x on 16 threads.

Share this post


Link to post
1 hour ago, David Heffernan said:

Depending on what you do when you query these bits you may find that the memory is the bottleneck and threading won't help. Did you do any benchmarking yet? 

 

Also if performance matters then delphi is invariably not the answer. 

 

Actually I did benchmarking.  The first version of this program read a 30GB boolean array from a drive and used that.  Naturally I did a single-thread before multithreading.  I forgot the ratio, but 20 threads on a 12th generation i7 was something like 8-9x faster than the single-thread version.  Now I'm working on replacing the 30GB boolean array with a much larger bit vector.

 

Share this post


Link to post

Hyoerthreading is probably useless. I've never found a task which benefits from it. I guess they must exist though, or is it just pure marketing? 

 

Modern AMD chipsets seems to have remarkable scaling though. 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×