Replacement for TBits?

Lars Fosdal · June 29, 2023

Intel's blurb: "Under the right circumstances, the technology lets CPU cores effectively do two things at once." - but it doesn't really outline the circumstances.

https://en.wikipedia.org/wiki/Hyper-threading has some interesting observations, making me wonder if I actually would be better off by disabling Hyper-Threading...

Jud · July 1, 2023

This goes back a few years, but when I got my four-core hyperthreaded i7, hyperthreading could get 1.5-1.6 more performance than a single core, in my tests on CPU-intensive tasks. Less than that on memory-intensive tasks.

A.M. Hoornweg · July 3, 2023

On 6/29/2023 at 12:53 AM, Jud said:

Performance is an issue. The plan is to do at least 10 trillion accesses to the bit vector - more if I can manage it.

That's too bad - otherwise a memory mapped file might do the job.

David Heffernan · July 3, 2023

7 hours ago, A.M. Hoornweg said:

otherwise a memory mapped file might do the job

Why would a memory mapped file be remotely helpful here?

A.M. Hoornweg · July 4, 2023

16 hours ago, David Heffernan said:

Why would a memory mapped file be remotely helpful here?

For size reasons. Memory mapped files let you

- use contiguous arrays bigger than available RAM, mapping file-backed data into virtual 64-bit address space

- use simple pointer arithmetics to access individual bytes, it is dead easy to implement SetBit() GetBit() etc

- let the operating system's cache algorithm handle the intricacies of swapping pages in and out (LRU/MRU etc)

- benefit from the speed and low latency of modern SSD's

- have the data on disk, ready to be re-used

Speed-wise this is only an option if the operating system can minimize swapping so accessing the elements shouldn't be totally random. If the probability of accessing an element is some kind of bell curve then it might just work.

[edit] typo.

Edited July 4, 2023 by A.M. Hoornweg

David Heffernan · July 4, 2023

6 hours ago, A.M. Hoornweg said:

For size reasons. Memory mapped files let you

- use contiguous arrays bigger than available RAM, mapping file-backed data into virtual 64-bit address space

- use simple pointer arithmetics to access individual bytes, it is dead easy to implement SetBit() GetBit() etc

- let the operating system's cache algorithm handle the intricacies of swapping pages in and out (LRU/MRU etc)

- benefit from the speed and low latency of modern SSD's

- have the data on disk, ready to be re-used

Speed-wise this is only an option if the operating system can minimize swapping so accessing the elements shouldn't be totally random. If the probability of accessing an element is some kind of bell curve then it might just work.

[edit] typo.

Doesn't memory already handle all of this? I mean, the paging system does all of this already surely?

A.M. Hoornweg · July 4, 2023

14 minutes ago, David Heffernan said:

Doesn't memory already handle all of this? I mean, the paging system does all of this already surely?

I have never tried to allocate such enormous amounts of memory using Delphi's heap manager (Fastmm4), I really don't know how it behaves if you try to allocate one huge chunk that is bigger than what the machine has physically. The documentation says "For win32 and Win64, the default FastMM Memory Manager is optimized for applications that allocate large numbers of small- to medium-sized blocks, as is typical for object-oriented applications and applications that process string data. "

MapViewOfFile() bypasses the Delphi heap completely and leaves it up to Windows to map a contiguous block of virtual memory.

Anders Melander · July 4, 2023

On 6/29/2023 at 9:11 AM, David Heffernan said:

Hyoerthreading is probably useless. I've never found a task which benefits from it. I guess they must exist though, or is it just pure marketing?

Marketing.

I don't think they originally intended it as such but since it at best doesn't hurt performance, that's what it became.

1 hour ago, David Heffernan said:

Doesn't memory already handle all of this? I mean, the paging system does all of this already surely?

Yes, it does.

47 minutes ago, A.M. Hoornweg said:

I really don't know how it behaves if you try to allocate one huge chunk that is bigger than what the machine has physically.

Maybe now would be a good time to read up on what virtual memory is.

You don't have to allocate beyond physical memory before virtual memory comes into play. You just have to allocate beyond the process' working set. The working set is the part of your process' virtual memory that is backed by physical memory. The working set can grow and shrink depending on memory access and global resource pressure but there's always an upper and lower limit. Of course, it's a bit more complicated than that (there are more layers than what I have described) but that's the basic overview.

In general, I would recommend that one doesn't try to outsmart things one doesn't fully understand. Be it threading, memory management, or women 🙂

David Heffernan · July 4, 2023

8 hours ago, A.M. Hoornweg said:

8 hours ago, David Heffernan said:

I have never tried to allocate such enormous amounts of memory using Delphi's heap manager (Fastmm4), I really don't know how it behaves if you try to allocate one huge chunk that is bigger than what the machine has physically.

These huge blocks won't be allocated by fastmn. They will be passed through to VirtualAlloc. But we aren't talking about a chunk of memory greater than what the machine has physically anyway.

I'm rather sceptical that MMF would ever be appropriate here.

Brandon Staggs · July 5, 2023

On 7/4/2023 at 8:42 AM, A.M. Hoornweg said:

I have never tried to allocate such enormous amounts of memory using Delphi's heap manager (Fastmm4), I really don't know how it behaves if you try to allocate one huge chunk that is bigger than what the machine has physically.

I had the same initial reservations about it, so I tested it. It works fine.

A.M. Hoornweg · July 6, 2023

14 hours ago, Brandon Staggs said:

I had the same initial reservations about it, so I tested it. It works fine.

Good to know!

A.M. Hoornweg · July 6, 2023

On 7/4/2023 at 11:50 PM, David Heffernan said:

These huge blocks won't be allocated by fastmn. They will be passed through to VirtualAlloc. But we aren't talking about a chunk of memory greater than what the machine has physically anyway.

I'm rather sceptical that MMF would ever be appropriate here.

In the simplest case a MMF is just a block of bytes, why would that be inappropriate? Just because it's allocated by a different API?

We happen to use them extensively for data acquisition, they have that nice little feature that allows us to share the same buffer between processes. One process collecting data, another evaluating it. There are tons of use cases for that.

David Heffernan · July 6, 2023

1 hour ago, A.M. Hoornweg said:

In the simplest case a MMF is just a block of bytes, why would that be inappropriate? Just because it's allocated by a different API?

If you simple want a block of bytes, just use memory. Why use MMF?

1 hour ago, A.M. Hoornweg said:

We happen to use them extensively for data acquisition, they have that nice little feature that allows us to share the same buffer between processes. One process collecting data, another evaluating it. There are tons of use cases for that.

Here’s an example of why you might use MMF, for cross process sharing.

But that isn't what this topic is abkut. I can't see anything in this topic that suggests that MMF would add anything over plain memory.

Or have I missed something?

Lars Fosdal · July 6, 2023

MMFs are good for "random R/W access" - but not so great for sequential R/Ws of huge files.

Have a look at all the SO posts about systems crawling to a halt when using MMFs.

A.M. Hoornweg · July 6, 2023

55 minutes ago, David Heffernan said:

But that isn't what this topic is abkut. I can't see anything in this topic that suggests that MMF would add anything over plain memory.

Or have I missed something?

David, OP can just use whatever allocation method pleases him. The end result is a pointer to a memory block whatever method he uses. It doesn't make one version worse than the other.

David Heffernan · July 6, 2023

2 minutes ago, A.M. Hoornweg said:

David, OP can just use whatever allocation method pleases him. The end result is a pointer to a memory block whatever method he uses. It doesn't make one version worse than the other.

It's obviously more complicated for the programmer to use MMF than plain memory. Unless there was a benefit to using MMF then there's no point in taking on that complexity.

A.M. Hoornweg · July 6, 2023

23 minutes ago, David Heffernan said:

It's obviously more complicated for the programmer to use MMF than plain memory. Unless there was a benefit to using MMF then there's no point in taking on that complexity.

Sure. But OP has to re-write tBits anyway because of its size limitations. I'd advise him to make the allocation/deallocation methods virtual to keep all options open.

Rollo62 · July 6, 2023

I still have no clue why the heck Jud needs 300 Billion Bits in memory in the first place.

Aren't there really no optimizations in the algorithm or processing task possible somehow ?

Sherlock · July 6, 2023

@Rollo62Consider the amount of data processed at CERN according to https://home.web.cern.ch/science/computing they process more than 30PB a year. Another example may come from Astrophysics' EHT with enormous data https://eventhorizontelescope.org/technology gathered simultaneously all over the world and then correlated in Bonn and at MIT. Off the top of my head just two examples with really high volume data. I learned not to question the why...the who might be interesting though, if the OP may reveal it.

In the end 300 Billion Bits are a mere 37,5 Gigabytes, BTW.

Rollo62 · July 6, 2023

Yes, I think there are many applications, physics, math, measurements, even finance ....

But I think Delphi is not the first choice for those applications.

Anyhow, it would be great to get a little bit of info, about what initial problem should be solved, since this may have many complete different solutions paths too.

Just to have a huge chunk of bits is one, maybe some kind of bit-compression could be another one, especially if only 5% of the bits really need to be "1".

How about finding useful bit-combinations, to encode larger chunks into something like enhanced HEX code or the like, to save a lot of space in smaller memory ?

Without a little more information, it's like poking in the fog.

Edited July 6, 2023 by Rollo62

Brandon Staggs · July 6, 2023

10 minutes ago, Rollo62 said:

Without a little more information, it's like poking in the fog.

In the end, if he knows he has the RAM, there is no reason for him to go about all of that. Writing a new TBits doesn't take long. If he needs to store the true/false condition of 300 billion items at once for fast analysis, it sounds to me like he's already made the right choice.

I remember an assembly coder once telling me he could not conceive of any program really needing four gigabytes of RAM. I thought that was silly because at the time, people were bumping up to the limits of a 4GB disc platter which is one layer of a DVD and that's compressed output. Surely people editing video would benefit from at least that much memory to play with. But he had it in his head that it was just excessive to access that much memory in one process and I wasn't going to bother changing his mind.

David Heffernan · July 6, 2023

4 minutes ago, Brandon Staggs said:

Writing a new TBits doesn't take long.

There's literally one in this thread.

A.M. Hoornweg · July 7, 2023

OP needs 300 billion bits or so he said.

I assume that collecting that amount of data takes a long time and that the data will be evaluated to obtain some sort of useful result afterwards. So the program must run for a long time and may not be terminated prematurely or else the contents of the tBits are lost and everything has to start over again.

That sounds like a very time consuming development/debugging cycle that can be avoided by splitting up the program into an acquisition process and an evaluation process that share a common block of virtual memory.

The acquisition part can run "forever" in the background, it can be straightforward because it needs not evaluate the results. It just supplies live data (bits) in memory that become more complete over time. The evaluation part can be a work in progress that can be comfortably debugged, refined and re-compiled. It can observe "live" data that become more complete over time.

Sign In

Replacement for TBits?

Recommended Posts

Lars Fosdal 1877

Share this post

Link to post

Jud 1

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

Anders Melander 2068

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

Brandon Staggs 386

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

Lars Fosdal 1877

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

Rollo62 602

Share this post

Link to post

Sherlock 687

Share this post

Link to post

Rollo62 602

Share this post

Link to post

Brandon Staggs 386

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

A.M. Hoornweg 159

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity