Is there buffered Memory stream implementation available

Tommi Prami · August 30, 2022

RTL has Buffered file stream now, but I think there is no buffered Memory stream, I think.

is there fast implementation available? I made long long time ago one, but that was not super good implementation and don't have code anymore.

At least then it made some code way faster, and Stringbuilder I believe is not super fast, and and can't handle binary data 🙂

-Tee-

shineworld · August 30, 2022

TMemoryStream

https://docwiki.embarcadero.com/Libraries/Sydney/en/System.Classes.TMemoryStream

Edited August 30, 2022 by shineworld

David Heffernan · August 30, 2022

Rather than buffering in this case I have a memory stream that doesn't use contiguous memory. Instead of a single memory block, it manages a list of equally sized blocks. This avoids any performance issues with repeated calls to ReallocMem. Is that the performance block you want to work around?

dummzeuch · August 30, 2022

2 hours ago, Tommi Prami said:

RTL has Buffered file stream now, but I think there is no buffered Memory stream, I think.

is there fast implementation available? I made long long time ago one, but that was not super good implementation and don't have code anymore.

At least then it made some code way faster, and Stringbuilder I believe is not super fast, and and can't handle binary data 🙂

What exactly is the problem you want to solve? Reallocating memory when writing to TMemoryStream?

Tommi Prami · August 31, 2022

16 hours ago, dummzeuch said:

What exactly is the problem you want to solve? Reallocating memory when writing to TMemoryStream?

Current code writes stuff into the memory stream in small chunks, it would be _easy_ optimization to have buffered memory stream.

I used to have very crude memory stream that allocated memory with some mechanism that I can't remember, it was mainly used to write XML data back then, but it made some code 20x faster. (if I recall), but some isolated cases where way faster, than reallocating all the time.

One thing I remeber is that implementation did reallocate always when buffer got full, so was not super smart, just some simple growth scheme.

Sure I could rewrite the code at first place, but sometimes it is legacy code of some third party library, so rewriting might be too big of a task. Would be better to get some maintained library instead, if available. But it always is case by case call which way to go.

-Tee-

Tommi Prami · August 31, 2022

18 hours ago, shineworld said:

TMemoryStream

https://docwiki.embarcadero.com/Libraries/Sydney/en/System.Classes.TMemoryStream

As far as I know TMemoryStream does not buffer, at least in a way that I want to use it.

-Tee-

Tommi Prami · August 31, 2022

18 hours ago, David Heffernan said:

Rather than buffering in this case I have a memory stream that doesn't use contiguous memory. Instead of a single memory block, it manages a list of equally sized blocks. This avoids any performance issues with repeated calls to ReallocMem. Is that the performance block you want to work around?

That sounds smart implementation, if blocks are big enough, SaveToFile etc (combining the data to one, or reading it to elsewhere) should be pretty fast also.

David Heffernan · August 31, 2022

53 minutes ago, Tommi Prami said:

Current code writes stuff into the memory stream in small chunks, it would be _easy_ optimization to have buffered memory stream.

I used to have very crude memory stream that allocated memory with some mechanism that I can't remember, it was mainly used to write XML data back then, but it made some code 20x faster. (if I recall), but some isolated cases where way faster, than reallocating all the time.

One thing I remeber is that implementation did reallocate always when buffer got full, so was not super smart, just some simple growth scheme.

Sure I could rewrite the code at first place, but sometimes it is legacy code of some third party library, so rewriting might be too big of a task. Would be better to get some maintained library instead, if available. But it always is case by case call which way to go.

-Tee-

I'm not sure what you are saying here. Writing to memory is just writing to memory. Be it some intermediate buffer or the memory block backing the stream. Can you explain what performance block you are trying to overcome. Reallocation is the only block I can see.

Fr0sT.Brutal · August 31, 2022

FastMM already reserves memory when reallocating. Once I made elementary benchmark of adding chars to a string: straight way and with stringbuilder. Results were almost the same...

Anyway don't forget the rule: "Profile first, then optimize"

dummzeuch · August 31, 2022

I've got a generic TdzStreamCache implementation in my dzlib, which adds caching to any type of stream. Not sure whether it is any improvement to TMemoryStream though. I only used TMemoryStream in the unit tests to ensure that no data gets lost, I never timed it against it.

Tommi Prami · September 1, 2022

On 8/31/2022 at 9:52 AM, David Heffernan said:

I'm not sure what you are saying here. Writing to memory is just writing to memory. Be it some intermediate buffer or the memory block backing the stream. Can you explain what performance block you are trying to overcome. Reallocation is the only block I can see.

Reallocation for sure.

Tommi Prami · September 1, 2022

On 8/31/2022 at 9:54 AM, Fr0sT.Brutal said:

FastMM already reserves memory when reallocating. Once I made elementary benchmark of adding chars to a string: straight way and with stringbuilder. Results were almost the same...

Anyway don't forget the rule: "Profile first, then optimize"

For sure.

My experience is that if you write lot of stuff to to stream, and if there is lot of reallocation, it'll be quite slow.

Need to measure this for sure, before commit into anything. All I know i've used buffered stream to speed up very close to same situation now. I just remember that some parts of the code got way faster, and overall saw significant speedup.

I think this comes down to how much data is written and how small pieces and so on, in other words, how many reallocations will happen in real world. I think I logged the final stream sizes for a day (it was an server which made Xml files) and used Stetson-Harris method to pic some nice size for initial allocation size, and adjusted growth strategy to get some kind of balance and not to allocate way too large buffers.

Concatenating strings on FastMM is fast, but still I managed to make nice speedup back then. It is quite easy to make Buffered stream, but testing and handling all possible corner cases you first miss... That is why I am asking if there is good implementation that I could test, does it make sense or not (in this case) 🙂

-Tee-

David Heffernan · September 1, 2022

18 minutes ago, Tommi Prami said:

For sure.

My experience is that if you write lot of stuff to to stream, and if there is lot of reallocation, it'll be quite slow.

Need to measure this for sure, before commit into anything. All I know i've used buffered stream to speed up very close to same situation now. I just remember that some parts of the code got way faster, and overall saw significant speedup.

I think this comes down to how much data is written and how small pieces and so on, in other words, how many reallocations will happen in real world. I think I logged the final stream sizes for a day (it was an server which made Xml files) and used Stetson-Harris method to pic some nice size for initial allocation size, and adjusted growth strategy to get some kind of balance and not to allocate way too large buffers.

Concatenating strings on FastMM is fast, but still I managed to make nice speedup back then. It is quite easy to make Buffered stream, but testing and handling all possible corner cases you first miss... That is why I am asking if there is good implementation that I could test, does it make sense or not (in this case) 🙂

-Tee-

Buffering isn't really going to help here, because the MM already does that in effect.

Fr0sT.Brutal · September 1, 2022

41 minutes ago, David Heffernan said:

Buffering isn't really going to help here, because the MM already does that in effect.

In this case I think this could be optimized. Depending on overall sizes, of course. Reallocating 100-Mb chunks slow down the process anyway even if FastMM reserves some space. In this application (lots of writing, total large size) that stream you mentioned (storing its contents in separate small chunks) will beat any contiguous periodically growing buffer.

Another option, of course, it so actually stream the data not buffer.

Edited September 1, 2022 by Fr0sT.Brutal

David Heffernan · September 1, 2022

9 minutes ago, Fr0sT.Brutal said:

In this case I think this could be optimized. Depending on overall sizes, of course. Reallocating 100-Mb chunks slow down the process anyway even if FastMM reserves some space. In this application (lots of writing, total large size) that stream you mentioned (storing its contents in separate small chunks) will beat any contiguous periodically growing buffer.

Another option, of course, it so actually stream the data not buffer.

I agree. My point is that OP keeps asking for buffering but that won't really help.

Tommi Prami · September 2, 2022

20 hours ago, David Heffernan said:

Buffering isn't really going to help here, because the MM already does that in effect.

I do fast and crude test. Good that I did, speed gain is nice, prosentually, but it is much faster that I though:

procedure TForm3.Button1Click(Sender: TObject);
const
  LOOP_MAX: Integer = 90000000;
  DATA: RawByteString = '0123456789';
var
  LStopWatch: TStopWatch;
  I: Integer;
  LStream: TMemoryStream;
  LDataLength: Integer;
begin
  LDataLength := Length(DATA);

  LStream := TMemoryStream.Create;
  LStopWatch := TStopWatch.StartNew;
  try
    LStopWatch := TStopWatch.StartNew;
    for I := 1 to LOOP_MAX do
      LStream.Write(DATA[1], LDataLength);
    LStopWatch.Stop;
  finally
    LStream.Free;
  end;
  Memo1.Lines.Add('No preallocation: ' + FormatFloat('0.000s', LStopWatch.Elapsed.TotalSeconds));

  LStream := TMemoryStream.Create;
  LStream.Size := (LOOP_MAX * LDataLength) + 1000;
  LStopWatch := TStopWatch.StartNew;
  try
    LStopWatch := TStopWatch.StartNew;
    for I := 1 to LOOP_MAX do
      LStream.Write(DATA[1], LDataLength);
    LStopWatch.Stop;
  finally
    LStream.Free;
  end;
  Memo1.Lines.Add('Preallocation: ' + FormatFloat('0.000s', LStopWatch.Elapsed.TotalSeconds));
end;

No preallocation: 1,150s
Preallocation: 0,647s

Nice speedup, prosentually, but time saved in real world situation is very small.

So no point of optimizing this, in my current case. This way. Maybe my mary failed me, and my original optimizations where pre FastMM times. or I just looked how many percent faster it was 😉 Or both....

-Tee-

Fr0sT.Brutal · September 2, 2022

48 minutes ago, Tommi Prami said:

I do fast and crude test

You can also include debug DCU's, set non-stopping breakpoint to Stream's realloc method and see how many times it will be called

Tommi Prami · September 5, 2022

On 9/2/2022 at 12:04 PM, Fr0sT.Brutal said:

You can also include debug DCU's, set non-stopping breakpoint to Stream's realloc method and see how many times it will be called

That would be good to investigate, how I can see the count like that, never knew you could setup a breakpoint like that, so how to do that?? Non braking breakpoint yes, but how to see how many times it is called...

Also should run as release build with optimizations on, to see how it runs then. (This was pretty much similar, little bit slower, but OK)

Edited September 5, 2022 by Tommi Prami

Tommi Prami · September 5, 2022

Now that went through TMemoryStream code, it is buffered...

I must confuse it into some other stream class, sorry about that.

But really interested how to get call count of the non breaking breakpoint tough...

Uwe Raabe · September 5, 2022

13 minutes ago, Tommi Prami said:

But really interested how to get call count of the non breaking breakpoint tough...

Add a breakpoint at the proper source line and open the context menu on that breakpoint. Set the Pass count to some high value.

For more info see the docs: https://docwiki.embarcadero.com/RADStudio/Alexandria/en/Add_Source_Breakpoint

Quote

Because the debugger increments the count with each pass, you can use them to determine which iteration of a loop fails. Set the pass count to the maximum loop count and run your program. When the program fails, you can calculate the number of loop iterations by examining the number of passes that occurred.

Fr0sT.Brutal · September 19, 2022

On 9/5/2022 at 6:32 AM, Tommi Prami said:

how I can see the count like that, never knew you could setup a breakpoint like that, so how to do that?

In your case of one-time check, the most simple option is to log breakpoint passes (evaluate some expression + log result) then copy to some text editor and count the lines

Sign In

Is there buffered Memory stream implementation available

Recommended Posts

Tommi Prami 153

Share this post

Link to post

shineworld 86

Share this post

Link to post

David Heffernan 2453

Share this post

Link to post

dummzeuch 1658

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

David Heffernan 2453

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

dummzeuch 1658

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

David Heffernan 2453

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

David Heffernan 2453

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

Tommi Prami 153

Share this post

Link to post

Uwe Raabe 2165

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity