Jump to content
Tommi Prami

Is there buffered Memory stream implementation available

Recommended Posts

RTL has Buffered file stream now, but I think there is no buffered Memory stream, I think.

 

is there fast implementation available? I made long long time ago one, but that was not super good implementation and don't have code anymore. 

At least then it made some code way faster, and Stringbuilder I believe is not super fast, and and can't handle binary data 🙂

 

-Tee-

Share this post


Link to post

Rather than buffering in this case I have a memory stream that doesn't use contiguous memory. Instead of a single memory block, it manages a list of equally sized blocks. This avoids any performance issues with repeated calls to ReallocMem. Is that the performance block you want to work around?

  • Like 1

Share this post


Link to post
2 hours ago, Tommi Prami said:

RTL has Buffered file stream now, but I think there is no buffered Memory stream, I think.

 

is there fast implementation available? I made long long time ago one, but that was not super good implementation and don't have code anymore. 

At least then it made some code way faster, and Stringbuilder I believe is not super fast, and and can't handle binary data 🙂

What exactly is the problem you want to solve? Reallocating memory when writing to TMemoryStream?

  • Like 1

Share this post


Link to post
16 hours ago, dummzeuch said:

What exactly is the problem you want to solve? Reallocating memory when writing to TMemoryStream?

Current code writes stuff into the memory stream in small chunks, it would be _easy_ optimization to have buffered memory stream. 

I used to have very crude memory stream that allocated memory with some mechanism that I can't remember, it was mainly used to write XML data back then, but it made some code 20x faster. (if I recall), but some isolated cases where way faster, than reallocating all the time.

One thing I remeber is that implementation did reallocate always when buffer got full, so was not super smart, just some simple growth scheme.

Sure I could rewrite the code at first place, but sometimes it is legacy code of some third party library, so rewriting might be too big of a task. Would be better to get some maintained library instead, if available. But it always is case by case call which way to go. 

 

-Tee-

Share this post


Link to post
18 hours ago, David Heffernan said:

Rather than buffering in this case I have a memory stream that doesn't use contiguous memory. Instead of a single memory block, it manages a list of equally sized blocks. This avoids any performance issues with repeated calls to ReallocMem. Is that the performance block you want to work around?

That sounds smart implementation, if blocks are big enough, SaveToFile etc (combining the data to one, or reading it to elsewhere) should be pretty fast also.

 

Share this post


Link to post
53 minutes ago, Tommi Prami said:

Current code writes stuff into the memory stream in small chunks, it would be _easy_ optimization to have buffered memory stream. 

I used to have very crude memory stream that allocated memory with some mechanism that I can't remember, it was mainly used to write XML data back then, but it made some code 20x faster. (if I recall), but some isolated cases where way faster, than reallocating all the time.

One thing I remeber is that implementation did reallocate always when buffer got full, so was not super smart, just some simple growth scheme.

Sure I could rewrite the code at first place, but sometimes it is legacy code of some third party library, so rewriting might be too big of a task. Would be better to get some maintained library instead, if available. But it always is case by case call which way to go. 

 

-Tee-

I'm not sure what you are saying here. Writing to memory is just writing to memory. Be it some intermediate buffer or the memory block backing the stream. Can you explain what performance block you are trying to overcome. Reallocation is the only block I can see. 

  • Like 1

Share this post


Link to post

FastMM already reserves memory when reallocating. Once I made elementary benchmark of adding chars to a string: straight way and with stringbuilder. Results were almost the same...

Anyway don't forget the rule: "Profile first, then optimize"

Share this post


Link to post

I've got a generic TdzStreamCache implementation in my dzlib, which adds caching to any type of stream. Not sure whether it is any improvement to TMemoryStream though. I only used TMemoryStream in the unit tests to ensure that no data gets lost, I never timed it against it.

  • Like 1

Share this post


Link to post
On 8/31/2022 at 9:52 AM, David Heffernan said:

I'm not sure what you are saying here. Writing to memory is just writing to memory. Be it some intermediate buffer or the memory block backing the stream. Can you explain what performance block you are trying to overcome. Reallocation is the only block I can see. 

Reallocation for sure.

Share this post


Link to post
On 8/31/2022 at 9:54 AM, Fr0sT.Brutal said:

FastMM already reserves memory when reallocating. Once I made elementary benchmark of adding chars to a string: straight way and with stringbuilder. Results were almost the same...

Anyway don't forget the rule: "Profile first, then optimize"

 

For sure.

 

My experience is that if you write lot of stuff to to stream, and if there is lot of reallocation, it'll be quite slow. 

 

Need to measure this for sure, before commit into anything. All I know i've used buffered stream to speed up very close to same situation now. I just remember that some parts of the code got way faster, and overall saw significant speedup. 

I think this comes down to how much data is written and how small pieces and so on, in other words, how many reallocations will happen in real world. I think I logged the final stream sizes for a day (it was an server which made Xml files) and used Stetson-Harris method to pic some nice size for initial allocation size, and adjusted growth strategy to get some kind of balance and not to allocate way too large buffers.

 

Concatenating strings on FastMM is fast, but still I managed to make nice speedup back then.  It is quite easy to make Buffered stream, but testing and handling all possible corner cases you first miss... That is why I am asking if there is good implementation that I could test, does it make sense or not (in this case) 🙂

 

-Tee-

 

Share this post


Link to post
18 minutes ago, Tommi Prami said:

 

For sure.

 

My experience is that if you write lot of stuff to to stream, and if there is lot of reallocation, it'll be quite slow. 

 

Need to measure this for sure, before commit into anything. All I know i've used buffered stream to speed up very close to same situation now. I just remember that some parts of the code got way faster, and overall saw significant speedup. 

I think this comes down to how much data is written and how small pieces and so on, in other words, how many reallocations will happen in real world. I think I logged the final stream sizes for a day (it was an server which made Xml files) and used Stetson-Harris method to pic some nice size for initial allocation size, and adjusted growth strategy to get some kind of balance and not to allocate way too large buffers.

 

Concatenating strings on FastMM is fast, but still I managed to make nice speedup back then.  It is quite easy to make Buffered stream, but testing and handling all possible corner cases you first miss... That is why I am asking if there is good implementation that I could test, does it make sense or not (in this case) 🙂

 

-Tee-

 

Buffering isn't really going to help here, because the MM already does that in effect.

Share this post


Link to post
41 minutes ago, David Heffernan said:

Buffering isn't really going to help here, because the MM already does that in effect.

In this case I think this could be optimized. Depending on overall sizes, of course. Reallocating 100-Mb chunks slow down the process anyway even if FastMM reserves some space. In this application (lots of writing, total large size) that stream you mentioned (storing its contents in separate small chunks) will beat any contiguous periodically growing buffer.

Another option, of course, it so actually stream the data not buffer.

Edited by Fr0sT.Brutal

Share this post


Link to post
9 minutes ago, Fr0sT.Brutal said:

In this case I think this could be optimized. Depending on overall sizes, of course. Reallocating 100-Mb chunks slow down the process anyway even if FastMM reserves some space. In this application (lots of writing, total large size) that stream you mentioned (storing its contents in separate small chunks) will beat any contiguous periodically growing buffer.

Another option, of course, it so actually stream the data not buffer.

I agree. My point is that OP keeps asking for buffering but that won't really help. 

Share this post


Link to post
20 hours ago, David Heffernan said:

Buffering isn't really going to help here, because the MM already does that in effect.

I do fast and crude test. Good that I did, speed gain is nice, prosentually, but it is much faster that I though:
 

procedure TForm3.Button1Click(Sender: TObject);
const
  LOOP_MAX: Integer = 90000000;
  DATA: RawByteString = '0123456789';
var
  LStopWatch: TStopWatch;
  I: Integer;
  LStream: TMemoryStream;
  LDataLength: Integer;
begin
  LDataLength := Length(DATA);

  LStream := TMemoryStream.Create;
  LStopWatch := TStopWatch.StartNew;
  try
    LStopWatch := TStopWatch.StartNew;
    for I := 1 to LOOP_MAX do
      LStream.Write(DATA[1], LDataLength);
    LStopWatch.Stop;
  finally
    LStream.Free;
  end;
  Memo1.Lines.Add('No preallocation: ' + FormatFloat('0.000s', LStopWatch.Elapsed.TotalSeconds));

  LStream := TMemoryStream.Create;
  LStream.Size := (LOOP_MAX * LDataLength) + 1000;
  LStopWatch := TStopWatch.StartNew;
  try
    LStopWatch := TStopWatch.StartNew;
    for I := 1 to LOOP_MAX do
      LStream.Write(DATA[1], LDataLength);
    LStopWatch.Stop;
  finally
    LStream.Free;
  end;
  Memo1.Lines.Add('Preallocation: ' + FormatFloat('0.000s', LStopWatch.Elapsed.TotalSeconds));
end;

No preallocation: 1,150s
Preallocation: 0,647s

Nice speedup, prosentually, but time saved in real world situation is very small.

 

So no point of optimizing this, in my current case. This way. Maybe my mary failed me, and my original optimizations where pre FastMM times. or I just looked how many percent faster it was 😉 Or both....

 

-Tee-

Share this post


Link to post
48 minutes ago, Tommi Prami said:

I do fast and crude test

You can also include debug DCU's, set non-stopping breakpoint to Stream's realloc method and see how many times it will be called

Share this post


Link to post
On 9/2/2022 at 12:04 PM, Fr0sT.Brutal said:

You can also include debug DCU's, set non-stopping breakpoint to Stream's realloc method and see how many times it will be called

That would be good to investigate, how I can see the count like that, never knew you could setup a breakpoint like that, so how to do that?? Non braking breakpoint yes, but how to see how many times it is called...

 

Also should run as release build with optimizations on, to see how it runs then. (This was pretty much similar, little bit slower, but OK)

Edited by Tommi Prami

Share this post


Link to post

Now that went through TMemoryStream code, it is buffered...

I must confuse it into some other stream class, sorry about that.

But really interested how to get call count of the non breaking breakpoint tough...

Share this post


Link to post
13 minutes ago, Tommi Prami said:

But really interested how to get call count of the non breaking breakpoint tough...

Add a breakpoint at the proper source line and open the context menu on that breakpoint. Set the Pass count to some high value.

 

For more info see the docs: https://docwiki.embarcadero.com/RADStudio/Alexandria/en/Add_Source_Breakpoint

Quote

Because the debugger increments the count with each pass, you can use them to determine which iteration of a loop fails. Set the pass count to the maximum loop count and run your program. When the program fails, you can calculate the number of loop iterations by examining the number of passes that occurred. 

 

  • Thanks 1

Share this post


Link to post
On 9/5/2022 at 6:32 AM, Tommi Prami said:

how I can see the count like that, never knew you could setup a breakpoint like that, so how to do that?

In your case of one-time check, the most simple option is to log breakpoint passes (evaluate some expression + log result) then copy to some text editor and count the lines

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×