Tommi Prami 130 Posted August 30, 2022 RTL has Buffered file stream now, but I think there is no buffered Memory stream, I think. is there fast implementation available? I made long long time ago one, but that was not super good implementation and don't have code anymore. At least then it made some code way faster, and Stringbuilder I believe is not super fast, and and can't handle binary data 🙂 -Tee- Share this post Link to post
shineworld 73 Posted August 30, 2022 (edited) TMemoryStream https://docwiki.embarcadero.com/Libraries/Sydney/en/System.Classes.TMemoryStream Edited August 30, 2022 by shineworld Share this post Link to post
David Heffernan 2345 Posted August 30, 2022 Rather than buffering in this case I have a memory stream that doesn't use contiguous memory. Instead of a single memory block, it manages a list of equally sized blocks. This avoids any performance issues with repeated calls to ReallocMem. Is that the performance block you want to work around? 1 Share this post Link to post
dummzeuch 1505 Posted August 30, 2022 2 hours ago, Tommi Prami said: RTL has Buffered file stream now, but I think there is no buffered Memory stream, I think. is there fast implementation available? I made long long time ago one, but that was not super good implementation and don't have code anymore. At least then it made some code way faster, and Stringbuilder I believe is not super fast, and and can't handle binary data 🙂 What exactly is the problem you want to solve? Reallocating memory when writing to TMemoryStream? 1 Share this post Link to post
Tommi Prami 130 Posted August 31, 2022 16 hours ago, dummzeuch said: What exactly is the problem you want to solve? Reallocating memory when writing to TMemoryStream? Current code writes stuff into the memory stream in small chunks, it would be _easy_ optimization to have buffered memory stream. I used to have very crude memory stream that allocated memory with some mechanism that I can't remember, it was mainly used to write XML data back then, but it made some code 20x faster. (if I recall), but some isolated cases where way faster, than reallocating all the time. One thing I remeber is that implementation did reallocate always when buffer got full, so was not super smart, just some simple growth scheme. Sure I could rewrite the code at first place, but sometimes it is legacy code of some third party library, so rewriting might be too big of a task. Would be better to get some maintained library instead, if available. But it always is case by case call which way to go. -Tee- Share this post Link to post
Tommi Prami 130 Posted August 31, 2022 18 hours ago, shineworld said: TMemoryStream https://docwiki.embarcadero.com/Libraries/Sydney/en/System.Classes.TMemoryStream As far as I know TMemoryStream does not buffer, at least in a way that I want to use it. -Tee- Share this post Link to post
Tommi Prami 130 Posted August 31, 2022 18 hours ago, David Heffernan said: Rather than buffering in this case I have a memory stream that doesn't use contiguous memory. Instead of a single memory block, it manages a list of equally sized blocks. This avoids any performance issues with repeated calls to ReallocMem. Is that the performance block you want to work around? That sounds smart implementation, if blocks are big enough, SaveToFile etc (combining the data to one, or reading it to elsewhere) should be pretty fast also. Share this post Link to post
David Heffernan 2345 Posted August 31, 2022 53 minutes ago, Tommi Prami said: Current code writes stuff into the memory stream in small chunks, it would be _easy_ optimization to have buffered memory stream. I used to have very crude memory stream that allocated memory with some mechanism that I can't remember, it was mainly used to write XML data back then, but it made some code 20x faster. (if I recall), but some isolated cases where way faster, than reallocating all the time. One thing I remeber is that implementation did reallocate always when buffer got full, so was not super smart, just some simple growth scheme. Sure I could rewrite the code at first place, but sometimes it is legacy code of some third party library, so rewriting might be too big of a task. Would be better to get some maintained library instead, if available. But it always is case by case call which way to go. -Tee- I'm not sure what you are saying here. Writing to memory is just writing to memory. Be it some intermediate buffer or the memory block backing the stream. Can you explain what performance block you are trying to overcome. Reallocation is the only block I can see. 1 Share this post Link to post
Fr0sT.Brutal 900 Posted August 31, 2022 FastMM already reserves memory when reallocating. Once I made elementary benchmark of adding chars to a string: straight way and with stringbuilder. Results were almost the same... Anyway don't forget the rule: "Profile first, then optimize" Share this post Link to post
dummzeuch 1505 Posted August 31, 2022 I've got a generic TdzStreamCache implementation in my dzlib, which adds caching to any type of stream. Not sure whether it is any improvement to TMemoryStream though. I only used TMemoryStream in the unit tests to ensure that no data gets lost, I never timed it against it. 1 Share this post Link to post
Tommi Prami 130 Posted September 1, 2022 On 8/31/2022 at 9:52 AM, David Heffernan said: I'm not sure what you are saying here. Writing to memory is just writing to memory. Be it some intermediate buffer or the memory block backing the stream. Can you explain what performance block you are trying to overcome. Reallocation is the only block I can see. Reallocation for sure. Share this post Link to post
Tommi Prami 130 Posted September 1, 2022 On 8/31/2022 at 9:54 AM, Fr0sT.Brutal said: FastMM already reserves memory when reallocating. Once I made elementary benchmark of adding chars to a string: straight way and with stringbuilder. Results were almost the same... Anyway don't forget the rule: "Profile first, then optimize" For sure. My experience is that if you write lot of stuff to to stream, and if there is lot of reallocation, it'll be quite slow. Need to measure this for sure, before commit into anything. All I know i've used buffered stream to speed up very close to same situation now. I just remember that some parts of the code got way faster, and overall saw significant speedup. I think this comes down to how much data is written and how small pieces and so on, in other words, how many reallocations will happen in real world. I think I logged the final stream sizes for a day (it was an server which made Xml files) and used Stetson-Harris method to pic some nice size for initial allocation size, and adjusted growth strategy to get some kind of balance and not to allocate way too large buffers. Concatenating strings on FastMM is fast, but still I managed to make nice speedup back then. It is quite easy to make Buffered stream, but testing and handling all possible corner cases you first miss... That is why I am asking if there is good implementation that I could test, does it make sense or not (in this case) 🙂 -Tee- Share this post Link to post
David Heffernan 2345 Posted September 1, 2022 18 minutes ago, Tommi Prami said: For sure. My experience is that if you write lot of stuff to to stream, and if there is lot of reallocation, it'll be quite slow. Need to measure this for sure, before commit into anything. All I know i've used buffered stream to speed up very close to same situation now. I just remember that some parts of the code got way faster, and overall saw significant speedup. I think this comes down to how much data is written and how small pieces and so on, in other words, how many reallocations will happen in real world. I think I logged the final stream sizes for a day (it was an server which made Xml files) and used Stetson-Harris method to pic some nice size for initial allocation size, and adjusted growth strategy to get some kind of balance and not to allocate way too large buffers. Concatenating strings on FastMM is fast, but still I managed to make nice speedup back then. It is quite easy to make Buffered stream, but testing and handling all possible corner cases you first miss... That is why I am asking if there is good implementation that I could test, does it make sense or not (in this case) 🙂 -Tee- Buffering isn't really going to help here, because the MM already does that in effect. Share this post Link to post
Fr0sT.Brutal 900 Posted September 1, 2022 (edited) 41 minutes ago, David Heffernan said: Buffering isn't really going to help here, because the MM already does that in effect. In this case I think this could be optimized. Depending on overall sizes, of course. Reallocating 100-Mb chunks slow down the process anyway even if FastMM reserves some space. In this application (lots of writing, total large size) that stream you mentioned (storing its contents in separate small chunks) will beat any contiguous periodically growing buffer. Another option, of course, it so actually stream the data not buffer. Edited September 1, 2022 by Fr0sT.Brutal Share this post Link to post
David Heffernan 2345 Posted September 1, 2022 9 minutes ago, Fr0sT.Brutal said: In this case I think this could be optimized. Depending on overall sizes, of course. Reallocating 100-Mb chunks slow down the process anyway even if FastMM reserves some space. In this application (lots of writing, total large size) that stream you mentioned (storing its contents in separate small chunks) will beat any contiguous periodically growing buffer. Another option, of course, it so actually stream the data not buffer. I agree. My point is that OP keeps asking for buffering but that won't really help. Share this post Link to post
Tommi Prami 130 Posted September 2, 2022 20 hours ago, David Heffernan said: Buffering isn't really going to help here, because the MM already does that in effect. I do fast and crude test. Good that I did, speed gain is nice, prosentually, but it is much faster that I though: procedure TForm3.Button1Click(Sender: TObject); const LOOP_MAX: Integer = 90000000; DATA: RawByteString = '0123456789'; var LStopWatch: TStopWatch; I: Integer; LStream: TMemoryStream; LDataLength: Integer; begin LDataLength := Length(DATA); LStream := TMemoryStream.Create; LStopWatch := TStopWatch.StartNew; try LStopWatch := TStopWatch.StartNew; for I := 1 to LOOP_MAX do LStream.Write(DATA[1], LDataLength); LStopWatch.Stop; finally LStream.Free; end; Memo1.Lines.Add('No preallocation: ' + FormatFloat('0.000s', LStopWatch.Elapsed.TotalSeconds)); LStream := TMemoryStream.Create; LStream.Size := (LOOP_MAX * LDataLength) + 1000; LStopWatch := TStopWatch.StartNew; try LStopWatch := TStopWatch.StartNew; for I := 1 to LOOP_MAX do LStream.Write(DATA[1], LDataLength); LStopWatch.Stop; finally LStream.Free; end; Memo1.Lines.Add('Preallocation: ' + FormatFloat('0.000s', LStopWatch.Elapsed.TotalSeconds)); end; No preallocation: 1,150s Preallocation: 0,647s Nice speedup, prosentually, but time saved in real world situation is very small. So no point of optimizing this, in my current case. This way. Maybe my mary failed me, and my original optimizations where pre FastMM times. or I just looked how many percent faster it was 😉 Or both.... -Tee- Share this post Link to post
Fr0sT.Brutal 900 Posted September 2, 2022 48 minutes ago, Tommi Prami said: I do fast and crude test You can also include debug DCU's, set non-stopping breakpoint to Stream's realloc method and see how many times it will be called Share this post Link to post
Tommi Prami 130 Posted September 5, 2022 (edited) On 9/2/2022 at 12:04 PM, Fr0sT.Brutal said: You can also include debug DCU's, set non-stopping breakpoint to Stream's realloc method and see how many times it will be called That would be good to investigate, how I can see the count like that, never knew you could setup a breakpoint like that, so how to do that?? Non braking breakpoint yes, but how to see how many times it is called... Also should run as release build with optimizations on, to see how it runs then. (This was pretty much similar, little bit slower, but OK) Edited September 5, 2022 by Tommi Prami Share this post Link to post
Tommi Prami 130 Posted September 5, 2022 Now that went through TMemoryStream code, it is buffered... I must confuse it into some other stream class, sorry about that. But really interested how to get call count of the non breaking breakpoint tough... Share this post Link to post
Uwe Raabe 2057 Posted September 5, 2022 13 minutes ago, Tommi Prami said: But really interested how to get call count of the non breaking breakpoint tough... Add a breakpoint at the proper source line and open the context menu on that breakpoint. Set the Pass count to some high value. For more info see the docs: https://docwiki.embarcadero.com/RADStudio/Alexandria/en/Add_Source_Breakpoint Quote Because the debugger increments the count with each pass, you can use them to determine which iteration of a loop fails. Set the pass count to the maximum loop count and run your program. When the program fails, you can calculate the number of loop iterations by examining the number of passes that occurred. 1 Share this post Link to post
Fr0sT.Brutal 900 Posted September 19, 2022 On 9/5/2022 at 6:32 AM, Tommi Prami said: how I can see the count like that, never knew you could setup a breakpoint like that, so how to do that? In your case of one-time check, the most simple option is to log breakpoint passes (evaluate some expression + log result) then copy to some text editor and count the lines Share this post Link to post