Best type for data buffer: TBytes, RawByteString, String, AnsiString, ...

Rollo62 · October 2, 2020

Just now, FPiette said:

Maybe you don't know, but I'm the author of ICS (Internet Component Suite) and what you describe is exactly what happens with a TCP stream which is what a TCP socket gives. So I have large experience in that kind of data receiving handling.

Of course I know, thanks for that nice piece of software :classic_cheerleader:

Quote

There are 3 ways to delimit data:

1) There are delimiters such STX/END around a data block or a CRLF (or any other end-of-block marker) at the end of data block.

2) Each block begins with a data length.

3) Communication is closed or broken

Yes, TCP/UDP has very similar issues.
Additionally in my configuration I cannot rely on delimiters, as I said, some use SOT/EOT delimiters, others use fixeld-length method.
So if I only need to follow one protocol, that would be easy, but I need to follow several, different protocols.
Yes, I could use a special handler for each protocol near the source, thats true.

I already considered before, beside simple data types, to use a "class wrapper" für such tasks.
Originally I thought about a wrapper for the simple types in the first place,
but finally there could also linked lists, fifo or other classes be used.

I have a working solution based on TBytes right now, and I look for fast/easy optimizations, so maybe I need to move through
more than one step:

1. RawByteString
2. Linked list

Since the 2. is a bigger efford to change at the moment, I would need to postpone that to a future change.
But thanks for the suggestions anyway.

I know that this is of course a very powerful method, transferring referenced types, which reduces memory copying a lot.

FPiette · October 2, 2020

4 minutes ago, Rollo62 said:

Yes, TCP/UDP has very similar issues.

No, only TCP. UDP is a datagram oriented protocol.

Quote

Additionally in my configuration I cannot rely on delimiters, as I said, some use SOT/EOT delimiters, others use fixeld-length method.

But you are able to discover which protocol is used in your processor stuff. What I say is that you have to move the part of the code out of the processor code and move it to an intermediate layer between the low level receiver and high level the processor. Layers is the key success in protocol handling.

Quote

1. RawByteString
2. Linked list

RawByteString is low performance. It is all about memory alloc/free, data copy and memory fragmentation. Low performance but easier to design and develop.

Linked list (Or queue if you want to encapsulate the linked list in another name) is the way to go for performance as I explained. And yes, I agree that this is more complex to design and develop but this is the price to pay for performance.

Quote

Since the 2. is a bigger efford to change at the moment, I would need to postpone that to a future change.

I bet you will never do that change. You'll buy a higher performance computer.

Rollo62 · October 2, 2020

26 minutes ago, FPiette said:

I bet you will never do that change. You'll buy a higher performance computer.

Then you don't know me well, I make permanent, changes to all my core code libraries on a regular base

Only I have too little sleep sometimes

Even this change wouldn't be necessary, as it works right now and only degrades in performance under some few, rare conditions.

But I'm willing to prepare for future devices, and to increase performance for my customers.

Edited October 2, 2020 by Rollo62

FPiette · October 2, 2020

11 minutes ago, Rollo62 said:

Then you don't know me well

Yes, I have no idea who you are!

A.M. Hoornweg · October 2, 2020

I must really warn for RawByteString.

The documentation says that you should not instantiate variables of type RawByteString, it is meant to be used as a parameter or result type of a function.

I have tried, and if you declare a variable of type Rawbytestring and put data into it, it will behave just like Ansistring and has the same code page issues:

procedure TForm2.Button2Click(Sender: TObject);
var raw:RawByteString;
begin
     setlength(raw,100);
     Showmessage(format('The code page of this rawbytestring is %d',[stringcodepage(raw)]));
end;

On my system this rawbytestring has code page 1252!

Rollo62 · October 2, 2020

5 minutes ago, A.M. Hoornweg said:

I have tried, and if you declare a variable of type Rawbytestring and put data into it, it will behave just like Ansistring and has the same code page issues:

I will check, it also said that it should have no codepage, only when you explicitly set one.

Fr0sT.Brutal · October 2, 2020

10 minutes ago, A.M. Hoornweg said:

I have tried, and if you declare a variable of type Rawbytestring and put data into it, it will behave just like Ansistring and has the same code page issues:

Yes, faced the same issue. Despite its name, this type is kinda tricky. I had to use SetCodePage(S, $FFFF, False); in my function where I create variabe of this type.

A.M. Hoornweg · October 2, 2020

6 minutes ago, Fr0sT.Brutal said:

Yes, faced the same issue. Despite its name, this type is kinda tricky. I had to use SetCodePage(S, $FFFF, False); in my function where I create variabe of this type.

You can do that on any Ansistring. RawByteString does not add any value here.

Also, it is a can of worms. If you do something like c:=a+b, what's the code page of the result?

Edited October 2, 2020 by A.M. Hoornweg

Fr0sT.Brutal · October 2, 2020

11 minutes ago, A.M. Hoornweg said:

You can do that on any Ansistring. RawByteString does not add any value here.

Also, it is a can of worms. If you do something like c:=a+b, what's the code page of the result?

Sure it could be done with any AnsiString. But "RawByteString = type AnsiString($ffff);" so I just manually set the right value.

I suppose no conversion is done for this codepage so regardless of the value compiler will set to result, it still will be raw bytes.

I really can't understand why wouldn't they name the "any codepage string" type like "CodepagedString" and make RawByteString behave just like its name supposes

PS Post #555! ))

Edited October 2, 2020 by Fr0sT.Brutal

Darian Miller · October 2, 2020

10 hours ago, Rollo62 said:

But from my gut feeling I would say that TBytes is probably not the most efficient data type for handling data, since its dynamic handling is not supported very well from the compiler.

You seem highly concerned about performance implications - so measure the performance. Simulate a million (or whatever constitutes a significant amount) receipts of randomized sample data and measure your few possible implementations. If you find very little difference, then pick the one that is easiest for your team to extend and maintain.

If your original source is TBytes, and perhaps your output source is TBytes, then introducing any conversions seems wasteful, but gut feelings should only act as a general guide. You either measure or you keep guessing. If you keep guessing, you may accidently determine if it's (currently) providing acceptable performance.

Rollo62 · October 2, 2020

Yes, it is TBytes, but not well optimized.
Thats why I look for more suitable options.
I just replaces TBytes by a custom type based on RawByteString, and all unit tests were running well so far.

Maybe tomorrow I can get my first impression (and measurements) on performance.

jbg · October 2, 2020

TBytes suffers from being a dynamic array. If you set its length it will initialize the allocated memory with zeros. So if you allocate a TBytes array and then write data into it (from a file, other memory, socket) you don't need the zero-initialization because you overwrite them anyway. Unfortunately there is no "SetLengthUninitialized" function for TBytes that would get rid of the unnecessary ZeroMemory/FillChar call. I wrote my own function and patched all the TEncoding methods and use it in places were performance matter.

I just saw that that was already mentioned in this thread.

Edited October 2, 2020 by jbg

Clément · October 2, 2020

TBytes and pointers! Either love or hate them!

Can you avoid allocation if you work with one large chuck of data (eg: 1 or 2MBytes) ?

Fr0sT.Brutal · October 5, 2020

On 10/2/2020 at 8:15 PM, jbg said:

If you set its length it will initialize the allocated memory with zeros. So if you allocate a TBytes array and then write data into it (from a file, other memory, socket) you don't need the zero-initialization because you overwrite them anyway

What perf gain it really brings? My feelings say that allocation cost is much greater than fill cost. Especially because the latter is very well optimized with asm routines using block CPU commands

Edited October 5, 2020 by Fr0sT.Brutal

A.M. Hoornweg · October 5, 2020

What throughput are we talking about here anyway, in kilobytes per second?

I think good design is much more important than premature optimizations.

Rollo62 · October 5, 2020

1 hour ago, A.M. Hoornweg said:

What throughput are we talking about here anyway, in kilobytes per second?

I think good design is much more important than premature optimizations.

Its not very critical, llike maximum maybe 1 Kbps,

but the single transmissions maybe be processed in several different positions, llike DB-store, listview, Chart, ...

My tests so far didn't show much different performance between TBytes and RawByteString,

but thats hard to test, as I've tried to check the whole signal chain.

Yes, a different design, like the linked list proposal from @FPiette, comes closer to my decision.
Since that means a bigger rewrite than just a buffer change, I will have to do that later.

So far its working, and I can stay with TBytes, which also make sense with the binary data.

Maybe from a philosophical standpoint I could think the following:

Does the data contains only pure ASCII --> Then RawByteString could make sense
May data contain Binary, beside ASCII --> Then maybe the whole data SHOULD BE considered as binary, and TBytes is the right choice.

Edited October 5, 2020 by Rollo62

Stefan Glienke · October 6, 2020

On 10/2/2020 at 2:33 PM, FPiette said:

btw: Linked list are frequently used to implement queues (FIFO, LIFO). I never seriously looked at how Delphi RTL implement TQueue<T> but I think it is an array which is less efficient.

Especially for queues I don't see a clear advantage of a linked list even less so in favor of using an array. Even more for a circular buffer. I agree mostly with this answer.

Also as for linked lists vs arrays - question knowledge that was aquired decades ago and refresh it considering modern hardware. Things like prefetchers make sequential data access order of magnitudes faster than random memory access resulting from non optimized linked list implementations.

FPiette · October 6, 2020

Quote

Especially for queues I don't see a clear advantage of a linked list even less so in favor of using an array.

Linked lists compared to arrays allow to insert/delete/move items without moving the whole array. When doing a FIFO, it is very important to be able to quickly insert at one end.

With linked list it is easy to implement multiple queues, for example a free list and a busy list without moving data. Obviously, if you have a small numbers of items, it makes no differences. For a really small number of items, a linked list may even be slower that an array. As everywhere, use case are different and there is no best solution for everything.

Quote

from non optimized linked list implementations.

That's true for anything! Non optimized is... non optimized 🙂

Lars Fosdal · October 6, 2020

I'd have a TBytes buffer and a list/array on the side where the parser breaks down the offset, "type" and method for extracting the buffer content in the appropriate form.

If you use a circular buffer - it is slightly more complex - but not exceedingly so. The list/array on the side could also be circular.

Rollo62 · October 7, 2020

16 hours ago, FPiette said:

Linked lists compared to arrays allow to insert/delete/move items without moving the whole array. When doing a FIFO, it is very important to be able to quickly insert at one end.

I used the circular buffer (of bytes) in the first place, because of that decoupling of write/read operations by a fixed time.
The transmissions could be fragmented, means 1,2 or 3 transmissions needed to complete a frame.

But in the begining the frames came veeeeeery rarely only on manual demand, from the first devices we supported.

Then later I needed to add new devices, which send data more frequently, but still moderate, thats why I consider possible optimizations now.

Maybe the next devices will have even higher load, so this might get critical.

The circular buffer I use has a fixed memory pre-allocated, so there should be no overhead with grow or shrink,
that was the main advantage of circular buffer, why I've made that decision at that time.

It still works fine and decouples read / write operations well, just buffering the slight processing deviations from other tasks,
to keep the average processing non-blocking, thats was my goal.

I never saw a buffer overrun, although this would be allowed too, then simply grow once and stay at double space.

I could think of an optimized list, to buffer the single fragments, but that would need to create / destroy memory each time.
In the circular buffer I just need to move the pointers, to the new memory position, which is very effective.

The analysis of the data is done after the circular buffer (for decoupling).

When reading from the circular buffer, then the data fragments could be torn, like 1, 2-and-half ==> not fully received until 3 yet.
So I have to wait for more data and retry.

For this analysis I use an external TBytes buffer, which could be replaced by linked list, etc.
From deeper consideration, maybe the analysis could be taken "inside" the circular buffer itself, saving the cost of copying memory.

The problem is probably the "circular" design, which doesn't alllow effective, linear memory searches.

It could be reasonable to replace the circular buffer by a two-section linear buffer of TBytes, like a "framebuffer"

where one section currently reads until ready, then switches to processing, while the second then reads until the next frame finished:

First, partly received ( full frame consists of 1,2,3 )

(1)==> "1,2" --> analyse incomplete

(2)==> ""

where e.g. section one rund full until 1,2,3 ==> where always TBytes can be fastly analysed from 0 ... last, as one memory piece.

(1)==> "1,2,3" --> analyse complete, switch to processing the buffer

(2)==> ""

or

When found 1,2,3 and maybe already the next fragment 4 is in the 1. section,

(1)==> "1,2,3,4" --> analyse complete, copy the overhead to the 2nd buffer, switch to processing the buffer

(2)==> ""

it could be copied to the next section, to have section (1) for further processing, while (2) is now the new write buffer.

While new fragments "5" may arrive

(1)==> "1,2,3" --> processing

(2)==> "4,5" --> buffering

The frames may consist of various data representations, from fixed length, over SOT/EOT signatures, to simply timeout based,
that is the problem.
There is no clear protocol to rely on, so I have to do specific analysis all the time when data arrives (also extendable in the future).

Anyway the real problem seems to me the analysing, not the storage of the fragments itself.

The analysing might fail (not fully received), and needed to wait for more data, so an analysing process needs to be repeated from time to time.

Thanks for your input to get some diffferent views and thoughts, I think still this is not the final solution yet, but maybe closer.

Edited October 7, 2020 by Rollo62

Stefan Glienke · October 7, 2020

19 hours ago, FPiette said:

That's true for anything! Non optimized is... non optimized 🙂

There is a difference between using a data structure like an array which is optimized by default because the hardware really really likes it and one where additional work has to be done.

Well the only thing to optimize on an array is to make the type being stored as compact as possible and to ensure it uses as few cache lines as possible.

When implementing a linked list naively you end up with heap memory all over the place - so make it as cache friendly as possible to have to go the extra mile - that's what I meant.

But as you said it depends but for adding/removing at both ends a circular buffer array with "wrap around" logic to avoid moving when operating at the head will win I am pretty sure.

Additional read on the subject: https://dzone.com/articles/performance-of-array-vs-linked-list-on-modern-comp

Edited October 7, 2020 by Stefan Glienke

TurboMagic · October 7, 2020

On 10/2/2020 at 9:01 AM, Rollo62 said:

Yes I have that, but the strings are so much more elegant.
I would whish to have better TBytes and dyn. array support like that in Delphi.

You know that arrays can be concatenated with + since XE7? And delete etc. is available for them as well!

Rollo62 · October 8, 2020

@TurboMagic

Thanks, I was not aware of that feature.

Found this here from Marco.

But I'm working with TBytes as parameter, not sure if that should also work.

So far I'm using manual pointer operations.

Have to check how that is done and look into its performance, maybe only next week.

Stefan Glienke · October 8, 2020

Concatenation of dynamic arrays is just for convenience - it certainly does not help to improve performance if you are concerned about heap allocations.

October 8, 2020

On 10/7/2020 at 11:20 AM, Stefan Glienke said:

Additional read on the subject: https://dzone.com/articles/performance-of-array-vs-linked-list-on-modern-comp

Brilliant! I hade coding linked lists 🙂

But seriously - when deploying server code "in tha cloud" mostly you are deprived of L1..3 cache.

This is not my corner of the shop so i would like to humbly ask; do you "folks" consider metal/virtual when making decisions concerning O(n)-stuff?

Sign In

Best type for data buffer: TBytes, RawByteString, String, AnsiString, ...

Recommended Posts

Rollo62 589

Share this post

Link to post

FPiette 393

Share this post

Link to post

Rollo62 589

Share this post

Link to post

FPiette 393

Share this post

Link to post

A.M. Hoornweg 153

Share this post

Link to post

Rollo62 589

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

A.M. Hoornweg 153

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

Darian Miller 387

Share this post

Link to post

Rollo62 589

Share this post

Link to post

jbg 243

Share this post

Link to post

Clément 153

Share this post

Link to post

Fr0sT.Brutal 903

Share this post

Link to post

A.M. Hoornweg 153

Share this post

Link to post

Rollo62 589

Share this post

Link to post

Stefan Glienke 2143

Share this post

Link to post

FPiette 393

Share this post

Link to post

Lars Fosdal 1866

Share this post

Link to post

Rollo62 589

Share this post

Link to post

Stefan Glienke 2143

Share this post

Link to post

TurboMagic 102

Share this post

Link to post

Rollo62 589

Share this post

Link to post

Stefan Glienke 2143

Share this post

Link to post

Guest

Share this post

Link to post

Create an account or sign in to comment

Create an account