The Case of Delphi Const String Parameters

balabuev · January 10, 2021

16 minutes ago, Dalija Prasnikar said:

Definition of value and reference type does not change with language.

You wrong here. There EXIST a more general, or more conceptual if you like, notion of value types and reference types.

Just as a good example, please look here: https://developer.apple.com/swift/blog/?id=10

Quote

In Swift, Array, String, and Dictionary are all value types.

Period.

52 minutes ago, Dalija Prasnikar said:

You are confusing value of the string with value type.

No, let's quote again two things:

- First, is a qualifying condition of the value type from Wikipedia: [a type is a value type, if: ] A value of value type is the actual value.

- Second, your own words: Value of the string is sequence of characters.

Dont' you see the full match?

January 10, 2021

Another thing i missed mentioning, when i point no compiler does that meant no compiler should break the logical sequence of the language and its RTL, they all should be in harmony, i wasn't talking about reference count in java or any other language.

The idea is simple and clear if A sending a parameter to B and be is declared as const then no matter what should happen that parameter should not changed and should stay still, i know this will not apply to a class (objects..etc) but definitly should apply to the language defined simple types, like no leak no lose or change of content.

Dalija Prasnikar · January 10, 2021

1 minute ago, balabuev said:

In Swift, Array, String, and Dictionary are all value types.

Period.

I know. And in Delphi all those types are reference types. In Java all those types are also reference types.

Again, definition of value type and reference type does not imply which types in particular language belong to particular category.

1 minute ago, balabuev said:

Next:

No, let's quote again two things:

- First, is a qualifying condition of the value type from Wikipedia: [a type is a value type, if: ] A value of value type is the actual value.

- Second, your own words: Value of the string is sequence of characters.

Value of the string content, not string variable. By your definition all types would be value types.

I also quoted Delphi documentation that clearly states value of string variable itself is pointer to actual string data. QED.

balabuev · January 10, 2021

26 minutes ago, Dalija Prasnikar said:

I know. And in Delphi all those types are reference types. In Java all those types are also reference types.

You did not get my point. I guess we both understand that in Swift as in most other languages strings are implemented as pointer to heap allocated memory. As well, as dictionaries. But, nevertheless they call them value types.

Because the value type notion is more general, than you think.

Edited January 10, 2021 by balabuev

Dalija Prasnikar · January 10, 2021

12 minutes ago, Kas Ob. said:

Lets establish few things and facts first

The only fact there is is that reference counting does not work like that. And it could not work with any bit flipping. Bit flipping does not solve anything.

20 minutes ago, Kas Ob. said:

What is perplexing me is your insistence that this can't be fixed or even merely the idea it can be fix is a blasphemy, why ?! technology and science didn't evolve by refusing new ideas crazy or not.

It is not a blasphemy, it is simply not a working solution. Again, I am not against discussing potential solutions to anything, nor how can something be improved, but when your initial understanding is flawed, there are just too many things that I would have to convince you about.

January 10, 2021

8 minutes ago, Dalija Prasnikar said:

but when your initial understanding is flawed,

Constant strings since ages does have -1 as reference counting, generated by the compiler by default, hence current RTL you are talking about is built on top of that.

balabuev · January 10, 2021

I want to kindly remeber that the topic is not exclusively about strings...

procedure TForm11.Button1Click(Sender: TObject);
type
 TFoo = record
   X:       Integer;
   BigData: array[0..15] of Integer;
 end;

var
  s1: TFoo;

  procedure Test(const Value: TFoo);
  begin
    s1.X := 9;
    ShowMessage(Value.X.ToString);
  end;

begin
  s1.X := 7;
  Test(s1);
end;

I know, I know, this works as expected. Because:

1) Delphi is Delphi!

2) If you thinking differently, then:

2.1) You thinking about wnything else, but not about Delphi.

2.2) Shut up and goto (1)

Edited January 10, 2021 by balabuev

Dalija Prasnikar · January 10, 2021

8 minutes ago, balabuev said:

You did not understood my point. I guess we both understand that in Swift as in most other languages strings are implemented as pointer to heap allocated memory. As well, as dictionaries. But, nevertheless they call them value types.

Because the value type notion is more general, than you think.

I answered before thinking... so you got me

But, real answer is more complicated. Technically, memory representation in Swift allows both variants depending on the size of data (content). So until you discuss actual code, there is not way of telling how will inner representation work. It may be value and it may be reference.

In Swift value types and reference types in terms of Swift documentation also means all value types in Swift imply copy on assignment semantic, so for Swift developers that classification carries more weight than actual underlying representation - which again depends on the actual content.

Also Swift compiler has(d) bugs around handling "value types" that are not really value types, but rather reference types in terms of Wikipedia definition, that would cause memory leaks under certain conditions (and other issues). I am saying bugs, because some of those I know about were actual bugs, I am not in position to say whether some of those bugs are just "as designed" behavior similar to Delphi const string parameter behavior. I am not that deeply involved with Swift and I am not familiar with all its internals, that also change every five minutes as Swift involves.

Just like Delphi strings, Swift "fake" value types can suffer from some problems cause by the fact that they don't occupy single location in memory.

Dalija Prasnikar · January 10, 2021

48 minutes ago, Kas Ob. said:

Constant strings since ages does have -1 as reference counting, generated by the compiler by default, hence current RTL you are talking about is built on top of that.

That is because string literals don't require managing memory as they are part of the executable and are stored in data segment.

Their reference count of -1 is just flag used for optimization and omitting reference counting that makes no sense for general reference counting mechanism.

Dalija Prasnikar · January 10, 2021

25 minutes ago, balabuev said:

I want to kindly remeber that the topic is not exclusively about strings...


procedure TForm11.Button1Click(Sender: TObject);
type
 TFoo = record
   X:       Integer;
   BigData: array[0..15] of Integer;
 end;

var
  s1: TFoo;

  procedure Test(const Value: TFoo);
  begin
    s1.X := 9;
    ShowMessage(Value.X.ToString);
  end;

begin
  s1.X := 7;
  Test(s1);
end;

It works as expected because TFoo is record and records are value types. That means two things, first there is no additional memory management (heap) involved. s1 holds complete content of record on stack without indirections. Next because size of record does not fit into register and Value is passed as const compiler optimization kicks in and passes reference to s1 and not copy. In other words generated code is the same as it would be if you used var parameter.

If you change declaration of TFoo and make it smaller, you would get different result, because Value would now contain independent copy of the data.

 TFoo = record
   X: byte;
   Y: byte;
 end;

If you change declaration of Test to

procedure Test(const [ref] Value: TFoo);

Then regardless of TFoo size it would be always passed as reference (pointer) to the original data, and you would not have two copies.

January 10, 2021

5 minutes ago, Dalija Prasnikar said:

Their reference count of -1 is just flag used for optimization and omitting reference counting that makes no sense for general reference counting mechanism.

The RTL is built on preventing the transition from -1 to 0 in case with strings, so why not exploit/use this to protect runtime vars, also why to stop at -1, is there a law prevent using using such usage ? is it wikipedia or unspoken rules of Pascal.

But the RTL is built to handle this case with RefCount = -1, the same RTL functions are doing this and expected to generate normal behaviour and those are in fact doing well in that matter, so literal or not, what will do we violate by extend that behaviour? or is the result better logic and more resilient code ?

Your argument started and still going about, we shouldn't fix it because it can't be fix and even we shouldn't think about it because any such thinking means we don't understand reference counting, am i getting this right ?

Dear Dalija, i think you are missing the big picture here by not looking from different angle (or the big picture) , reference count is not the target, reference counting is used to manage one thing only, and that is to know when something is not used (and will not be used) and need to be free, that simple, the underlining scheme with counting the usage or building a huge list, is irrelevant, just to know when a value need special copy/duplicate or remove, and please don't correct me about the usage of a value (or content), we are beyond this as it is irrelevant, i think you do understand what i mean, an action need to be triggered when reached 0 in this case and we need to prevent it when going from -1 to 0, in theory i see it is doable and viable.

balabuev · January 10, 2021

I should argue, that strictly technical definition of value type vs reference type is also confusing:

type
  TFoo = record
    S: string;
  end;

var
  S: string; // string is a reference type (in memory layouting sense).
  F: TFoo;   // TFoo is a value type.

While the memory layouts are identical in both cases.

Also, @Dalija Prasnikar, just for fun, how you define for yourself Pointer type? Is it a value type or a reference type?

procedure P;
var
  x: Integer;
  p: Pointer;
begin
  p := @x;
end;

Edited January 10, 2021 by balabuev

Dalija Prasnikar · January 10, 2021

5 minutes ago, balabuev said:

Also, @Dalija Prasnikar, just for fun, how you define for yourself Pointer type? Is it a value type or a reference type?

Pointer is a reference type.

But if you are bisecting reference types, then reference part (the immediate value stored in variable) has value type semantics. In other words when you assign one pointer to another you are creating copy of a stored value in that variable alone (reference part), just like when you are assigning one integer to another.

If that pointer is not nil - then assigning one pointer to another will still point to the same data location, while pointer itself will have two distinct copies at that time.

That also answers the first part of your question.

balabuev · January 10, 2021

1 hour ago, Dalija Prasnikar said:

Pointer is a reference type.

I knew that you'll say this .

With Pointer type I just wanted to highlight the fact that its values have no associated second part (like heap chars data in strings). In definition from Wikipedia this mising second part is called - "actual value".

Yes, pointers usually points to some data, but this data cannot be considered a part of pointer value - pointer values point to (or hold location of) some external data. I mean that pointer value itself (the address) is quite disconnected from the data (if any) to which it points.

And so, the "actual value" of Pointer - is the stored address itself.

2 hours ago, Dalija Prasnikar said:

That also answers the first part of your question.

So, if I tell you just the fact that some variable is represented by a pointer to heap actual data, you will not be able to guess, whether I'm speaking about a variable of a value type (like, for example, TFoo) or a variable of a reference type (like, for example, dynamic array). It's impossible to distinguish between these two things based on provided information.

Dalija Prasnikar · January 10, 2021

6 minutes ago, balabuev said:

I knew that you'll say this .

With Pointer type I just wanted to highlight the fact that its values have no associated second part (like heap chars data in strings). In definition from Wikipedia this mising second part is called - "actual value".

Only nil pointer does not have any associated data. When you put some address in pointer you have associated it with some data that is not directly stored in variable itself.

Since main purpose of pointer types is storing address to something, so if you treat that address as actual value, then you can say that raw pointers are value types. There is some ambiguity here, sure.

In case of other reference types, you are definitely not interested in address as value, but content (value) in associated second part and that is what makes strings, interfaces, objects, dynamic arrays reference types.

Dalija Prasnikar · January 10, 2021

6 hours ago, Kas Ob. said:

The RTL is built on preventing the transition from -1 to 0 in case with strings, so why not exploit/use this to protect runtime vars, also why to stop at -1, is there a law prevent using using such usage ? is it wikipedia or unspoken rules of Pascal.

Again, -1 is used JUST for string literals. You know when you write s := '123', then '123' is string literal and it is not dynamically allocated and its memory does not have to be managed. That is why there is optimization in RTL JUST for string literals that skips reference counting for them. -1 tells means you are dealing with string literal. You cannot use that value for any dynamically allocated data (strings).

balabuev · January 10, 2021

3 minutes ago, Dalija Prasnikar said:

then you can say that raw pointers are value types.

Thanks, God!

5 minutes ago, Dalija Prasnikar said:

Only nil pointer does not have any associated data.

Not agree:

procedure P(const S: string);
var
  p, eof: PChar;
begin
  p   := Pointer(S);
  eof := p + Length(S);

  while p <> eof do
  begin
    // Do something.
    Inc(p);
  end;
end;

Here, for example, eof variable is a pointer, which holds not nil address, while point to no meaningfull data. And unlike object references and other more obvious reference types this case is not too exoting for pointers.

January 11, 2021

This took too much discussion and i am giving up, to summarize what we have as facts

1) Literal string could be initialized to +5 ( or any positive value) by the compiler and this will not break anything, RTL doesn't need the part for checking for -1, we can agree on this , right ? if it is 1 before executing any code then it is impossible to reach 0 and trigger exception by returning it to MM, and considering the compiler and RTL are executing the code of accessing the RefCount in constant or not, then this mean inefficiency on both RTL and compiler too, so at least this can be changed for performance.

2) If we introduced another field for (lets say) strings, something like LockCount in the strings header, we could have reached the ultimate protection for strings, and here we have this

a) we didn't lose much bytes as all MM with Delphi is multiple of 8 at least, and the current header is 12 size.

b) by making the header size 16 byte we managed to align the chars sequence in string hence, then we achieved a good optimization with the possibility to aligned SIMD out of the box, assuming the MM is 16 aligned.

c) if that is not bad design, because it is not, also will guarantee the consistency of the compiler and the language, then are we allowed to think of optimize this to lose the new field and lose the alignment and achieve better performance by replacing it with a single bit on already existing field?

3) Introducing or just discussing any better code is wrong and off the table, because the compiler and RTL is not doing it, great idea and principle to evolve and enhance, here i would suggest 30 minute video on how these lost and helpless people calling themselves developers at FaceBook playing around with strings, introducing stuff and changing stuff, shame on them, they don't have a lick of understanding on how strings should work.

https://www.youtube.com/watch?v=kPR8h4-qZdk

So in short, it is not a bug, and will not be fixed, fine and good luck all.

ps: i wished if someone just thought about this through and asked how my suggestion flawed in case of chaining functions, and the answer if very simple and short (also easy to find), i skipped explaining to see if someone would ask !, no one care !, as a hint to wondering minds as i am leaving this, at least 10% of code in Delphi application is handling and shuffling values on stack 😉

Dalija Prasnikar · January 11, 2021

49 minutes ago, Kas Ob. said:

This took too much discussion and i am giving up

With this I agree...

There is already solution for this bug - triggering reference counting mechanism for const references. Not other gimmick is necessary. I also said why this solution is not viable, because reference counting induces performance penalty. And this is exactly the reason why there is no reference counting trigger for const parameters so that developers have more control over reference counting and to speed up code execution, since in most code passing reference counting instance (regardless of the type) does not require reference counting.

No reference counting is speed optimization.

49 minutes ago, Kas Ob. said:

1) Literal string could be initialized to +5 ( or any positive value) by the compiler and this will not break anything, RTL doesn't need the part for checking for -1, we can agree on this , right ? if it is 1 before executing any code then it is impossible to reach 0 and trigger exception by returning it to MM, and considering the compiler and RTL are executing the code of accessing the RefCount in constant or not, then this mean inefficiency on both RTL and compiler too, so at least this can be changed for performance.

Yes, literal string could be initialized to some higher value. But then every time such string would trigger reference counting just like other strings do. But in this case reference counting would be triggered for string literals, and again that would introduce performance penalty for string literals that do not require memory management and reference counting. Comparing some integer value with -1 is inherently faster than increasing/decreasing reference count.

Your proposal would again defeat the purpose of having -1 as speed optimization.

49 minutes ago, Kas Ob. said:

2) If we introduced another field for (lets say) strings, something like LockCount in the strings header, we could have reached the ultimate protection for strings, and here we have this

We already have that. It is called ReferenceCount field in string header. We don't need another number, negative positive or whatever.

http://docwiki.embarcadero.com/RADStudio/Sydney/en/Internal_Data_Formats_(Delphi)#Long_String_Types

So your idea with negative flag would not work at all, but even if it would that would also require locking operation on reference count field and same performance penalty. It would be simpler to just trigger reference counting instead.

Your idea with additional field (if I understood that correctly) would now mean we would have to do locking on two numbers instead of one. Again even more performance penalty.

If we want to pay price in performance penalty, then we could just have compiler to omit all speed optimizations and be done with it. And again I think I explained well enough why this will not happen. Not everyone is willing to sacrifice speed because once in a blue moon some developers might shoot themselves in the foot.

January 11, 2021

@Dalija Prasnikar You still didn't get it, and your argument of full of fallacies, based on miss understanding the idea to begin with.

41 minutes ago, Dalija Prasnikar said:

No reference counting is speed optimization.

I am not introducing any new reference counting in new places (were currently is not already there), my suggestion is to lose a fraction of the that gain to secure the code functionality and correctness of the code.

42 minutes ago, Dalija Prasnikar said:

Your proposal would again defeat the purpose of having -1 as speed optimization.

No, it is not, it will prefect it and just complete the design.

43 minutes ago, Dalija Prasnikar said:

Comparing some integer value with -1 is inherently faster than increasing/decreasing reference count.

You can NOT be more wrong about this assumption, the problem is not the compare itself but what comes after, the branching, your assumption will be true on the CPU's last made in 1997.

50 minutes ago, Dalija Prasnikar said:

Not everyone is willing to sacrifice speed because once in a blue moon some developers might shoot themselves in the foot.

Sure, but this is for something we write code for, like object so we must check assigned() first then write the code, we can't and shouldn't be managing simple types the solemnly is the compiler responsibility, this should be de facto for any language, also the checks to protect against broken code should be provided by the compiler as built-in feature.

I broke my word to not comment anymore, but these above are personal advice for you, read what i wrote and if the idea is not clear then we discuss mechanism and clear things, not simply throw any idea away, because it is discussing taboo based on our biased and wrong knowledge, and of course our refusal to to agree it can be better.

Really sorry for wasting your time.

balabuev · January 11, 2021

1 hour ago, Kas Ob. said:

You can NOT be more wrong about this assumption, the problem is not the compare itself but what comes after, the branching

You missed the fact that increments and decrements are interlocked. So, they are slow operations.

January 11, 2021

12 minutes ago, balabuev said:

You missed the fact that increments and decrements are interlocked. So, they are slow operations.

I did not !

Why they are interlocked to begin with ?, are strings thread safe to waste 22-25 cycle on one instruction (18 on modern processors) ?

No they are not thread safe, so what the point for having interlocked operation else than waste time.

Even though, with interlocked instruction we still adding a value, a value that is either -1 or +1 after checking we are not crossing from -1 to 0, i explained that already many times.

The point is i am not introducing huge change to the functionality in the RTL, small changes that only, instead of adding 1 always, it will keep a flag bit as protection, while the required to be added code will be minimum and only done by the compiler when a var declared value will be passed to as const parameter, nothing else, this will ensure protection of that associated data or value or whatever you want to call it, also the added code is merely few instruction not everywhere as Dalija wrongly assumes.

balabuev · January 11, 2021

29 minutes ago, Kas Ob. said:

Why they are interlocked to begin with

Because otherwise, you will not be able to use same global variable of type string in different threads:

1) Even for read-only use cases, because we already described that ref-count value changes even in cases, when string char data is not modified.

2) Also for read-write use cases, even if the access to shared global string variable is protected with critical section in user's code.

And this leads to the conclusion, that any potential additional field, like LockCount will also need locking. Moreover locking of two integer fields simultaneously is a much bigger problem (imho).

Edited January 11, 2021 by balabuev

David Heffernan · January 11, 2021

21 minutes ago, balabuev said:

Also for read-write use cases, even if the access to shared global string variable is protected with critical section in user's code.

In this case, you don't need interlocked operations on the ref count.

balabuev · January 11, 2021

16 minutes ago, David Heffernan said:

In this case, you don't need interlocked operations on the ref count.

Not agree. Imagine several identical tasks running in different threads:

var
  G: string;
  CriticalSection: ...;

procedure ThreadTask;
var
  s: string;
begin
  CriticalSection.Lock;   // Protected access to shared global G variable.
  s := G;                 //
  CriticalSection.Unlock; //

  DoSomethingWithS(s);
end;

Logical viewpoint: The code above reads the string from shared global variable G into thread local variable s. Then, since s is local we can do with it anything without futher protection.

Phisical viewpoint: Even after assignment of value of G to variable s, they still point to same char memory, so, more than one thread can access it. This also applies to reference count field, which is a part of metioned shared memory.

Edited January 11, 2021 by balabuev

Sign In

The Case of Delphi Const String Parameters

Recommended Posts

balabuev 102

Share this post

Link to post

Guest

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

balabuev 102

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

Guest

Share this post

Link to post

balabuev 102

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

Guest

Share this post

Link to post

balabuev 102

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

balabuev 102

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

balabuev 102

Share this post

Link to post

Guest

Share this post

Link to post

Dalija Prasnikar 1553

Share this post

Link to post

Guest

Share this post

Link to post

balabuev 102

Share this post

Link to post

Guest

Share this post

Link to post

balabuev 102

Share this post

Link to post

David Heffernan 2482

Share this post

Link to post

balabuev 102

Share this post

Link to post

Create an account or sign in to comment

Create an account