Jump to content
Guest

Rapid generics

Recommended Posts

47 minutes ago, edwinyzh said:

Does anybody knows if Rapid.Generics suffers the same 'exe bloatness' issue caused by System.Generics.Collections from RTL?

I also uses the compiler Generics, so my guess would be: yes.

Share this post


Link to post

System.Generics.Collections does not cause that much of a code bloat since the refactorings in XE7 - however it still causes more than it should but that is the limitation of the compiler.

I did some tests with Rapid.Generics and while they are optimized for some scenarios it was not a stellar improvement over System.Generics.Collections in 10.3.

 

And while I was doing benchmarks of those and Spring4D collections I saw that isolated benchmarks are often very much affected by certain CPU specifics - on different CPUs depending on their (non documented) behavior of the branch predictor and of course in a microbenchmark chances are high that all code fits into at least L2 cache.

  • Like 6

Share this post


Link to post
On 2/25/2019 at 9:11 AM, edwinyzh said:

Does anybody knows if Rapid.Generics suffers the same 'exe bloatness' issue caused by System.Generics.Collections from RTL?

 

refs:

https://stackoverflow.com/questions/31684300/do-generic-instantiations-in-multiple-units-bloat-the-executable

https://delphisorcery.blogspot.com/2014/03/why-delphi-generics-are-annoying.html

In the latest versions, the bloat for the collections in System.Generics.Collections has been battled successfully (using the newly introduced intrinsics, which greatly reduce the amount of generated code), but at the cost of a lot of source code duplication (but note that most of that is eliminated during code generation), which is actually what generics are supposed to overcome. Code was never slow because of the bloat, the generated code in total was just larger than necessary.

 

So no, there is not much bloat anymore (not more than if you had written the code for each type manually, re-using where possible), but there is a lot of copy-paste generics again.

Share this post


Link to post
Posted (edited)
On 2/25/2019 at 10:23 AM, Stefan Glienke said:

chances are high that all code fits into at least L2 cache.

That was even the case in the old style non-enhanced generics. The locality of critical code was the same, it was just repeated too often in different parts of the program. 

 

I don't expect a big speed difference due to that (but there may have been speed differences due to better optimization in the compilers and especially in the runtime).

Edited by Rudy Velthuis

Share this post


Link to post
13 hours ago, Rudy Velthuis said:

So no, there is not much bloat anymore

It's more complex than that. Maybe for users of System.Generics.Collections. But what about those of us that write our own generic types?

  • Like 3

Share this post


Link to post
13 hours ago, Rudy Velthuis said:

 

I don't expect a big speed difference due to that (but there may have been speed differences due to better optimization in the compilers and especially in the runtime).

Runtime optimisation only helps if your code relies heavily on the runtime functions that have been improved. And where are these improvements in the code emitted by the compilers? I've not seen anything. What has changed? 

Share this post


Link to post
Posted (edited)
2 hours ago, David Heffernan said:

It's more complex than that. Maybe for users of System.Generics.Collections. But what about those of us that write our own generic types?

For those who write their own generics, you have two ways to do this: the naive but generic way, which can still result in code bloat, and the System.Generics.Collections way, which goes against almost every prinicple of generics, i.e. that you don't have to repeat yourself ad infinitum. I wrote about that already: The current state of generics

 

What they did with the new intrinsics solves part of the problem for their own classes, but it certainly doesn't solve the problem for us who would like to write generics without having to worry about code bloat and without having to do a lot of "copy-and-paste generics". If they can make the compiler select different pieces of code depending on these new intrinsics, they can just as well make the compiler generate such code without us having to worry about it. That is more work, but that is how it should be.

 

In the meantime, they should also finally fix Error Insight (not only for the new inlined vars) and make the new themed IDE a lot more responsive.

Edited by Rudy Velthuis
  • Like 2

Share this post


Link to post

If you don't mind me extending this topic a little bit, I think the IDE support for generic collections really needs to be enhanced - when you Ctrl + Click on a generic collection class or its member, the code editor won't take you to the code definition like it does for a non-generic class.

 

I just tested Delphi 10.2 and it has the same flaw, not sure about 10.3.

Share this post


Link to post
2 minutes ago, edwinyzh said:

If you don't mind me extending this topic a little bit, I think the IDE support for generic collections really needs to be enhanced - when you Ctrl + Click on a generic collection class or its member, the code editor won't take you to the code definition like it does for a non-generic class.

 

I just tested Delphi 10.2 and it has the same flaw, not sure about 10.3.

That is something that needs to be addressed too, certainly, but it would not be on the top of my list.

Share this post


Link to post
2 hours ago, David Heffernan said:

And where are these improvements in the code emitted by the compilers? I've not seen anything. What has changed? 

I don't remember whre, but I have seen some improvements in code generation, especially in the 64 bit compiler, when I was searching for some bugs in the code generator during the FT. I did not write them down. I guess I should have.

 

FWIW, the really optimized compilers are the Clang C++ compilers. In debug mode, their code is terribly clumsy and ugly and slow. In release mode, it is blazingly fast. It is just very hard to put a breakpoint in the release mode code and debug the executable. The few times I managed (by chance), I saw highly optimized code.

Share this post


Link to post
Posted (edited)

Google for C++ template code bloat and you see that they also suffer from the very same problem including ridiculously long compile times and memory consumption. The suggested approach is very similar to what has been done in System.Generics.Collections.

Delphi however adds some more problems into the mix like having RTTI turned on by default which is extremely problematic if you have extensive generic class hierarchies. For generic fluent APIs that return generic type which have many different type parameters because of the way the API is being used this can turn a few hundred lines unit into several megabytes dcu monsters (the size itself is not the problem but the compiler churning on them for a long time). If you multiply that with other factors it turns compiling a 330K LOC application into a minute or more while consuming close to 2 GB RAM and producing 250MB of dcus and 70MB exe. These are real numbers from our code and an ongoing refactoring of both sides - library code that contains generics (Spring4D) and calling side reduces this significantly.

Edited by Stefan Glienke

Share this post


Link to post
1 hour ago, Rudy Velthuis said:

I don't remember where, but I have seen some improvements in code generation, especially in the 64 bit compiler

I see pretty much the same code in 10.3 as produced by XE7 in my real world app, using the Windows x64 Delphi compiler.  Performance is identical.  Still significantly worse than could be achieved by the mainstream C++ compilers. Probably worse than what could be achieved in C#!

  • Like 1

Share this post


Link to post
Posted (edited)
8 hours ago, David Heffernan said:

I see pretty much the same code in 10.3 as produced by XE7 in my real world app, using the Windows x64 Delphi compiler.  Performance is identical.  Still significantly worse than could be achieved by the mainstream C++ compilers. Probably worse than what could be achieved in C#!

That may well be. I doubt that optimizations often make a difference in the runtime of a program, unless you are really running very processor-intensive code. But even then, the algorithms used are far more important than an optimized runtime.

 

But sometimes, you really wish you could have better optimized code. I also wish we could have, unlike in any other programming languages (except assembler) access to overflow, zero, negative or carry flags. It could speed up a lot of (my PUREPASCAL) code.

 

And if we could have the new default constructors/automatic destructors/assignment operators/copy constructors, etc. for records, much of the code that currently requires type info to initialize, finalize and copy a record could be improved manually and that would really be a Good Thing. Much of the code that I write, involving BigIntegers and BigDecimals, but also other code, could be optimized a lot.

 

FWIW, it is possible that the optimizations I saw were rolled back too, together with the code for the above constructors etc. FWIW, there were some braindead constructs there too (and some idiotic code duplication or unnecessary nilling of local variables, etc., etc.) and the code was far from ready to be released. A delayed release would not have solved that either. ISTM the problems were much larger than originally estimated.

Edited by Rudy Velthuis

Share this post


Link to post
51 minutes ago, Rudy Velthuis said:

I doubt that optimizations often make a difference in the runtime of a program, unless you are really running very processor-intensive code. But even then, the algorithms used are far more important than an optimized runtime.

I'm not talking about the runtime. I'm talking about the code emitted by the compiler when it compiles my code. 

 

Performance is critical for my program. Some of the critical sections of code I translated to C in order to reap significant performance benefits of having a good optimising compiler. 

 

So yes, this is a real issue. 

Share this post


Link to post
Posted (edited)
58 minutes ago, David Heffernan said:

I'm not talking about the runtime. I'm talking about the code emitted by the compiler when it compiles my code. 

 

Performance is critical for my program. Some of the critical sections of code I translated to C in order to reap significant performance benefits of having a good optimising compiler. 

 

So yes, this is a real issue. 

OK, so you are one of the few exceptions for whom performance is *always* critical.

 

Most of the code I see is mainly non-critical, but may have some critical parts.

 

I generally resort to assembler (if possible, i.e. on Windows, 32 bit or 64 bit target), although I always try to have a PUREPASCAL backup and I optimize that as much as I can.

 

And, as I said, it may well be that some of these optimizations were rolled back. I know I have seen some and I was quite surprised to see them.

Edited by Rudy Velthuis

Share this post


Link to post
36 minutes ago, Rudy Velthuis said:

 

I generally resort to assembler (if possible, i.e. on Windows, 32 bit or 64 bit target), although I always try to have a PUREPASCAL backup and I optimize that as much as I can.

Practical for small amounts of code that is seldom modified. Of course good compilers typically produce better code than humans can manage. 

Share this post


Link to post
21 hours ago, Rudy Velthuis said:

That is something that needs to be addressed too, certainly, but it would not be on the top of my list.

I literally do that failing Generics Ctrl-Click several times a day.  I guess I am a slow learner 😛

I also wish that the Insight mechanism would deduce the most likely class types in scope and let me select one as jump target, and not ALWAYS send me to the virtual/abstract declarations of the base class. 

Share this post


Link to post
9 hours ago, David Heffernan said:

Of course good compilers typically produce better code than humans can manage. 

That's a commonplace, but I am not convinced. A good assembler programmer can produce better code than any optimizing compiler.

Share this post


Link to post
3 hours ago, Rudy Velthuis said:

That's a commonplace, but I am not convinced. A good assembler programmer can produce better code than any optimizing compiler.

Very hard to find them though, and so easy to find good compilers. 

Share this post


Link to post

I have tested my component for Generics-Collection with Tokyo 10.2.3 and Rio 10.3.1 with same results:

Adding 100.000 data / read in order 0 to 100.000 / read 100.000 times random / free

 

Tokyo: 1417ms / 59ms / 67ms / 234ms

Tokyo with Rapid-Generics: 705ms / 49ms / 69ms / 26ms

 

Rio: 1415ms / 59ms / 65ms / 230ms

Rio with Rapid-Generics: 671ms / 51ms / 69ms / 26ms

 

So the results between Tokyo and Rio are the same! But Rapid Generics is faster with adding (increment) and freeing (.clear)

  • Thanks 1

Share this post


Link to post

Show the benchmark code please - I have seen so much flawed benchmarks (including some of my own) in the past months that I don't believe posted results anymore.

Share this post


Link to post
1 minute ago, Stefan Glienke said:

Show the benchmark code please - I have seen so much flawed benchmarks (including some of my own) in the past months that I don't believe posted results anymore.

Hi Stefan, 

here's the code: 

procedure TForm1.Button1Click(Sender: TObject);
var
    vstart:tdatetime;
    vtest, i:longint;
    vtc:c__TriggerConstants;
begin
    vtc:=c__tp(); // create my Generics-Collection-Component
    try
        Randomize;
        vstart:=now;
        for i:=0 to 1000000 do vtc.add('my'+inttostr(i), Random(10000000), '');
        memo1.Lines.add('> fill '+formatdatetime('hh:nn:ss:zzz', now-vstart)); vstart:=now;

        for i:=0 to 100000 do vtest:=vtc.getvalue('my'+inttostr(i), -1);
        memo1.Lines.add('> get sort '+formatdatetime('hh:nn:ss:zzz', now-vstart)); vstart:=now;

        for i:=0 to 100000 do vtest:=vtc.getvalue('my'+inttostr(random(100000)), -1);
        memo1.Lines.add('> get random '+formatdatetime('hh:nn:ss:zzz', now-vstart)); vstart:=now;
    finally
        freeandnil(vtc);
        memo1.Lines.add('> free '+formatdatetime('hh:nn:ss:zzz', now-vstart));
    end;

end;

And yes, i create 1.000.000 elements and later read only 100.000 .. 😉

 

The code was compiled two times with switching directives:

{$ifNdef mxHtScripter_RapidGenerics} System.Generics.Collections, {$else} Rapid.Generics, {$endif}

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×