Jump to content
Guest

Rapid generics

Recommended Posts

Well this still does not tell anything about how the list is being used, What their element type is or if they have a change notification attached (because all that influences the performance).

I am asking because I have been spending quite some time looking into the rapid.generics code and the 10.3 system.generics.collections improvements and did some own for spring4d.

 

And what makes testing with isolated (micro)benchmarks kinda difficult is the fact that often enough the hardware effect kicks in and shows you big differences that are just there in this benchmark but in real code completely irrelevant or even contradictory.

  • Like 1

Share this post


Link to post
1 minute ago, Stefan Glienke said:

Well this still does not tell anything about how the list is being used, What their element type is or if they have a change notification attached (because all that influences the performance).

I am asking because I have been spending quite some time looking into the rapid.generics code and the 10.3 system.generics.collections improvements and did some own for spring4d.

 

And what makes testing with isolated (micro)benchmarks kinda difficult is the fact that often enough the hardware effect kicks in and shows you big differences that are just there in this benchmark but in real code completely irrelevant or even contradictory.

I'm using System.Generics.Collections so far and was only interested in comparing my component with rapid.generics. My "small" benchmark does not say anything in detail but it shows, that the Rapid is doing create and free faster. 

Before this can be used the source should be checked for threadsafe etc ... 

 

Our component stores a lot of information in a "tDictionary<string, __Record>;" where the __Record is defined as 

    __Record = record
    public
        isObject, doFreeObject:boolean;
        vname:string;
        vname_original:string;
        vname_2, vname_3, vname_4:string;
        vwert:variant;
        vdescription:string;
        vcomponent:tobject;

        vdtstamp:tdatetime;
    end;

 

Important to see was only the speed-difference on my machine, and it is a noticeable difference with Rapid.Generics so the programmer is doing something right in my opinion. 

  • Like 1

Share this post


Link to post

Storing records in collections is a delicate topic as usually the code is a bit more complex than just storing integers or pointers. Even more so if the record has managed types like string.

In this case the simple fact of having a local variable of T inside your generic code and doing one assignment more than necessary (because for example passing some olditem to a notification) might cause a severe slow-down.

If you put such code into a single method it does the stack cleanup and finalization for that variable in all cases even if there is no notification to be called.

 

The code in Rapid.Generics is using some shortcuts and produces even more convoluted code than System.Generics.Collections has since its refactoring in XE7.

It does for example not use TArray<T> as backing storage for its list but pure pointer math. It also does not zero memory for this array which buys some speed by not doing the round trip to those all the code happening in System.DynArraySetLength - especially for managed types. That buys a bit of performance when adding items - especially if you don't set the capacity before.

 

I know that the RTL collections had a ton of bugs caused by that refactoring as certain typekinds suddenly were not handled properly - I don't see any unit tests for Rapid.Generics though so I would not say that they are working for all kinds of types that you might store in those lists.

 

As for the specific case of Clear taking longer in the RTL collections that is another optimization being done in Rapid.Generics where it simply cleans up the memory and is done whereas the RTL runs through some extra code which is not necessary if there is no OnChange attached to the list.

 

 

Edit: I looked into the Rapid.Generics code for records and it maintains its own mechanism to cleanup any managed fields with a small performance improvement if the field is actually empty. This causes a major speedup if you are doing some benchmarks with empty records but I guess with real data this won't give much. I tested with a small record with 2 string fields and an integer and when they were empty the Clear call was very fast compared to the RTL list but not so much anymore when the fields contaied some strings that it had to cleanup.

Edited by Stefan Glienke
  • Like 2

Share this post


Link to post
7 hours ago, David Heffernan said:

Very hard to find them though, and so easy to find good compilers. 

It is very hard to find good compilers I like. I can get along with C and C++, but I don't like them.

I love Delphi, but it certainly needs better optimization (and fewer bugs).

I like assembler too, because it allows me to do almost everything I want. If I want optimization, it is my responsibility. If I have bugs, it was me (well, or the libraries I use).

Share this post


Link to post

Write a better _FinalizeArray routine for tkRecord then please as the current implementation is pretty terrible as it keeps calling _FinalizeRecord in a loop which again calls _FinalizeArray with ElemCount 1.

That contributes to the slowness of Generics.Collections if you have records in a list.

 

I did some patching today and added an ElemCount parameter to _FinalizeRecord to only call that once per _FinalizeArray - but did that in pure pascal.

Share this post


Link to post
18 minutes ago, Rudy Velthuis said:

It is very hard to find good compilers I like.

You mean language rather than compiler. 

Share this post


Link to post
1 minute ago, Stefan Glienke said:

Write a better _FinalizeArray routine for tkRecord then please as the current implementation is pretty terrible as it keeps calling _FinalizeRecord in a loop which again calls _FinalizeArray with ElemCount 1.

That contributes to the slowness of Generics.Collections if you have records in a list.

 

I did some patching today and added an ElemCount parameter to _FinalizeRecord to only call that once per _FinalizeArray - but did that in pure pascal.

Yes, we could perhaps write a better _InitializeArray, _FinalizeArray and _CopyRecord and patch these, but that is not good enough,

 

We don't want these to be called at all, no matter how good they are. Because they still need to loop through a lot of type info to find out which parts of a record must be treated how. That is why the new constructors/destructors etc. would be so cool: you can do your own initialization, finalization and copying manually, i.e. your code knows what to do with which field and doesn't have tediously to loop through type info.

 

That should make records a lot faster (if done right). You could even do your own reference counted types and your own dynarrays and what not. You could have a dynarray for integers only, completely with refcounts, but it would not have to initialize much, nor would it have to finalize the integers at all, etc. It would not be dependent on type info. And it would still be like the dynarrays in Delphi, just faster and with methods.

Share this post


Link to post
1 minute ago, David Heffernan said:

You mean language rather than compiler. 

Well, no compiler without a language. If I don't like the language, it is very unlikely I will gladly use a compiler for it.

Share this post


Link to post
32 minutes ago, Rudy Velthuis said:

Well, no compiler without a language. If I don't like the language, it is very unlikely I will gladly use a compiler for it.

I don't much care what you like, or don't like.

 

My point was that there exist plenty of compilers that can emit optimised code that is exceedingly efficient, and extremely hard to beat by humans writing code themselves. 

  • Like 1

Share this post


Link to post
1 hour ago, Rudy Velthuis said:

That is why the new constructors/destructors etc. would be so cool: you can do your own initialization, finalization and copying manually, i.e. your code knows what to do with which field and doesn't have tediously to loop through type info.

Why would we want to finalize records manually? What a terrible retrograde step. 

Share this post


Link to post
1 hour ago, Rudy Velthuis said:

We don't want these to be called at all, no matter how good they are. Because they still need to loop through a lot of type info to find out which parts of a record must be treated how. That is why the new constructors/destructors etc. would be so cool: you can do your own initialization, finalization and copying manually, i.e. your code knows what to do with which field and doesn't have tediously to loop through type info.

But we have the slow RTL routines that can easily be improved - we don't have the new ctors and dtors and actually I would not hold my breath for them to be implemented as you imagine them to - they will be driven by additional typeinfo (look into 10.3 System.pas where you can find all the relevant code for that feature because they just disabled it inside the compiler but did not revert the necessary RTL code).

And still if you have an array of those record it would still have to loop through the array and call the dtor for every single item regardless. An optimized version of FinalizeArray/Record can just shift pointers over the array and do the cleanup - even if using the records managed field table - that is just a simple record structure, nothing fancy. Putting everything into nested loops and calls regardless the fact if the fields are even filled with something is what makes the current version slow - that is as I mentioned before what makes Rapid.Generics faster on its Clear.

  • Like 1

Share this post


Link to post
15 hours ago, Rudy Velthuis said:

It is very hard to find good compilers I like. I can get along with C and C++, but I don't like them.

I love Delphi, but it certainly needs better optimization (and fewer bugs).

I like assembler too, because it allows me to do almost everything I want. If I want optimization, it is my responsibility. If I have bugs, it was me (well, or the libraries I use).

Have you looked in to Free Pascal Compiler? It is open source, so you can tweak output assembler as you like, but in the same time it uses Delphi language, which we all love.

Share this post


Link to post
37 minutes ago, Микола Петрівський said:

Have you looked in to Free Pascal Compiler? It is open source, so you can tweak output assembler as you like, but in the same time it uses Delphi language, which we all love.

Of course I have. But first I don't like the old-style IDE (Lazarus) and FPC is not nearly on the same level as Delphi, despite Delphi's bugs and other shortcomings.

 

And yes, it is open source, but I am not inclined to get acquainted with the complete compiler/linker/code generator source code so I may be able to tweak the output. The argument "it is open source so you can change it if you like" only works for rather trivial projects, IMO. I use a lot of open source, but I don't want to tweak any of it by browsing though the usually vast number of hard-to-read source files. And my level of expertise is not good enough to be of any help either.

Edited by Rudy Velthuis
  • Like 1

Share this post


Link to post
14 hours ago, Stefan Glienke said:

But we have the slow RTL routines that can easily be improved - we don't have the new ctors and dtors and actually I would not hold my breath for them to be implemented as you imagine them to - they will be driven by additional typeinfo (look into 10.3 System.pas where you can find all the relevant code for that feature because they just disabled it inside the compiler but did not revert the necessary RTL code).

And still if you have an array of those record it would still have to loop through the array and call the dtor for every single item regardless. An optimized version of FinalizeArray/Record can just shift pointers over the array and do the cleanup - even if using the records managed field table - that is just a simple record structure, nothing fancy. Putting everything into nested loops and calls regardless the fact if the fields are even filled with something is what makes the current version slow - that is as I mentioned before what makes Rapid.Generics faster on its Clear.

We have slow RTL routines that are mainly slow because they have to read all that type info before they can do what they are supposed to do. Sure, some of it can be improved, but that is not of much help.

 

The new ctors etc. can work as I said, and I have seen them work that way (no need for type info whatsoever) before they were removed again. Again, if done right, they will not have to rely on type info anymore and then the door is open for many improvements that do not need patching the runtime or the compiler.

 

I am personally against patching anything, as there often is no need. But I am all for better ways to handle records and arrays without type info.

 

And yes, the constructor of each record in a large array must be called, if there is such a constructor. That is the same as in C++, and there it works remarkably well, especially if this can be inlined and if the call can be eliminated if it is empty. And indeed, an optimized version of InitializeArray etc. could simply do the entire array at once (e.g. nil it out). That should be enhanceable too, sure. But that is InitializeArray, not InitializeRecord.

 

But I don't quite understand: "regardless the fact if the fields are even filled with something". Do you mean "evenly", i.e. do you mean a record that has, for instance, exactly two or more string fields and can therefore be cleared much more efficiently than the current way of looping through the entire type info for each field?

Share this post


Link to post
15 hours ago, David Heffernan said:

Why would we want to finalize records manually? What a terrible retrograde step. 

I would want the ability to do things manually (by giving the record a destructor -- if I don't write a destructor, then the runtime will do its usual thing) and forego the rather inefficient type info queries for every field. I know which items need finalization and which type they are and can therefore do it much more efficiently than something like FinalizeRecord, which has to find these things out at runtime.

 

The ideal situation would be, of course, if the compiler automatically added (wrote) such a destructor for us, using the knowledge it has about the types in the record at compile time, instead of simply compiling in a function like FinalizeRecord and type info. But what I have seen (and which was removed again) already showed a lot of promise.

Edited by Rudy Velthuis
  • Like 1

Share this post


Link to post
15 hours ago, David Heffernan said:

I don't much care what you like, or don't like.

 

My point was that there exist plenty of compilers that can emit optimised code that is exceedingly efficient, and extremely hard to beat by humans writing code themselves. 

Sure. And yet I think well but manually written assembler beats every compiler, despite the often used "a good compiler ... etc.".

 

There are indeed a number of good, optimizing and popular compilers. I don't think they are plenty.

Share this post


Link to post
4 minutes ago, Rudy Velthuis said:

And yet I think well but manually written assembler beats every compiler, despite the often used "a good compiler ... etc.".

True years ago, but these days not so. Just put some code through godbolt and marvel at the code that it generates. You don't have to get that complicated before you see the compiler spotting optimisations that are very far from obvious. Optimisers now have deep knowledge of hardware architecture and can use that to make choices that are beyond almost all human asm programmers.

  • Like 1

Share this post


Link to post
44 minutes ago, Rudy Velthuis said:

Of course I have. But first I don't like the old-style IDE (Lazarus) and FPC is not nearly on the same level as Delphi, despite Delphi's bugs and other shortcomings.

 

And yes, it is open source, but I am not inclined to get acquainted with the complete compiler/linker/code generator source code so I may be able to tweak the output. The argument "it is open source so you can change it if you like" only works for rather trivial projects, IMO. I use a lot of open source, but I don't want to tweak any of it by browsing though the usually vast number of hard-to-read source files. And my level of expertise is not good enough to be of any help either.

I use: Inkscape, Libre office, Gimp, Blender, Audacity, Atom, etc.

 

I know how to use these programs, but don't have the expertise to improve them.

 

And the soures for these things are vast. It would take days to acquire some familiarity with them. Then I would have to find out, if I could, how to improve things. Very unlikely scenario.

  • Like 1

Share this post


Link to post
37 minutes ago, Rudy Velthuis said:

The ideal situation would be, of course, if the compiler automatically added (wrote) such a destructor for us, using the knowledge it has about the types in the record at compile time, instead of simply compiling in a function like FinalizeRecord and type info. But what I have seen (and which was removed again) already showed a lot of promise.

Well, that's exactly what I have been arguing for. It seems utterly insane to me that this task is handled at runtime when it can be handled at compile time.

 

Anyway, as I understand it the record dtor would run in addition to the RTTI based finalization code. So adding a dtor could only ever make code slower. 

Edited by David Heffernan
  • Like 2

Share this post


Link to post
1 minute ago, David Heffernan said:

True years ago, but these days not so. Just put some code through godbolt and marvel at the code that it generates. You don't have to get that complicated before you see the compiler spotting optimisations that are very far from obvious. Optimisers now have deep knowledge of hardware architecture and can use that to make choices that are beyond almost all human asm programmers.

I have put code through godbolt quite a few times already. Yes, the generated code is not bad, but can still be improved, manually.

Share this post


Link to post
1 minute ago, David Heffernan said:

Well, that's exactly what I have been arguing for. It seems utterly insane to me that this task is handled at runtime when it can be handled at compile time.

Sure, but in the meantime, as long as teh compiler doesn't do it, I would be extremely happy about the ability to take this in my own hands, with default ctor/dtor/etc. But you can't eliminate FinalizeRecord with type info right now.

 

These ctors/etc. would, as I wrote, also make things possible that are not possible, or can only be done rather awkwardly, now.

Share this post


Link to post
2 hours ago, Rudy Velthuis said:

I have put code through godbolt quite a few times already. Yes, the generated code is not bad, but can still be improved, manually.

That's not the same as starting from scratch.

 

Also, didn't you have trouble with bugs in you asm code in your bigint library? 

Edited by David Heffernan

Share this post


Link to post
6 hours ago, Rudy Velthuis said:

Of course I have. But first I don't like the old-style IDE (Lazarus) and FPC is not nearly on the same level as Delphi, despite Delphi's bugs and other shortcomings.

The IDE can be set up in the new docked style, much like Delphi. Just a matter of installing the packages for it.

Share this post


Link to post
10 hours ago, Bill Meyer said:

The IDE can be set up in the new docked style, much like Delphi. Just a matter of installing the packages for it.

Please don't hijack the conversation to offtopic.

  • Thanks 1

Share this post


Link to post
1 hour ago, Tommi Prami said:

Please don't hijack the conversation to offtopic

I second that.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×