Guest Posted February 24, 2019 I know than in Rio10.3.1 generics was enhanced, perhaps somebody knows if the generics of Dmitry are still faster? https://github.com/d-mozulyov/Rapid.Generics Share this post Link to post
Lars Fosdal 1791 Posted February 25, 2019 Interesting. Does Rapid function properly with RTTI as well? Share this post Link to post
Edwin Yip 154 Posted February 25, 2019 Does anybody knows if Rapid.Generics suffers the same 'exe bloatness' issue caused by System.Generics.Collections from RTL? refs: https://stackoverflow.com/questions/31684300/do-generic-instantiations-in-multiple-units-bloat-the-executable https://delphisorcery.blogspot.com/2014/03/why-delphi-generics-are-annoying.html Share this post Link to post
dummzeuch 1505 Posted February 25, 2019 47 minutes ago, edwinyzh said: Does anybody knows if Rapid.Generics suffers the same 'exe bloatness' issue caused by System.Generics.Collections from RTL? I also uses the compiler Generics, so my guess would be: yes. Share this post Link to post
Stefan Glienke 2002 Posted February 25, 2019 System.Generics.Collections does not cause that much of a code bloat since the refactorings in XE7 - however it still causes more than it should but that is the limitation of the compiler. I did some tests with Rapid.Generics and while they are optimized for some scenarios it was not a stellar improvement over System.Generics.Collections in 10.3. And while I was doing benchmarks of those and Spring4D collections I saw that isolated benchmarks are often very much affected by certain CPU specifics - on different CPUs depending on their (non documented) behavior of the branch predictor and of course in a microbenchmark chances are high that all code fits into at least L2 cache. 5 Share this post Link to post
Rudy Velthuis 91 Posted March 10, 2019 On 2/25/2019 at 9:11 AM, edwinyzh said: Does anybody knows if Rapid.Generics suffers the same 'exe bloatness' issue caused by System.Generics.Collections from RTL? refs: https://stackoverflow.com/questions/31684300/do-generic-instantiations-in-multiple-units-bloat-the-executable https://delphisorcery.blogspot.com/2014/03/why-delphi-generics-are-annoying.html In the latest versions, the bloat for the collections in System.Generics.Collections has been battled successfully (using the newly introduced intrinsics, which greatly reduce the amount of generated code), but at the cost of a lot of source code duplication (but note that most of that is eliminated during code generation), which is actually what generics are supposed to overcome. Code was never slow because of the bloat, the generated code in total was just larger than necessary. So no, there is not much bloat anymore (not more than if you had written the code for each type manually, re-using where possible), but there is a lot of copy-paste generics again. Share this post Link to post
Rudy Velthuis 91 Posted March 10, 2019 (edited) On 2/25/2019 at 10:23 AM, Stefan Glienke said: chances are high that all code fits into at least L2 cache. That was even the case in the old style non-enhanced generics. The locality of critical code was the same, it was just repeated too often in different parts of the program. I don't expect a big speed difference due to that (but there may have been speed differences due to better optimization in the compilers and especially in the runtime). Edited March 10, 2019 by Rudy Velthuis Share this post Link to post
David Heffernan 2345 Posted March 11, 2019 13 hours ago, Rudy Velthuis said: So no, there is not much bloat anymore It's more complex than that. Maybe for users of System.Generics.Collections. But what about those of us that write our own generic types? 3 Share this post Link to post
David Heffernan 2345 Posted March 11, 2019 13 hours ago, Rudy Velthuis said: I don't expect a big speed difference due to that (but there may have been speed differences due to better optimization in the compilers and especially in the runtime). Runtime optimisation only helps if your code relies heavily on the runtime functions that have been improved. And where are these improvements in the code emitted by the compilers? I've not seen anything. What has changed? Share this post Link to post
Rudy Velthuis 91 Posted March 11, 2019 (edited) 2 hours ago, David Heffernan said: It's more complex than that. Maybe for users of System.Generics.Collections. But what about those of us that write our own generic types? For those who write their own generics, you have two ways to do this: the naive but generic way, which can still result in code bloat, and the System.Generics.Collections way, which goes against almost every prinicple of generics, i.e. that you don't have to repeat yourself ad infinitum. I wrote about that already: The current state of generics What they did with the new intrinsics solves part of the problem for their own classes, but it certainly doesn't solve the problem for us who would like to write generics without having to worry about code bloat and without having to do a lot of "copy-and-paste generics". If they can make the compiler select different pieces of code depending on these new intrinsics, they can just as well make the compiler generate such code without us having to worry about it. That is more work, but that is how it should be. In the meantime, they should also finally fix Error Insight (not only for the new inlined vars) and make the new themed IDE a lot more responsive. Edited March 11, 2019 by Rudy Velthuis 2 Share this post Link to post
Edwin Yip 154 Posted March 11, 2019 If you don't mind me extending this topic a little bit, I think the IDE support for generic collections really needs to be enhanced - when you Ctrl + Click on a generic collection class or its member, the code editor won't take you to the code definition like it does for a non-generic class. I just tested Delphi 10.2 and it has the same flaw, not sure about 10.3. Share this post Link to post
Rudy Velthuis 91 Posted March 11, 2019 2 minutes ago, edwinyzh said: If you don't mind me extending this topic a little bit, I think the IDE support for generic collections really needs to be enhanced - when you Ctrl + Click on a generic collection class or its member, the code editor won't take you to the code definition like it does for a non-generic class. I just tested Delphi 10.2 and it has the same flaw, not sure about 10.3. That is something that needs to be addressed too, certainly, but it would not be on the top of my list. Share this post Link to post
Rudy Velthuis 91 Posted March 11, 2019 2 hours ago, David Heffernan said: And where are these improvements in the code emitted by the compilers? I've not seen anything. What has changed? I don't remember whre, but I have seen some improvements in code generation, especially in the 64 bit compiler, when I was searching for some bugs in the code generator during the FT. I did not write them down. I guess I should have. FWIW, the really optimized compilers are the Clang C++ compilers. In debug mode, their code is terribly clumsy and ugly and slow. In release mode, it is blazingly fast. It is just very hard to put a breakpoint in the release mode code and debug the executable. The few times I managed (by chance), I saw highly optimized code. Share this post Link to post
Stefan Glienke 2002 Posted March 11, 2019 (edited) Google for C++ template code bloat and you see that they also suffer from the very same problem including ridiculously long compile times and memory consumption. The suggested approach is very similar to what has been done in System.Generics.Collections. Delphi however adds some more problems into the mix like having RTTI turned on by default which is extremely problematic if you have extensive generic class hierarchies. For generic fluent APIs that return generic type which have many different type parameters because of the way the API is being used this can turn a few hundred lines unit into several megabytes dcu monsters (the size itself is not the problem but the compiler churning on them for a long time). If you multiply that with other factors it turns compiling a 330K LOC application into a minute or more while consuming close to 2 GB RAM and producing 250MB of dcus and 70MB exe. These are real numbers from our code and an ongoing refactoring of both sides - library code that contains generics (Spring4D) and calling side reduces this significantly. Edited March 11, 2019 by Stefan Glienke Share this post Link to post
David Heffernan 2345 Posted March 11, 2019 1 hour ago, Rudy Velthuis said: I don't remember where, but I have seen some improvements in code generation, especially in the 64 bit compiler I see pretty much the same code in 10.3 as produced by XE7 in my real world app, using the Windows x64 Delphi compiler. Performance is identical. Still significantly worse than could be achieved by the mainstream C++ compilers. Probably worse than what could be achieved in C#! 1 Share this post Link to post
Rudy Velthuis 91 Posted March 11, 2019 (edited) 8 hours ago, David Heffernan said: I see pretty much the same code in 10.3 as produced by XE7 in my real world app, using the Windows x64 Delphi compiler. Performance is identical. Still significantly worse than could be achieved by the mainstream C++ compilers. Probably worse than what could be achieved in C#! That may well be. I doubt that optimizations often make a difference in the runtime of a program, unless you are really running very processor-intensive code. But even then, the algorithms used are far more important than an optimized runtime. But sometimes, you really wish you could have better optimized code. I also wish we could have, unlike in any other programming languages (except assembler) access to overflow, zero, negative or carry flags. It could speed up a lot of (my PUREPASCAL) code. And if we could have the new default constructors/automatic destructors/assignment operators/copy constructors, etc. for records, much of the code that currently requires type info to initialize, finalize and copy a record could be improved manually and that would really be a Good Thing. Much of the code that I write, involving BigIntegers and BigDecimals, but also other code, could be optimized a lot. FWIW, it is possible that the optimizations I saw were rolled back too, together with the code for the above constructors etc. FWIW, there were some braindead constructs there too (and some idiotic code duplication or unnecessary nilling of local variables, etc., etc.) and the code was far from ready to be released. A delayed release would not have solved that either. ISTM the problems were much larger than originally estimated. Edited March 11, 2019 by Rudy Velthuis Share this post Link to post
David Heffernan 2345 Posted March 11, 2019 51 minutes ago, Rudy Velthuis said: I doubt that optimizations often make a difference in the runtime of a program, unless you are really running very processor-intensive code. But even then, the algorithms used are far more important than an optimized runtime. I'm not talking about the runtime. I'm talking about the code emitted by the compiler when it compiles my code. Performance is critical for my program. Some of the critical sections of code I translated to C in order to reap significant performance benefits of having a good optimising compiler. So yes, this is a real issue. Share this post Link to post
Rudy Velthuis 91 Posted March 11, 2019 (edited) 58 minutes ago, David Heffernan said: I'm not talking about the runtime. I'm talking about the code emitted by the compiler when it compiles my code. Performance is critical for my program. Some of the critical sections of code I translated to C in order to reap significant performance benefits of having a good optimising compiler. So yes, this is a real issue. OK, so you are one of the few exceptions for whom performance is *always* critical. Most of the code I see is mainly non-critical, but may have some critical parts. I generally resort to assembler (if possible, i.e. on Windows, 32 bit or 64 bit target), although I always try to have a PUREPASCAL backup and I optimize that as much as I can. And, as I said, it may well be that some of these optimizations were rolled back. I know I have seen some and I was quite surprised to see them. Edited March 11, 2019 by Rudy Velthuis Share this post Link to post
David Heffernan 2345 Posted March 11, 2019 36 minutes ago, Rudy Velthuis said: I generally resort to assembler (if possible, i.e. on Windows, 32 bit or 64 bit target), although I always try to have a PUREPASCAL backup and I optimize that as much as I can. Practical for small amounts of code that is seldom modified. Of course good compilers typically produce better code than humans can manage. Share this post Link to post
Lars Fosdal 1791 Posted March 12, 2019 21 hours ago, Rudy Velthuis said: That is something that needs to be addressed too, certainly, but it would not be on the top of my list. I literally do that failing Generics Ctrl-Click several times a day. I guess I am a slow learner 😛 I also wish that the Insight mechanism would deduce the most likely class types in scope and let me select one as jump target, and not ALWAYS send me to the virtual/abstract declarations of the base class. Share this post Link to post
Rudy Velthuis 91 Posted March 12, 2019 9 hours ago, David Heffernan said: Of course good compilers typically produce better code than humans can manage. That's a commonplace, but I am not convinced. A good assembler programmer can produce better code than any optimizing compiler. Share this post Link to post
David Heffernan 2345 Posted March 12, 2019 3 hours ago, Rudy Velthuis said: That's a commonplace, but I am not convinced. A good assembler programmer can produce better code than any optimizing compiler. Very hard to find them though, and so easy to find good compilers. Share this post Link to post
microtronx 38 Posted March 12, 2019 I have tested my component for Generics-Collection with Tokyo 10.2.3 and Rio 10.3.1 with same results: Adding 100.000 data / read in order 0 to 100.000 / read 100.000 times random / free Tokyo: 1417ms / 59ms / 67ms / 234ms Tokyo with Rapid-Generics: 705ms / 49ms / 69ms / 26ms Rio: 1415ms / 59ms / 65ms / 230ms Rio with Rapid-Generics: 671ms / 51ms / 69ms / 26ms So the results between Tokyo and Rio are the same! But Rapid Generics is faster with adding (increment) and freeing (.clear) 1 Share this post Link to post
Stefan Glienke 2002 Posted March 12, 2019 Show the benchmark code please - I have seen so much flawed benchmarks (including some of my own) in the past months that I don't believe posted results anymore. Share this post Link to post
microtronx 38 Posted March 12, 2019 1 minute ago, Stefan Glienke said: Show the benchmark code please - I have seen so much flawed benchmarks (including some of my own) in the past months that I don't believe posted results anymore. Hi Stefan, here's the code: procedure TForm1.Button1Click(Sender: TObject); var vstart:tdatetime; vtest, i:longint; vtc:c__TriggerConstants; begin vtc:=c__tp(); // create my Generics-Collection-Component try Randomize; vstart:=now; for i:=0 to 1000000 do vtc.add('my'+inttostr(i), Random(10000000), ''); memo1.Lines.add('> fill '+formatdatetime('hh:nn:ss:zzz', now-vstart)); vstart:=now; for i:=0 to 100000 do vtest:=vtc.getvalue('my'+inttostr(i), -1); memo1.Lines.add('> get sort '+formatdatetime('hh:nn:ss:zzz', now-vstart)); vstart:=now; for i:=0 to 100000 do vtest:=vtc.getvalue('my'+inttostr(random(100000)), -1); memo1.Lines.add('> get random '+formatdatetime('hh:nn:ss:zzz', now-vstart)); vstart:=now; finally freeandnil(vtc); memo1.Lines.add('> free '+formatdatetime('hh:nn:ss:zzz', now-vstart)); end; end; And yes, i create 1.000.000 elements and later read only 100.000 .. 😉 The code was compiled two times with switching directives: {$ifNdef mxHtScripter_RapidGenerics} System.Generics.Collections, {$else} Rapid.Generics, {$endif} Share this post Link to post