Jump to content

Eric Grange

Members
  • Content Count

    30
  • Joined

  • Last visited

  • Days Won

    1

Eric Grange last won the day on September 8 2024

Eric Grange had the most liked content!

Community Reputation

11 Good

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Eric Grange

    TParallelArray Sort Performance...

    > If I had to bet I would say you still have range checking enabled (RTL has that disabled, so does your code) Ah, no, it was on. I has spotted some "{$IFDEF RANGECHECKS_ON}" in the code and assumed Spring would control range checking, disabling it in sections where it's safe (because asserted, explicit loops on count/length, unit tested, etc. such as for a sort) ๐Ÿ˜ž However looking more closely at the Spring source that doesn't seem to be the case (f.i. Vector<T> GetItem would lose range checking globally as well), so timings with range checking off would be a bit of an oddball scenario for Spring. > Looks like I am getting twice the speed at worst and 50 times at best. That sounds impressive!
  2. Eric Grange

    TParallelArray Sort Performance...

    Hmm, definitely something odd. I used the version from https://bitbucket.org/sglienke/spring4d/src/master/ maybe it's off in some way ? Here is on i7-1165G7, Win64, optimization on, stack frames off, Spring is only marginally ahead of the RTL, I get these timings Fill array (double), elements count: 500000 Start sorting ... RTL TArray.Sort (ms.): 63 Spring TArray.Sort (ms.): 56 Fill array (double), elements count: 5000000 Start sorting ... RTL TArray.Sort (ms.): 596 Spring TArray.Sort (ms.): 574 Fill array (double), elements count: 100000000 Start sorting ... RTL TArray.Sort (ms.): 14311 Spring TArray.Sort (ms.): 13708 Quicksort non-generic is about 30% faster than Spring in all 3 tests.
  3. Eric Grange

    TParallelArray Sort Performance...

    @Stefan Glienke could you try reproducing by invoking your sort calls directly in the minimalistic benchmark code posted in the thread beginning ?
  4. Eric Grange

    TParallelArray Sort Performance...

    Looks like parallel sort largest non-crash single value array size is somewhere between 10000 and 15000 actually at @Lajos Juhรกsz Stefan actually went easy at it with a Random(16) ๐Ÿ˜‰
  5. Eric Grange

    TParallelArray Sort Performance...

    I actually used an Introsort implementation you had posted sometime ago, but with the RTL comparare ๐Ÿ™‚ However I can't really reproduce your benchmark advantage with latest Spring4D from your repository: in the same conditions as above, Spring4D's sort clocks at 609 ms for 5 million values, just 7% faster than RTL. . The code really goes in the IntroSort_Double methods and the Compare_Double comparer (I checked with the debugger), is there some option or setting required ? When the array is pre-sorted instead of random, the timings are: - RTL TArray.Sort : 265 ms - Spring4D IntroSort : 238 ms - non-generic QuickSort : 78 ms - TParallelArray.Sort: 100 ms So Spring4d gets somewhat faster compared to RTL, but not as a massively as the non-generic Quicksort. The other edge case is an array of 5 millions times the same value: - RTL TArray.Sort : 248 ms - Spring4D IntroSort : 229 ms - non-generic QuickSort : 102 ms - TParallelArray.Sort: CRASH with Stack Overflow !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Ok, this one looks like a death blow to the new parallel sort ๐Ÿ˜• For reference, this is the non-generic QuickSort copy-pasta {$R-} {$O+} {$Q-} type TDoubleArray = array [0..MaxInt div 8 - 1] of Double; PDoubleArray = ^TDoubleArray; procedure QuickSort(pArray : PDoubleArray; minIndex, maxIndex : NativeInt); var i, j, p : NativeInt; begin repeat i := minIndex; j := maxIndex; p := (i+j) shr 1; repeat var pv := pArray[p]; while pArray[i] > pv do Inc(i); while pArray[j] < pv do Dec(j); if i <= j then begin var buf := pArray[i]; pArray[i] := pArray[j]; pArray[j] := buf; if p = i then p := j else if p = j then p := i; Inc(i); Dec(j); end; until i > j; if minIndex < j then QuickSort(pArray, minIndex, j); minIndex := i; until i >= maxIndex; end;
  6. Eric Grange

    TParallelArray Sort Performance...

    @Stefan Glienke I've been testing Introsort and other variants as well, they're more resilient to edge cases, but in the typical case, I've found them to be quite underwhelming tbh in Delphi. In C++ that's different, templates vs generics + loop unrolling give a definitive edge. On a slower processor (i7-1165G7), for 5 millions values: - TArray.Sort: 641 ms - TArray.IntroSort: 673 ms - Quicksort (non generic, single-thread): 450 ms - TParallelArray.Sort: 176 ms So 3.6x speedup over TArray.Sort for a quad-core, which is fair. Overhead of generics is about 40%. On larger array sizes (like 1T), timing ratios are similar, but the whole machine became unusably slow during the TParallelArray.Sort run, so that's something to keep in mind (don't use it to sort large arrays in a GUI app where you need to keep things snappy)
  7. Encapsulation can be frustrating, but exposing everything means you're making promises. People's code will gain dependencies to everything that can be accessed or overridden. Which means you won't be able to change much (or fix) without breaking user code. Opening too much means code will sediment and become untouchable... In the case of the DX11 driver, it's obvious it was locked in the implementation section because whoever was working on it wasn't satisfied with it. Likely because he/she did not have time to tidy it up. It's essentially DX9 code with a light rewrite to DX11. I was able to hack it, but it was brittle. I'm now starting down the path of reimplementing it, which in the long term will open more possibilities (and hoping Delphi 12.2 doesn't wreak havoc on TContext3D, haha)
  8. @Kas Ob. yes, but the code has not changed in a long while AFAICT. On practical approach might be for EMBT to "officially" allow FMX.Context.XXX open-source forks. This would allow reimplementation projects to be kickstarted, and after a few iterations, there would probably be little left from the original EMBT source code anyway (at least from what I can see in the TDX11Context). The OpenGL context seems less "private", but it still has a lot of private vars in key areas. One of the first things forks would do would probably be to turn those private vars into fields, and support multiple contexts (and eventually multiple threads)
  9. @Uwe Raabe this doesn't seem to work, the compiler is creating different variables, class private vars don't appear to be relative to the TClass ? @Lajos Juhรกsz this would be a long term ticket, and given then number of private stuff one needs to crack (not just in TDX11Context, but also in TContext3D), a major undertaking for EMBT to refactor everything
  10. Hi, this is definitely on the hacky side. Is there a way to gain access to private class var of a class defined in an implementation section ? More precisely, I'm trying to gain access to TDX11Context private class vars, which is defined in the implementation section of FMX.Context.DX11, and the vertex & pixel shaders more specifically. My best attempt so far is to obtain the address of TDX11Context.DoSetShaderVariable (easy, it's virtual and just protected), and there the first line is just if (CurrentVertexShader <> nil) and (Length(FVSBuf) > 0) then begin which is simple enough to "disassemble", get the FVSBuf address, from which the other vars can be inferred. It's all quite fragile though ๐Ÿ™‚
  11. AFAICT there are no ICODE in the map file for Delphi 64 builds. I solved the issue by not having an exception raised... And found the unit that triggered the exception through "Halt" and bisection from the unit list. There were 3000+ units in the whole project, but since it was during the initialization, bisecting didn't take long (add Halt to unit initialization, run and see if Halt reached or not, bisect and repeat)
  12. Because the map file provides that information already. For instance for Win64 binaries you will see at the top of the map file lines like Detailed map of segments 0001:00000000 0001C36C C=CODE S=.text G=(none) M=System ALIGN=4 0001:0001C36C 00001944 C=CODE S=.text G=(none) M=SysInit ALIGN=4 0001:0001DCB0 00003980 C=CODE S=.text G=(none) M=System.Types ALIGN=4 0001:00021630 00000D40 C=CODE S=.text G=(none) M=System.UITypes ALIGN=4 0001:00022370 000068BC C=CODE S=.text G=(none) M=Winapi.Windows ALIGN=4 0001:00028C2C 000006BC C=CODE S=.text G=(none) M=FastMM4LockFreeStack ALIGN=4 0001:000292E8 00000028 C=CODE S=.text G=(none) M=FastMM4Messages ALIGN=4 0001:00029310 00002570 C=CODE S=.text G=(none) M=FastMM4 ALIGN=4 0001:0002B880 00000200 C=CODE S=.text G=(none) M=Winapi.WinSvc ALIGN=4 0001:0002BA80 000007E0 C=CODE S=.text G=(none) M=System.SysConst ALIGN=4 0001:0002C260 00000040 C=CODE S=.text G=(none) M=Winapi.ImageHlp ALIGN=4 0001:0002C2A0 00000020 C=CODE S=.text G=(none) M=Winapi.SHFolder ALIGN=4 0001:0002C2C0 00000760 C=CODE S=.text G=(none) M=Winapi.PsAPI ALIGN=4 And the unit order is the initialization order as well as the order in the compiled binary. For listing at runtime my code would not really work, because it required not only manually obtaining the address of the interval Context variable, but also resolving against the map file. However it's possible to bundle the MAP file in the executable, and retrieve that information, the JCL's JDBG format allows that (in addition to being useful for debug stacks) https://stackoverflow.com/questions/6019698/access-jcl-debug-information-contained-in-executable Thanks, nice to know!
  13. No, I do not have it. Do you know if it handles exception in the initialization section ? You're right it does! Haha, or in the case of the 64bit map, it's just the CODE sections. Ah well... ๐Ÿ™‚ FWIW the issue was related to the TNotificationCenter, which a unit was initializing ahead of time to avoid the incorrect app name reporting (https://en.delphipraxis.net/topic/4102-embarcadero-toaster-notification-window-caption-in-win10/), and this would sometimes fail with a Delphi 10.3 exe (the issue is not present in Delphi 11.2 afaict). The issue had been in the code for about 6 months before encountering a situation where it would be problematic with a particular combo of user rights.
  14. Ok, in case anyone encounters a similar issue, here is the ghetto method I used to obtain units initialization order. First after the "begin" of the main, program call a procedure like [ complicated approach deleted ] Just use the map file as Uwe Raabe pointed below,to get the order of initialization, the detailed segments section lists the units in order of initialization.
  15. The project has about 4000 units... this is why I would rather have a way to get a list of those units in the order InitUnits calls them, so I can bisect. (also the issue is infrequent, I have been unable to get it when debugging) If worse comes to worse, I will probably hack to get the raw InitUnits call addresses, and then resolve them to unit names with the detailed map file.
ร—