-
Content Count
2561 -
Joined
-
Last visited
-
Days Won
133
Everything posted by Anders Melander
-
I couldn't find a function for disabling the Efficiency-cores in your public source... so I wrote one (yes, I'm procrastinating again): // Set process affinity to exclude efficiency cores function SetPerformanceAffinityMask(Force: boolean = False): boolean; procedure RestoreAffinityMask; https://github.com/graphics32/graphics32/blob/3c239b58b063892b20063e8735de5360ef9fb5be/Source/GR32_System.pas#L102 Now I just need a CPU that can actually utilize it 😕 By the way, your previous post lead me to this: https://www.uops.info/table.html Much easier to use than Agner Fog's tables and also appears to be more up to date. Now I'm thinking about how to get that info integrated into the Delphi debugger... and maybe throw in the data from Félix Cloutier's x86 reference. I guess that is also where godbolt gets its reference info from. Oh wait; There I go again. Better get back to work now.
-
Unfortunately it isn't up to date. For example, your processor architecture (Raptopr Lake/Raptor Cove) isn't in there. And, unless you're Peter Cordes and have all this info in your head, it's often too time consuming to compare the timings of each instruction for each of the relevant architectures. And then there's execution units, pipelines, fusing and stuff I don't even understand to consider. Somebody train an AI to figure this sh*t out for me. I seem to remember that VTune had a static code analyzer with all this information built in, many, many versions ago, but I think that's gone now. Random returns a Double so there conversion from that to Single but that is the same for all the functions. There's no implicit conversion beyond that; If I'm passing a Single to a function that takes a Single argument then that value stays a Single. Passed on the stack for x86 and in XMM0 for x64. I have {$CODEALIGN 16} in an include file as I need it elsewhere for SIMD aligned loads. Yes; Your x64 results are pretty wonky. ROUNDSS+CVTSS2SI should be faster than CVTSS2SD+CVTTSD2SI. Actually, ROUNDSS+CVTSS2SI has a slightly higher latency (8+6) than CVTSS2SD+CVTTSD2SI (5+6).
-
By the way, the reason why the RTL Trunc is slower is probably because it's only been implemented for Double; There is no overload for Single so it always incurs the overhead of Single->Double conversion. The x64 version is implemented with a single CVTTSD2SI instruction while the x86 version uses x87. Also, since the RTL Trunc is implemented as assembler it cannot be inlined and on x86 Delphi always pass Single params on the stack even though they would fit in a general register. This levels the playing field and makes a faster alternative worthwhile. It's beyond me why they haven't implemented basic numerical functions such as Trunc, Round, Abs, etc. as compiler intrinsics so we at least can get them inlined.
-
Yes it is but for some reason CVTTSS2SI is not always faster than CVTSS2SI. I'm not sure that I can trust the benchmarks though. The results does seem to fluctuate a bit. Here are the different versions (TFloat = Single): function Trunc_Pas(Value: TFloat): Integer; begin Result := Trunc(Value); end; function FastTrunc_SSE2(Value: TFloat): Integer; asm {$if defined(CPUX86)} MOVSS XMM0, Value {$ifend} CVTTSS2SI EAX, XMM0 end; function SlowTrunc_SSE2(Value: TFloat): Integer; var SaveMXCSR: Cardinal; NewMXCSR: Cardinal; const // SSE MXCSR rounding modes MXCSR_ROUND_MASK = $FFFF9FFF; MXCSR_ROUND_NEAREST = $00000000; MXCSR_ROUND_DOWN = $00002000; MXCSR_ROUND_UP = $00004000; MXCSR_ROUND_TRUNC = $00006000; asm XOR ECX, ECX // Save current rounding mode STMXCSR SaveMXCSR // Load rounding mode MOV EAX, SaveMXCSR // Do we need to change anything? MOV ECX, EAX NOT ECX AND ECX, MXCSR_ROUND_TRUNC JZ @SkipSetMXCSR // Skip expensive LDMXCSR @SetMXCSR: // Save current rounding mode in ECX and flag that we need to restore it MOV ECX, EAX // Set rounding mode to truncation AND EAX, MXCSR_ROUND_MASK OR EAX, MXCSR_ROUND_TRUNC // Set new rounding mode MOV NewMXCSR, EAX LDMXCSR NewMXCSR @SkipSetMXCSR: {$if defined(CPUX86)} MOVSS XMM0, Value {$ifend} // Round/Trunc CVTSS2SI EAX, XMM0 // Restore rounding mode // Did we modify it? TEST ECX, ECX JZ @SkipRestoreMXCSR // Skip expensive LDMXCSR // Restore old rounding mode LDMXCSR SaveMXCSR @SkipRestoreMXCSR: end; function FastTrunc_SSE41(Value: TFloat): Integer; const ROUND_MODE = $08 + $03; // $00=Round, $01=Floor, $02=Ceil, $03=Trunc asm {$if defined(CPUX86)} MOVSS xmm0, Value {$ifend} ROUNDSS xmm0, xmm0, ROUND_MODE CVTSS2SI eax, xmm0 end; And here are the benchmark results from my 10 year old Core i5-2500K @3.3 desktop system. x86 results x64 results Meh... but at least they are all consistently faster than Trunc - Unless I test on my laptop with a Core i7-8750H CPU @2.2 x86 results on battery x86 results on mains Yes, I know it's the result of my power saving profile throttling the CPU but it's interesting that it makes the x87 math so much faster than the SIMD math. Here's the benchmark code for completeness: procedure BM_FastTrunc(const state: TState); begin var FastTruncProc: TFastRoundProc := TFastRoundProc(state[0]); for var _ in state do begin RandSeed := 0; for var i := 1 to 1000*1000*1000 do begin FastTruncProc(Random(i) / i); end; end; end; const FastTruncs: array[0..3] of record Name: string; Proc: TFastRoundProc; end = ( (Name: 'Trunc'; Proc: Trunc_Pas), (Name: 'FastTrunc_SSE2'; Proc: FastTrunc_SSE2), (Name: 'FastTrunc_SSE41'; Proc: FastTrunc_SSE41), (Name: 'SlowTrunc_SSE2'; Proc: SlowTrunc_SSE2) ); begin for var i := 0 to High(FastTruncs) do Spring.Benchmark.Benchmark(BM_FastTrunc, 'FastTrunc').Arg(Int64(@FastTruncs[i].Proc)).ArgName(FastTruncs[i].Name).TimeUnit(kMillisecond); Spring.Benchmark.Benchmark_Main; end.
-
This one maybe: https://blog.synopse.info/?post/2011/04/14/Enhanced-logging-in-SynCommons via: https://stackoverflow.com/a/7216537/2249664
-
The accepted answer to that question doesn't involve exceptions; It redirects the assertion handler to another function which then has access to the unit name and line number.
-
No, it's just begun. Okay, I'll stop now. I think I got my point across 🙂
-
Yeah but have you looked into install.bat (warning: have the suicide prevention hotline on speed dial if you do): :: compile installer echo. echo =================================================================== echo Compiling JediInstaller... build\dcc32ex.exe %INSTALL_VERBOSE% --runtime-package-rtl --runtime-package-vcl -q -dJCLINSTALL -E..\bin -I..\source\include -U..\source\common;..\source\windows JediInstaller.dpr if ERRORLEVEL 1 goto FailedCompile :: New Delphi versions output "This product doesn't support command line compiling" and then exit with ERRORLEVEL 0 if not exist ..\bin\JediInstaller.exe goto FailedCompile echo. echo =================================================================== echo Launching JCL installer... ::start ..\bin\JediInstaller.exe %* if not exist ..\bin\JCLCmdStarter.exe goto FailStart ..\bin\JCLCmdStarter.exe ..\bin\JediInstaller.exe %* if ERRORLEVEL 1 goto FailStart goto FINI
-
I'm not sure but I think it needs to be installed using some sort of installer which then generates the include file based on something, something, whatever, at this point I gave up and deleted everything.
-
That's funny. Almost all my problems with JCL and JVCL was caused by the fact that I installed them in the first place. Easily solvable though 🙂
-
Interesting take on Binary search (FYI)
Anders Melander replied to Tommi Prami's topic in Algorithms, Data Structures and Class Design
Wow. That's a nice resource! -
Hmm. It seems to be doing odd/even rounding: FastTrunc(0.5) = 0 FastTrunc(1.5) = 2 FastTrunc(2.5) = 2 FastTrunc(3.5) = 4 Ah, it's the fluff. I got the logic mixed up: // Do we need to change anything? TEST EAX, MXCSR_ROUND_DOWN JNZ @SetMXCSR TEST EAX, MXCSR_ROUND_UP JZ @SkipSetMXCSR // Skip expensive LDMXCSR @SetMXCSR: [...] Yet again, the duck provides the answer.
-
What new features would you like to see in Delphi 13?
Anders Melander replied to PeterPanettone's topic in Delphi IDE and APIs
Nonsense. Windows developers have been able to create professionally looking applications that for decades without the aid of layout controls. The main reason for amateurish looking applications is amateurish developers. The DevExpress layout control is tightly coupled to the rest of their library but even if it had been possible to separate it from the rest then it would be a terrible idea. Embarcadero does not have the resources or expertise to maintain and evolve something as complex as TdxLayoutControl. Just look at the state of the 3rd party libraries they already have incorporated into Delphi. I wouldn't mind a rudimentary layout control as a part of the VCL but if they can't even get something as simple as TGridPanel to work properly then I think it's better they not even try. -
Probably not but I have never tested with a map file produced by C++ Builder and it looks like the format differs slightly from that of Delphi. The segment/module list of a Delphi map file looks like this: Detailed map of segments 0001:00000000 0000FED4 C=CODE S=.text G=(none) M=System ACBP=A9 0001:0000FED4 00000C9C C=CODE S=.text G=(none) M=SysInit ACBP=A9 0001:00010B70 0000373C C=CODE S=.text G=(none) M=System.Types ACBP=A9 0001:000142AC 000007E8 C=CODE S=.text G=(none) M=System.UITypes ACBP=A9 0001:00014A94 00001E04 C=CODE S=.text G=(none) M=Winapi.Windows ACBP=A9 0001:00016898 000003A8 C=CODE S=.text G=(none) M=System.SysConst ACBP=A9 [...] As you can see there's no path in the module names. If you create a bug report at the map2pdb issue tracker and attach the map file (zipped) I will take a look at it.
-
What new features would you like to see in Delphi 13?
Anders Melander replied to PeterPanettone's topic in Delphi IDE and APIs
I think you'll have to wait a bit; It's only just entered puberty. 13 years old.... -
MAP2PDB - Profiling with VTune
Anders Melander replied to Anders Melander's topic in Delphi Third-Party
There used to be a link to download previous versions (which is how I managed to use it with Windows 7 at that time), but apparently they've removed that ability: https://community.intel.com/t5/Analyzers/where-can-I-download-an-older-version-vtune/m-p/1561574#M24281 😞 -
MAP2PDB - Profiling with VTune
Anders Melander replied to Anders Melander's topic in Delphi Third-Party
VTune only supports Intel hardware as it relies on certain CPU features that are only available on Intel CPUs. At least that what they claim: https://www.intel.com/content/www/us/en/developer/articles/system-requirements/vtune-profiler-system-requirements.html Maybe you can get an older version of VTune to work. For example the current version of VTune doesn't support hardware assisted profiling on my (admittedly pretty old) processor. -
https://stackexchange.com/sites
-
Why are you asking this question in a Delphi programming forum?
-
Better Translation Manager(BTM) and hint texts
Anders Melander replied to cstarling1989's topic in VCL
Yes, I guess that could be done; Hook into the application OnShowHint handler and suppress empty hints. -
Seeking Collaboration: Creating a Delphi Component for STM32 Boards
Anders Melander replied to techdesk's topic in General Help
Welcome to my /ignore list. Bye -
Delphi 12.0 TParallel.For performance. Threading.pas issues
Anders Melander replied to mitch.terpak's topic in General Help
Whoops. I just remembered who wrote that code 🙂 -
Delphi 12.0 TParallel.For performance. Threading.pas issues
Anders Melander replied to mitch.terpak's topic in General Help
I've heard this too, but it was many, many years ago. Well that explains the complete lack of comments. All their comments are like: // TODO : Document this, FFS! and // TODO : WFT is this shit? -
Better Translation Manager(BTM) and hint texts
Anders Melander replied to cstarling1989's topic in VCL
I'm not sure I understand you. Where would your users specify the hint texts? In BTM? If so you can configure BTM to synthesize properties in case the property "default" mechanism caused them not to be stored. This is already done for the TField.DisplayLabel property which isn't stored in the DFM if its value equals the FieldName property: So you need to specify: The control type as a regular expression. The name of the property to synthesize. Hint in your case. The synthesized value of the property. Probably just an empty string in your case. The problem here is that you probably want this done for lots of different controls so I'm not if this is feasible for you. It would of course best if you could just set the default hint texts in your base language so there's something to translate. Surely, if a control needs a hint text in one language, it needs it in all languages. -
Seeking Collaboration: Creating a Delphi Component for STM32 Boards
Anders Melander replied to techdesk's topic in General Help
Maybe start by revealing who you are, "techdesk"...