Jump to content

Anders Melander

Members
  • Content Count

    2998
  • Joined

  • Last visited

  • Days Won

    167

Everything posted by Anders Melander

  1. Anders Melander

    12.1 requiring me to uninstall 12.0?

    Just answer No to the question "Do you want to remove all RAD Studio 12 entries from your registry?" and you should be fine (depending on your definition of "fine"). The Delphi installer has always been a case study in poor usability. This time it took me 4 tries before I got everything I wanted installed. First time I forgot to change the install location. I do this pretty much every other time. Second time I installed Android platform support but didn't select to also install the (apparently required) JDK. Third time I discovered that since didn't explicitly select to install Windows platform support, I only got the 32-bit compiler.
  2. Anders Melander

    NEW INDY REPO (CLONE) FOR ATHENS 12.1 UPDATES

    That's not how you do it. I doubt Remy will have time or motivation to look at whatever it is you have done. Fork the original repository. Make a branch. Apply your changes. Create a pull request to have your changes merged into the original repository. 2-4 should be done once per separate issue. A single issue that makes 20 unrelated changes probably has zero chance of being accepted.
  3. Anders Melander

    Delphi 12.1 is available

    That would have had zero impact on their self-hosted Jira since its support expired 7 years ago anyway. I'm not saying I don't understand why they would want to migrate to something else, but the reason you state ain't it.
  4. Anders Melander

    Delphi 12.1 is available

    The new Atlassian slogan
  5. Anders Melander

    Delphi 12.1 is available

    Yes, I fear it will. While I haven't used it yet and there's presently nothing there, I'm really afraid this just wiped out yet another small community. Tick tock, tick tock.
  6. Anders Melander

    Delphi 12.1 is available

    aaaaaaaand it sucks. But at least, from reading the announcement, it appears they know it sucks.
  7. Anders Melander

    Delphi 12.1 is available

    You have been able to do that in Delphi for 10-15 years - Just not with both editors docked.
  8. Anders Melander

    Do you need an ARM64 compiler for Windows?

    With in-process COM you'd have all the same problems without any of the benefits of the BPL/DLL packages. With out-of-process COM you'd have the benefit of process separation but you would have to surface the whole TControl and design-time API bidirectionally. It would be a nightmare. Also each process would contain their own linked-in copy of the RTL/VCL. And forget about shoehorning this into the existing IDE; It would have to be rewritten from scratch. No thanks.
  9. Anders Melander

    Do you need an ARM64 compiler for Windows?

    Yes, that much was clear. So how do you accomplish that?
  10. You can use OpenCV on Android and iOS. Haven't you researched this at all?
  11. Anders Melander

    Do you need an ARM64 compiler for Windows?

    No more than any other major upgrade of Delphi. How would you have designed the system then?
  12. Anders Melander

    globalSize()

    I think you can safely assume that: GlobalSize returns the correct size. Any additional memory it might return (which I don't believe it will) will be zeroed. I've been working with drag/drop and the clipboard through COM for 25 years and none of my tools has any handling of extra data and they assume that GlobalSize returns the requested size (which is has so far).
  13. Anders Melander

    globalSize()

    GlobalSize isn't useless just because it doesn't behave the way you want it to or expect it to. From what I can tell you can only rely on the returned size to be >= requested size. Regardless of whatever it might return right now on your system. That said, I can't remember having GlobalSize return a value I didn't expect - but I might also just have forgotten about it. Anyway, if you are reading data from the clipboard then you can try to request the TYMED_ISTREAM medium instead of TYMED_HGLOBAL. If you are lucky the returned IStream will report the correct size. I doubt it though; I think the IStream is just a wrapper around a HGLOBAL.
  14. Anders Melander

    globalSize()

    What mistake is that? It's by design. Did you not read the articles you linked to?
  15. Anders Melander

    Regression - Delphi 12 - IsZero()

    Opinions are easy. Code... a bit harder.
  16. Anders Melander

    Bug in TButton with Multi-Line Caption?

    One did
  17. Anders Melander

    x87 vs SSE single truncation

    So I have the following function which is supposed to truncate a Single using the SSE CVTTSS2SI instruction. Pretty simple except for all the MXCSR fluff. Yes, I know I could just use the SSE4.1 ROUNDSS instruction, which does all of the below in a single instruction, but that's not relevant to this. Anyway, the problem is that my function doesn't always agree with System.Trunc (which is implemented with the x87 instruction FISTP). I guess that is to expected in some case due to the difference in precision (80 vs 32 bits) but as far as I can tell that is not the problem I'm encountering here - and I would also only expect it to manifest as a problem in rounding and not truncation. Specifically I have the value -2343.5 System.Trunc(-2343.5) = -2343 FastTrunc(-2343.5)=-2344 Given that truncation is supposed to round towards zero, I believe that System.Trunc is correct. But then why is CVTTTSS2SI not doing that? function FastTrunc_SSE2(Value: Single): Integer; var SaveMXCSR: Cardinal; NewMXCSR: Cardinal; const // SSE MXCSR rounding modes MXCSR_ROUND_MASK = $FFFF9FFF; MXCSR_ROUND_NEAREST = $00000000; MXCSR_ROUND_DOWN = $00002000; MXCSR_ROUND_UP = $00004000; MXCSR_ROUND_TRUNC = $00006000; asm XOR ECX, ECX // Save current rounding mode STMXCSR SaveMXCSR // Load rounding mode MOV EAX, SaveMXCSR // Do we need to change anything? TEST EAX, MXCSR_ROUND_DOWN JNZ @SetMXCSR TEST EAX, MXCSR_ROUND_UP JZ @SkipSetMXCSR // Skip expensive LDMXCSR @SetMXCSR: // Save current rounding mode in ECX and flag that we need to restore it MOV ECX, EAX // Set rounding mode to truncation AND EAX, MXCSR_ROUND_MASK OR EAX, MXCSR_ROUND_TRUNC // Set new rounding mode MOV NewMXCSR, EAX LDMXCSR NewMXCSR @SkipSetMXCSR: {$if defined(TARGET_x86)} MOVSS XMM0, Value {$ifend} // Round/Trunc CVTSS2SI EAX, XMM0 // Restore rounding mode // Did we modify it? TEST ECX, ECX JZ @SkipRestoreMXCSR // Skip expensive LDMXCSR // Restore old rounding mode LDMXCSR SaveMXCSR @SkipRestoreMXCSR: end;
  18. Anders Melander

    A gem from the past (Goto)

    True, but don't say that out loud. sEE, i'M uSiNg GoToS; iM A hiGhLy SkIlLeD pRoGraMmEr!
  19. Anders Melander

    x87 vs SSE single truncation

    Sounds like premature optimization 🙂 I'm doing graphics so memory bandwidth is always going to be a bottleneck. The first goal then is to use the correct algorithms and update as little as possible (thus minimizing the impact of that bottleneck) and then do everything else as fast as possible. Round and Trunc are used a lot for some operations and while replacing them with something faster might not yield much in most situations they are significant components in some performance scenarios. Also, my goal wasn't really to create a killer Round/Trunc function. I just wound up there because I needed to isolate the functionality when it didn't behave as I expected.
  20. Anders Melander

    x87 vs SSE single truncation

    No. I'm working in Single precision so there's no type conversion going on. That said, I have implemented overloads for both Single and Double and the single and double instructions performs exactly the same.
  21. Anders Melander

    x87 vs SSE single truncation

    Ah, our old enemy: Copy/paste
  22. Anders Melander

    x87 vs SSE single truncation

    I just looked at my unit test of FastTrunk and I wondered why I was running the tests with different values of MXCSR set - and then I remembered why I chose to use ROUNDSS instead of CVTTSS2SI... The Intel documentation on CVTTSD2SI states: So I assumed that CVTTSS2SI behaved the same way and opted against having to fiddle with MXCSR in order to guarantee truncation. Well, it turns out that it does behave the same way; The documentation wrong. How about that.
  23. Anders Melander

    x87 vs SSE single truncation

    I couldn't find a function for disabling the Efficiency-cores in your public source... so I wrote one (yes, I'm procrastinating again): // Set process affinity to exclude efficiency cores function SetPerformanceAffinityMask(Force: boolean = False): boolean; procedure RestoreAffinityMask; https://github.com/graphics32/graphics32/blob/3c239b58b063892b20063e8735de5360ef9fb5be/Source/GR32_System.pas#L102 Now I just need a CPU that can actually utilize it 😕 By the way, your previous post lead me to this: https://www.uops.info/table.html Much easier to use than Agner Fog's tables and also appears to be more up to date. Now I'm thinking about how to get that info integrated into the Delphi debugger... and maybe throw in the data from Félix Cloutier's x86 reference. I guess that is also where godbolt gets its reference info from. Oh wait; There I go again. Better get back to work now.
  24. Anders Melander

    x87 vs SSE single truncation

    Unfortunately it isn't up to date. For example, your processor architecture (Raptopr Lake/Raptor Cove) isn't in there. And, unless you're Peter Cordes and have all this info in your head, it's often too time consuming to compare the timings of each instruction for each of the relevant architectures. And then there's execution units, pipelines, fusing and stuff I don't even understand to consider. Somebody train an AI to figure this sh*t out for me. I seem to remember that VTune had a static code analyzer with all this information built in, many, many versions ago, but I think that's gone now. Random returns a Double so there conversion from that to Single but that is the same for all the functions. There's no implicit conversion beyond that; If I'm passing a Single to a function that takes a Single argument then that value stays a Single. Passed on the stack for x86 and in XMM0 for x64. I have {$CODEALIGN 16} in an include file as I need it elsewhere for SIMD aligned loads. Yes; Your x64 results are pretty wonky. ROUNDSS+CVTSS2SI should be faster than CVTSS2SD+CVTTSD2SI. Actually, ROUNDSS+CVTSS2SI has a slightly higher latency (8+6) than CVTSS2SD+CVTTSD2SI (5+6).
  25. Anders Melander

    x87 vs SSE single truncation

    By the way, the reason why the RTL Trunc is slower is probably because it's only been implemented for Double; There is no overload for Single so it always incurs the overhead of Single->Double conversion. The x64 version is implemented with a single CVTTSD2SI instruction while the x86 version uses x87. Also, since the RTL Trunc is implemented as assembler it cannot be inlined and on x86 Delphi always pass Single params on the stack even though they would fit in a general register. This levels the playing field and makes a faster alternative worthwhile. It's beyond me why they haven't implemented basic numerical functions such as Trunc, Round, Abs, etc. as compiler intrinsics so we at least can get them inlined.
×