Jump to content

Anders Melander

Members
  • Content Count

    2561
  • Joined

  • Last visited

  • Days Won

    133

Everything posted by Anders Melander

  1. Anders Melander

    TParallelArray Sort Performance...

    Wouldn't it make sense to do a CLFLUSH before the sort so it doesn't benefit from all the data already being in the cache? procedure FlushCache(Data: Pointer; Size: Integer); const CACHE_LINE_SIZE = 64; asm @NextBlock: CLFLUSH [Data + Size] SUB Size,CACHE_LINE_SIZE JGE @NextBlock end;
  2. Anders Melander

    Delphi takes 9 seconds to start/shutdown an empty application

    So get a new MB that support the CPU you'd like. In my current system I have upgraded the MB in my system 3 times, the CPU 6 times, the GPU 2 times and the PSU 2 times. Always with newer and faster models. The only thing I haven't replaced is the 20 year old case (Lian Li PC-2100B tower) but that too will go the next time. I don't really need 12 internal and 6 external storage bays anymore 🙂 and being full aluminum it's quite noisy with all the fans. Probably Windows Update. That's a repeat offender on my system.
  3. Anders Melander

    TParallelArray Sort Performance...

    But it crashed faster and fast is better, right? Right? If only there was some easy way of getting stuff like this tested before release... I mean, come on, we all make bugs but this is simply not acceptable.
  4. Anders Melander

    TParallelArray Sort Performance...

    How do you get the benchmark results ordered by parameters rather than method? The way I use spring.benchmark I get the results ordered by method and then parameters which makes it hard to compare the different methods: procedure Benchmark(BenchmarkFunc: TFunction; const Name: string); begin Spring.Benchmark.Benchmark(BenchmarkFunc, Name).RangeMultiplier(4).Ranges([Range(1024+1, 8192+13), Range(128, 5120)]).TimeUnit(kMillisecond); end; //------------------------------------------------------------------------------ begin Benchmark(BenchmarkNoTranspose32, 'MemCopy (no transpose)'); Benchmark(BenchmarkReferenceTranspose32, 'ReferenceTranspose32'); Benchmark(BenchmarkCacheObliviousTranspose32, 'CacheObliviousTranspose32'); Benchmark(BenchmarkCacheObliviousTransposeEx32, 'CacheObliviousTransposeEx32'); Benchmark(BenchmarkSuperDuperTranspose32, 'SuperDuperTranspose32'); Spring.Benchmark.Benchmark_Main; end. ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------ MemCopy (no transpose)/1025/128 0,028 ms 0,025 ms 26353 Rate=5.26859G/s MemCopy (no transpose)/4096/128 0,153 ms 0,143 ms 4480 Rate=3.66644G/s MemCopy (no transpose)/8205/128 0,616 ms 0,578 ms 1000 Rate=1.81663G/s MemCopy (no transpose)/1025/256 0,064 ms 0,049 ms 11200 Rate=5.37395G/s MemCopy (no transpose)/4096/256 0,596 ms 0,502 ms 1120 Rate=2.08783G/s MemCopy (no transpose)/8205/256 1,30 ms 1,03 ms 560 Rate=2.03463G/s MemCopy (no transpose)/1025/1024 0,590 ms 0,558 ms 1120 Rate=1.88088G/s MemCopy (no transpose)/4096/1024 2,56 ms 2,25 ms 299 Rate=1.86656G/s MemCopy (no transpose)/8205/1024 5,48 ms 4,26 ms 154 Rate=1.97165G/s MemCopy (no transpose)/1025/4096 2,63 ms 2,08 ms 345 Rate=2.01523G/s MemCopy (no transpose)/4096/4096 10,9 ms 10,0 ms 64 Rate=1.67608G/s MemCopy (no transpose)/8205/4096 23,6 ms 22,6 ms 29 Rate=1.48514G/s MemCopy (no transpose)/1025/5120 3,36 ms 2,85 ms 236 Rate=1.84339G/s MemCopy (no transpose)/4096/5120 14,3 ms 12,3 ms 56 Rate=1.70823G/s MemCopy (no transpose)/8205/5120 29,3 ms 23,8 ms 21 Rate=1.7644G/s ReferenceTranspose32/1025/128 0,345 ms 0,322 ms 2036 Rate=407.045M/s ReferenceTranspose32/4096/128 1,49 ms 1,35 ms 498 Rate=388.607M/s ReferenceTranspose32/8205/128 3,84 ms 3,07 ms 224 Rate=342.187M/s ReferenceTranspose32/1025/256 0,806 ms 0,628 ms 1120 Rate=417.974M/s ReferenceTranspose32/4096/256 3,91 ms 3,23 ms 213 Rate=324.868M/s ReferenceTranspose32/8205/256 28,4 ms 25,7 ms 28 Rate=81.8274M/s ReferenceTranspose32/1025/1024 7,00 ms 6,77 ms 90 Rate=155.018M/s ReferenceTranspose32/4096/1024 97,6 ms 93,8 ms 7 Rate=44.7392M/s ReferenceTranspose32/8205/1024 184 ms 168 ms 4 Rate=50.0207M/s ReferenceTranspose32/1025/4096 33,4 ms 32,8 ms 20 Rate=127.951M/s ReferenceTranspose32/4096/4096 367 ms 336 ms 2 Rate=49.9415M/s ReferenceTranspose32/8205/4096 720 ms 656 ms 1 Rate=51.2117M/s ReferenceTranspose32/1025/5120 44,2 ms 41,4 ms 17 Rate=126.885M/s ReferenceTranspose32/4096/5120 459 ms 391 ms 2 Rate=53.6871M/s ReferenceTranspose32/8205/5120 969 ms 781 ms 1 Rate=53.7723M/s CacheObliviousTranspose32/1025/128 0,326 ms 0,285 ms 2800 Rate=461.001M/s CacheObliviousTranspose32/4096/128 1,37 ms 1,34 ms 560 Rate=391.468M/s [...]
  5. Anders Melander

    Delphi takes 9 seconds to start/shutdown an empty application

    The CPU alone would be enough to explain the difference. I'm guessing your old CPU was a AMD Ryzen 8700G which is 5-6 times faster than your new CPU. https://www.cpubenchmark.net/compare/2962vs5836/Intel-i5-7440HQ-vs-AMD-Ryzen-7-8700G In addition, the support circuits (all the stuff that's not the CPU), being laptop components, are most likely optimized for energy efficiency rather than performance. My own laptop, a Lenovo X1 Extreme, was at the time I bought it the fastest (and most expensive 😞 ) laptop Lenovo produced but the performance is still... meh.. not impressive. The RAM will also make a difference. Windows 10 will be able to handle 16Gb better than Windows 7 did but it's still on the lower side. I think the best thing you could do with this hardware is to add more memory. Depending on the current memory configuration you should be able to upgrade to 32Gb. If I was you I would spend my effort on fixing your old system. Replace the parts that are dead. If you limit yourself to parts (CPU/MB/RAM/GPU) that are a few years old you can build a really good system on the cheap. I reluctantly upgraded my main system from Windows 7 to 10 earlier this year. I had feared that it would kill the performance of my 10+ old system (Intel i5-2500K @ 3.30Ghz, 16Gb) but it actually performs pretty good. In many cases better than the old system did.
  6. Anders Melander

    TParallelArray Sort Performance...

    Yes but that's the same for all three tests. The smaller array should still benefit the most from the cache. To me the posted results indicate that there's a lot of per-sort overhead somewhere.
  7. Anders Melander

    TParallelArray Sort Performance...

    You didn't answer any of my questions... Regardless, would you care to publish your test code so we can see how the sausages got made?
  8. Anders Melander

    TParallelArray Sort Performance...

    There's something fishy with those numbers. I would have expected the 500K test to have better throughput than the 5M and 1B tests since the 500K array can fit in the 24Mb cache while the 5M and 1B arrays cannot. Did you do a warm-up run before the benchmark to get the thread pool spun up? How many iterations did you do on each test?
  9. You mean without using {$WARN}? Have you looked in the project settings to see if there's an option there to disable it?
  10. Anders Melander

    MSQuic for Delphi ?

    Let me Google that for you... https://en.wikipedia.org/wiki/MsQuic https://github.com/microsoft/msquic/blob/main/docs/FAQ.md Enabling HTTP/3 support on Windows Server 2022 Troubleshooting HTTP/3 in http.sys
  11. Anders Melander

    Minimum Viable Product (MVP)

    *facepalm*
  12. That's a strange statement. Why are threads evil? Eric is trying to do something with a class that just wasn't designed for what he needs. That in itself doesn't make the design wrong or bad. The counter-argument is that no one knows how the functionality of a class' public API will be implemented in future. Class members are private in order to not lock a public API to a specific implementation bound to those private members; They shield the API from implementation details. You are basically arguing against the use of encapsulation.
  13. Anders Melander

    How to resolve error in Database Desktop execution in Delphi 7

    About three decades of progress. Database Desktop was, as far as I remember, a heavily trimmed down version of the Paradox front end. Borland sold Paradox to Corel in the 90s and apparently Corel, or whatever they're called these days, is still stuck with it. So if you want a query tool that support Paradox and runs on modern Windows your best bet is probably Wordperfect Office (shudder).
  14. Anders Melander

    Delphi Documentation website issues

    I guess so. If only there was some kind of advanced technology one could use to help with that. I mean it's not like this is the first time.
  15. Anders Melander

    Delphi Documentation website issues

    🤦‍♂️ If only there was some way to communicate stuff like that to their users...
  16. Anders Melander

    Minimum Viable Product (MVP)

    I haven't worked on POS systems for almost a year, so things might have changed since then, but I seem to remember that one of the components that were kinda important to the customers was the actual Point Of Sale part... Maybe ask your AI why it left that little detail out...
  17. Anders Melander

    Minimum Viable Product (MVP)

    Yes I know what MVP means but you, your customers, or your sales department, still need to define the criteria for what a "viable" product is. You can't ask us for that because we have no stake in it and we don't know your target market. If you know the needs and demands of your market there are simple methods to define a MVP but it's not a process that involves developers.
  18. Anders Melander

    Minimum Viable Product (MVP)

    That depends on how you define "viable", doesn't it? Nobody is going to be able to answer your question without additional requirements.
  19. As far as I can tell it uses GetStringType(CT_CTYPE3) and skips codepoints with the C3_NONSPACING flag or without the C3_ALPHA flag.
  20. ...also known as the single character U+00E5 (Latin Small Letter A with Ring Above) of which U+0061 U+030a is the decomposition. But, even if the input was guaranteed to be composed Unicode then it would not be safe to replace CharPrev with a "-1" without knowing the exact algorithm CharPrev uses internally. The "looks like" part is nonsense since the glyphs produced by that sequence depends on the font being used to render it but it seems like CharPrev just skips all Combining Diacritical Marks.
  21. Anders Melander

    MAP2PDB - Profiling with VTune

    It took me a bit longer than expected to get here but I believe I've finally reached the goal. The following shows VTune profiling a Delphi application, with symbol, line number and source code resolution: Download Get the source here: https://bitbucket.org/anders_melander/map2pdb/ And a precompiled exe here: https://bitbucket.org/anders_melander/map2pdb/downloads/ The source has only been tested with Delphi 10.3 - uses inline vars so it will not compile with older versions. Usage map2pdb - Copyright (c) 2021 Anders Melander Version 2.0 Parses the map file produced by Delphi and writes a PDB file. Usage: map2pdb [options] <map-filename> Options: -v Verbose output -pdb[:<output-filename>] Writes a PDB (default) -yaml[:<output-filename>] Writes an YAML file that can be used with llvm-pdbutil -bind[:<exe-filename>] Patches a Delphi compiled exe file to include a reference to the pdb file -test Works on test data. Ignores the input file Example: Configure your project linker options to output a Detailed map file. Compile the project. Execute map2pdb <map-filename> -bind Profile the application with VTune (or whatever) Known issues The -bind switch must occur after the filename contrary to the usage instructions. PDB files larger than 16Mb are not valid. This is currently by design. 64-bit PE files are not yet supported by the -bind option. As should be evident I decided not to go the DWARF route after all. After using a few days to read the DWARF specification and examine the FPC source I decided that it would be easier to leverage the PDB knowledge I had already acquired. Not that this has been easy. Even though I've been able to use the LLVM PDB implementation and Microsoft's PDB source as a reference LLVM's implementation is incomplete and buggy and the LLVM source is "modern C++" which means that it's close to unreadable in places. Microsoft's source, while written in clean C and guaranteed to be correct, doesn't compile and is poorly commented. Luckily it was nothing a few all-nighters with a disassembler and a hex editor couldn't solve. Enjoy!
  22. Anders Melander

    MAP2PDB - Profiling with VTune

    Not that I know of. The file format is completely undocumented and there's no known API to extract info from it. A bit of googling found these: https://www.delphipraxis.net/48587-dcp-format.html http://hmelnov.icc.ru/DCU/ https://gitlab.com/dcu32int/DCU32INT/-/blob/master/DCP.pas?ref_type=heads The DCU32INT project looks like it could be a stepping stone.
  23. Anders Melander

    Watch me coding in Delphi on YouTube

    Next up: How to use "with" to make your code more readable.
  24. Anders Melander

    Loading and Saving PNG into TBitmap changes the image

    The Win32 AlphaBlend API also requires alpha premultipled RGB. But just because the display API requires premultipled pixels doesn't mean that TBitmap should premultiply and then discard the source pixels. What it should have done is to create an internal premultipled copy of the source pixel data for display purpose if and when it was needed. The GR32PNG implementation can probably be adapted for FMX. It's the default PNG format handler in Graphics32: https://github.com/graphics32/graphics32/blob/master/Source/GR32_PortableNetworkGraphic.pas https://github.com/graphics32/graphics32/blob/master/Source/GR32_Png.pas
×