Jump to content

Stefan Glienke

Members
  • Content Count

    1497
  • Joined

  • Last visited

  • Days Won

    152

Posts posted by Stefan Glienke


  1. 36 minutes ago, Rollo62 said:

    Thanks for these enlightments.
    If Spring4D already optimizes to 14ms, how did you get the only 2nd best results at 22ms?

    Is there any configuration setting needed, to switch on or off the Hash-Algorithm, how to use Spring4D in the right way with Integers?

    The numbers before the last one are all based on the quoted code using RTL Dictionary with the tweaks I explained.

     

    There is no setting needed. When you are using collections in spring4d without an explicit comparer, there are several places where they use an optimized path without a comparer at all - for example, when using an Integer as a key in a dictionary or when sorting a list of integers.

     

    33 minutes ago, Lars Fosdal said:

    Did you guys read the article on the undergraduate that suggested new improvements to hash tables?
    Article: https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/
    "Tiny Pointers" Paper: https://arxiv.org/abs/2111.12800

    Did not read the paper, but I watched his presentation. I have to admit I only understand half of all that theory stuff - I am still waiting to see this in an actual implementation because theory-wise things are all fine, but when it comes to the actual code reality kicks in. For example, hardware wants to have a word on these different arrays, which ideally would be laid out in a contiguous block, and then you need some indexing algorithm for the subparts and so on.

    • Like 2

  2. Pscht, don't spoil the secret soup that Primoz and I have been cooking :classic_cool:

     

    As for the simd qsort - I'll let the numbers speak for themselves - that is on Win32 sorting Int32 - pdqsort is the one you can find in spring4d development branch:

     

    ----------------------------------------------------------------------------------------------
    Benchmark                                                    Time             CPU   Iterations
    ----------------------------------------------------------------------------------------------
    random/QuickSort_RTL/count:100                            1937 ns         1888 ns          100
    random/QuickSort_Simd/count:100                            945 ns          931 ns          100
    random/PDQSort/count:100                                   592 ns          577 ns          100
    random/QuickSort_RTL/count:1000                          39206 ns        39184 ns          100
    random/QuickSort_Simd/count:1000                         10290 ns        10272 ns          100
    random/PDQSort/count:1000                                 5355 ns         5342 ns          100
    random/QuickSort_RTL/count:10000                        652150 ns       651096 ns          100
    random/QuickSort_Simd/count:10000                       205577 ns       205363 ns          100
    random/PDQSort/count:10000                              200458 ns       199829 ns          100
    random/QuickSort_RTL/count:100000                      7759142 ns      7750341 ns          100
    random/QuickSort_Simd/count:100000                     2565052 ns      2560527 ns          100
    random/PDQSort/count:100000                            2416112 ns      2412229 ns          100
    random/QuickSort_RTL/count:1000000                    91422324 ns     91249522 ns          100
    random/QuickSort_Simd/count:1000000                   27870985 ns     27814505 ns          100
    random/PDQSort/count:1000000                          27731286 ns     27692879 ns          100
    few_unique/QuickSort_RTL/count:100                        1871 ns         1855 ns          100
    few_unique/QuickSort_Simd/count:100                        950 ns          930 ns          100
    few_unique/PDQSort/count:100                               501 ns          489 ns          100
    few_unique/QuickSort_RTL/count:1000                      28397 ns        28377 ns          100
    few_unique/QuickSort_Simd/count:1000                     11415 ns        11401 ns          100
    few_unique/PDQSort/count:1000                             2635 ns         2628 ns          100
    few_unique/QuickSort_RTL/count:10000                    405657 ns       405324 ns          100
    few_unique/QuickSort_Simd/count:10000                   156791 ns       156707 ns          100
    few_unique/PDQSort/count:10000                           39877 ns        39837 ns          100
    few_unique/QuickSort_RTL/count:100000                  4705581 ns      4698238 ns          100
    few_unique/QuickSort_Simd/count:100000                 2006202 ns      2002594 ns          100
    few_unique/PDQSort/count:100000                         446669 ns       446170 ns          100
    few_unique/QuickSort_RTL/count:1000000                52019783 ns     51936730 ns          100
    few_unique/QuickSort_Simd/count:1000000               25403129 ns     25348705 ns          100
    few_unique/PDQSort/count:1000000                       5053615 ns      5046242 ns          100
    two_values/QuickSort_RTL/count:100                        1626 ns         1602 ns          100
    two_values/QuickSort_Simd/count:100                        992 ns          978 ns          100
    two_values/PDQSort/count:100                               371 ns          349 ns          100
    two_values/QuickSort_RTL/count:1000                      20607 ns        20587 ns          100
    two_values/QuickSort_Simd/count:1000                     12320 ns        12314 ns          100
    two_values/PDQSort/count:1000                             1256 ns         1246 ns          100
    two_values/QuickSort_RTL/count:10000                    293260 ns       293008 ns          100
    two_values/QuickSort_Simd/count:10000                   170420 ns       170229 ns          100
    two_values/PDQSort/count:10000                           10936 ns        10887 ns          100
    two_values/QuickSort_RTL/count:100000                  3664647 ns      3659945 ns          100
    two_values/QuickSort_Simd/count:100000                 2131772 ns      2128914 ns          100
    two_values/PDQSort/count:100000                         123928 ns       123845 ns          100
    two_values/QuickSort_RTL/count:1000000                41535128 ns     41447869 ns          100
    two_values/QuickSort_Simd/count:1000000               27185208 ns     27141671 ns          100
    two_values/PDQSort/count:1000000                       1279685 ns      1276747 ns          100
    all_equal/QuickSort_RTL/count:100                         2366 ns         2342 ns          100
    all_equal/QuickSort_Simd/count:100                        1457 ns         1439 ns          100
    all_equal/PDQSort/count:100                                527 ns          477 ns          100
    all_equal/QuickSort_RTL/count:1000                       18446 ns        18411 ns          100
    all_equal/QuickSort_Simd/count:1000                      13511 ns        13463 ns          100
    all_equal/PDQSort/count:1000                              1377 ns         1368 ns          100
    all_equal/QuickSort_RTL/count:10000                     237075 ns       236655 ns          100
    all_equal/QuickSort_Simd/count:10000                    172246 ns       171992 ns          100
    all_equal/PDQSort/count:10000                             5947 ns         5847 ns          100
    all_equal/QuickSort_RTL/count:100000                   3014635 ns      3006448 ns          100
    all_equal/QuickSort_Simd/count:100000                  2235909 ns      2233777 ns          100
    all_equal/PDQSort/count:100000                           42432 ns        42382 ns          100
    all_equal/QuickSort_RTL/count:1000000                 34511832 ns     34427709 ns          100
    all_equal/QuickSort_Simd/count:1000000                28091462 ns     27988156 ns          100
    all_equal/PDQSort/count:1000000                         413582 ns       412323 ns          100
    ascending/QuickSort_RTL/count:100                         1711 ns         1651 ns          100
    ascending/QuickSort_Simd/count:100                        1300 ns         1262 ns          100
    ascending/PDQSort/count:100                                909 ns          886 ns          100
    ascending/QuickSort_RTL/count:1000                       14402 ns        14355 ns          100
    ascending/QuickSort_Simd/count:1000                      12382 ns        12293 ns          100
    ascending/PDQSort/count:1000                              1283 ns         1259 ns          100
    ascending/QuickSort_RTL/count:10000                     169937 ns       169542 ns          100
    ascending/QuickSort_Simd/count:10000                    164197 ns       163830 ns          100
    ascending/PDQSort/count:10000                             5191 ns         5126 ns          100
    ascending/QuickSort_RTL/count:100000                   2073379 ns      2070924 ns          100
    ascending/QuickSort_Simd/count:100000                  2021744 ns      2018514 ns          100
    ascending/PDQSort/count:100000                           47881 ns        47829 ns          100
    ascending/QuickSort_RTL/count:1000000                 24703110 ns     24661199 ns          100
    ascending/QuickSort_Simd/count:1000000                22872824 ns     22813704 ns          100
    ascending/PDQSort/count:1000000                         528476 ns       527071 ns          100
    slightly_scrambled/QuickSort_RTL/count:100                1192 ns         1148 ns          100
    slightly_scrambled/QuickSort_Simd/count:100                967 ns          950 ns          100
    slightly_scrambled/PDQSort/count:100                       523 ns          498 ns          100
    slightly_scrambled/QuickSort_RTL/count:1000              16250 ns        16220 ns          100
    slightly_scrambled/QuickSort_Simd/count:1000             11613 ns        11424 ns          100
    slightly_scrambled/PDQSort/count:1000                     4808 ns         4757 ns          100
    slightly_scrambled/QuickSort_RTL/count:10000            230412 ns       230013 ns          100
    slightly_scrambled/QuickSort_Simd/count:10000           180107 ns       179869 ns          100
    slightly_scrambled/PDQSort/count:10000                   93826 ns        93569 ns          100
    slightly_scrambled/QuickSort_RTL/count:100000          2705736 ns      2702848 ns          100
    slightly_scrambled/QuickSort_Simd/count:100000         2204179 ns      2201412 ns          100
    slightly_scrambled/PDQSort/count:100000                1355543 ns      1346419 ns          100
    slightly_scrambled/QuickSort_RTL/count:1000000        32222921 ns     32171861 ns          100
    slightly_scrambled/QuickSort_Simd/count:1000000       24320016 ns     24261911 ns          100
    slightly_scrambled/PDQSort/count:1000000              15813751 ns     15769810 ns          100
    sorted_blocks/QuickSort_RTL/count:100                     2051 ns         2030 ns          100
    sorted_blocks/QuickSort_Simd/count:100                    1747 ns         1650 ns          100
    sorted_blocks/PDQSort/count:100                            978 ns          956 ns          100
    sorted_blocks/QuickSort_RTL/count:1000                   22202 ns        22159 ns          100
    sorted_blocks/QuickSort_Simd/count:1000                  13048 ns        13006 ns          100
    sorted_blocks/PDQSort/count:1000                          6170 ns         6131 ns          100
    sorted_blocks/QuickSort_RTL/count:10000                 259193 ns       257887 ns          100
    sorted_blocks/QuickSort_Simd/count:10000                194062 ns       193862 ns          100
    sorted_blocks/PDQSort/count:10000                       170191 ns       169930 ns          100
    sorted_blocks/QuickSort_RTL/count:100000               4040755 ns      4035505 ns          100
    sorted_blocks/QuickSort_Simd/count:100000              2505645 ns      2502093 ns          100
    sorted_blocks/PDQSort/count:100000                     2312275 ns      2310099 ns          100
    sorted_blocks/QuickSort_RTL/count:1000000             46132898 ns     46058821 ns          100
    sorted_blocks/QuickSort_Simd/count:1000000            27532882 ns     27458613 ns          100
    sorted_blocks/PDQSort/count:1000000                   27203482 ns     27114899 ns          100
    single_big_swap/QuickSort_RTL/count:100                   1769 ns         1748 ns          100
    single_big_swap/QuickSort_Simd/count:100                  1235 ns         1199 ns          100
    single_big_swap/PDQSort/count:100                         1076 ns         1060 ns          100
    single_big_swap/QuickSort_RTL/count:1000                 15277 ns        15188 ns          100
    single_big_swap/QuickSort_Simd/count:1000                12614 ns        12571 ns          100
    single_big_swap/PDQSort/count:1000                        2836 ns         2787 ns          100
    single_big_swap/QuickSort_RTL/count:10000               174662 ns       174264 ns          100
    single_big_swap/QuickSort_Simd/count:10000              190061 ns       187438 ns          100
    single_big_swap/PDQSort/count:10000                      29452 ns        29394 ns          100
    single_big_swap/QuickSort_RTL/count:100000             2094548 ns      2090125 ns          100
    single_big_swap/QuickSort_Simd/count:100000            2252041 ns      2245660 ns          100
    single_big_swap/PDQSort/count:100000                    280203 ns       279238 ns          100
    single_big_swap/QuickSort_RTL/count:1000000           25817701 ns     25698143 ns          100
    single_big_swap/QuickSort_Simd/count:1000000          24978407 ns     24883864 ns          100
    single_big_swap/PDQSort/count:1000000                  2840811 ns      2822251 ns          100
    descending/QuickSort_RTL/count:100                        1364 ns         1326 ns          100
    descending/QuickSort_Simd/count:100                       1402 ns         1337 ns          100
    descending/PDQSort/count:100                               617 ns          599 ns          100
    descending/QuickSort_RTL/count:1000                      13961 ns        13914 ns          100
    descending/QuickSort_Simd/count:1000                     10914 ns        10875 ns          100
    descending/PDQSort/count:1000                              842 ns          810 ns          100
    descending/QuickSort_RTL/count:10000                    168231 ns       167973 ns          100
    descending/QuickSort_Simd/count:10000                   158758 ns       158528 ns          100
    descending/PDQSort/count:10000                            4137 ns         4091 ns          100
    descending/QuickSort_RTL/count:100000                  2044335 ns      2039881 ns          100
    descending/QuickSort_Simd/count:100000                 1990965 ns      1984400 ns          100
    descending/PDQSort/count:100000                          33285 ns        32847 ns          100
    descending/QuickSort_RTL/count:1000000                24925931 ns     24823017 ns          100
    descending/QuickSort_Simd/count:1000000               22903629 ns     22831022 ns          100
    descending/PDQSort/count:1000000                        480036 ns       477583 ns          100
    pipe_organ/QuickSort_RTL/count:100                        1351 ns         1326 ns          100
    pipe_organ/QuickSort_Simd/count:100                        946 ns          931 ns          100
    pipe_organ/PDQSort/count:100                               530 ns          503 ns          100
    pipe_organ/QuickSort_RTL/count:1000                      16315 ns        16203 ns          100
    pipe_organ/QuickSort_Simd/count:1000                     11241 ns        11244 ns          100
    pipe_organ/PDQSort/count:1000                             1301 ns         1271 ns          100
    pipe_organ/QuickSort_RTL/count:10000                    203222 ns       202833 ns          100
    pipe_organ/QuickSort_Simd/count:10000                   158800 ns       158708 ns          100
    pipe_organ/PDQSort/count:10000                            4128 ns         4063 ns          100
    pipe_organ/QuickSort_RTL/count:100000                  2543356 ns      2535591 ns          100
    pipe_organ/QuickSort_Simd/count:100000                 2086136 ns      2081574 ns          100
    pipe_organ/PDQSort/count:100000                          33407 ns        33379 ns          100
    pipe_organ/QuickSort_RTL/count:1000000                29884109 ns     29808067 ns          100
    pipe_organ/QuickSort_Simd/count:1000000               25110723 ns     25014817 ns          100
    pipe_organ/PDQSort/count:1000000                        411686 ns       409344 ns          100
    push_front/QuickSort_RTL/count:100                        1717 ns         1672 ns          100
    push_front/QuickSort_Simd/count:100                       1576 ns         1510 ns          100
    push_front/PDQSort/count:100                              1159 ns         1109 ns          100
    push_front/QuickSort_RTL/count:1000                      16895 ns        16866 ns          100
    push_front/QuickSort_Simd/count:1000                     10169 ns        10159 ns          100
    push_front/PDQSort/count:1000                              834 ns          820 ns          100
    push_front/QuickSort_RTL/count:10000                    207717 ns       206724 ns          100
    push_front/QuickSort_Simd/count:10000                   160138 ns       159126 ns          100
    push_front/PDQSort/count:10000                            6265 ns         6215 ns          100
    push_front/QuickSort_RTL/count:100000                  2508110 ns      2491772 ns          100
    push_front/QuickSort_Simd/count:100000                 2049725 ns      2044132 ns          100
    push_front/PDQSort/count:100000                          57586 ns        57545 ns          100
    push_front/QuickSort_RTL/count:1000000                28090747 ns     27987564 ns          100
    push_front/QuickSort_Simd/count:1000000               22919136 ns     22856894 ns          100
    push_front/PDQSort/count:1000000                        586990 ns       585977 ns          100
    push_middle/QuickSort_RTL/count:100                       1859 ns         1831 ns          100
    push_middle/QuickSort_Simd/count:100                      1557 ns         1524 ns          100
    push_middle/PDQSort/count:100                              365 ns          347 ns          100
    push_middle/QuickSort_RTL/count:1000                     15468 ns        15391 ns          100
    push_middle/QuickSort_Simd/count:1000                    10691 ns        10661 ns          100
    push_middle/PDQSort/count:1000                             852 ns          818 ns          100
    push_middle/QuickSort_RTL/count:10000                   186437 ns       185943 ns          100
    push_middle/QuickSort_Simd/count:10000                  159570 ns       159464 ns          100
    push_middle/PDQSort/count:10000                           4892 ns         4843 ns          100
    push_middle/QuickSort_RTL/count:100000                 2251077 ns      2244548 ns          100
    push_middle/QuickSort_Simd/count:100000                2096000 ns      2088067 ns          100
    push_middle/PDQSort/count:100000                         61318 ns        61166 ns          100
    push_middle/QuickSort_RTL/count:1000000               26583284 ns     26476059 ns          100
    push_middle/QuickSort_Simd/count:1000000              23082618 ns     22978049 ns          100
    push_middle/PDQSort/count:1000000                       457556 ns       455193 ns          100

    And this is sorting Double on Win64:

     

    ------------------------------------------------------------------------------------------
    Benchmark                                                Time             CPU   Iterations
    ------------------------------------------------------------------------------------------
    random/QuickSort_RTL/count:100                        2565 ns         2547 ns          100
    random/QuickSort_Simd/count:100                       3349 ns         3301 ns          100
    random/PDQSort/count:100                              1168 ns         1183 ns          100
    random/QuickSort_RTL/count:1000                      47373 ns        47332 ns          100
    random/QuickSort_Simd/count:1000                     32689 ns        32635 ns          100
    random/PDQSort/count:1000                             6910 ns         6875 ns          100
    random/QuickSort_RTL/count:10000                    776390 ns       775674 ns          100
    random/QuickSort_Simd/count:10000                   503738 ns       478031 ns          100
    random/PDQSort/count:10000                          167340 ns       167280 ns          100
    random/QuickSort_RTL/count:100000                  9351122 ns      9337285 ns          100
    random/QuickSort_Simd/count:100000                 5608265 ns      5599393 ns          100
    random/PDQSort/count:100000                        2102133 ns      2097889 ns          100
    random/QuickSort_RTL/count:1000000               111924175 ns    111232604 ns          100
    random/QuickSort_Simd/count:1000000               64520049 ns     63951990 ns          100
    random/PDQSort/count:1000000                      24124765 ns     23968662 ns          100
    few_unique/QuickSort_RTL/count:100                    3079 ns         3016 ns          100
    few_unique/QuickSort_Simd/count:100                   2430 ns         2415 ns          100
    few_unique/PDQSort/count:100                          1100 ns         1054 ns          100
    few_unique/QuickSort_RTL/count:1000                  34210 ns        34068 ns          100
    few_unique/QuickSort_Simd/count:1000                 30551 ns        30475 ns          100
    few_unique/PDQSort/count:1000                         3884 ns         3821 ns          100
    few_unique/QuickSort_RTL/count:10000                478228 ns       475630 ns          100
    few_unique/QuickSort_Simd/count:10000               388707 ns       387296 ns          100
    few_unique/PDQSort/count:10000                       34574 ns        34507 ns          100
    few_unique/QuickSort_RTL/count:100000              5266166 ns      5242134 ns          100
    few_unique/QuickSort_Simd/count:100000             4469897 ns      4460899 ns          100
    few_unique/PDQSort/count:100000                     385048 ns       382507 ns          100
    few_unique/QuickSort_RTL/count:1000000            57633919 ns     57367821 ns          100
    few_unique/QuickSort_Simd/count:1000000           51159478 ns     50926898 ns          100
    few_unique/PDQSort/count:1000000                   4738638 ns      4716733 ns          100
    two_values/QuickSort_RTL/count:100                    2416 ns         2404 ns          100
    two_values/QuickSort_Simd/count:100                   2596 ns         2573 ns          100
    two_values/PDQSort/count:100                           642 ns          630 ns          100
    two_values/QuickSort_RTL/count:1000                  21831 ns        21786 ns          100
    two_values/QuickSort_Simd/count:1000                 26755 ns        26698 ns          100
    two_values/PDQSort/count:1000                         2256 ns         2209 ns          100
    two_values/QuickSort_RTL/count:10000                307295 ns       306226 ns          100
    two_values/QuickSort_Simd/count:10000               385912 ns       384743 ns          100
    two_values/PDQSort/count:10000                       26331 ns        26284 ns          100
    two_values/QuickSort_RTL/count:100000              3864743 ns      3848329 ns          100
    two_values/QuickSort_Simd/count:100000             4514908 ns      4499180 ns          100
    two_values/PDQSort/count:100000                     111245 ns       110447 ns          100
    two_values/QuickSort_RTL/count:1000000            44169248 ns     43899232 ns          100
    two_values/QuickSort_Simd/count:1000000           52890868 ns     52295299 ns          100
    two_values/PDQSort/count:1000000                   1227367 ns      1222160 ns          100
    all_equal/QuickSort_RTL/count:100                     2280 ns         2264 ns          100
    all_equal/QuickSort_Simd/count:100                    2217 ns         2174 ns          100
    all_equal/PDQSort/count:100                            582 ns          567 ns          100
    all_equal/QuickSort_RTL/count:1000                   18267 ns        18192 ns          100
    all_equal/QuickSort_Simd/count:1000                  26927 ns        26839 ns          100
    all_equal/PDQSort/count:1000                          1483 ns         1474 ns          100
    all_equal/QuickSort_RTL/count:10000                 231677 ns       230263 ns          100
    all_equal/QuickSort_Simd/count:10000                388326 ns       385394 ns          100
    all_equal/PDQSort/count:10000                         8330 ns         8267 ns          100
    all_equal/QuickSort_RTL/count:100000               3072258 ns      3055451 ns          100
    all_equal/QuickSort_Simd/count:100000              4560752 ns      4521917 ns          100
    all_equal/PDQSort/count:100000                       69035 ns        67927 ns          100
    all_equal/QuickSort_RTL/count:1000000             36140895 ns     35829412 ns          100
    all_equal/QuickSort_Simd/count:1000000            52332564 ns     52036527 ns          100
    all_equal/PDQSort/count:1000000                     660978 ns       656433 ns          100
    ascending/QuickSort_RTL/count:100                     1870 ns         1877 ns          100
    ascending/QuickSort_Simd/count:100                    3392 ns         3352 ns          100
    ascending/PDQSort/count:100                           1172 ns         1160 ns          100
    ascending/QuickSort_RTL/count:1000                   16620 ns        16586 ns          100
    ascending/QuickSort_Simd/count:1000                  31390 ns        31279 ns          100
    ascending/PDQSort/count:1000                          1498 ns         1433 ns          100
    ascending/QuickSort_RTL/count:10000                 205486 ns       204808 ns          100
    ascending/QuickSort_Simd/count:10000                397228 ns       395426 ns          100
    ascending/PDQSort/count:10000                         6873 ns         6760 ns          100
    ascending/QuickSort_RTL/count:100000               2481823 ns      2478599 ns          100
    ascending/QuickSort_Simd/count:100000              4677020 ns      4651949 ns          100
    ascending/PDQSort/count:100000                       53174 ns        52941 ns          100
    ascending/QuickSort_RTL/count:1000000             30165890 ns     30016057 ns          100
    ascending/QuickSort_Simd/count:1000000            52585840 ns     52358100 ns          100
    ascending/PDQSort/count:1000000                     546292 ns       543164 ns          100
    slightly_scrambled/QuickSort_RTL/count:100            2265 ns         2179 ns          100
    slightly_scrambled/QuickSort_Simd/count:100           2984 ns         2951 ns          100
    slightly_scrambled/PDQSort/count:100                  1312 ns         1223 ns          100
    slightly_scrambled/QuickSort_RTL/count:1000          18443 ns        18415 ns          100
    slightly_scrambled/QuickSort_Simd/count:1000         32498 ns        32437 ns          100
    slightly_scrambled/PDQSort/count:1000                 4900 ns         4885 ns          100
    slightly_scrambled/QuickSort_RTL/count:10000        264849 ns       264564 ns          100
    slightly_scrambled/QuickSort_Simd/count:10000       425552 ns       424796 ns          100
    slightly_scrambled/PDQSort/count:10000              102652 ns       101993 ns          100
    slightly_scrambled/QuickSort_RTL/count:100000      3134562 ns      3119682 ns          100
    slightly_scrambled/QuickSort_Simd/count:100000     4998249 ns      4978579 ns          100
    slightly_scrambled/PDQSort/count:100000            1472946 ns      1466023 ns          100
    slightly_scrambled/QuickSort_RTL/count:1000000    37367087 ns     37182092 ns          100
    slightly_scrambled/QuickSort_Simd/count:1000000   55740685 ns     55477997 ns          100
    slightly_scrambled/PDQSort/count:1000000          17186242 ns     17082706 ns          100
    sorted_blocks/QuickSort_RTL/count:100                 2478 ns         2433 ns          100
    sorted_blocks/QuickSort_Simd/count:100                3095 ns         2992 ns          100
    sorted_blocks/PDQSort/count:100                       1090 ns         1067 ns          100
    sorted_blocks/QuickSort_RTL/count:1000               27677 ns        27633 ns          100
    sorted_blocks/QuickSort_Simd/count:1000              34255 ns        34185 ns          100
    sorted_blocks/PDQSort/count:1000                      7345 ns         7287 ns          100
    sorted_blocks/QuickSort_RTL/count:10000             309712 ns       307961 ns          100
    sorted_blocks/QuickSort_Simd/count:10000            472346 ns       467668 ns          100
    sorted_blocks/PDQSort/count:10000                   156489 ns       156120 ns          100
    sorted_blocks/QuickSort_RTL/count:100000           4751737 ns      4736686 ns          100
    sorted_blocks/QuickSort_Simd/count:100000          5649021 ns      5626166 ns          100
    sorted_blocks/PDQSort/count:100000                 2090767 ns      2085443 ns          100
    sorted_blocks/QuickSort_RTL/count:1000000         55395331 ns     55148208 ns          100
    sorted_blocks/QuickSort_Simd/count:1000000        64626584 ns     64356116 ns          100
    sorted_blocks/PDQSort/count:1000000               24145373 ns     24023248 ns          100
    single_big_swap/QuickSort_RTL/count:100               2154 ns         2105 ns          100
    single_big_swap/QuickSort_Simd/count:100              3405 ns         3371 ns          100
    single_big_swap/PDQSort/count:100                     1271 ns         1280 ns          100
    single_big_swap/QuickSort_RTL/count:1000             18941 ns        18884 ns          100
    single_big_swap/QuickSort_Simd/count:1000            35627 ns        35369 ns          100
    single_big_swap/PDQSort/count:1000                    2914 ns         2881 ns          100
    single_big_swap/QuickSort_RTL/count:10000           230744 ns       226649 ns          100
    single_big_swap/QuickSort_Simd/count:10000          439977 ns       439106 ns          100
    single_big_swap/PDQSort/count:10000                  24462 ns        24441 ns          100
    single_big_swap/QuickSort_RTL/count:100000         2642839 ns      2638752 ns          100
    single_big_swap/QuickSort_Simd/count:100000        5123966 ns      5110310 ns          100
    single_big_swap/PDQSort/count:100000                212054 ns       211579 ns          100
    single_big_swap/QuickSort_RTL/count:1000000       31998557 ns     31831534 ns          100
    single_big_swap/QuickSort_Simd/count:1000000      56693968 ns     56385127 ns          100
    single_big_swap/PDQSort/count:1000000              2287462 ns      2274715 ns          100
    descending/QuickSort_RTL/count:100                    1953 ns         1907 ns          100
    descending/QuickSort_Simd/count:100                   3266 ns         3233 ns          100
    descending/PDQSort/count:100                          1243 ns         1190 ns          100
    descending/QuickSort_RTL/count:1000                  16769 ns        16669 ns          100
    descending/QuickSort_Simd/count:1000                 31242 ns        31187 ns          100
    descending/PDQSort/count:1000                         1416 ns         1383 ns          100
    descending/QuickSort_RTL/count:10000                205746 ns       205463 ns          100
    descending/QuickSort_Simd/count:10000               397112 ns       396469 ns          100
    descending/PDQSort/count:10000                        6385 ns         6357 ns          100
    descending/QuickSort_RTL/count:100000              2509532 ns      2479948 ns          100
    descending/QuickSort_Simd/count:100000             4654628 ns      4626693 ns          100
    descending/PDQSort/count:100000                      53043 ns        52875 ns          100
    descending/QuickSort_RTL/count:1000000            29567957 ns     29440319 ns          100
    descending/QuickSort_Simd/count:1000000           52694899 ns     52342249 ns          100
    descending/PDQSort/count:1000000                    546650 ns       544430 ns          100
    pipe_organ/QuickSort_RTL/count:100                    2116 ns         2097 ns          100
    pipe_organ/QuickSort_Simd/count:100                   1963 ns         1946 ns          100
    pipe_organ/PDQSort/count:100                           628 ns          611 ns          100
    pipe_organ/QuickSort_RTL/count:1000                  18278 ns        18198 ns          100
    pipe_organ/QuickSort_Simd/count:1000                 28385 ns        27872 ns          100
    pipe_organ/PDQSort/count:1000                         1451 ns         1439 ns          100
    pipe_organ/QuickSort_RTL/count:10000                214821 ns       214320 ns          100
    pipe_organ/QuickSort_Simd/count:10000               368611 ns       367921 ns          100
    pipe_organ/PDQSort/count:10000                        7572 ns         6703 ns          100
    pipe_organ/QuickSort_RTL/count:100000              2732768 ns      2722013 ns          100
    pipe_organ/QuickSort_Simd/count:100000             4673867 ns      4651534 ns          100
    pipe_organ/PDQSort/count:100000                      54171 ns        53981 ns          100
    pipe_organ/QuickSort_RTL/count:1000000            32340442 ns     32204978 ns          100
    pipe_organ/QuickSort_Simd/count:1000000           52601173 ns     52242565 ns          100
    pipe_organ/PDQSort/count:1000000                   1191710 ns      1179642 ns          100
    push_front/QuickSort_RTL/count:100                    2201 ns         2179 ns          100
    push_front/QuickSort_Simd/count:100                   3505 ns         3463 ns          100
    push_front/PDQSort/count:100                          1466 ns         1460 ns          100
    push_front/QuickSort_RTL/count:1000                  20234 ns        20070 ns          100
    push_front/QuickSort_Simd/count:1000                 32713 ns        32609 ns          100
    push_front/PDQSort/count:1000                         2279 ns         1973 ns          100
    push_front/QuickSort_RTL/count:10000                259051 ns       249922 ns          100
    push_front/QuickSort_Simd/count:10000               408989 ns       407301 ns          100
    push_front/PDQSort/count:10000                        9560 ns         9541 ns          100
    push_front/QuickSort_RTL/count:100000              2963996 ns      2943123 ns          100
    push_front/QuickSort_Simd/count:100000             4759393 ns      4743849 ns          100
    push_front/PDQSort/count:100000                      92058 ns        91788 ns          100
    push_front/QuickSort_RTL/count:1000000            32817948 ns     32651555 ns          100
    push_front/QuickSort_Simd/count:1000000           52370293 ns     52119300 ns          100
    push_front/PDQSort/count:1000000                    885186 ns       880839 ns          100
    push_middle/QuickSort_RTL/count:100                   2182 ns         2131 ns          100
    push_middle/QuickSort_Simd/count:100                  3385 ns         3379 ns          100
    push_middle/PDQSort/count:100                          643 ns          630 ns          100
    push_middle/QuickSort_RTL/count:1000                 18552 ns        18454 ns          100
    push_middle/QuickSort_Simd/count:1000                30820 ns        30591 ns          100
    push_middle/PDQSort/count:1000                        1553 ns         1518 ns          100
    push_middle/QuickSort_RTL/count:10000               234531 ns       233782 ns          100
    push_middle/QuickSort_Simd/count:10000              392271 ns       389699 ns          100
    push_middle/PDQSort/count:10000                       8547 ns         8493 ns          100
    push_middle/QuickSort_RTL/count:100000             2610121 ns      2600388 ns          100
    push_middle/QuickSort_Simd/count:100000            4630818 ns      4609689 ns          100
    push_middle/PDQSort/count:100000                     71370 ns        69366 ns          100
    push_middle/QuickSort_RTL/count:1000000           31046096 ns     30924256 ns          100
    push_middle/QuickSort_Simd/count:1000000          52541755 ns     52326830 ns          100
    push_middle/PDQSort/count:1000000                   711900 ns       708811 ns          100

     

    • Like 2

  3. On 2/16/2020 at 3:41 PM, vfbb said:

    It is not an issue, it is the cost benefit of higher performance x less memory. But the optimization of creating it with the capacity is valid considering that the performance increases is on average 30%, both for small collections and for large ones.

    
    uses
      System.SysUtils, System.Generics.Collections, System.Diagnostics, FMX.Dialogs;
    
    procedure TestWithInitialCapacity;
    var
      I, J, LNewId: Integer;
      LDictionary: TDictionary<Integer, Boolean>;
      LStopwatch: TStopwatch;
    begin
      LStopwatch := TStopwatch.StartNew;
      for I := 0 to 10000 do begin
        LDictionary := TDictionary<Integer, Boolean>.Create(200);
        for J := 0 to 100 do begin
          repeat
            LNewId := Random(High(Integer));
          until not LDictionary.ContainsKey(LNewId);
          LDictionary.Add(LNewId, True);
        end;
        FreeAndNil(LDictionary);
      end;
      showmessage(Format('Test with initial capacity: %g', [LStopwatch.Elapsed.TotalMilliseconds]));
    end;
    
    procedure TestWithoutInitialCapacity;
    var
      I, J, LNewId: Integer;
      LDictionary: TDictionary<Integer, Boolean>;
      LStopwatch: TStopwatch;
    begin
      LStopwatch := TStopwatch.StartNew;
      for I := 0 to 10000 do begin
        LDictionary := TDictionary<Integer, Boolean>.Create;
        for J := 0 to 100 do begin
          repeat
            LNewId := Random(High(Integer));
          until not LDictionary.ContainsKey(LNewId);
          LDictionary.Add(LNewId, True);
        end;
        FreeAndNil(LDictionary);
      end;
      showmessage(Format('Test without initial capacity: %g', [LStopwatch.Elapsed.TotalMilliseconds]));
    end;

     

    I just came across this thread while searching for something else, and while reading this code, I want to make a few points:

     

    I agree on the essential point: when you know how many items you will add to a collection, it is always beneficial to set the capacity upfront.

     

    Now, even though this is some naive benchmark code, it shows a few things:

    - Calling ContainsKey before the attempt to call Add means performing the lookup twice, which has not been necessary at all since Delphi 10.3 because of the new method TryAdd. On my machine, the tests go down from 51ms and 31ms to 44ms and 26ms.

    - The hash function is one of the key parts that makes or breaks a dictionary performance-wise. The numbers I posted are from 12.3, which already uses a better hash algo (FNV1a) than it did years ago (BobJenkins) but is still far away from ideal.

    Especially for Integer keys, if the hashtable is designed in a robust way of dealing with hash collisions and clustering, you can get away with not hashing anything but using the value itself. Unfortunately, the RTL one is not built in that way, but for this benchmark, it is fine to replace the eq compare it uses with the one from Spring4d, which does what I mentioned before. This brings down the numbers to 36ms and 22ms. That is another 15% improvement.

     

    Now Spring4d would not be Spring4d if it did not have another trick up its sleeve: when it uses the default eq compare, the hashtable does not need to call GetHashCode and Equals for some intrinsic types such as Integer because the value itself is used as hashcode and for checking Integer equality it does not need to call some method. That leads to the above test with initial capacity taking 14ms.

    • Like 5
    • Thanks 1

  4. To avoid repeating the unit throughout the source code, you can declare an alias at the top and only explicitly put the unit name there. That, however, only works for non-generic types and consts. I wish it would also work for generics and routines.

    • Like 3

  5. The claim that it compiles faster is bogus - prove me wrong. Most compile time from spring4d comes from generics, which I reported years ago.

    Also, my suggestion for third-party libraries is to pre-compile them, which removes any dependency on the project options in your project.

     

    Currently, Spring4d supports down to XE, and as long as that is the case, I am not putting even more conditionals into the code than there already are.

    • Like 5

  6. Just a few new numbers of a not yet released parallel pdq sort - using the benchmark code from this comment earlier in this thread

    Fill array (double), elements count: 500000
    Start sorting ...
    RTL TArray.Sort (ms.): 47
    RTL TParallelArray.Sort (ms.): 35
    Spring TArray.Sort (ms.): 12
    Spring TArray.Sort_Parallel (ms.): 3
    
    Fill array (double), elements count: 5000000
    Start sorting ...
    RTL TArray.Sort (ms.): 551
    RTL TParallelArray.Sort (ms.): 128
    Spring TArray.Sort (ms.): 136
    Spring TArray.Sort_Parallel (ms.): 64
    
    Fill array (double), elements count: 100000000
    Start sorting ...
    RTL TArray.Sort (ms.): 12724
    RTL TParallelArray.Sort (ms.): 1884
    Spring TArray.Sort (ms.): 3035
    Spring TArray.Sort_Parallel (ms.): 675

    Again - these numbers are fluctuating a bit because the benchmark is a "run once" benchmark and it depends on the current CPU state etc - also I did not tweak the threshold and CPU count yet - simply calling TTask.Run from System.Threading to fork some slices into parallel execution.

     

    But overall it does not look too bad, doesn't it? :classic_cool:

    • Like 6

  7. 2 hours ago, Anders Melander said:

    If you use something, with a specific purpose, how is that CCP?

    Because as you can read in this thread people advocate for using it everywhere instead of at those rare places it's designed for: making sure the reference is set to nil before the instance is destroyed because any code being called during destruction might reach back to this very reference.

    (which is the reason its name is not even correct - it should have been named NilAndFree)

    • Sad 1

  8. Reducing instead of eliminating the probability of AVs (or other errors caused by races) in async code is one of the most annoying things you can do - anyone who ever was on the hunt for that "one in a billion times" bug knows that. I cannot believe that people who should know better give such advice.


  9. 8 hours ago, pyscripter said:

    Spring4D in particular uses camelCasing for parameters and local variables, small f for fields and uses Pascal casing for function/procedure names and properties.

     

    @Stefan Glienke Is this just a matter of taste?

    It always is just a matter of taste - I don't particularly like the parameter and local variable prefixing so I mostly settled on the C# naming conventions for those - I did not want to go that far to name fields with a leading underscore like them though.

    Though this is not without disadvantages - I had cases where I had a count parameter and wanted to access the Count field but without qualifying with Self I accessed the parameter instead. :classic_blush: (those are the days where I wished for case sensitivity)

    54 minutes ago, dummzeuch said:

    If only there was a code formatter written in Delphi that is open source!

    Because being open source means that everyone can go ahead and modify it to their liking, right? :classic_dry:


  10. I am all for better optimization performed by the compiler but this particular case is not among them - if you write empty methods then use some static code analysis to let them be detected and remove them.

    Optimizing out empty methods (and here we are not talking about some regular methods but ctor and dtor!) is something that most compilers don't do for various reasons.

     

    Only when methods get inlined (which is not possible for virtual methods unless devirtualization is possible but that's an entirely different story) the compiler can detect that there actually is nothing to (and calling inherited is not nothing) and in fact the Delphi compiler does that - but ctor and dtor can not be inlined.

     

    The case you show here gets more interesting though if you remove all the empty ctors and dtors because then I still see a slight difference of 10-15%.

    The reason for that is TObject.InitInstance

     

    P.S. Just FYI if I run the benchmark without the empty ctor and dtor in Delphi 12 release config I get these numbers:

    BaseClass: 106,01 ms
    unneeded inheritance: 119,40 ms

    The same code compiled in Delphi 10.1 gives these numbers - so much about "there is no progress on optimization":

    BaseClass: 156,23 ms
    unneeded inheritance: 175,53 ms

    Although these particular differences are most likely because of the following two improvements I provided:

    • Like 2
×