Tommi Prami 130 Posted November 2, 2023 (edited) This is quite interesting. Have not looked into how it is implemented, but speedup claims are pretty substantial. https://github.com/intel/x86-simd-sort Rarely sort is the real bottleneck in any code, but faster is still faster đ  -Tee- Edited November 2, 2023 by Tommi Prami Share this post Link to post
DelphiUdIT 176 Posted November 2, 2023 Quite interesting, but I think it's not complete. Support to AVX2 should be full available. Intel new processors (12th, 13th and 14th gen.) don't support AVX512 (not even in the future, it seems). Â In the professional line of Intel (like some Xeon family) only a subset of AVX512 instructions are supported. Â The AVX2 instructions are more widely supported. Â Bye 1 Share this post Link to post
Stefan Glienke 2002 Posted November 2, 2023 A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats: Do people in reality really have these kinds of data that need to be sorted? Share this post Link to post
DelphiUdIT 176 Posted November 2, 2023 7 minutes ago, Stefan Glienke said: A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats: Do people in reality really have these kinds of data that need to be sorted? I'm not expert on that, but from what I have seen of development "out there" I don't think there is this great need. But perhaps for certain specific purposes or for some algorithms this may be necessary. The development, even if it is a niche, is still going well. What I was focusing on is that it is useless to provide only partial "optimization" parts, because they are too dependent on the hardware ... unless such optimizations are to be used only and exclusively on specific hardware (if I have to mainly use Intel's I certainly wouldn't use the AVX512, but if I only use AMD maybe yes). Share this post Link to post
MichaelT 6 Posted November 2, 2023 5 hours ago, Stefan Glienke said: A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats: Do people in reality really have these kinds of data that need to be sorted? Benchmarks are benchmarks and usually compare the capability of a certain piece of hardware or a system as a whole to solve a task. That's what they do and speed up is the point, because slower is easy to achieve in general.  The granularity of data is growing and considering time series gets more important than ever before, since hardware became fast enough.  I'm also doing hard to find an application for just sorting key figures/values regarding an array as an independent dimension not connected to descriptive values. Maybe my brain is too strongly tied to non-math database thinking and I'm no technician or mathematician. These guys make use of such data-structures.  Sorting values should result in faster lookup times acting as an index and extracting ranges in series of values for example. No idea why someone would need that, but as mentioned short before ...  In general we are used to, me at least, to know what I'm looking for. If values are just retrieved searching and creating ranges can help to find out what should be looked up. Integer (just normal) as well as floating point values can act as describing dimensions too. In technical applications or the analysis of production line data regarding or planning worked that way. In a business application almost everything that can be related to a surrogate id is more or less discrete and searching starts with specifying describing values retrieved from describing dimensions assumed to have an impact. In case of finding the impacting/influencing dimensions values measured have to be regarded/analyzed first, values grouped and so on.  One of my former business partners simplified optimization by removing criteria that had no impact to the optimization of the product mix in production first instead of considering a fixed set.  Richard Werner (German economist) used pretty much the same trick to remove the impact of the interest rate to the demand for credit in business for example unless the plug is pulled by the central bank. It's no surprise because 'a line'/company providing goods in the classical Mid-European industry model used the interest rate after app. 35(40) to 50 years to decide if the line should keep producing and just repair the broken machines or build a new plant (reproduction of the line). Since the mid. 1980s even in Austria everything has been pushed into direction the 'Anglo-Saxon' industrial/business model. Those guys simply rebuilt the plant after 45 years and never really bothered with this situation. The myth of the impact of the interest rate and the believe concerning investment decisions is still alive and kept alive.  The same myths do exist in many production planning systems considering a wide variety of influential factors/criteria retrieved from the past and with this for so called experience. That's why so many optimization model rely on let's say about 40 describing dimensions/criteria/influential factors taken from a pool of maybe 2000 potential influential factors while in current production, in a steel plant liquid phase, only 3 or 4 really drive the optimization software's decision and the result of the planning and optimization run. This requires the analysis of the values/key figures and describing technical dimensions first an the analysis should finish quickly, because today a run of production plan happens in de facto real-time (about 0,5 sec or less) and finding the influential factors should also not take long. In the past 20 years ago such a run was allowed to take 20 minutes.  Share this post Link to post
Dave Novo 51 Posted November 3, 2023 15 hours ago, Stefan Glienke said: A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats: Do people in reality really have these kinds of data that need to be sorted? We write software that does scientific data analysis. We regularly have to sort arrays of millions of elements of data. But I recognize we are in the minority of developers. 1 Share this post Link to post
Fr0sT.Brutal 900 Posted November 3, 2023 16 hours ago, Stefan Glienke said: Do people in reality really have these kinds of data that need to be sorted? In fact, most of people really, though indirectly, have this kind of data sorted đ DB keys 1 Share this post Link to post
Stefan Glienke 2002 Posted November 3, 2023 (edited) 1 hour ago, Fr0sT.Brutal said: In fact, most of people really, though indirectly, have this kind of data sorted đ DB keys Indirectly does not matter for this algorithm because it directly sorts a vector of integer or float - and DB keys are organized in a different layout.  2 hours ago, Dave Novo said: We write software that does scientific data analysis. We regularly have to sort arrays of millions of elements of data. But I recognize we are in the minority of developers. Again, the question was not in sorting millions of elements of data but strictly sorting arrays of integer or float. Edited November 3, 2023 by Stefan Glienke Share this post Link to post
DJSox 1 Posted November 3, 2023 In statistics and/or data analysis to calculate the median you have to sort the data, at least I always have. When plotting data, say in an xy plot, sometimes it's better to sort the data by x first. So, in technical fields it's not that uncommon. Share this post Link to post
David Heffernan 2345 Posted November 3, 2023 36 minutes ago, DJSox said: In statistics and/or data analysis to calculate the median you have to sort the data, at least I always have. This sounds like a really interesting numerical problem! I suspect that in practical terms, sorting is going to be the winner though. Share this post Link to post
Dave Novo 51 Posted November 3, 2023 Correct, we sort arrays of integers and floats all the time. Both double and single. And have to process these millions of values as well. By "process" I mean perform mathematical operations on them. For us the Dew Research library mtxVec has proven invaluable. Share this post Link to post
Tommi Prami 130 Posted November 6, 2023 On 11/2/2023 at 4:36 PM, Stefan Glienke said: A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats: Do people in reality really have these kinds of data that need to be sorted? Good point. Mainly in real life apps I've been working with almost every time sorting is been for strings. Then some quite small portion been list of objects or records. -Tee- Share this post Link to post
mikerabat 20 Posted November 24, 2023 (edited) I'm not sure about exactly this library but some predecessors indeed were able to sort pointer list whereas the "key" is a float/int 32/64 . The key can be specified as an "offset" to the starting address -> this allows to sort pointers to records/classes.... at least that is the way they do it in "Fast quicksort Implementation using AVX Instructions"  The definitions in the library for AVX512 are actually using arrays of templates so I guess that means that you can sort records right not only Double/float... ?  On 11/3/2023 at 1:23 PM, DJSox said: In statistics and/or data analysis to calculate the median you have to sort the data, at least I always have. When plotting data, say in an xy plot, sometimes it's better to sort the data by x first. So, in technical fields it's not that uncommon. This problem is solved by an algorithm that allows to extract the kth largest element (not only the median) -> basically it does the partitioning of quicksort but does not sort hte "unnecessary" half. That in fact allows to pick the median in O(n) (roughly) time....  And if you want to go fancy you can do a rolling median over an array of doubles. The algorithm can be found here: https://github.com/mikerabat/mrmath/blob/master/RollingMedMean.pas   Edited November 24, 2023 by mikerabat Share this post Link to post