Intel Simd-sort library

Tommi Prami · November 2, 2023

This is quite interesting. Have not looked into how it is implemented, but speedup claims are pretty substantial.

https://github.com/intel/x86-simd-sort

Rarely sort is the real bottleneck in any code, but faster is still faster 🙂

-Tee-

Edited November 2, 2023 by Tommi Prami

DelphiUdIT · November 2, 2023

Quite interesting, but I think it's not complete. Support to AVX2 should be full available.

Intel new processors (12th, 13th and 14th gen.) don't support AVX512 (not even in the future, it seems).

In the professional line of Intel (like some Xeon family) only a subset of AVX512 instructions are supported.

The AVX2 instructions are more widely supported.

Bye

Stefan Glienke · November 2, 2023

A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats:

Do people in reality really have these kinds of data that need to be sorted?

DelphiUdIT · November 2, 2023

7 minutes ago, Stefan Glienke said:

A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats:

Do people in reality really have these kinds of data that need to be sorted?

I'm not expert on that, but from what I have seen of development "out there" I don't think there is this great need. But perhaps for certain specific purposes or for some algorithms this may be necessary. The development, even if it is a niche, is still going well.
What I was focusing on is that it is useless to provide only partial "optimization" parts, because they are too dependent on the hardware ... unless such optimizations are to be used only and exclusively on specific hardware (if I have to mainly use Intel's I certainly wouldn't use the AVX512, but if I only use AMD maybe yes).

MichaelT · November 2, 2023

5 hours ago, Stefan Glienke said:

A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats:

Do people in reality really have these kinds of data that need to be sorted?

Benchmarks are benchmarks and usually compare the capability of a certain piece of hardware or a system as a whole to solve a task. That's what they do and speed up is the point, because slower is easy to achieve in general.

The granularity of data is growing and considering time series gets more important than ever before, since hardware became fast enough.

I'm also doing hard to find an application for just sorting key figures/values regarding an array as an independent dimension not connected to descriptive values. Maybe my brain is too strongly tied to non-math database thinking and I'm no technician or mathematician. These guys make use of such data-structures.

Sorting values should result in faster lookup times acting as an index and extracting ranges in series of values for example. No idea why someone would need that, but as mentioned short before ...

In general we are used to, me at least, to know what I'm looking for. If values are just retrieved searching and creating ranges can help to find out what should be looked up. Integer (just normal) as well as floating point values can act as describing dimensions too. In technical applications or the analysis of production line data regarding or planning worked that way. In a business application almost everything that can be related to a surrogate id is more or less discrete and searching starts with specifying describing values retrieved from describing dimensions assumed to have an impact. In case of finding the impacting/influencing dimensions values measured have to be regarded/analyzed first, values grouped and so on.

One of my former business partners simplified optimization by removing criteria that had no impact to the optimization of the product mix in production first instead of considering a fixed set.

Richard Werner (German economist) used pretty much the same trick to remove the impact of the interest rate to the demand for credit in business for example unless the plug is pulled by the central bank. It's no surprise because 'a line'/company providing goods in the classical Mid-European industry model used the interest rate after app. 35(40) to 50 years to decide if the line should keep producing and just repair the broken machines or build a new plant (reproduction of the line). Since the mid. 1980s even in Austria everything has been pushed into direction the 'Anglo-Saxon' industrial/business model. Those guys simply rebuilt the plant after 45 years and never really bothered with this situation. The myth of the impact of the interest rate and the believe concerning investment decisions is still alive and kept alive.

The same myths do exist in many production planning systems considering a wide variety of influential factors/criteria retrieved from the past and with this for so called experience. That's why so many optimization model rely on let's say about 40 describing dimensions/criteria/influential factors taken from a pool of maybe 2000 potential influential factors while in current production, in a steel plant liquid phase, only 3 or 4 really drive the optimization software's decision and the result of the planning and optimization run. This requires the analysis of the values/key figures and describing technical dimensions first an the analysis should finish quickly, because today a run of production plan happens in de facto real-time (about 0,5 sec or less) and finding the influential factors should also not take long. In the past 20 years ago such a run was allowed to take 20 minutes.

Dave Novo · November 3, 2023

15 hours ago, Stefan Glienke said:

A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats:

Do people in reality really have these kinds of data that need to be sorted?

We write software that does scientific data analysis. We regularly have to sort arrays of millions of elements of data. But I recognize we are in the minority of developers.

Fr0sT.Brutal · November 3, 2023

16 hours ago, Stefan Glienke said:

Do people in reality really have these kinds of data that need to be sorted?

In fact, most of people really, though indirectly, have this kind of data sorted 😉

DB keys

Stefan Glienke · November 3, 2023

1 hour ago, Fr0sT.Brutal said:

In fact, most of people really, though indirectly, have this kind of data sorted 😉

DB keys

Indirectly does not matter for this algorithm because it directly sorts a vector of integer or float - and DB keys are organized in a different layout.

2 hours ago, Dave Novo said:

We write software that does scientific data analysis. We regularly have to sort arrays of millions of elements of data. But I recognize we are in the minority of developers.

Again, the question was not in sorting millions of elements of data but strictly sorting arrays of integer or float.

Edited November 3, 2023 by Stefan Glienke

DJSox · November 3, 2023

In statistics and/or data analysis to calculate the median you have to sort the data, at least I always have. When plotting data, say in an xy plot, sometimes it's better to sort the data by x first. So, in technical fields it's not that uncommon.

David Heffernan · November 3, 2023

36 minutes ago, DJSox said:

In statistics and/or data analysis to calculate the median you have to sort the data, at least I always have.

This sounds like a really interesting numerical problem! I suspect that in practical terms, sorting is going to be the winner though.

Dave Novo · November 3, 2023

Correct, we sort arrays of integers and floats all the time. Both double and single. And have to process these millions of values as well. By "process" I mean perform mathematical operations on them. For us the Dew Research library mtxVec has proven invaluable.

Tommi Prami · November 6, 2023

On 11/2/2023 at 4:36 PM, Stefan Glienke said:

A serious question because many sort benchmarks are so obsessed with sorting large arrays/vectors of just integers and floats:

Do people in reality really have these kinds of data that need to be sorted?

Good point.

Mainly in real life apps I've been working with almost every time sorting is been for strings. Then some quite small portion been list of objects or records.

-Tee-

mikerabat · November 24, 2023

I'm not sure about exactly this library but some predecessors indeed were able to sort pointer list whereas the "key" is a float/int 32/64 . The key can be specified as an "offset" to the

starting address -> this allows to sort pointers to records/classes....

at least that is the way they do it in "Fast quicksort Implementation using AVX Instructions"

The definitions in the library for AVX512 are actually using arrays of templates so I guess that means that you can sort records right not only Double/float... ?

On 11/3/2023 at 1:23 PM, DJSox said:

In statistics and/or data analysis to calculate the median you have to sort the data, at least I always have. When plotting data, say in an xy plot, sometimes it's better to sort the data by x first. So, in technical fields it's not that uncommon.

This problem is solved by an algorithm that allows to extract the kth largest element (not only the median) -> basically it does the partitioning of quicksort but does not sort hte "unnecessary" half. That in fact allows to pick the median in O(n) (roughly) time....

And if you want to go fancy you can do a rolling median over an array of doubles. The algorithm can be found here:
https://github.com/mikerabat/mrmath/blob/master/RollingMedMean.pas

Edited November 24, 2023 by mikerabat

Sign In

Intel Simd-sort library

Recommended Posts

Tommi Prami 158

Share this post

Link to post

DelphiUdIT 260

Share this post

Link to post

Stefan Glienke 2150

Share this post

Link to post

DelphiUdIT 260

Share this post

Link to post

MichaelT 10

Share this post

Link to post

Dave Novo 57

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

Stefan Glienke 2150

Share this post

Link to post

DJSox 1

Share this post

Link to post

David Heffernan 2463

Share this post

Link to post

Dave Novo 57

Share this post

Link to post

Tommi Prami 158

Share this post

Link to post

mikerabat 27

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity