dummzeuch 1505 Posted December 14, 2018 (edited) I have a list of several thousand points with given X and Y coordinates (cartesian corrdinates in metres, stored as double) and of course additional properties. The list originally is sorted by one of the additional properties. I need to go through this list and for each point find others that are near this point, e.g. have a distance of less than 1 metre. This sounds like it should be a common problem and there should be an existing solution, but my Google skills have failed me. My approach would be: 1. Create two sorted lists, ListX sorted by X, ListY sorted by Y. 2. For each point find the points in ListX and ListY that are near the point's coordinates (example: Less than 1 metre in that direction.) 3. Take those points that are in both results and check their actual distance to my given point. Edited December 14, 2018 by dummzeuch Share this post Link to post
Alexander Elagin 143 Posted December 14, 2018 This is obviously a range searching task (https://en.wikipedia.org/wiki/Range_searching) . I'd probably start with something like range tree (https://stackoverflow.com/questions/31121628/finding-all-points-in-certain-radius-of-another-point). Probably a google search like this https://www.google.com/search?q=find+points+in+radius&ie=utf-8&oe=utf-8 will give more hints. 1 Share this post Link to post
Fritzew 51 Posted December 14, 2018 If it is the distance you are looking for, I don't think presorting are really faster as calculate the distance direct. from my experience presorting is in the most cases slower. But it depends. If the Data is more or less static then maybe presorting will win. in other cases Use Hypot to calculate the distance. for point in Pointlist do begin dist := Hypot(checkx-point.x, checky-point.y); if dist < checkdist then ..... end; Hypot is very fast You can also use a parallel Loop in this case 2 1 Share this post Link to post
Alexander Elagin 143 Posted December 14, 2018 1 minute ago, Fritzew said: If it is the distance you are looking for, I don't think presorting are really faster as calculate the distance direct. Agreed. A simple loop might be faster than building complex index structures. Share this post Link to post
David Heffernan 2345 Posted December 14, 2018 Very bad idea to use the distance since that involves sqrt. Do the searching with sqr distance and only call sqrt when you have the closest point. 3 Share this post Link to post
Kryvich 165 Posted December 14, 2018 You're right. We can compare the squares of distances: checkDistance2 := Sqr(checkDistance); ... dist2 := Sqr(X1-X2) + Sqr(Y1-Y2); if dist2 < checkDistance2 then ..... Share this post Link to post
dummzeuch 1505 Posted December 14, 2018 I did a test with about 2000 points. The result is nearly instant just using a linear search approach (and even using an existing DistanceTo function that includes a sqrt call). Of course that will change once I get into the 100s of thousands (It's O(n^2 after all) but for now I'll stick to the naive approach. Share this post Link to post
Steve Maughan 26 Posted December 15, 2018 I do this all the time in AlignMix, our mapping package. Here's our super-fast algorithm: 1. Have a sorted list along one dimension (I normally do longitude) 2. Do a binary search to find the point with the closest longitude (let's call this P:0 ) and record the distance between this point and the search point 3. Start a loop: look at the next sequential point from P:0 in the sorted list; record the distance and update if smaller than previous; exit the loop if the difference in the longitude is now greater than the shortest distance found, otherwise move on to next point 4. Start a loop: look at the previous sequential point from P:0 in the sorted list: record the distance and update if smaller than previous; exit the loop if the difference in the longitude is now greater than the shortest distance found, otherwise move on to next point In practice I combine steps 3 & 4 so it alternates searching for the next above and the next below P:0. Hope that helps. In my tests this is extremely fast and scales the same as a binary search (log(n)) Steve 1 2 Share this post Link to post
Steve Maughan 26 Posted December 15, 2018 On 12/14/2018 at 7:24 AM, Fritzew said: If it is the distance you are looking for, I don't think presorting are really faster as calculate the distance direct. from my experience presorting is in the most cases slower. But it depends. If the Data is more or less static then maybe presorting will win. in other cases Use Hypot to calculate the distance. for point in Pointlist do begin dist := Hypot(checkx-point.x, checky-point.y); if dist < checkdist then ..... end; Hypot is very fast You can also use a parallel Loop in this case I didn't know about "Hypot": thanks for sharing!! Steve Share this post Link to post
Guest Posted December 16, 2018 Should be faster with procedure FindPoints(const APoint: TPoint; ARadius: Double; const ACollection, AOutput: TList<TPoint>); var xMin, xMax, yMin, yMax: Double; I: Integer; begin if ARadius < 0 then raise EArgumentOutOfRangeException.Create('ARadius'); if not Assigned(ACollection) then raise EArgumentNilException.Create('ACollection'); if not Assigned(AOutput) then raise EArgumentNilException.Create('AOutput'); xMin := APoint.X - ARadius; xMax := APoint.X + ARadius; yMin := APoint.Y - ARadius; yMax := APoint.Y + ARadius; for I := 0 to ACollection.Count - 1 do begin if not((ACollection[I].X < xMin) or (ACollection[I].X > xMax) or (ACollection[I].Y < yMin) or (ACollection[I].Y > yMax)) and (Hypot(ACollection[I].X - APoint.X, ACollection[I].Y - APoint.Y) <= ARadius) then AOutput.Add(ACollection[I]); end; end; Share this post Link to post
Arnaud Bouchez 407 Posted December 17, 2018 On 12/14/2018 at 3:11 PM, Kryvich said: You're right. We can compare the squares of distances: checkDistance2 := Sqr(checkDistance); ... dist2 := Sqr(X1-X2) + Sqr(Y1-Y2); if dist2 < checkDistance2 then ..... Square of distances is indeed the way I would have done it: faster (no branch, no call), and no complex math. Share this post Link to post
Boris Novgorodov 10 Posted December 17, 2018 (edited) Quick demonstration how space partition methods could help to get significant gain. Code uses simple 64x64 grid with cells 32x32. Each cell contains indexes of points belonging to this cell. At first brute-force method counts (doubled) number of point pairs with distance<= 4. The second code piece checks only cells neighbouring to the cell containing examined point. For 64K points gain is about 200 times. Complexity is O(N^2) vs O(N^3/2) (for proper cell size and uniform point distribution). More advanced structures like kd-tree might provide better performance for complex cases too (worse distribution etc). var Wdt, Hgt, NPts: Integer; i, j, k, r, m: Integer; t: DWord; sqd, maxsqd, cnt, gx, gy: Integer; P: TArray<TPoint>; Grid: array[0..63, 0..63] of TArray<Integer>; begin Wdt := 2048; Hgt := 2048; NPts := 64 * 1024; RandSeed := 42; SetLength(P, NPts); for i := 0 to NPts - 1 do begin P[i] := Point(Random(Wdt), Random(Hgt)); Grid[P[i].Y shr 5, P[i].X shr 5] := Grid[P[i].Y shr 5, P[i].X shr 5] + [i]; end; maxsqd := 16; cnt := 0; t := GetTickCount; for j := 0 to NPts - 1 do for i := 0 to NPts - 1 do begin sqd := Sqr(P[i].X - P[j].X) + Sqr(P[i].Y - P[j].Y); if sqd <= maxsqd then Inc(cnt); end; Memo1.Lines.Add(Format('%d %d', [cnt, GetTickCount - t])); cnt := 0; t := GetTickCount; for i := 0 to NPts - 1 do begin gx := P[i].X shr 5; gy := P[i].Y shr 5; for j := Max(0, gy - 1) to Min(gy + 1, 63) do for k := Max(0, gx - 1) to Min(gx + 1, 63) do for m := 0 to High(Grid[j, k]) do begin r := Grid[j, k, m]; sqd := Sqr(P[i].X - P[r].X) + Sqr(P[i].Y - P[r].Y); if sqd <= maxsqd then Inc(cnt); end; end; Memo1.Lines.Add(Format('%d %d', [cnt, GetTickCount - t])); count time (milliseconds) 115406 11466 115406 62 Edited December 17, 2018 by Boris Novgorodov 4 Share this post Link to post
Rudy Velthuis 91 Posted February 14, 2019 (edited) On 12/14/2018 at 1:24 PM, Fritzew said: If it is the distance you are looking for, I don't think presorting are really faster as calculate the distance direct. from my experience presorting is in the most cases slower. But it depends. If the Data is more or less static then maybe presorting will win. in other cases Use Hypot to calculate the distance. for point in Pointlist do begin dist := Hypot(checkx-point.x, checky-point.y); if dist < checkdist then ..... end; Hypot is very fast You can also use a parallel Loop in this case Hypot takes the square root of sum of the squares, I guess (the length of the hypothenuse). But that is not necessary. It is enough to use Sqr(checkx - point.x) + Sqr(checky - point.y); Note that I use Sqr(), not Sqrt(). That will compare the squares of the distances, but that's fine too, as long as you know the square of the maximum distance. It is quite likely a lot faster, especially since you now only deal with integers again and don't need to take the square root. Edited February 14, 2019 by Rudy Velthuis Added some Share this post Link to post
David Schwartz 426 Posted February 14, 2019 (edited) There used to be a library named Hipparchus that used Voronoi maps to accomplish this sort of thing. Microsoft bought Hipparchus, converted it to C# (from Fortran and C) and embedded it into SQL Server as it's GIS engine. These guys published an article in Dr. Dobb's back in 1992 or so that explained it very well. They also published c code and some tools in conjunction with that article. But they also sold the Hipparchus library commercially. It was a lot more involved than what the Dr. Dobb's article did. Since MS bought them out, they started an open-source effort to do similar stuff, but I haven't seen any updates in quite a while. I'd love to get hold of a copy of their library with the source code, if anybody might know of a copy sitting on a shelf somewhere. Unfortunately, once they sold to MS, the company had to pull their stuff off the market. I learned about the sale about a month too late. 😞 Essentially, there's one app that preprocesses a sampling of the data and creates a voronoi map. Then all of the points are translated from lat/lon to vectors based on the centroid of a voronoi cell where each point is a tuple <cos(p), offset>. As it happens, all of the math can then be done with simple 16-bit signed integer calculations, which makes looking for common types of things, like nearest-neighbor clusters in a set of points, extremely quick. But, if you have access to MS SQL Server since 2010 or so, I believe this logic is included in their GIS package. Edited February 15, 2019 by David Schwartz Share this post Link to post