Fast way to find points near a given point?

dummzeuch · December 14, 2018

I have a list of several thousand points with given X and Y coordinates (cartesian corrdinates in metres, stored as double) and of course additional properties.

The list originally is sorted by one of the additional properties.

I need to go through this list and for each point find others that are near this point, e.g. have a distance of less than 1 metre.

This sounds like it should be a common problem and there should be an existing solution, but my Google skills have failed me.

My approach would be:

1. Create two sorted lists, ListX sorted by X, ListY sorted by Y.

2. For each point find the points in ListX and ListY that are near the point's coordinates (example: Less than 1 metre in that direction.)

3. Take those points that are in both results and check their actual distance to my given point.

Edited December 14, 2018 by dummzeuch

Alexander Elagin · December 14, 2018

This is obviously a range searching task (https://en.wikipedia.org/wiki/Range_searching) . I'd probably start with something like range tree (https://stackoverflow.com/questions/31121628/finding-all-points-in-certain-radius-of-another-point). Probably a google search like this https://www.google.com/search?q=find+points+in+radius&ie=utf-8&oe=utf-8 will give more hints.

Fritzew · December 14, 2018

If it is the distance you are looking for, I don't think presorting are really faster as calculate the
distance direct.
from my experience presorting is in the most cases slower.
But it depends. If the Data is more or less static then maybe presorting will win.

in other cases

Use Hypot to calculate the distance.

for point in Pointlist do
  begin
    dist := Hypot(checkx-point.x, checky-point.y);
    if dist <  checkdist then .....
  end;

Hypot is very fast
You can also use a parallel Loop in this case

Alexander Elagin · December 14, 2018

1 minute ago, Fritzew said:

If it is the distance you are looking for, I don't think presorting are really faster as calculate the distance direct.

Agreed. A simple loop might be faster than building complex index structures.

David Heffernan · December 14, 2018

Very bad idea to use the distance since that involves sqrt. Do the searching with sqr distance and only call sqrt when you have the closest point.

Kryvich · December 14, 2018

You're right. We can compare the squares of distances:

  checkDistance2 := Sqr(checkDistance);
  ...
  dist2 := Sqr(X1-X2) + Sqr(Y1-Y2);
  if dist2 < checkDistance2 then .....

dummzeuch · December 14, 2018

I did a test with about 2000 points. The result is nearly instant just using a linear search approach (and even using an existing DistanceTo function that includes a sqrt call).

Of course that will change once I get into the 100s of thousands (It's O(n^2 after all) but for now I'll stick to the naive approach.

Steve Maughan · December 15, 2018

I do this all the time in AlignMix, our mapping package. Here's our super-fast algorithm:

1. Have a sorted list along one dimension (I normally do longitude)

2. Do a binary search to find the point with the closest longitude (let's call this P:0 ) and record the distance between this point and the search point

3. Start a loop: look at the next sequential point from P:0 in the sorted list; record the distance and update if smaller than previous; exit the loop if the difference in the longitude is now greater than the shortest distance found, otherwise move on to next point

4. Start a loop: look at the previous sequential point from P:0 in the sorted list: record the distance and update if smaller than previous; exit the loop if the difference in the longitude is now greater than the shortest distance found, otherwise move on to next point

In practice I combine steps 3 & 4 so it alternates searching for the next above and the next below P:0.

Hope that helps. In my tests this is extremely fast and scales the same as a binary search (log(n))

Steve

Steve Maughan · December 15, 2018

On 12/14/2018 at 7:24 AM, Fritzew said:
If it is the distance you are looking for, I don't think presorting are really faster as calculate the
distance direct.
from my experience presorting is in the most cases slower.
But it depends. If the Data is more or less static then maybe presorting will win.

in other cases

Use Hypot to calculate the distance.
for point in Pointlist do
  begin
    dist := Hypot(checkx-point.x, checky-point.y);
    if dist <  checkdist then .....
  end;
Hypot is very fast
You can also use a parallel Loop in this case

I didn't know about "Hypot": thanks for sharing!!

Steve

December 16, 2018

Should be faster with

procedure FindPoints(const APoint: TPoint; ARadius: Double; const ACollection, AOutput: TList<TPoint>);
var
  xMin, xMax, yMin, yMax: Double;
  I: Integer;
begin
  if ARadius < 0 then
    raise EArgumentOutOfRangeException.Create('ARadius');

  if not Assigned(ACollection) then
    raise EArgumentNilException.Create('ACollection');

  if not Assigned(AOutput) then
    raise EArgumentNilException.Create('AOutput');

  xMin := APoint.X - ARadius;
  xMax := APoint.X + ARadius;
  yMin := APoint.Y - ARadius;
  yMax := APoint.Y + ARadius;

  for I := 0 to ACollection.Count - 1 do
  begin
    if not((ACollection[I].X < xMin) or (ACollection[I].X > xMax) or (ACollection[I].Y < yMin) or (ACollection[I].Y > yMax)) and
      (Hypot(ACollection[I].X - APoint.X, ACollection[I].Y - APoint.Y) <= ARadius) then
      AOutput.Add(ACollection[I]);
  end;
end;

Arnaud Bouchez · December 17, 2018

On 12/14/2018 at 3:11 PM, Kryvich said:
You're right. We can compare the squares of distances:
  checkDistance2 := Sqr(checkDistance);
  ...
  dist2 := Sqr(X1-X2) + Sqr(Y1-Y2);
  if dist2 < checkDistance2 then .....

Square of distances is indeed the way I would have done it: faster (no branch, no call), and no complex math.

Boris Novgorodov · December 17, 2018

Quick demonstration how space partition methods could help to get significant gain. Code uses simple 64x64 grid with cells 32x32. Each cell contains indexes of points belonging to this cell. At first brute-force method counts (doubled) number of point pairs with distance<= 4. The second code piece checks only cells neighbouring to the cell containing examined point. For 64K points gain is about 200 times. Complexity is O(N^2) vs O(N^3/2) (for proper cell size and uniform point distribution). More advanced structures like kd-tree might provide better performance for complex cases too (worse distribution etc).

var
  Wdt, Hgt, NPts: Integer;
  i, j, k, r, m: Integer;
  t: DWord;
  sqd, maxsqd, cnt, gx, gy: Integer;
  P: TArray<TPoint>;
  Grid: array[0..63, 0..63] of TArray<Integer>;
begin
  Wdt := 2048;
  Hgt := 2048;
  NPts := 64 * 1024;
  RandSeed := 42;

  SetLength(P, NPts);
  for i := 0 to NPts - 1 do begin
    P[i] := Point(Random(Wdt), Random(Hgt));
    Grid[P[i].Y shr 5, P[i].X shr 5] := Grid[P[i].Y shr 5, P[i].X shr 5] + [i];
  end;

  maxsqd := 16;
  cnt := 0;
  t := GetTickCount;
  for j := 0 to NPts - 1 do
     for i := 0 to NPts - 1 do begin
       sqd := Sqr(P[i].X - P[j].X) + Sqr(P[i].Y - P[j].Y);
       if sqd <= maxsqd then
         Inc(cnt);
     end;
  Memo1.Lines.Add(Format('%d %d', [cnt, GetTickCount - t]));

  cnt := 0;
  t := GetTickCount;
  for i := 0 to NPts - 1 do begin
    gx := P[i].X shr 5;
    gy := P[i].Y shr 5;
    for j := Max(0, gy - 1) to Min(gy + 1, 63) do
       for k := Max(0, gx - 1) to Min(gx + 1, 63) do
          for m := 0 to High(Grid[j, k]) do begin
             r := Grid[j, k, m];
             sqd := Sqr(P[i].X - P[r].X) + Sqr(P[i].Y - P[r].Y);
             if sqd <= maxsqd then
                Inc(cnt);
          end;
  end;
  Memo1.Lines.Add(Format('%d %d', [cnt, GetTickCount - t]));

count time (milliseconds)

115406 11466
115406 62

Edited December 17, 2018 by Boris Novgorodov

Rudy Velthuis · February 14, 2019

On 12/14/2018 at 1:24 PM, Fritzew said:
If it is the distance you are looking for, I don't think presorting are really faster as calculate the
distance direct.
from my experience presorting is in the most cases slower.
But it depends. If the Data is more or less static then maybe presorting will win.

in other cases

Use Hypot to calculate the distance.
for point in Pointlist do
  begin
    dist := Hypot(checkx-point.x, checky-point.y);
    if dist <  checkdist then .....
  end;
Hypot is very fast
You can also use a parallel Loop in this case

Hypot takes the square root of sum of the squares, I guess (the length of the hypothenuse). But that is not necessary. It is enough to use Sqr(checkx - point.x) + Sqr(checky - point.y); Note that I use Sqr(), not Sqrt().

That will compare the squares of the distances, but that's fine too, as long as you know the square of the maximum distance. It is quite likely a lot faster, especially since you now only deal with integers again and don't need to take the square root.

Edited February 14, 2019 by Rudy Velthuis
Added some

David Schwartz · February 14, 2019

There used to be a library named Hipparchus that used Voronoi maps to accomplish this sort of thing. Microsoft bought Hipparchus, converted it to C# (from Fortran and C) and embedded it into SQL Server as it's GIS engine.

These guys published an article in Dr. Dobb's back in 1992 or so that explained it very well. They also published c code and some tools in conjunction with that article. But they also sold the Hipparchus library commercially. It was a lot more involved than what the Dr. Dobb's article did.

Since MS bought them out, they started an open-source effort to do similar stuff, but I haven't seen any updates in quite a while.

I'd love to get hold of a copy of their library with the source code, if anybody might know of a copy sitting on a shelf somewhere. Unfortunately, once they sold to MS, the company had to pull their stuff off the market. I learned about the sale about a month too late. 😞

Essentially, there's one app that preprocesses a sampling of the data and creates a voronoi map. Then all of the points are translated from lat/lon to vectors based on the centroid of a voronoi cell where each point is a tuple <cos(p), offset>.

As it happens, all of the math can then be done with simple 16-bit signed integer calculations, which makes looking for common types of things, like nearest-neighbor clusters in a set of points, extremely quick.

But, if you have access to MS SQL Server since 2010 or so, I believe this logic is included in their GIS package.

Edited February 15, 2019 by David Schwartz

Sign In

Fast way to find points near a given point?

Recommended Posts

dummzeuch 1505

Share this post

Link to post

Alexander Elagin 143

Share this post

Link to post

Fritzew 51

Share this post

Link to post

Alexander Elagin 143

Share this post

Link to post

David Heffernan 2345

Share this post

Link to post

Kryvich 165

Share this post

Link to post

dummzeuch 1505

Share this post

Link to post

Steve Maughan 26

Share this post

Link to post

Steve Maughan 26

Share this post

Link to post

Guest

Share this post

Link to post

Arnaud Bouchez 407

Share this post

Link to post

Boris Novgorodov 10

Share this post

Link to post

Rudy Velthuis 91

Share this post

Link to post

David Schwartz 426

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity