Jump to content
dummzeuch

Fast way to find points near a given point?

Recommended Posts

I have a list of several thousand points with given X and Y coordinates (cartesian corrdinates in metres, stored as double) and of course additional properties.

The list originally is sorted by one of the additional properties.

I need to go through this list and for each point find others that are near this point, e.g. have a distance of less than 1 metre.

This sounds like it should be a common problem and there should be an existing solution, but my Google skills have failed me.

My approach would be:

1. Create two sorted lists, ListX sorted by X, ListY sorted by Y.

2. For each point find the points in ListX and ListY that are near the point's coordinates (example: Less than 1 metre in that direction.)

3. Take those points that are in both results and check their actual distance to my given point.
 

Edited by dummzeuch

Share this post


Link to post

This is obviously a range searching task (https://en.wikipedia.org/wiki/Range_searching) . I'd probably start with something like range tree (https://stackoverflow.com/questions/31121628/finding-all-points-in-certain-radius-of-another-point). Probably a google search like this https://www.google.com/search?q=find+points+in+radius&ie=utf-8&oe=utf-8 will give more hints.

  • Thanks 1

Share this post


Link to post

If it is the distance you are looking for, I don't think presorting are really faster as calculate the
distance direct.
from my experience presorting is in the most cases slower.
But it depends. If the Data is more or less static then maybe presorting will win.


in other cases

Use Hypot to calculate the distance.

for point in Pointlist do
  begin
    dist := Hypot(checkx-point.x, checky-point.y);
    if dist <  checkdist then .....
  end;

Hypot is very fast 
You can also use a parallel Loop in this case


 

  • Like 2
  • Thanks 1

Share this post


Link to post
1 minute ago, Fritzew said:

If it is the distance you are looking for, I don't think presorting are really faster as calculate the distance direct.

Agreed. A simple loop might be faster than building complex index structures.

Share this post


Link to post

You're right. We can compare the squares of distances:

  checkDistance2 := Sqr(checkDistance);
  ...
  dist2 := Sqr(X1-X2) + Sqr(Y1-Y2);
  if dist2 < checkDistance2 then .....

 

Share this post


Link to post

I did a test with about 2000 points. The result is nearly instant just using a linear search approach (and even using an existing DistanceTo function that includes a sqrt call).

 

Of course that will change once I get into the 100s of thousands (It's O(n^2 after all) but for now I'll stick to the naive approach.

Share this post


Link to post

I do this all the time in AlignMix, our mapping package. Here's our super-fast algorithm:

 

1. Have a sorted list along one dimension (I normally do longitude)

2. Do a binary search to find the point with the closest longitude (let's call this P:0 ) and record the distance between this point and the search point

3. Start a loop: look at the next sequential point from P:0 in the sorted list; record the distance and update if smaller than previous; exit the loop if the difference in the longitude is now greater than the shortest distance found, otherwise move on to next point

4. Start a loop: look at the previous sequential point from P:0 in the sorted list: record the distance and update if smaller than previous; exit the loop if the difference in the longitude is now greater than the shortest distance found, otherwise move on to next point

 

In practice I combine steps 3 & 4 so it alternates searching for the next above and the next below P:0.

 

Hope that helps. In my tests this is extremely fast and scales the same as a binary search (log(n))

 

Steve

  • Like 1
  • Thanks 2

Share this post


Link to post
On 12/14/2018 at 7:24 AM, Fritzew said:

If it is the distance you are looking for, I don't think presorting are really faster as calculate the
distance direct.
from my experience presorting is in the most cases slower.
But it depends. If the Data is more or less static then maybe presorting will win.


in other cases

Use Hypot to calculate the distance.


for point in Pointlist do
  begin
    dist := Hypot(checkx-point.x, checky-point.y);
    if dist <  checkdist then .....
  end;

Hypot is very fast 
You can also use a parallel Loop in this case


 

I didn't know about "Hypot": thanks for sharing!!

 

Steve

Share this post


Link to post

Should be faster with

procedure FindPoints(const APoint: TPoint; ARadius: Double; const ACollection, AOutput: TList<TPoint>);
var
  xMin, xMax, yMin, yMax: Double;
  I: Integer;
begin
  if ARadius < 0 then
    raise EArgumentOutOfRangeException.Create('ARadius');

  if not Assigned(ACollection) then
    raise EArgumentNilException.Create('ACollection');

  if not Assigned(AOutput) then
    raise EArgumentNilException.Create('AOutput');

  xMin := APoint.X - ARadius;
  xMax := APoint.X + ARadius;
  yMin := APoint.Y - ARadius;
  yMax := APoint.Y + ARadius;

  for I := 0 to ACollection.Count - 1 do
  begin
    if not((ACollection[I].X < xMin) or (ACollection[I].X > xMax) or (ACollection[I].Y < yMin) or (ACollection[I].Y > yMax)) and
      (Hypot(ACollection[I].X - APoint.X, ACollection[I].Y - APoint.Y) <= ARadius) then
      AOutput.Add(ACollection[I]);
  end;
end;

 

Share this post


Link to post
On 12/14/2018 at 3:11 PM, Kryvich said:

You're right. We can compare the squares of distances:


  checkDistance2 := Sqr(checkDistance);
  ...
  dist2 := Sqr(X1-X2) + Sqr(Y1-Y2);
  if dist2 < checkDistance2 then .....

 

Square of distances is indeed the way I would have done it: faster (no branch, no call), and no complex math.

Share this post


Link to post

Quick demonstration how space partition methods could help to get significant gain. Code uses simple 64x64 grid with cells 32x32. Each cell contains indexes of points belonging to this cell. At first  brute-force method counts (doubled) number of point pairs with distance<= 4. The second code piece checks only cells neighbouring to the cell containing examined point. For 64K points gain is about 200  times. Complexity is O(N^2) vs  O(N^3/2) (for proper cell size and uniform point distribution).  More advanced structures like kd-tree might provide better performance for complex cases too (worse distribution etc).

 

var
  Wdt, Hgt, NPts: Integer;
  i, j, k, r, m: Integer;
  t: DWord;
  sqd, maxsqd, cnt, gx, gy: Integer;
  P: TArray<TPoint>;
  Grid: array[0..63, 0..63] of TArray<Integer>;
begin
  Wdt := 2048;
  Hgt := 2048;
  NPts := 64 * 1024;
  RandSeed := 42;

  SetLength(P, NPts);
  for i := 0 to NPts - 1 do begin
    P[i] := Point(Random(Wdt), Random(Hgt));
    Grid[P[i].Y shr 5, P[i].X shr 5] := Grid[P[i].Y shr 5, P[i].X shr 5] + [i];
  end;

  maxsqd := 16;
  cnt := 0;
  t := GetTickCount;
  for j := 0 to NPts - 1 do
     for i := 0 to NPts - 1 do begin
       sqd := Sqr(P[i].X - P[j].X) + Sqr(P[i].Y - P[j].Y);
       if sqd <= maxsqd then
         Inc(cnt);
     end;
  Memo1.Lines.Add(Format('%d %d', [cnt, GetTickCount - t]));

  cnt := 0;
  t := GetTickCount;
  for i := 0 to NPts - 1 do begin
    gx := P[i].X shr 5;
    gy := P[i].Y shr 5;
    for j := Max(0, gy - 1) to Min(gy + 1, 63) do
       for k := Max(0, gx - 1) to Min(gx + 1, 63) do
          for m := 0 to High(Grid[j, k]) do begin
             r := Grid[j, k, m];
             sqd := Sqr(P[i].X - P[r].X) + Sqr(P[i].Y - P[r].Y);
             if sqd <= maxsqd then
                Inc(cnt);
          end;
  end;
  Memo1.Lines.Add(Format('%d %d', [cnt, GetTickCount - t]));

 

count      time (milliseconds)

115406   11466
115406    62

Edited by Boris Novgorodov
  • Thanks 5

Share this post


Link to post
On 12/14/2018 at 1:24 PM, Fritzew said:

If it is the distance you are looking for, I don't think presorting are really faster as calculate the
distance direct.
from my experience presorting is in the most cases slower.
But it depends. If the Data is more or less static then maybe presorting will win.


in other cases

Use Hypot to calculate the distance.


for point in Pointlist do
  begin
    dist := Hypot(checkx-point.x, checky-point.y);
    if dist <  checkdist then .....
  end;

Hypot is very fast 
You can also use a parallel Loop in this case


 

Hypot takes the square root of sum of the squares, I guess (the length of the hypothenuse). But that is not necessary. It is enough to use Sqr(checkx - point.x) + Sqr(checky - point.y); Note that I use Sqr(), not Sqrt().

 

That will compare the squares of the distances, but that's fine too, as long as you know the square of the maximum distance. It is quite likely a lot faster, especially since you now only deal with integers again and don't need to take the square root.

Edited by Rudy Velthuis
Added some

Share this post


Link to post

There used to be a library named Hipparchus that used Voronoi maps to accomplish this sort of thing. Microsoft bought Hipparchus, converted it to C# (from Fortran and C) and embedded it into SQL Server as it's GIS engine.

 

These guys published an article in Dr. Dobb's back in 1992 or so that explained it very well. They also published c code and some tools in conjunction with that article.  But they also sold the Hipparchus library commercially. It was a lot more involved than what the Dr. Dobb's article did. 

 

Since MS bought them out, they started an open-source effort to do similar stuff, but I haven't seen any updates in quite a while.

 

I'd love to get hold of a copy of their library with the source code, if anybody might know of a copy sitting on a shelf somewhere. Unfortunately, once they sold to MS, the company had to pull their stuff off the market. I learned about the sale about a month too late. 😞

 

Essentially, there's one app that preprocesses a sampling of the data and creates a voronoi map. Then all of the points are translated from lat/lon to vectors based on the centroid of a voronoi cell where each point is a tuple <cos(p), offset>. 

 

As it happens, all of the math can then be done with simple 16-bit signed integer calculations, which makes looking for common types of things, like nearest-neighbor clusters in a set of points, extremely quick.

 

But, if you have access to MS SQL Server since 2010 or so, I believe this logic is included in their GIS package.

Edited by David Schwartz

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×