Jump to content
aehimself

WinSock (Indy) select() doesn't return on network

Recommended Posts

Hello,

 

We have an application built on Delphi 10.4.1 / 10.4.2 which is communicating with a server using the Indy TidHttp component. It works perfectly, but there is one particular call when the result can arrive in 1-1,5 hours... and this is where things get strange. The request is sent with idHttp.Post (using a stream as an outgoing and an incoming data buffer) and if the reply arrives in 30 minutes, all is fine. Somewhere between 30 minutes and one hour, the underlying WinSock .select never returns. Data is sent out, received by the server, processed and the data is sent out - but never received by the client. Using WireShark it can be seen that the moment the server sends the reply, the client issues a TCP retransmission... maybe it thinks that the data was lost but was unable to use the channel while waiting for data? Then the reply arrives, channel gets free and it sends the retransmission but discards the data received? These are just guesses, I'm not very familiar with this low-level functionality of WinSock.

Oh, one more thing... this issue is NOT present if the server and the client is on the same machine; connecting to localhost makes a difference.

 

Stack trace where the application stops is as follows:

:772729dc ntdll.ZwWaitForSingleObject + 0xc
:74417555 ; C:\WINDOWS\SysWOW64\mswsock.dll
:751c5f1e WS2_32.select + 0xce
IdStackWindows.TIdSocketListWindows.FDSelect(???,???,nil,???)
IdStackWindows.TIdSocketListWindows.SelectRead(-2)
IdSocketHandle.TIdSocketHandle.Select(???)
IdSocketHandle.CheckIsReadable(???)
IdSocketHandle.TIdSocketHandle.Readable(-2)
IdIOHandlerStack.TIdIOHandlerStack.Readable(???)
IdIOHandler.TIdIOHandler.ReadFromSource(True,-2,False)
IdIOHandler.TIdIOHandler.ReadLn(#$A,-1,16384,TIdASCIIEncoding($1A138AD4) as IIdTextEncoding)
IdIOHandler.TIdIOHandler.ReadLn(nil)
IdHTTP.TIdCustomHTTP.InternalReadLn
IdHTTP.TIdCustomHTTP.DoRequest(???,'http://10.0.2.53:12345/BIN',$2E6D75A0,$2E6D73A0,(...))
IdHTTP.TIdCustomHTTP.Post('http://10.0.2.53:12345/BIN',$14E0F58,$2E6D73A0)

Before you say anything, I know this is a bad design. We shouldn't wait on long lasting operations but to poll for it on HTTP. What I'd like to know is what happens and why it happens so I can get an insight if patching the mess worth it or just jump straight to refactoring.

 

Cheers!
 

Share this post


Link to post

My guesses are:

  1.  TCP connection went into timewait on the client.  Netstat is the only tool that I am aware will show you this, so you need to wrap it in loop and capture the output.
  2. Router killed it to conserve memory.

The registry reference you are looking for is here (older version, but you can adjust):  https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc757512(v=ws.10)

 

It is a mix of timed wait and keep alive that you try to tune.  I have only had mixed success trying to keep long db connections open, the keepalives are often killed at the firewall / router level.  A lot of modern network equipment tracks those connections and kills them to save router memory.

 

I help it helps you to get some insight.

  • Like 1

Share this post


Link to post

Checking the TCP state with NetStat is a great idea, I don't know why I didn't think about this! Will do the check, thanks!

Share this post


Link to post

There are two connections between the client and the server at this state, both show up as ESTABLISHED.

Damn, I really wanted to pass this on to the NW guys 🙂

Share this post


Link to post
Posted (edited)

The client only knows / reacts to what it sends receives, so it could stay established. Sending keep alives is still a possible strategy.

You can still have a timer on a router expire and the router will not send anything to the client(s). at that point it should be a re-transmit.

 

It is still worth asking your network guys to understand if there is anything in the middle. Especially a load balancer in front of that HTTP server.

Edited by SwiftExpat

Share this post


Link to post
Posted (edited)
7 hours ago, SwiftExpat said:

My guesses are:

  1.  TCP connection went into timewait on the client.  Netstat is the only tool that I am aware will show you this, so you need to wrap it in loop and capture the output.
  2. Router killed it to conserve memory.

That would be my guess.  Most likely a network router thinks the HTTP connection has been idle for too long and is killing the connection.  This is a common problem in FTP, for instance, where a long file transfer on a data connection leaves the command connection idle.  The TIdFTP component has a NATKeepAlive property to address that issue.  You will likely have to enable the same TCP-level keepalive on your HTTP connection, too (ie, by calling the TIdHTTP.IOHandler.Binding.SetKeepAliveValues() method).

 

Otherwise, try switching to a different protocol, like WebSocket, or a custom protocol, that will let you send pings/pongs periodically to keep the connection alive over long periods of idleness.

Edited by Remy Lebeau
  • Thanks 1

Share this post


Link to post

I can confirm that wrapping the long lasting call in a Try...Finally block and enabling / disabling the KeepAlive function via SetKeepAliveValues solves the problem.

 

Thank you, Remy!

Share this post


Link to post

@Remy Lebeau Just yesterday a new issue was reported which is in direct connection with the solution in this thread.

 

The code is fairly simple:

procedure TCustomActionCallerThread.Execute;
begin
  V_CONNECTION.IndyHttpClient.Socket.Binding.SetKeepAliveValues(True, FTCPKeepAlive, FTCPKeepAlive);
  Try
    try
      V_CONNECTION.DoRequest(FActionName, FRequestXml, FResult);
    except
      on E: Exception do
        FResultException := Exception(AcquireExceptionObject);
    end;
  Finally
    V_CONNECTION.IndyHTTPClient.Socket.Binding.SetKeepAliveValues(False, 0, 0);
  End;
end;

The DoRequest (where the HTTP communication actually takes place) is quick (less than a second) but the code in the finally block (to disable the keepalive) throws an exception: Socker Error # 10038  Socket operation on non-socket.

To make things more interesting, if the application is built with Delphi 10.4.1, it completely freezes. 10.4.2 only throws the exception but the operation finishes successfully.

 

Do you have an idea why this error might appear? We use HTTP keepalive, so the socket should still exist after the DoRequest call. Is there a check which I can use as a condition to prevent this from happening?

Also, do you happen to know in any difference between the Indy versions in 10.4.1 and 10.4.2? I can not really find a logical explanation in the behavior difference.

 

Thank you!

Share this post


Link to post
2 hours ago, aehimself said:

Do you have an idea why this error might appear?

HTTP is stateless, so it is possible that the TCP connection is closed at the end of the HTTP response, in which case TIdHTTP would close the socket before DoRequest() exits.  Check if the Binding.HandleAllocated property is true before calling SetKeepAliveValues():

procedure TCustomActionCallerThread.Execute;
begin
  V_CONNECTION.IndyHttpClient.Socket.Binding.SetKeepAliveValues(True, FTCPKeepAlive, FTCPKeepAlive);
  Try
    ...
  Finally
    if V_CONNECTION.IndyHTTPClient.Socket.Binding.HandleAllocated then
      V_CONNECTION.IndyHTTPClient.Socket.Binding.SetKeepAliveValues(False, 0, 0);
  End;
end;

In fact, because HTTP is stateless, it is possible that there is no TCP connection yet when DoRequest() is called, in which case TIdHTTP would have to call Connect() internally.  So, you will have to account for that, as well:

procedure TCustomActionCallerThread.Execute;
begin
  if V_CONNECTION.IndyHTTPClient.Socket.Binding.HandleAllocated then
    V_CONNECTION.IndyHttpClient.Socket.Binding.SetKeepAliveValues(True, FTCPKeepAlive, FTCPKeepAlive);
  Try
    ...
  Finally
    if V_CONNECTION.IndyHTTPClient.Socket.Binding.HandleAllocated then
      V_CONNECTION.IndyHTTPClient.Socket.Binding.SetKeepAliveValues(False, 0, 0);
  End;
end;

In case there is no TCP connection yet before DoRequest() is called, you would have to use TIdHTTP's OnSocketAllocated, OnAfterBind, OnConnected, or OnStatus event to know when the Binding has a new socket assigned before you can then call SetKeepAliveValues() on it (there are no events to indicate when each HTTP request is started/finished).

2 hours ago, aehimself said:

We use HTTP keepalive, so the socket should still exist after the DoRequest call.

That is not a guarantee.  An HTTP keep-alive is a request from the client to the server, but the server is not obligated to honor that request.  It is the server's decision whether the TCP connection stays open or not after the response is sent.  Look at the TIdHTTP.Response.KeepAlive property after DoRequest() exits, if it is false then TIdHTTP would have closed the socket.

2 hours ago, aehimself said:

Also, do you happen to know in any difference between the Indy versions in 10.4.1 and 10.4.2?

In recent years, I don't know which Indy revision has gone into each IDE release.  When Indy was using SVN, I would tag each revision that Embarcadero released.  But after Indy switched to GitHub (consequently losing its build numbers), I haven't been tagging the releases anymore.  I do know that there were like a dozen checkins made to Indy between Delphi 10.4.0, 10.4.1, and 10.4.2.  But offhand, I don't see anything in the change history that should affect handling of the underlying socket as you describe.  However, Indy's source code is included with each IDE release, so you should be able to do a local diff between the versions.

  • Thanks 1

Share this post


Link to post

I decided to implement both. Upon enabling the TCP keepalive if a handle is allocated SetKeepAliveValues is called but there is now an OnSocketAllocated handler which checks if the keepalive was enabled and if yes, calls SetKeepAliveValues.

This way the exception disappeared; your guess was right, there was no handle allocated after the DoRequest call.

 

As I could not reproduce the freezing of the application I can not confirm whether that disappears or not... guess time will tell sooner or later.

 

Thank you!

Share this post


Link to post
Posted (edited)
8 hours ago, aehimself said:

I decided to implement both. Upon enabling the TCP keepalive if a handle is allocated SetKeepAliveValues is called but there is now an OnSocketAllocated handler which checks if the keepalive was enabled and if yes, calls SetKeepAliveValues.

Makes me wonder now if I should add a NATKeepAlive property to TIdHTTP, similar to what TIdFTP has. I've opened a ticket for that: https://github.com/IndySockets/Indy/issues/413

Edited by Remy Lebeau

Share this post


Link to post
15 hours ago, Remy Lebeau said:

Makes me wonder now if I should add a NATKeepAlive property to TIdHTTP, similar to what TIdFTP has. I've opened a ticket for that: https://github.com/IndySockets/Indy/issues/413

Honestly, I don't know if it would make any sense. It's your decision at the end. Would have made my implementation easier (and error-free the first try 🙂) but our implementation is clearly wrong here. In normal operation the HTTP protocol should work in "bursts": request something, get an answer, repeat until all done.

 

If someone else needs a TCP-level keep-alive in HTTP instead of fixing the real issue (like in my case) these 3 lines of extra code seems very well deserved. Plus the solution is now publicly accessible in this thread.

Share this post


Link to post
7 hours ago, aehimself said:

In normal operation the HTTP protocol should work in "bursts": request something, get an answer, repeat until all done.

Under most conditions, yes.  I was thinking more along the lines of when server-side pushes are used and there are delays between events, or when a client asks to use HTTP keep-alives and there are delays between requests, etc.  Things where the connection is left open but sitting idle for periods of time.

Share this post


Link to post
4 hours ago, Remy Lebeau said:

when server-side pushes

Aren't these supposed to be services (like Apple's or Google's notification service) or websockets...?

Share this post


Link to post
1 hour ago, aehimself said:

Aren't these supposed to be services (like Apple's or Google's notification service) or websockets...?

Those are certainly ways to handle server pushes, but there are also HTTP-based push models as well:

  • multipart/x-mixed-replace (the original server push, dating all the way back to Netscape, and still supported by most browsers today).
  • text/event-stream (built-in to HTML5).
  • HTTP/2 (which Indy doesn't support at this time) has server pushing built-in at the protocol layer.
  • etc...

Share this post


Link to post
8 hours ago, Fr0sT.Brutal said:

Chunked encoding also

That encoding is really meant as just a means of transporting streaming data for a single resource, rather than pushing individual pieces of data. The response would have to be using a media type where the receiver knows each chunk carries a new piece of data that replaces old data.  I'm not aware of any standard media types that use chunked encoding in that manner.  But custom media types certainly could, I suppose, as long as you are in control of both sender and receiver.  Also, the chunked encoding is hop-to-hop, not end-to-end, so it is subject to interference by proxies.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×