Jump to content
aehimself

WinSock (Indy) select() doesn't return on network

Recommended Posts

Hello,

 

We have an application built on Delphi 10.4.1 / 10.4.2 which is communicating with a server using the Indy TidHttp component. It works perfectly, but there is one particular call when the result can arrive in 1-1,5 hours... and this is where things get strange. The request is sent with idHttp.Post (using a stream as an outgoing and an incoming data buffer) and if the reply arrives in 30 minutes, all is fine. Somewhere between 30 minutes and one hour, the underlying WinSock .select never returns. Data is sent out, received by the server, processed and the data is sent out - but never received by the client. Using WireShark it can be seen that the moment the server sends the reply, the client issues a TCP retransmission... maybe it thinks that the data was lost but was unable to use the channel while waiting for data? Then the reply arrives, channel gets free and it sends the retransmission but discards the data received? These are just guesses, I'm not very familiar with this low-level functionality of WinSock.

Oh, one more thing... this issue is NOT present if the server and the client is on the same machine; connecting to localhost makes a difference.

 

Stack trace where the application stops is as follows:

:772729dc ntdll.ZwWaitForSingleObject + 0xc
:74417555 ; C:\WINDOWS\SysWOW64\mswsock.dll
:751c5f1e WS2_32.select + 0xce
IdStackWindows.TIdSocketListWindows.FDSelect(???,???,nil,???)
IdStackWindows.TIdSocketListWindows.SelectRead(-2)
IdSocketHandle.TIdSocketHandle.Select(???)
IdSocketHandle.CheckIsReadable(???)
IdSocketHandle.TIdSocketHandle.Readable(-2)
IdIOHandlerStack.TIdIOHandlerStack.Readable(???)
IdIOHandler.TIdIOHandler.ReadFromSource(True,-2,False)
IdIOHandler.TIdIOHandler.ReadLn(#$A,-1,16384,TIdASCIIEncoding($1A138AD4) as IIdTextEncoding)
IdIOHandler.TIdIOHandler.ReadLn(nil)
IdHTTP.TIdCustomHTTP.InternalReadLn
IdHTTP.TIdCustomHTTP.DoRequest(???,'http://10.0.2.53:12345/BIN',$2E6D75A0,$2E6D73A0,(...))
IdHTTP.TIdCustomHTTP.Post('http://10.0.2.53:12345/BIN',$14E0F58,$2E6D73A0)

Before you say anything, I know this is a bad design. We shouldn't wait on long lasting operations but to poll for it on HTTP. What I'd like to know is what happens and why it happens so I can get an insight if patching the mess worth it or just jump straight to refactoring.

 

Cheers!
 

Share this post


Link to post

My guesses are:

  1.  TCP connection went into timewait on the client.  Netstat is the only tool that I am aware will show you this, so you need to wrap it in loop and capture the output.
  2. Router killed it to conserve memory.

The registry reference you are looking for is here (older version, but you can adjust):  https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc757512(v=ws.10)

 

It is a mix of timed wait and keep alive that you try to tune.  I have only had mixed success trying to keep long db connections open, the keepalives are often killed at the firewall / router level.  A lot of modern network equipment tracks those connections and kills them to save router memory.

 

I help it helps you to get some insight.

  • Like 1

Share this post


Link to post

Checking the TCP state with NetStat is a great idea, I don't know why I didn't think about this! Will do the check, thanks!

Share this post


Link to post

There are two connections between the client and the server at this state, both show up as ESTABLISHED.

Damn, I really wanted to pass this on to the NW guys 🙂

Share this post


Link to post
Posted (edited)

The client only knows / reacts to what it sends receives, so it could stay established. Sending keep alives is still a possible strategy.

You can still have a timer on a router expire and the router will not send anything to the client(s). at that point it should be a re-transmit.

 

It is still worth asking your network guys to understand if there is anything in the middle. Especially a load balancer in front of that HTTP server.

Edited by SwiftExpat

Share this post


Link to post
Posted (edited)
7 hours ago, SwiftExpat said:

My guesses are:

  1.  TCP connection went into timewait on the client.  Netstat is the only tool that I am aware will show you this, so you need to wrap it in loop and capture the output.
  2. Router killed it to conserve memory.

That would be my guess.  Most likely a network router thinks the HTTP connection has been idle for too long and is killing the connection.  This is a common problem in FTP, for instance, where a long file transfer on a data connection leaves the command connection idle.  The TIdFTP component has a NATKeepAlive property to address that issue.  You will likely have to enable the same TCP-level keepalive on your HTTP connection, too (ie, by calling the TIdHTTP.IOHandler.Binding.SetKeepAliveValues() method).

 

Otherwise, try switching to a different protocol, like WebSocket, or a custom protocol, that will let you send pings/pongs periodically to keep the connection alive over long periods of idleness.

Edited by Remy Lebeau
  • Thanks 1

Share this post


Link to post

I can confirm that wrapping the long lasting call in a Try...Finally block and enabling / disabling the KeepAlive function via SetKeepAliveValues solves the problem.

 

Thank you, Remy!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×