TidTCPClient fails to discover a lost connection

Lars Fosdal · November 30, 2020

We are experiencing a really weird problem.

We have a TIdTCPClient socket that connects to a server of ours - which uses the TIdTCPServer component.

The client side code and server side code is unchanged since prior to 2017 - and has been running well up until when we rolled out our first version built with Delphi 10.4.1.

We started getting complaints that the clients didn't update when expected.

The server connection list no longer shows the clients as connected - so that makes sense in a way.

What doesn't make sense is that the client does not discover the disconnect on IdTCPClient.Connected nor on IdTCPClient1.Socket.InputBufferIsEmpty.

What is even more crazy is that IdTCPClient.Socket.Write(OutBuf); also executes happily without care about there being nothing on the other end.

Edit: Actually - that Write did cause a Debug Output: TCP Client error: Encapsulation event: Unrecoverable throw caught. Trying to rectify by recreating TCP-Client..... @ TPSD.MyHandlerForReceptionOfTCPClientTlgs (logid: 82911125) - it just had to time out, and suddenly we are reconnected - but for how long.

We are not able to willfully reproduce the disconnect, by taking down the server, or breaking the VPN connection for a freshly started client.
If we do that, the client reconnects, so it seems to be related to having an idle connection for some period of time

It works in 10.3.3 - Are there changes in Indy for 10.4.1 that are breaking somehow?

Remy Lebeau · November 30, 2020

2 hours ago, Lars Fosdal said:

What doesn't make sense is that the client does not discover the disconnect on IdTCPClient.Connected nor on IdTCPClient1.Socket.InputBufferIsEmpty.

The only way to detect a closed connection is to perform I/O on the connection and see if it fails. The Connected() method performs a read operation internally. That will detect a *graceful* disconnect fairly quickly. But if the connection is *lost abnormally* (network issues, etc), then by default it may take the OS a long while (minutes, hours) to invalidate the connection, during which time I/O operations will not fail (reads will simply report no data is available to read, and sends will be buffered until the buffer fills up). So, the best way to handle this is to employ timeouts/keepalives in your own protocol code and just close the connection yourself if you detect it is not being responsive enough for your needs. Or, at the very least, enable keepalives at the TCP layer (see the TIdTCPClient.Socket.Binding.SetKeepAliveValues() method) so the OS can invalidate a lost connection sooner rather than later.

2 hours ago, Lars Fosdal said:

What is even more crazy is that IdTCPClient.Socket.Write(OutBuf); also executes happily without care about there being nothing on the other end.

See above. In the case of an abnormal connection loss, the OS will happily carry on with I/O operations until the connection has been invalidated. In the meantime, outbound data will simply be queued in the socket's internal buffer, until the buffer eventually fills up, blocking subsequent writes.

2 hours ago, Lars Fosdal said:

Edit: Actually - that Write did cause a Debug Output: TCP Client error: Encapsulation event: Unrecoverable throw caught. Trying to rectify by recreating TCP-Client..... @ TPSD.MyHandlerForReceptionOfTCPClientTlgs (logid: 82911125) - it just had to time out, and suddenly we are reconnected - but for how long.

The default time is up to the OS to decide. Which is why you should rely on using your own timeouts/keepalives instead.

2 hours ago, Lars Fosdal said:

We are not able to willfully reproduce the disconnect, by taking down the server, or breaking the VPN connection for a freshly started client.

That is because modern networking software is usually smart enough to invalidate existing connections fairly quickly when users do things like that manually. That wasn't always the case in the past.

2 hours ago, Lars Fosdal said:

If we do that, the client reconnects, so it seems to be related to having an idle connection for some period of time

Neither Indy, nor the OS, care about connections being idle, as long as they are truly alive. But, your network might care about idleness, depending on its setup.

2 hours ago, Lars Fosdal said:

It works in 10.3.3 - Are there changes in Indy for 10.4.1 that are breaking somehow?

No.

Lars Fosdal · November 30, 2020

Thank you, @Remy Lebeau! I also asked our networking team about any recent changes to the network firewalls and routers.

Which is preferable - the client or the server initiating the keep-alive?

We did actually start with a server side test today, as it was designed in - but it has been configured off for years. If idle, it will send a "Hello" packet every 5 seconds.

November 30, 2020

16 minutes ago, Lars Fosdal said:

5 seconds.

That is way too much for ping, i use either 45 second or 3 minutes, and that after last received packet, most the time i build it that each and every packet has timestamp which is tickcount acquired form the server after the handshake, that way you will have the server and all the client on the same time and thus you can correct the time when you see fit.

So if you don't have timestamp per packet, then you can use it with timer as you wish, one thing though i do ping-pong-pong packet in the result the client will have measured the time and delay in outgoing to server traffic and the server will get the result also because it will send its timing too, this will give you great view of all the connections, you just have added last packet and delay (lag).

19 minutes ago, Lars Fosdal said:

Which is preferable - the client or the server initiating the keep-alive?

Client, to take the stress form the server to keep tracking these details with timers, only one background thread on server will run a check every t/3 to check for the last packet and then drop the connection, means if the client is is supposed to send ping every 45 second then the server worst case it will be detected in 60 second after last packet, also one minute is good time and almost what a router need to boot and connect.

Remy Lebeau · December 1, 2020

4 hours ago, Lars Fosdal said:

Which is preferable - the client or the server initiating the keep-alive?

That depends on which keep-alive you are referring to. If you mean the TCP level keep-alive, then it is a setting of the local TCP stack, so I would suggest enabling it on both ends, for good measure. But, if you are referring to a keep-alive in your protocol level communications, then who initiates the keep-alive depends on the particular design of that protocol.

4 hours ago, Lars Fosdal said:

We did actually start with a server side test today, as it was designed in - but it has been configured off for years. If idle, it will send a "Hello" packet every 5 seconds.

That is a protocol-level keep-alive. If your server is already doing that, then you don't need to enable a TCP level keep-alive on the server end. If it sends a "Hello" and does not get a response back within a few seconds, close the connection. On the client side, if it knows it is idle and should be expecting "Hello" packets, then you don't really need to enable a TCP keep-alive on the client end, either. Just start a timer for 5-ish seconds and if it elapses before a "Hello" packet is received then close the connection, repeating for each expected "Hello". Does the client ever send its own "Hello" packets to the server, if it doesn't see any server-sent "Hello"s for awhile?

Lars Fosdal · December 1, 2020

7 hours ago, Remy Lebeau said:

Does the client ever send its own "Hello" packets to the server, if it doesn't see any server-sent "Hello"s for awhile?

No, not currently. There has never been a need.

The test last night shows that all the test clients remained connected after enabling the 5 second Hello on the server side.

That means I can go to the networking guys to ask them to dig a little deeper, unless... there was something in a Windows patch.

We don't use timers - but waiting threads.

I'll put the TCP level keep-alive in the back of my head for future exploration.

@Kas Ob. The apps and servers run in a closed network with a limited number of clients and telegrams. A hello every 5 seconds doesn't add much load, and at this time, I think I will keep it server side as it works well enough. That said - the timeout period is a parameter, so we can increase it if we want to.

Lars Fosdal · December 1, 2020

@Remy Lebeau - My network guy came back and revealed that they had changed firewalls from one brand to another, and that the new firewalls had a much shorter keep-alive default - so that explained the why this problem suddenly revealed itself.
Thank you for your kind help!

Sign In

TidTCPClient fails to discover a lost connection

Recommended Posts

Lars Fosdal 1858

Share this post

Link to post

Remy Lebeau 1606

Share this post

Link to post

Lars Fosdal 1858

Share this post

Link to post

Guest

Share this post

Link to post

Remy Lebeau 1606

Share this post

Link to post

Lars Fosdal 1858

Share this post

Link to post

Lars Fosdal 1858

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity