Hello,
Most probably this is NOT going to be an issue with ICS as I experienced the very same symptom with TServerSocket before I made the switch. I'm mainly looking for tips on where can I start debugging the issue as for the time being I'm completely out of ideas.
I have an application which is connecting to a server on a single, TCP socket. On average, 80 bytes (binary) are sent from the clients to the server each minute, in one packet. The TCP channel is unidirectional, messages are only going from the client to the server. Everything is working perfectly, until a seemingly random time; when for a seemingly random client the data is not received anymore. The TCP connection is still established, the client is still sending the packet and WireShark confirms that it arrives to the server machine. It seems that the socket's receive event stops firing off. What is even more interesting, that it affects random clients (with different OSes, sometimes Windows 2000, sometimes 2012 R2, sometimes 2019), only causes one client to get stuck at a time, but multiple clients can get stuck during the process. The application can remain in this state for days without memory increase (so I'm not inflating the local buffer endlessly, without triggering the data processing), memory or handle leaks. If I restart the client or the server, forcing the client to reconnect, everything jumps back to normal.
As for a little background, the very same logic was working perfectly, when the binary data was converted to, and sent as text. By switching to binary the sent data size was reduced from 200-500 bytes to 60-100. I don't know why but I suspect this change triggered the error I'm seeing now; and maybe because of the data size.
https://docs.microsoft.com/en-us/troubleshoot/windows/win32/data-segment-tcp-winsock mentions that TCP is not really efficient with unidirectional, small data packets but it only will result delivery delay. For me it seems to be irrelevant.
Sending code looks something like this (TBufferLength = Word):
Function TCommunicationEngine.Send(Const inText: String): Boolean;
Var
buf, len: TBytes;
sent: Integer;
Begin
Result := False;
Try
// Step 1 - String to TBytes
buf := TEncoding.UTF8.GetBytes(inText);
// Step 2 - Encryption of "buf"
// ...
// If the buffer exceeds the maximum length allowed, raise an error as it can not be sent!
If Length(buf) > TBufferLength.MaxValue Then Raise ETCPPortError.Create('Buffer overflow, cannot send ' + Length(buf).ToString + ' bytes!');
// Step 3 - Append the length of the buffer to the beginning of the buffer
SetLength(len, SizeOf(TBufferLength));
PBufferLength(@len[0])^ := Length(buf);
SetLength(buf, Length(buf) + Length(len));
Move(buf[0], buf[Length(len)], Length(buf) - Length(len));
Move(len[0], buf[0], Length(len));
// Step 4 - Send the completed buffer
sent := _tcpport.Send(@buf[0], Length(buf));
// Step 5 - Post-sending verifications
If sent < 1 Then Raise ETCPPortError.Create('No data was sent!');
Log(LOG_TCP, '> ' + BytesToString(buf) + ' (' + sent.ToString + ' bytes)');
Result := True;
Except
On E:Exception Do HandleException(E, 'while sending data');
End;
End;
Receiving block looks like this (TClientConnection is a descendant of TWSocketClient, _data is a strict private TBytes, _count is a strict private Integer):
Procedure TClientConnection.ConnectionDataAvailable(inSender: TObject; inError: Word);
Var
buf: TBytes;
need, len, read: Integer;
debuglog: String;
Begin
// Note that due to how TCP works, if packets are arriving at high speed they might be appended to one single ReceiveText event.
If BanList.IsBanned(Self.PeerAddr) Then Self.Close // If the IP where the data is coming from is banned, disconnect
Else Begin
len := Self.RcvdCount;
If len = 0 Then Exit;
Repeat
debuglog := Self.PeerAddr + ' > Read cycle starts. received data size: ' + len.ToString + ', socket data size: ' + Length(_data).ToString + ', position: ' + _pos.ToString + '. ';
If _pos = 0 Then Begin
// Position is 0 = there is no fragment. Read the data size first
If len < SizeOf(Word) Then Begin
BanList.Failed(Self.PeerAddr, 'Packet size is incorrect');
Self.Close; // Packet is corrupted, reset the connection
Exit;
End;
SetLength(buf, SizeOf(TBufferLength));
Self.Receive(@buf[0], Length(buf));
// buf now contains the data size. Resize socket's data length
SetLength(_data, PBufferLength(@buf[0])^);
// As the data size is read out, reduce the received length
len := len - Length(buf);
debuglog := debuglog + 'Prepared a ' + Length(_data).ToString + ' byte buffer. ';
End;
need := Length(_data) - _pos;
If need < 0 Then Begin
// this should never happen. I'll just keep it here for debugging purposes...
Log(LOG_STD, 'Possible memory corruption happened. Data size of ' + Self.PeerAddr + ' is ' + Length(_data).ToString + ', position is ' + _pos.ToString);
Self.Close;
Exit;
End
Else
If need > 0 Then Begin
If len < need Then SetLength(buf, len) // If we received less bytes than needed to fill the buffer, read everything
Else SetLength(buf, need); // If we received more bytes than needed to fill th buffer, only read what is needed
debuglog := debuglog + 'Reading out ' + Length(buf).ToString + ' bytes. ';
read := Self.Receive(@buf[0], Length(buf));
If read > 0 Then Begin
debuglog := debuglog + read.ToString + ' bytes read. ';
// Something was read from the buffer. Append it to the socket's data
Move(buf[0], _data[_pos], read);
// Increase data position
Inc(_pos, read);
// Reduce received length
len := len - read;
End
Else debuglog := debuglog + 'Nothing was read. ';
End;
If _pos = Length(_data) Then Begin
Log(LOG_TCP, debuglog.TrimRight);
Log(LOG_TCP, Self.PeerAddr + ' > ' + BytesToString(_data) + ' (' + _pos.ToString + ' bytes)');
// Buffer is full. Process the data.
// Decrypt the buffer...
// ...
Try
ProcessLine(Self.PeerAddr, timestamp, TEncoding.UTF8.GetString(_data));
Except
On E:Exception Do Begin
Log(LOG_STD, TranslateException(E, 'processing client data'));
BanList.Failed(Self.PeerAddr, 'data processing error: ' + E.Message);
Self.Close;
End;
End;
_pos := 0;
SetLength(_data, 0);
End
Else
If Not debuglog.IsEmpty Then Log(LOG_TCP, debuglog.TrimRight);
Until len = 0;
If (Length(_data) > 0) Or (_pos > 0) Then Log(LOG_TCP, 'Storing a fragment for ' + Self.PeerAddr + ': Data size: ' + Length(_data).ToString + ', position: ' + _pos.ToString);
End;
End;
I know that there are a couple of premature exits before the actual data processing, but even when I added temporary logging before these, none of them was reached.
I'll investigate on how I can, and will try to add TCP_NODELAY and SO_SNDBUF, but I doubt that they will make any difference. Until then, I'm really interested what are the aspects what I did not even think of until now.
I'm using ICS v8.64, application is compiled using Delphi 10.4.1 as a 32-bit executable, and is executed as a Windows service on a Server 2012 R2 machine.
Any help is greatly appreciated 🙂