Eric Bonilha 3 Posted February 25 Hello! I'm trying to diagnose an issue I have with our application and would like to pick the minds of the experts here. The application has a TCP Server running on the main thread, so all connections are processed by the main thread and basically the issue I'm having is that if the main thread is somewhat busy (not even too busy), some connections are straight away refused by the OS (When clients try to access the application). Now, we have this application running in thousands and thousands of machines and this is a VERY RARE occurrence... as even if the main thread is busy (processing something else) when an inbound connection is pending, the OS should keep this connection in a backlog and the effect is that the client trying to access might take some time to get a reply, but it will establish the connection as soon as the thread accepts the connection, and I believe this is the standard way that the OS works, please correct me if I'm wrong... I don't know why, in some very rare cases (like this one), if the thread is slightly busy, many connections will be straightly refused... Do you have any idea why? Could this be some setting on the OS? This is Windows Server (I have to remotely access the customer to get the specific version). I tried increasing the ListenBacklog value to high values like 200 (instead of the default 15), but still the problem persists, in the latest tests I did, I could see about 14 or 15 connections being accepted and processed, then all other connections (We were opening about 80 connections simultaneously) are immediately refused. Any ideas or suggestions are appreciated! Thanks Eric Share this post Link to post
Remy Lebeau 1512 Posted February 26 (edited) 20 hours ago, Eric Bonilha said: In the latest tests I did, I could see about 14 or 15 connections being accepted and processed, then all other connections (We were opening about 80 connections simultaneously) are immediately refused. Is there a firewall, load balancer, or other similar system running in front of the server? Typically, the only reason for a "connection refused" error is either: - the connection is trying to access a port that is not open. - the connection reaches the port but the backlog is full. - the connection is rejected by a WSAAccept() callback. - a firewall or other system in front of the server is blocking the connection. Edited Wednesday at 05:59 PM by Remy Lebeau Share this post Link to post
Kas Ob. 128 Posted February 26 8 hours ago, Eric Bonilha said: Any ideas or suggestions are appreciated! Remy listed few things, and i will list more thought to follow on this, Windows OS has its own DDoS protection implemented, it almost useless or more like very naïve as any more advanced one will cause wide range of problems and require more advanced knowledge to tweak so Microsoft kept it as simple as possible with very limited settings to tweak, anyway 1) start with this link and see your dynamic port with "show dynamicport" before changing and adjusting https://learn.microsoft.com/en-us/troubleshoot/windows-client/networking/tcp-ip-port-exhaustion-troubleshooting 2) It is disgusting how Microsoft manage to just lose links to 404, valuable information and documentation, for stupid site miss-manage, i have very little time to write and search and searching almost always land me on 404 !, found this though https://serverfault.com/questions/43252/how-can-i-harden-the-tcp-ip-stack-in-windows-server-2008 ECN can play a role, and there was some more registry settings , later will look for them if your couldn't find the problem, but in general more information is needed, like how many new connection established per second and average time for connection staying connected ... Share this post Link to post
Angus Robertson 612 Posted February 26 Is the server dead once the problem arises, or does it start accepting connections again at some point? The backlog of 15 suggests the default is not being changed, but it is set immediately before Listen so can not be skipped. There is a fix in V9.4 relating to the wrong connection state when connections open very quickly, usually localhost, that could stall WSocket, not sure if it applies to your situation. Angus Share this post Link to post
Kas Ob. 128 Posted February 26 12 hours ago, Eric Bonilha said: Do you have any idea why? Could this be some setting on the OS? Also i witnessed this behavior on many Windows and Linux servers hosted on dedicated servers, it is almost was the host problem or a specific ISP, you needed to study the dropped connections, if we are talking about dropped connection not accepted ones, does your host have some sort of DDoS protection, because it might be triggered on their hardware before your server by unrelated server attack happens to be the same switch and this could lead to such dropping/losing connection or refusing new connections for few minutes then everything come back as normal, and the load return to its normal. For this case, track and record the time of this and ask your host technical support to confirm if that is the case, also record these IP(s) refused or dropped connections, and try to geo locate them see if they belongs to one or more than one but close ISP(s). Share this post Link to post
FPiette 390 Posted Wednesday at 12:40 PM 14 hours ago, Eric Bonilha said: some connections are straight away refused by the OS (When clients try to access the application). After that happens, is the application start working again without restarting it or one it happens no other connection is accepted. If no other connection is accepted, it could be that the listen socket has been closed unexpectedly. This could happens by a bug elsewhere leading to a CloseSocket or even plain winapi CloseHandle function is called with the socket handle. Share this post Link to post
Eric Bonilha 3 Posted Wednesday at 07:03 PM Thanks for all the answers! I will check on the firewall settings, but the test was done locally, so the client (Basically a camera surveillance system) running on the same machine as the server, connecting local, so I'm not sure the firewall would affect it Here are more thoughts: Remy: My initial suspicious was the backlog, the connection was reaching the backlog (because the thread is busy and has not yet accepted all connections), but I did set a value of 200 for the backlog (instead of 15) and it didn't help. I checked and debugged the code and I confirmed that the ICS is sending the property value (200) to the windows API instead of 15 In our application, the connection is not being rejected when we receive the connection available event from ICS, I'm not sure ICS could be rejecting when it receives it, before actually triggering the event to us? I will check ICS source code Kas: The DDOS protection could be something to check, its a Windows Server (not client), but I will check if there is anything activated, this could be one reason indeed! Angus, Piette: The application keeps running normally, and the connections are accepted again after a few seconds. So basically, when the main thread has processed everything, new connections are accepted. Now I'm thinking about DDOS protection and I have to do some testing to see if it could be it or if its because the main thread is busy. As I said, we have this running on thousands of systems, and even when the main thread is busy, new connections will be "pending" waiting for the thread to accept them, and not straight away denied like its happening. So what I noticed is that if I open like 80 clients at the same time, only a few gets through (about 15), but then a couple of seconds later, when those failed connections are retried, they will get accepted! Share this post Link to post
tgbs 16 Posted Friday at 12:58 PM Windows 10,11 ? Or Windows Server version Share this post Link to post