Jump to content
turkverisoft

Thread programming without sleep or WaitFor events

Recommended Posts

We need very high performance to process UDP Socket messages. If we use sleep(0) or sleep(1) or WaitForSingleObject (Event based) it works but sometime we get 15-16ms latency.
If we don't use these it works perfectly but in this case the blocks other operations.
Is there any way to use Threads without any latency in Delphi? Maybe by dedicating one of core to this thread.

Share this post


Link to post
27 minutes ago, turkverisoft said:

sometime we get 15-16ms latency

Unless you're running on a Commodore 64 you shouldn't be seeing such delays due to Sleep or WaitForSingleObject. There must be something else affecting the result but it's hard to tell without seeing your source. How many threads do you have running concurrently?

 

Anyway, IO Completion Ports are generally considered the best way to get optimal performance in the scenario you describe. It should be fairly easy to find some examples of how to utilize them with Delphi.

 

...and don't mess with the thread affinity. Leave the thread scheduling to the OS. It's better at it and it shouldn't really be needed for something like this.

 

P.S. Don't use Sleep(0).

  • Like 1

Share this post


Link to post
Guest

Mentioning Sleep and WaitForSingleObject means you are talking Windows, Windows is very efficient in sockets operation TCP and UDP, but mentioning Sleep means you are doing it wrong, by wrong i mean you are using inefficient design to handle receive and send over socket, that been said, i will recommend few resources and make few notes in hope saving you time and give you good start.

 

You didn't mention what is needed exactly, is it one to one connection with UDP traffic ? or server with many connections ( tens,hundreds ..) 

IOCP is my recommendation to go, as IOCP is very powerful and threading model in such usage is very easy and simple, for that i recommend you to have a look at those :

1) great blog from Grijjy https://blog.grijjy.com/2018/08/29/creating-high-performance-udp-servers-on-windows-and-linux/ here you should be careful with that second note in Windows part and the paragraph following it, you need to build list to recognize/associate UDP packets with session, if that is what you need, you didn't mention if their will be session ( identified as users/ steady connections ) or not ?

2) have a look at those SO questions and their answers :

https://stackoverflow.com/questions/2302267/is-there-a-i-o-completion-port-based-component-for-delphi

https://stackoverflow.com/questions/11361208/using-iocp-with-udp

3) You are well informed about the reliability of UDP ? dropped packets, repeated packets, corrupted packets(rare but happens).. you should be aware of that all before wasting your time.

 

What i recommend is :

1) Use 1-2 threads in IOCP model to receive on server side, that will be more than enough to consume more than 100Mbps, with i don't know, may be 40k packets per second on average system( not server with XEON CPU)

2) You can use the same threads allocated for receiving to send packets too with the same IOCP, that is up to you to measure and test.

3) most important thing to defeat delays and make things fast, do NOT handle any data in that receiving IOCP, means those dedicated threads are only to handle receive and send, after receiving a packet move them to different threads or another IOCP to handle/process ( decrypt in case SSL/TLS connection, or save to disk, send to another client...etc), you can handle them if the process of those packet is very fast with same threads of course, in that case dedicate more threads to the IOCP ( mention in 1), even you can assign 64 threads without noticeable drop in efficiency or higher CPU usage, Windows IOCP will use the threads that is still running and will not put sleep when there is work to do.

4) if you are just relaying the packets between users/sessions then 4 threads can handle tens of thousands of packet per second, with 0 delay,

5) if you are going to process those packets ( means your receiving threads will be busy even for fraction of second ) and UDP traffic incoming to server is high, then you will have dropped traffic, as UDP packets don't buffer ( accumulate ) on server Socket Stack, they will be dropped and lost.

6) for receive buffer size use perfect multiply of memory pages (4k = n*4*1024 bytes), those buffer should be aligned by the page size, this will greatly increase the incoming traffic speed as pages can be swapped without in memory copy, this will be done the socket stack driver in the kernel.

 

That what did come to my mind for now, While writing i see Andres confirm that IOCP is the best way to go, and i second his 

24 minutes ago, Anders Melander said:

Sleep has its merit but definitely not with sockets nor with any IO operation.

 

Good luck.

Share this post


Link to post

Hi all;

Forget UDP Port.
I have a list to handle without delay. Using sleep or waitforsingleobject affects 16ms  latency while switching windows messages.

I can't use IOCP because i have to process the list in an order FIFO.

I'm not sure which approch is the right way or any other approch to do this.
My first approch is not to use sleep or any other waiting objects. In this case CPU will use %20.

My second idea is setting one thread for  this job by using SetThreadAffinityMask. This thread will use allways one CORE  witout balancing with other Core's. It also use %20 of Total CPU but the differences is it's not going to switch between other CPU's.

Which one is the best approch? Or do you have an idea?

Note: There are same third-party tools to dedicate a core to a specific actions but i don't wan't to use them without source code.

 

thanks @Kas Ob. and @Anders Melander

Share this post


Link to post
5 hours ago, turkverisoft said:

i have to process the list in an order FIFO

Then you should not be using UDP, since order of packets is not guaranteed. With UDP, every packet should be considered data independent of all others.

  • Like 3

Share this post


Link to post
Guest

As Dave said FIFO with UDP is no-no, you should have your sorting and ordering mechanism, and of course if those packets are important then you should have a mechanism to detect such cases and resend them.

 

Now you didn't answer my question about the expected throughput or the count of packets you are receiving, if they are 100 or may up to 500, then you don't need any threading in the first place, your main thread can handle those with sockets event fast enough, again is there long processing operation !? in that case you need use background thread to do such process.

If you have many packets, then you need dedicated threads to do that.

 

You need to take deep look at your design and redesign it, find UDP library and use it, study the examples with such library and build like them.

 

Share this post


Link to post

You asked the same question on SO: https://stackoverflow.com/questions/61505887/thread-programming-without-sleep-or-waitfor-events

 

It was closed there because it lacked focus.  You seem to be wanting advice to a level of detail that far outstrips the level of detail used to specify your problem, and your current solution.  In my view you are unlikely to be anything much out of a question asked the way you did.  I recommend that you step back and provide a lot more detail and background.  That will give you more hope of getting relevant advice.

  • Like 2

Share this post


Link to post

with call Winapi.MMSystem.timeBeginPeriod(1) you can minimize the negative random effect from calling Sleep(), I don't know if it also affects WaitForSingleObject.

Share this post


Link to post
2 hours ago, David Heffernan said:

You asked the same question on SO

WTH? And it was closed before he reposted it verbatim here. I hate that.

 

@turkverisoft If you're just going to ignore the advice you get you will not only be wasting your own time but more importantly the time of the people trying to help you. Now go stand in the corner. I'm out.

Share this post


Link to post
2 hours ago, mrepec said:

with call Winapi.MMSystem.timeBeginPeriod(1) you can minimize the negative random effect from calling Sleep(), I don't know if it also affects WaitForSingleObject.

It surely affects your machines energy consumption.

 

Even Google fell for that: Once started, Chrome sucked laptop batteries dry, even after all processes had been killed. It stopped after a Windows reboot. Took them years to find out.

Edited by Der schöne Günther

Share this post


Link to post
19 hours ago, Anders Melander said:

Anyway, IO Completion Ports are generally considered the best way to get optimal performance in the scenario you describe.

Or Registered I/O.

Quote

The RIO API is a new extension to Windows Sockets (Winsock) and provides an opportunity for you to reduce network latency, increase message rates and improve the predictability of response times for applications that require very high performance, very high message rates, and predictability. RIO API extensions allow applications that process large numbers of small messages to achieve higher I/O operations per second (IOPS) with reduced jitter and latency. Server loads with high message rates and low latency requirements benefit most from RIO API extensions, including applications for financial services trading and high speed market data reception and dissemination. In addition, RIO API extensions provide high IOPS when you deploy many Hyper-V virtual machines (VMs) on a single physical computer.

 

  • Like 1

Share this post


Link to post
49 minutes ago, Remy Lebeau said:

And the FreePascal/Lazarus forum

And he even double posted his "response". I wonder if it's a bot :classic_dry:

Share this post


Link to post
21 minutes ago, Anders Melander said:

Indeed. Unfortunately it requires Win8 or "better" which means it's out of bounds for me at least.

Unlucky for you. We've stopped supporting Windows 7, after all MS doesn't anymore. 

Share this post


Link to post
1 hour ago, David Heffernan said:

Unlucky for you. We've stopped supporting Windows 7, after all MS doesn't anymore.

It's not really a problem. I have to support Windows 7 since the majority of my customers are still using it. In six to eight months I think Windows 10 will overtake it, but even then I will still have to support Windows 7 for those that use it.

I've not yet had any needs beyond what Windows 7 provides so like I said it's not a problem for me or my customers.

Share this post


Link to post
22 hours ago, Darian Miller said:

Interesting!   Any plans for Indy server-side usage?

Sort of, but mostly no.

 

At this time, there are no plans to move away from the current model of 1-thread-per-connection w/ blocking I/O, since Indy is multi-platform and that model is very portable.  However, on Windows, Indy's IdWinsock2 unit does currently have declarations for the RIO functions, although it does not actually import the functions at runtime.  But you can do that manually by calling WSAIoctl() directly (which Indy does expose access to).  Also, I did check in some updates to Indy a few months ago to allow users to create asynchronous sockets using Indy's API, although Indy itself does not make use of asynchronous sockets.

 

We tried once before (the SuperCore package) to add support for fibers and IOCP to Indy on Windows, but that was an epic failure, it just didn't work right and didn't fit well with the rest of Indy's architecture.  Maybe in the future, we might consider making some new Windows-specific components that are just native IOCP/RIO without trying to shoe-horn IOCP/RIO into the rest of Indy.  Who knows.  If we ever did, that would likely be WAY down the line.  Maybe Indy 12, 13, etc.  We haven't even released Indy 11 yet, and that will just be a maintenance version, no real new features, just mostly code cleanup.  No ETA on that.

  • Like 1

Share this post


Link to post
1 hour ago, Remy Lebeau said:

Sort of, but mostly no.

 

At this time, there are no plans to move away from the current model of 1-thread-per-connection w/ blocking I/O, since Indy is multi-platform and that model is very portable.  However, on Windows, Indy's IdWinsock2 unit does currently have declarations for the RIO functions, although it does not actually import the functions at runtime.  But you can do that manually by calling WSAIoctl() directly (which Indy does expose access to).  Also, I did check in some updates to Indy a few months ago to allow users to create asynchronous sockets using Indy's API, although Indy itself does not make use of asynchronous sockets.

 

We tried once before (the SuperCore package) to add support for fibers and IOCP to Indy on Windows, but that was an epic failure, it just didn't work right and didn't fit well with the rest of Indy's architecture.  Maybe in the future, we might consider making some new Windows-specific components that are just native IOCP/RIO without trying to shoe-horn IOCP/RIO into the rest of Indy.  Who knows.  If we ever did, that would likely be WAY down the line.  Maybe Indy 12, 13, etc.  We haven't even released Indy 11 yet, and that will just be a maintenance version, no real new features, just mostly code cleanup.  No ETA on that.

 

Yeah, I was excited for SuperCore with IOCP years back.  I eventually wrote my own IOCP server but I never put it on production as my first few versions complicated the heck out of a simple server.  My simple tests were easy enough to see huge throughput but when it was time to port an existing server over, it simply didn't turn out to be worth it.

 

Given today's news with FastMM5, maybe you should get an Indy 12 professional ready to go and release it GPL or Commercial.  That would throw some extra wrinkles into Embarcadero's bedsheets...  Delphi has certainly leaned heavily on Indy for a number of years now.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×