Jump to content
Clément

Service monitoring other services activities

Recommended Posts

Hi,

I'm using Delphi 12 for this one.

 

There are several Window Services applications working in a lot of different tasks.
For example:
    A Schedule service, with thread managers and worker threads execute tasks at specific times.
    A Communication Service, with several thread managers, each with it's own sets of working threads which handles communication with TCP (or UDP) devices.
    A Batch service with several thread managers, each with a set of working threads which handles batch...

 

Well as the application grew over the years, there are more and more thread managers handling more and more workers....

Sometimes Sh:classic_blink:t hits the fan, and some threads just stops responding. Sometimes it's a worker, which might be replaced by the manager, but sometimes it's a manager thread that goes bananas.

I would like to write another service, a thread monitoring service, where I want to "send somehow" a heartbeat from each worker thread ( from all the other threads ).

I want to know when a worker thread went bananas, but mainly when a manager threads goes bazinga.
Some of the errors we detected: out of memory, out of disk space, file is used by another process ( usually anti-virus), SQL query Error ( invalid customer data ), SQL Query error ( invalid instruction ), Server Database went in maintenance mode, Database not available (communication lost, disk is full, backup taking to long), Bad Windows Server Patch , Windows update, and the list goes on and on.

 

All the above describes actually problems that leads a worker or a manager to fail. Sometimes we can track what happened and reply our SLA in time. But sometimes it's just a nightmare. Nobody did anything and nobody changed anything...  

 

I guess I want the safest IPC in this context. For this to work, the worker thread cannot freeze while sending a heartbeat.
For know, just knowing what thread stopped will be enough.
I suspect a lot of things, but even with a lot of logs sometimes is very hard to track down what is happening, especially when the customer is eager to blame me.
At least, the idea is to detect a "worker strike" or a "manager riot" as early as possible.

Any tips?

 

 

 

Share this post


Link to post

I did something similar many years ago, then I abandoned this path because all the problems were solved and there was no longer any need for a similar approach.

What I did was use a program external to the application (not a service) using it as a "dumb" TCP server to collect information from all the other applications and their threads.

The reception had to have updated data (for example the number of cycles performed, the status of the connections, the number of polling performed on all devices) and through a rough analysis performed by this external application on which the TCP server ran, any alerts were displayed. At the time I also took into account the revolutions that an encoder performed (an encoder is a device that counts the rotations of a mechanical shaft) to match it with the cycles performed and if they did not match then alarms were sent from all sides (it cannot be said, but the application sent the data privately via its own internal mail client connected to my company server so that we had everything under control, similar to "analitycs").

Share this post


Link to post

ICS has a new Application Monitoring client and server system, I have it running on all my public servers monitoring my web, FTP, proxy Windows Services, and restarting them if they halt on or request if they experience critical errors.   Have a read of: https://wiki.overbyte.eu/wiki/index.php/FAQ_ICS_Application_Monitoring

 

The client part just sends simple TCP PING packets, the hard part is knowing when to send those pings, my first attempt just used a timer, but that started before the server started and did not check it ever started, things got better over the weeks.

 

The server is currently basic, running on the same machine since it needs to restart the Windows Services if they stop, but I'm going to add remote monitoring of that server with a websocket API so a remote PC could monitor sereveral servers.

 

Angus

 

Share this post


Link to post
2 hours ago, Clément said:

I would like to write another service, a thread monitoring service, where I want to "send somehow" a heartbeat from each worker thread ( from all the other threads ).

I want to know when a worker thread went bananas, but mainly when a manager threads goes bazinga.

At my last company, I developed and maintained a central Windows service that all of our other products communicated with to 1) write entries in a centralized log file, 2) send out notifications, and 3) track heartbeats.  I used a free-threaded ActiveX/COM object for the communication, and just about every thread of every product made use of this COM object.  Each message identified which product and internal component it belonged to.  Each product would register its heartbeats and then update them at regular intervals until shutdown.  If a heartbeat ever timed out unexpectedly (because a thread had frozen or died, or was just running a task for too long) then the service would send out a notification containing those details to our tech support and/or server admins (usually an email, but other kinds of notifications were also supported).

Edited by Remy Lebeau

Share this post


Link to post

I’m thrilled with the branding services I received from Digi Glume. They took the time to understand my business and created a brand that truly resonates with my target audience. The results are both professional and visually striking.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×