Linux watchdog handler

Hi all,

I’m interested in monitoring the processes running in a Linux system and determining when they are stuck/running endlessly very quickly.
Once I determine this, I also want to take on some actions (like dumping some debug info, restarting the process, etc…).

I know I can detect stuck processes using systemd, but unfortunately I wasn’t able to take action (where can I specify a script that I want to run when some process heartbeats are missed ?)

Are you aware about other tools that act like watchdog monitors ?
(processes can register to them, start sending heartbeats, and in case some heartbeats are missed, the tools takes some actions.

I am aware I can write my own tool – I just want to know if there’s anything else offering this functionality.

Thank you,
Andreea

monit

monit is a utility for monitoring and managing daemons or similar programs running on a Unix system. It will start specified programs if they are not running and restart programs not responding.

monit supports:

  • Daemon mode - poll programs at a specified interval
  • Monitoring modes - active, passive or manual
  • Start, stop and restart of programs
  • Group and manage groups of programs
  • Process dependency definition
  • Logging to syslog or own logfile
  • Configuration - comprehensive controlfile
  • Runtime and TCP/IP port checking (tcp and udp)
  • SSL support for port checking
  • Unix domain socket checking
  • Process status and process timeout
  • Process cpu usage
  • Process memory usage
  • Process zombie check
  • Check the systems load average
  • Check a file or directory timestamp
  • Alert, stop or restart a process based on its characteristics
  • MD5 checksum for programs started and stopped by monit
  • Alert notification for program timeout, restart, checksum, stop
    resource and timestamp error
  • Flexible and customizable email alert messages
  • Protocol verification. HTTP, FTP, SMTP, POP, IMAP, NNTP, SSH, DWP,
    LDAPv2 and LDAPv3
  • An http interface with optional SSL support to make monit
    accessible from a webbrowser

or maybe m/monit if you want a pretty UI