Solaris Troubleshooting Reference: Using Truss to Identify the Signals sending to a Process

One of the Sample Issue: Customer frequently sees in /var/adm/messages

syslogd: going down on signal 15



which indicates that the system rebooted from a normal shutdown or
init 6, yet no reboot has actually occurred and uptime indicates
that no reboot occurred.

If you stop the syslogd and restart it, it puts a message in /var/adm/messages file with a timestamp.  If you compare the timestamp of this message in
/var/adm/messages and the output of who -b, you will see that they don’t match. This means that the system was actually not rebooted, only syslogd was stopped
and restarted.

Apr 25 16:25:55 bubbles syslogd: going down on signal 15

# who -b
.       system boot  Apr 25 14:03

By the way, going down on signal 15 means that the kernel issued a SIGTERM to every process before shutdown. All signals are defined in /usr/include/sys/signal.h

Investigation using Truss:

Under some conditions, syslogd can go down(stop) without any apparent reason. There is typically no core file to analyse. In many cases, syslogd is re-started
automatically.

In some cases, there is another process that sends a SIGNAL to the syslogd
process. In these cases, syslogd won’t print any information in the /var/adm/messages file, other than that it’s ‘going down on signal XX’ .

If the re-start is unexpected, and it causes a problem, it may be important to see which process sends the signal, and why.

First, trace which process or PID sent this signal. This is not possible from within the ‘syslogd’ daemon.

One way, is to


  • run the ‘/usr/bin/truss’ command on syslogd’s Process ID(PID),
  • monitor it for a few minutes(or hours, based on the frequency),
  • determine which PID sends the signal
  • analyse why the process sent it.

Here is the simplest form or truss that could be used:

# truss -o /var/tmp/syslog.truss.out -sall -p `pgrep syslogd`

In ‘syslog.truss.out’ the source PID that sends the signal can be seen.

An example:

/1: Received signal #15, SIGTERM, in sigtimedwait() [caught]

 /1: siginfo: SIGTERM pid=3093 uid=0

In this case, process 3093, owned by root, sent the signal.

Using ps -ef, the name of the process sending the signal can b located.

# ps -ef | grep 3093

root 3093 2954 0 11:51:24 pts/3 0:00 -ksh

In this case, it was actually an interactive shell. These same methods can be  used to locate processes sending signals to syslogd. The ‘sending’ processes can then be reviewed to understand the reason it is sending signals to syslogd

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

3 Responses

  1. venkat says:

    hi ramdev,

    how to check the process which are running from last one hour……tx in advance

  2. Yogesh Raheja says:

    Hi Venkat, you can identify the processes by ps -eaf command.

  3. Ramdev Ramdev says:

    @venkat –  ps -eo pid,command,etime will give you the  process id and, command and the elapsed time of the process.

     You need to have use grep with proper regular expression to filter out the processes that falling under your time requirement. 

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us