Solaris Troubleshooting : Using Truss to Identify the Signals sending to a Process
One of the Sample Issue: Customer frequently sees in /var/adm/messages
syslogd: going down on signal 15
which indicates that the system rebooted from a normal shutdown or
init 6, yet no reboot has actually occurred and uptime indicates
that no reboot occurred.
If you stop the syslogd and restart it, it puts a message in /var/adm/messages file with a timestamp. If you compare the timestamp of this message in
/var/adm/messages and the output of who -b, you will see that they don’t match. This means that the system was actually not rebooted, only syslogd was stopped
Apr 25 16:25:55 bubbles syslogd: going down on signal 15
# who -b
. system boot Apr 25 14:03
By the way, going down on signal 15 means that the kernel issued a SIGTERM to every process before shutdown. All signals are defined in /usr/include/sys/signal.h
Investigation using Truss:
Under some conditions, syslogd can go down(stop) without any apparent reason. There is typically no core file to analyse. In many cases, syslogd is re-started
In some cases, there is another process that sends a SIGNAL to the syslogd
process. In these cases, syslogd won’t print any information in the /var/adm/messages file, other than that it’s ‘going down on signal XX’ .
If the re-start is unexpected, and it causes a problem, it may be important to see which process sends the signal, and why.
First, trace which process or PID sent this signal. This is not possible from within the ‘syslogd’ daemon.
One way, is to
- run the ‘/usr/bin/truss’ command on syslogd’s Process ID(PID),
- monitor it for a few minutes(or hours, based on the frequency),
- determine which PID sends the signal
- analyse why the process sent it.
Here is the simplest form or truss that could be used:
# truss -o /var/tmp/syslog.truss.out -sall -p `pgrep syslogd`
In ‘syslog.truss.out’ the source PID that sends the signal can be seen.
/1: Received signal #15, SIGTERM, in sigtimedwait() [caught]
/1: siginfo: SIGTERM pid=3093 uid=0
In this case, process 3093, owned by root, sent the signal.
Using ps -ef, the name of the process sending the signal can b located.
# ps -ef | grep 3093
root 3093 2954 0 11:51:24 pts/3 0:00 -ksh
In this case, it was actually an interactive shell. These same methods can be used to locate processes sending signals to syslogd. The ‘sending’ processes can then be reviewed to understand the reason it is sending signals to syslogd