Solaris Troubleshooting : Using Truss to Identify the Signals sending to a Process
One of the Sample Issue: Customer frequently sees in /var/adm/messages
syslogd: going down on signal 15
which indicates that the system rebooted from a normal shutdown or
init 6, yet no reboot has actually occurred and uptime indicates
that no reboot occurred.
If you stop the syslogd and restart it, it puts a message in /var/adm/messages file with a timestamp. If you compare the timestamp of this message in
/var/adm/messages and the output of who -b, you will see that they don’t match. This means that the system was actually not rebooted, only syslogd was stopped
and restarted.
Apr 25 16:25:55 bubbles syslogd: going down on signal 15
# who -b
. system boot Apr 25 14:03
By the way, going down on signal 15 means that the kernel issued a SIGTERM to every process before shutdown. All signals are defined in /usr/include/sys/signal.h
Investigation using Truss:
Under some conditions, syslogd can go down(stop) without any apparent reason. There is typically no core file to analyse. In many cases, syslogd is re-started
automatically.
In some cases, there is another process that sends a SIGNAL to the syslogd
process. In these cases, syslogd won’t print any information in the /var/adm/messages file, other than that it’s ‘going down on signal XX’ .
If the re-start is unexpected, and it causes a problem, it may be important to see which process sends the signal, and why.
First, trace which process or PID sent this signal. This is not possible from within the ‘syslogd’ daemon.
One way, is to
- run the ‘/usr/bin/truss’ command on syslogd’s Process ID(PID),
- monitor it for a few minutes(or hours, based on the frequency),
- determine which PID sends the signal
- analyse why the process sent it.
Here is the simplest form or truss that could be used:
# truss -o /var/tmp/syslog.truss.out -sall -p `pgrep syslogd`
In ‘syslog.truss.out’ the source PID that sends the signal can be seen.
An example:
/1: Received signal #15, SIGTERM, in sigtimedwait() [caught]
/1: siginfo: SIGTERM pid=3093 uid=0
In this case, process 3093, owned by root, sent the signal.
Using ps -ef, the name of the process sending the signal can b located.
# ps -ef | grep 3093
root 3093 2954 0 11:51:24 pts/3 0:00 -ksh
In this case, it was actually an interactive shell. These same methods can be used to locate processes sending signals to syslogd. The ‘sending’ processes can then be reviewed to understand the reason it is sending signals to syslogd
Hi,
thanks for deeply analysis but i have 1 issue; on server some time “configuration restart” message came in dmesg. can u explain plzzzzzzzz
can you check what is the alert type, like kernel, user, or any application.
and can you please post the alert along with all other lines posted with same time stamp as alert
Hi Ramdev. ur posts are very useful. But I have one issue.How can IÂ analyze Core dumps or crash dumps.Â
Hi Prajwala, As an ideal system administrator we dont’ normally try to dig these dumps instead we send them to oracle engineers to analysis and advise the solutions. But some times we are curious to look at it by using some debugging commands like dbx, mdb…etc. The below link is a good reference for your question “http://developers.sun.com/solaris/articles/manage_core_dump.html”
@Prajwla, In real world there are tools to analyse core dumps and crash dumps.
hi,
how to display processes from last 1 day.
@Venkat, this will display the processes which are strated in last 24 hours… ps -A -o user,pid,etime|egrep -v “-“
hi guys,
Could you please explain how can we install vcs in vm ware work station,
Hi Ramdev,
this UNix admin school is Awesome,,where i can register to get the updates.
i’m learning lot of things from unix school.Knowledge sharing is good attitude……..thanks lot……..
Sreedhar, Welcome to unixadminschool.com, you can opt for our regular unix knowledge in email from this link http://forms.aweber.com/form/77/1493755877.htm