Solaris Troubleshooting : Forcing a coredump on hung x86 / x64 solaris system

Befor going for troubleshooting please confirm that you really having ” system hang “
 Symptoms of System hang:
  • System appears to be hung
  • system is not pingable
  • can not login
  • can not execute commands
  • can not mount shares
  • can not start/stop services
  • system not responding

hanging system will not respond on any command and user interaction – it’s no longer usable.

Here are some of the situations which give the appearance of a system hang:

  • Operating system isn’t booted or is rebooting in a loop
  • System running low on memory or is overloaded
  • Network share is lost due to network errors
  • Other network errors
  • Video or console output frozen

Here’s how to eliminate the above issues which give the appearance of a system hang:

1 Verify that the system is powered on and os booted or os isn’t booting in a loop. You can check for booting in a loop on the system console.

2 Verify the status LED on the system. Use for example ipmitool -H <ip_of_ilom> -U root power status or platform get power state (for v20z/v40z) to verify the power status.

3 Verify on the console or through your Service Processor console that your Operating System is booted. The system is not hanging if you see any activity on the console

4 Wait for a while – systems which are low on memory (possibly because of a heavy load) use the swap intensively. If you wait a while, the system may become available. Of course, further investigation of what is causing this will be required. On Linux or Solaris operating systems, try to force an activity on the console by issuing a <ctrl> <c>.

5 For Windows 2003 SAC you can try to issue the help command in order to check system availability.

6 Verify that network infrastructure is healthy and configured. Use the “ping” command to ping the the default gateway in the network segment; ping any naming system servers.

If other systems in the same network segment appear to be hung, the network is a good place to start your investigation.

  • For Solaris and Linux, search for “NIS server not responding for domain <domainname>” on console or in the messages file. Check the availability of your name services (i.e. NIS, DNS, LDAP).
  • For Solaris and Linux, search for “NFS server not responding” on console or in the messages file. Check the availability of your NFS server
  • Ask your network administrator for any known issues in the network infrastructure.

7 Verify that all users of the system have the same issue / see a system hang.
On a multiuser system ask the others users if they see the same issue or if they recognized something else

 

If the above steps all check out, chances are you have in fact got a hung system.

Because these machines do not use the Open Boot PROM(OBP), the procedure for forcing a core file is different to that on SPARC[R] platforms. Boot the computer with the kernel debugger kadb, so that during a hang, it is possible to drop into kadb, and run the appropriate command to obtain a core file.

Note: Under Solaris 10 OS, use kmdb

IMPORTANT NOTE: kadb and kmdb work in console mode ONLY. If you drop into the debugger from the keyboard using a monitor attached to the system, there is no way to see the debugger prompt, and it will appear as though the system has frozen. This occurs because, as part of it’s normal functioning, kadb is supposed to suspend the system, including any GUI applications. To work around this problem, it is necessary to be connected to the console or if necessary, you could disable the GUI.

Setup

        1. Take the appropriate steps to ensure the dump device is set up properly, and that savecore is enabled. See the dumpadm(1M) manual page for details.
 
        2.Boot the system with kadb. 
        From a command line, type:
 
                    #eeprom boot-file=kadb and reboot, or, at the initial boot prompt,
 
                    when it says “Select (B)oot or (I)nterpreter:” enter “b kadb -d” and press return.
Please Note: if Solaris X86 01/06 (newboot) is used simply add “kadb” to the end of the initrd line.

                    root (hd0,0,a)
                    kernel /platform/i86pc/multiboot kadb
                    module /platform/i86pc/boot_archive

  1. The next time the system hangs, send a break, which if it works, will drop to the kadb/kmdb prompt:

Drop to kadb, generate a core file

 From a directly attached keyboard or serial port connection, type:

          F1-A - press the "F1" and "A" keys, simultaneously. The control-alt-d key sequence also works.
 
If using a Sun AMD Opteron based platform.

On X4200/X4100 Servers, once connected to the SP console (start /SP/console from ILOM prompt), and then press Esc followed by shift+b to send break.

On V65x Servers, send a break to the console, and then press the key corresponding to the sysrq-command to send.

On V20z and V40z, once connected via the platform console, press ^Ecl0 to send the break. That is, press CONTROL-E, then the letter ‘c’, then the letter ‘l’, then the number ‘0’.

On Blades (B100x, B200x) to send a break from the SC console, type break s’N’ were ‘N’ is the slot number followed by ‘y’ (yes) when prompted.

NOTE: If using the Serial-over-LAN functionality for the Sun AMD Opteron platform, use the alternate break sequence explained in Technical Instruction 1012587.1

    Once at the kadb prompt type
                                
                    #<systemdump

This will generate a core file, which can be retrieved from:

         /var/crash/'uname -n' (or wherever the local system stores core files)

Send the core to Sun for analysis.

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

1 Response

  1. Midhun says:

    How to send break in HP blade ?

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us