RHEL 5 : Crash Dump capturing for Red Hat Linux

There are numerous occasions when a crash dump can be a valuable source of information when troubleshooting a system. The most common times are a system hang or a system panic.

Under Solaris[TM] on both SPARC(R) and x86 platforms, the mechanisms for getting a crash dump in these situations are well understood. Under Linux (specifically Red Hat) this situation is less clear. 

This post explains about hot to get a crash dump from Red Hat linux to aid in troubleshooting system hangs after the operating system has been loaded. It covers which versions of RHEL are required, and the differences between 32bit and 64bit support.

RHEL crash dump Utilities:

The two main options for getting a crash dump (all pages in memory dumped to a file) under RHEL are netdump and diskdump.

Netdump – Supplied in RHEL 3 U1 and later – If you are on update 1 please see RHSA-2004:017-06 from Red Hat, this will allow 64 bit os dump as well – This will dump a vmcore file containing the entire contents of memory, over the network to a dedicated netdump-server. It will also dump a thread list and register info over the network to a log file. Kernel oops information will be dumped as well.

This allows for a central netdump-server, that can receive dumps and logs from multiple systems on a network and multiple architectures. This machine can be provisioned with large amounts of disk space and allows for central maintinance.

Secuity between the client and server is catered for.

There is a bug regarding netdump working across subnets (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90803). Currently the server and client need to be on the same subnet. Scheduled for fix in RHEL3 U5 and RHEL4 U1.

Netdump-server is on cd1 of RHEL3 and will need to be installed manually (rpm install)

Netdump client and the kernel modules are installed by default.


Diskdump – Supplied in RHEL3 U3 and later – This is more familiar to Solaris users, and is closer to the savecore facility in solaris. A dedicated partition is formatted to receive disk dumps. When the system panics, it will write a memory image to this partition. When the system comes back up, this partition will be checked and if it contains a valid dump image, this will be written back out to /var/crash (or another location) on the system. After this has completed, the dump partiton will be reformatted (which can take a while) ready to take another crash dump.

Diskdump is supplied with RHEL3 U3 or later on both 32bit and 64bit

Both the above methods supply a vmcore image, and a textual stack dump. They do not provide a namelist or symbol table. To analyze the resultant dump image, a kernel needs to be built with debug flags set, that is matched to the kernel the customer will be running. As most RHEL installs will use the default kernel, this isn’t as tricky as might be expected.

The contents of /boot on the customer system should be tar’d up, as it can contain useful system maps for assistance in performing a Red Hat Linux crash dump.

The crash analysis tool provided with Red Hat Linux ‘crash’ contains info in the manual page about what it requires. It can be run against a live kernel image as well.

Forcing Crash Dumps from Hung Linux systems

Setting up RHEL for crash dumps

The main method for forcing a crash dump from a hung Linux system is using the alt-sysrq-<key> combination. This is analogus to STOP-A (or L1-A) on a Sun SPARC system.

echo ?h? > /proc/sysrqtrigger will have the same effect as pressing alt-sysrq-h.

Enabling alt-sysrq key sequence

The alt-sysrq key sequence is disabled by default under RHEL. To enable it, edit /etc/sysctl.conf and set kernel.sysrq = 1

Netdump configuration

http://www.redhat.com/support/wpapers/redhat/netdump/

Netdump only works on i386, not x86_64 the netconsole.o kernel module is not supplied for x86_64. Even if you roll your own kernel, it will load, but not dump the memory image over the network in 64bit mode.

The server and client need to be on the same subnet.

On the server
chkconfig netdump-server on
service netdump-server start
create user netdump with password

On the client
Edit the file /etc/sysconfig/netdump and add a line like NETDUMPADDR=10.0.0.1
make sure the DEV= line reflects the ethernet adaptor that the server is accessible on (e.g. DEV=eth1)
chkconfig netdump on
service netdump propagate (will require netdump user/password on server)
service netdump start (make sure the module loads ok)

IF the server changes IP address or mac address, then all netdump client modules will need to be unloaded and reloaded.


netdump example output

CPU#0 is frozen.
CPU#1 is executing netdump.
CPU#2 is frozen.
CPU#3 is frozen.
< netdump activated - performing handshake with the client. >
NETDUMP START!
< handshake completed - listening for dump requests. >
0(79500)/

Diskdump configuration

Disk dump requires RHEL3 Update 3 or later. It works under 32bit and 64bit modes.

Create a new partition using fdisk.
NOTE: It MUST be bigger than the amount of physical memory in the system.

Swap partitions cannot be used for dump devices.

Format the newly created dump partition (a reboot may be required to reread the partition tables on the disk) with diskdumpfmt -f -p <device> (e.g. diskdumpfmt -f /dev/sdb2)

chkconfig diskdump on
service diskdump start

disk dump example output

CPU frozen: #0#1
CPU#1 is executing diskdump.
start dumping
dumping memory...

and on the way back up

INIT: Entering runlevel: 3
Entering non-interactive startup
Saving panic dump: [  OK  ]
Formatting dump device: [  OK  ]
Starting diskdump: [  OK  ]

 

 

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

8 Responses

  1. bigga says:

    I’m not really very much of the on the web viewer to be honest your sites great, continue! I will do not delay- bookmark your site an extra chance later on. Cheers

  2. Yogesh Raheja says:

    @Bigga, Its an honor for our Team. Thanks you very much for your boosters and Welcome to Gurkulindia. 

  3. Selvyn says:

    Hi Yogesh,

    Good job with the Unix/Linux articles. However this one about crash dump seems a bit obsolete. I believe the current versions of RHEL use kdump/kexex for crash dump.

    Selvyn

  4. Yogesh Raheja says:

    Hello Selvyn, yes that very true. The newer versions of RHEL have kdump and kexex for crash dumps. Thanks for pointing this.

  5. Vathinth Raj T says:

    I am bit confuse on this .
    I have enabled the option set kernel.sysrq = 1 in /etc/syctl.conf.
    When my server gets hang , How I will get the crash dump for analyze the issue???
    plz help me to understand.

  6. Ramdev Ramdev says:

    Hi Vathinth, if you want to force the kernel dump when the system hung … you can use the following key sequence if you are using the regular AT keyboard and directly connected to linux console.

    Alt+PrintScreen+[CommandKey]

    Command key can be any of the following:

    m – dump information about memory allocation
    t – dump thread state information
    p – dump current CPU registers and flags
    c – intentionally crash the system (useful for forcing a disk or netdump)
    s – immediately sync all mounted filesystems
    u – immediately remount all filesystems read-only
    b – immediately reboot the machine
    o – immediately power off the machine (if configured and supported)
    f – start the Out Of Memory Killer (OOM)
    w – dumps tasks that are in uninterruptable (blocked) state

    >> in other case if you using a HP ILO you can force for a crash dump

    using the serial BREAK sequence i.e. Esc + Ctrl + b — which triggers the Magic SysRq event.

    • Vathinth Raj T says:

      HI Ramdev,

      Thanks for your response.

      After the giving the above mentioned combination of keys , where the dump will be stored?

      And also after changing the setting in /etc/sysctl.conf , Is the machine need reboot?

      Thanks in advance.
      Vathinth Raj T

      • Ramdev Ramdev says:

        I believe …. It should be under /cores//….. And you to activate the new sysctl settings either you can reboot the machine or you can use the “sysctl -p” command without reboot..

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us