Creating Centralized Crash Analysis Server for Red hat Enterprise Linux Environment

Case Study : In our RedHat Enterprise Linux environment, transferring large vmcore files to external sites for support purpose can be extremely time consuming and/or not realistic.  And  at the same time  it is not feasible  to install the kernel-debuginfo and crash utility packages locally on every affected systems for local analysis.

                To overcome these challenges we want to create a centralize system for collecting vmcores from all other network servers in the network and perform the  crash analysis.

 

Overall Procedure:

1. kdump must to be properly configured and tested on all systems to generate a vmcore file for analysis.

    For the kdump installation and configuration procedure please check the post “Configuring KDump for RHEL5/RHEL6/RHEL7

2. The crash package must be installed on the centralized system that will be analyzing the core files.

# yum install crash

3. A working directory structure is recommended for storing the necessary files – this should include directories for kernel files, scripts, temporary cores and output files at minimum. These files will be rather large so proper space needs to be allocated accordingly. An example would be:

/cas/cores
/cas/kernels
/cas/output
/cas/scripts

4. Kernel specific vmlinux files matching the systems where the core was generated must be available on the centralize system for crash analysis. This can be accomplished manually or scripted depending on the number of different kernels needed.

        It may be necessary to manually create disabled rhel-*-server-debug repos for other versions of RHEL if using the example method below.

a. Download the kernel specific debuginfo RPM

[root@rhel6 kernels]# yumdownloader –disablerepo=\* –enablerepo=rhel-5-server-debug-rpms kernel-debuginfo-2.6.18-371.9.1.el5.x86_64

b. Extract the vmlinux file from the RPM using rpm2cio

[root@rhel6 kernels]# rpm2cpio kernel-debuginfo-2.6.18-371.9.1.el5.x86_64.rpm | cpio -idv ‘./usr/lib/debug/lib/modules/*/vmlinux’

c. If specifying the kernel version in the cpio path, the version inside the RPM may be $kernelversion.$arch or just

$kernelversion depending. The following command can be used to find the specific path if needed.

root@rhel6 kernels]# rpm2cpio kernel-debuginfo-2.6.18-371.9.1.el5.x86_64.rpm | cpio -idvt | grep vmlinux

d. The RPM can be deleted to save space if needed.

5. An input file can be very helpful when using the crash utility. Here is an example of a file that would need to be modified to match local directory structure and use:

[root@rhel6 cores]# cat /cas/scripts/crash-input.txt
!mkdir /cas/output/tmp 2>/dev/null
sys > /cas/output/tmp/sys
bt > /cas/output/tmp/bt
bt -a > /cas/output/tmp/bt-a
ps > /cas/output/tmp/ps
runq > /cas/output/tmp/runq
log > /cas/output/tmp/log
kmem -i > /cas/output/tmp/kmem-i
kmem -f > /cas/output/tmp/kmem-f
mod > /cas/output/tmp/mod
swap > /cas/output/tmp/swap
mount > /cas/output/tmp/mount
!tar zcf /cas/output/crash-analysis-$(date ‘+%Y%m%d_%H%M%S’).tar.gz /cas/output/tmp 2>/dev/null
!echo -e “#”
!echo -e “# Please attach the generated ‘/cas/output/crash-analysis-DATE.tar.gz'”
!echo -e “# and a sosreport from system that crashed to a Red Hat support case at”
!echo -e “# https://access.redhat.com/support/cases/”
!echo -e “#”
!echo -e “# Once files are uploaded please provide a case comment containing any additional”
!echo -e “# details about the issue as well as the following:”
!echo -e ” Please find attached crash analysis archive (which contains basic information”
!echo -e ” from the vmcore) along with the sosreport from the effected system.\n”
!echo -e ” Bandwidth is limited and transfer of the complete vmcore may take”
!echo -e ” prohibitively long. Please let us know what other information we can provide.\n”
quit

 
6. Finally, the crash utility can be run using the input file, extracted vmlinux file and vmcore from the remote system to generate a set of analysis files for upload to Red Hat. Crash can also be run manually to provide specific results or Red Hat staff can gather additional information over a remote session.

[root@rhel6 cores]# crash -s -i /cas/scripts/crash-input.txt /cas/kernels/usr/lib/debug/lib/modules/2.6.18-371.9.1.el5/vmlinux /cas/cores/vmcore-rhel5

 

Sometimes, Red Hat support just ask you some pre-analysis instead uploading the huge crash dump. In those cases you use this procedure  to use crash utility to perform some pre-analysis on vmcore

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

1 Response

  1. October 6, 2015

    […] Read – Creating Centralized Crash Analysis Server for RHEL Environment […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us