Redhat Linux : Collect System Diagnosis report for the Support Call purpose

Red Hat Enterprise Linux 4.5 and previous

On a default installation the package Sysreport should be already installed. If not you need to install the package “sysreport-.rpm” with the following command

# rpm -ivh sysreport-.rpm

or, if your system is registered at the Red Hat Network “RHN”, simply running

# up2date -i sysreport

This will install the latest version of Sysreport on your system.

To collect the information you need to start troubleshooting just enter the command

# sysreport

and follow the instructions on screen. At the end you get a filename and the location where to find the compressed information collected by this script. Please keep this data for further support.

Please note that Sysreport will need some time to collect all the data, depending on the speed of the system and how many packages are installed.

In cases you experience that Sysreport seems to hang and will not return after a while, you may pass the parameter “-norpm” to the command. This will skip the checking of the RPM database which may be broken.

Red Hat Enterprise Linux 4.6 and later

The “sosreport” command is a tool that collects information about a Red Hat Enterprise Linux system. To run sosreport, the “sos” package must be installed. The package should be installed by default, but if the package is not installed, follow the steps below:

Installation on Red Hat Enterprise Linux 4.6 and later

If the system is registered with Red Hat Network (RHN), “sos” can be installed using the up2date command:

# up2date sos

Installation on Red Hat Enterprise Linux 5 and later

If the system is registered with RHN, use the yum command:

# yum install sos

If the system is not registered with RHN, the “sos” package can be downloaded from the RHN website or found on the installation CDs. The RPM command can be used to install the package on any version of Red Hat Enterprise Linux:

# rpm -Uvh sos-..rpm

To collect the system information to start troubleshooting just enter the command and follow the instructions

# sosreport

The sosreport will run for several minutes, according to the system, the running time maybe more longer. Once completed, sosreport will generate a compressed a bz2 file under /tmp. Normally, the size of the bz2 file will be about 3MB.

The sosreport has some plugins which can be turn on and off, the following command lists the plugins:

# sosreport -l

If Sosreport seems to hang and will not return after a while, you may pass the parameter “-k rpm.rpmva=off” to the command. This will skip to verify on all packages.

# sosreport -k rpm.rpmva=off

Even though Sysreport and Sosreport collects most of the needed data for analysis, it is suggested that the content of the directory “/var/log/” is provided, to get all relevant data (such as older message files, service related log files, mcelogs etc).

You might tar this data with the following command:

# tar czvf logfiles.tar.gz /var/log

Be cautious, one of my real experience :

As per the Redhat these tools are safe to run on production system at any time, but I had experienced a problem when I ran sosreport on a production machine which had failed power supply fan.

The actually Scenario is :

One fine morning, a linux server ( part of three node VCS cluster hosting a  critical application  ) was configured on HP hardware, had thrown an hardware alert  . As per the ILO logs , the machine had Power supply FAN issues.  To raise a RedHat support call we need sosreport output, then we started the command during the production time, which created more cpu and disk activity on the machine, which inturn raised the  temperature in the machine ( this is because the cooling fan already failed). And the over temperature in the CPU caused the  server to stop responding from external connections but still left the server pingable in network.

As per the VCS setup if one of the node crashes then the other node should automatically pick the applications and continue to operate, and in this case since the system was just hung ( didn’t respondig to external connectioned) but still pinging, the VCS couln’t  take any quick decision to failover the application to the running nodes which inturn caused all the customer connections to fail.  And to recover the machine, we had to halt the troubled node forcefully and manually failed over all the services manually to the working node.

This whole process took 20 mins, and later we had to deal with many customer escalations with a question
why the diagnosis ran during the production hours”. And after that sysadmins were instructed to take business team permission to run any  diagnosis on production server during production times.

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

16 Responses

  1. MS says:

    Good one Anna…

  2. Muby says:

    Thanks for sharing your experience :)

  3. shekar says:

    sos-1.7-9.35.el4

    I have a strange issue experienced running sos report sos-1.7-9.35.el4 ,after LUN presentd from EMC have atached to host ,
    ran sosreport and the filsystem is dissapeaed mounted on that emc device presented .there after i have to reboot the server and request SAN team to re- represent the LUN .
    it is working fine,as long we don’t run sosreport but when i run sosreport the file system mounted /Data is getting dissapeared

    ANy idea , where to check at the sosreport . why it is removing the filesystem .

  4. seema says:

    Hi shekar …. why did you run sos report ? I did  run sos report to send  report to  RHEL support …. In your case …. First check Your LUN  and file system  if ur missing anything ….think …. why did you ask san team to again re-represent LUN  … ? Was LUN not visible to OS before , was reboot required ? As per my experience sos has nothing to do with storage it only  gets system information …  

  5. shekar says:

    Thanks for your response S,Yes, you are correct the SOS report is to generate report for Linux OS .It is affecting only fFor Data FS ( 2 TB ) ,Initially FOr data ,we always request SAN team to present LUN to Host and using powermt command we generate psuedo device and mount them /dev/emcpowera/VGxyz /Data ..it is working fine as long as u don’t run sosreport. .Once i ran the sosreport the file system /Data is dissapeared from df -k output .i can’t export it or import the VG .The only option i have left to reboot the box and request the SAN team to re-present the same LUN ,This is a bug . even i reported to RH and till date no solution .I’m still investigting it .This is strange .if any one has come across , pl let me know .( i’m runnign RH 4.x on Virtual machine) .

    when I pvscan or pvs or vgs , i can’t find the device other than rootvg

    o/p Before runnign sosreport
    [root@host1 ~]# pvs
    PV VG Fmt Attr PSize PFree
    /dev/cciss/c0d0p1 rootvg lvm2 a- 4G 0 ( Native disk is not affected )
    /dev/emcpowerg VGxyz lvm2 a- 2T 0 ( affected and dissapear )

    o/p After running Sosreport

    [root@host1 Data]# pvs
    PV VG Fmt Attr PSize PFree
    /dev/cciss/c0d0p2 rootvg lvm2 a- 4G 0

    • Ramdev Ramdev says:

      hi Shekar, can you please run ” sosreport -vvv “, let me know the output that you see on the screen.
      Hi Seema, thanks for trying to help for this.

  6. Yogesh Raheja says:

    @Shekar, yes u r right. There was a bug in Linux 4.X and prior version with sosreport. But it sorted out in 5.X version of RHEL. And I think RHEL was not able to provide any BUG fix yet for older versions as you stated above also.

  7. shekar says:

    Thanks buddy , if the given LUN is in the format of EFI (ee) , it is getting attached once ,at the time of beginging .once we run sos report it is getting disappear , if we change the lun and format to native Linux LVM (8e) . even after sos report run it is holding the FS. ,This is tested .FYI

  8. shekar says:

    I have added in the past other FS with this cmd by checking the PP size available

    chfs -a size=+1G /sap

    similiarly if PP is available on rootvg

    If the root/ FS on AIX is 0 space left

    chfs -a size=+1G / pl advise

  9. Yogesh Raheja says:

    @Shekar, I have done this in past on AIX it worked perfectly (for root too) if you have space available. chfs -a size=+1g

  10. Yogesh Raheja says:

    Live Example is: yogesh-AIX#lsvg -o
    rootvg
    yogesh-AIX#lsvg rootvg
    VOLUME GROUP: rootvg VG IDENTIFIER: 00c589e500004c000000013671848d74
    VG STATE: active PP SIZE: 128 megabyte(s)
    VG PERMISSION: read/write TOTAL PPs: 319 (40832 megabytes)
    MAX LVs: 256 FREE PPs: 87 (11136 megabytes)
    LVs: 18 USED PPs: 232 (29696 megabytes)
    OPEN LVs: 17 QUORUM: 1 (Disabled)
    TOTAL PVs: 1 VG DESCRIPTORS: 2
    STALE PVs: 0 STALE PPs: 0
    ACTIVE PVs: 1 AUTO ON: yes
    MAX PPs per VG: 32512
    MAX PPs per PV: 1016 MAX PVs: 32
    LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
    HOT SPARE: no BB POLICY: relocatable
    PV RESTRICTION: none
    yogesh-AIX#lsvg -l rootvg
    rootvg:
    LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
    hd5 boot 1 1 1 closed/syncd N/A
    hd6 paging 64 64 1 open/syncd N/A
    hd8 jfs2log 1 1 1 open/syncd N/A
    hd4 jfs2 4 4 1 open/syncd /
    hd2 jfs2 48 48 1 open/syncd /usr
    hd9var jfs2 8 8 1 open/syncd /var
    hd3 jfs2 12 12 1 open/syncd /tmp
    hd1 jfs2 1 1 1 open/syncd /home
    hd10opt jfs2 9 9 1 open/syncd /opt
    hd11admin jfs2 1 1 1 open/syncd /admin
    livedump jfs2 2 2 1 open/syncd /var/adm/ras/livedump
    rootlv jfs2 2 2 1 open/syncd /home/root
    buildlv jfs2 2 2 1 open/syncd /build
    nmonlv jfs2 8 8 1 open/syncd /nmon
    hpovlv jfs2 4 4 1 open/syncd /var/opt/OV
    mksysblv jfs2 40 40 1 open/syncd /mksysb_image
    openvlv jfs2 16 16 1 open/syncd /usr/openv
    pdumplv sysdump 9 9 1 open/syncd N/A
    yogesh-AIX#
    yogesh-AIX#
    yogesh-AIX#df -g /
    Filesystem GB blocks Free %Used Iused %Iused Mounted on
    /dev/hd4 0.50 0.31 38% 10745 13% /
    yogesh-AIX#
    yogesh-AIX#chfs -a size=+1G /
    Filesystem size changed to 3145728
    yogesh-AIX#
    yogesh-AIX#df -g /
    Filesystem GB blocks Free %Used Iused %Iused Mounted on
    /dev/hd4 1.50 1.31 13% 10745 4% /
    yogesh-AIX#
    yogesh-AIX#lsvg rootvg
    VOLUME GROUP: rootvg VG IDENTIFIER: 00c589e500004c000000013671848d74
    VG STATE: active PP SIZE: 128 megabyte(s)
    VG PERMISSION: read/write TOTAL PPs: 319 (40832 megabytes)
    MAX LVs: 256 FREE PPs: 79 (10112 megabytes)
    LVs: 18 USED PPs: 240 (30720 megabytes)
    OPEN LVs: 17 QUORUM: 1 (Disabled)
    TOTAL PVs: 1 VG DESCRIPTORS: 2
    STALE PVs: 0 STALE PPs: 0
    ACTIVE PVs: 1 AUTO ON: yes
    MAX PPs per VG: 32512
    MAX PPs per PV: 1016 MAX PVs: 32
    LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
    HOT SPARE: no BB POLICY: relocatable
    PV RESTRICTION: none
    yogesh-AIX#
    yogesh-AIX#
    yogesh-AIX#
    yogesh-AIX#chfs -a size=-1G /
    Filesystem size changed to 1048576
    yogesh-AIX#df -g /
    Filesystem GB blocks Free %Used Iused %Iused Mounted on
    /dev/hd4 0.50 0.31 38% 10745 13% /
    yogesh-AIX#
    yogesh-AIX#exit

  11. Yogesh Raheja says:

    @Shekar, but if you have 0 space left than you wont be able to increase anyfile system in any OS for any LVM. Either it throw some error or the comamnd will hung.

  12. Kiran M.S says:

    I have few questions reg RHN

    1.who takes care of RHN part ?
    2.How to find whether server got registered in RHN ?
    3.On which RHN account server got registered ? Will the account id stored in any file?

    • Ramdev Ramdev says:

      @Kiran –

      >> RHN will be taken care by infrastucture – engineering teams – who ceritfies new packages / pactches that can be used in environment.
      >> Registered servers will have a digital ID stored in the file /etc/sysconfig/rhn/systemid which is actually generated when we run up2date –register command
      >> /etc/sysconfig/rhn/up2date will have the serverURL talks about RHN server of your network

  13. Kiran M.S says:

    Thxs a lot anna :)

  1. September 17, 2015

    […] Read – Collecting RedHat Diagnosis report for the Support Call purpose […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us