Linux Admin Troubleshooting Reference – Kernel Panic and System Crash – Redhat Enterprise Linux (RHEL6)

What is the meaning of a Linux System Crash?

 
Crash is a generic term used usually to say that the system has come to halt and no progress is observed. The system seems unresponsive or has already rebooted.

Kernel Panic – A voluntary halt to all system activity when an abnormal situation is detected by the kernel. A Kernel panic is an action taken by an operating system upon detecting an Internal fatal error from which it cannot safely recover. And in Linux these Kernel Panics can be caused by different reasons

 

    • Hardware: Machine Check Exceptions
    • Error Detection and Correction (EDAC)
    • Non-Maskable Interrupts (NMIs)
      • Hardware NMI Button
      • NMI Watch Dog
      • unknown_nmi_panic
      • panic_on_unrecovered_nmi
      • panic_on_io_nmi
    • Software related BUG() macro 
    • Software related  Bad pointer handling
    • Software related Pseudo-hangs
    • Software related Out-of-Memory killer

 

Hardware: Machine Check Exceptions

Hardware Machine Check Exceptions normally caused by the the Component failures detected and reported by the hardware via an exception, and they typically looks like:

kernel: CPU 0: Machine Check Exception: 4
Bank 0: b278c00000000175
kernel: TSC 4d9eab664a9a60
kernel: Kernel panic – not syncing: Machine check

Sample Scenario 1 : 

System hangs or kernel panics with MCE (Machine Check Exception) in /var/log/messages file.
System was not responding. Checked the messages in netdump server. Found the following messages …”Kernel panic – not syncing: Machine check”.
System crashes under load.
System crashed and rebooted.
Machine Check Exception panic 

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

 

Error Detection and Correction (EDAC)

 

Normally, EDAC errors caused by Hardware mechanism to detect and report memory chip and PCI transfer errors, and reported in /sys/devices/system/edac/{mc/,pci} and logged by the kernel as:

EDAC MC0: CE page 0x283, offset 0xce0, grain 8,
syndrome 0x6ec3, row 0, channel 1 “DIMM_B1”:
amd76x_edac

 All the Informational EDAC messages (such as a corrected ECC error) are printed to the system log, where as critical EDAC messages (such as exceeding a hardware-defined temperature threshold) trigger a kernel panic.

 

 

Sample Scenario 2 : 

Console Screen having the messages as below

Northbridge Error, node 1, core: -1
K8 ECC error.
EDAC amd64 MC1: CE ERROR_ADDRESS= 0x101a793400
EDAC MC1: INTERNAL ERROR: row out of range (-22 >= 8)
EDAC MC1: CE – no information available: INTERNAL ERROR
EDAC MC1: CE – no information available: amd64_edacError Overflow

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Non-Maskable Interrupts (NMIs)

 

A Non maskable interrupt (NMI) is an interrupt that is unable to be ignored/masked out by standard operating system mechanisms. A non-maskable interrupt (NMI) cannot be ignored, and is generally used only for critical hardware errors however recent changes in behavior has added additional functionality of:

1) NMI button.

The NMI This can be used to signal the operating system when other standard input mechanisms (keyboard, ssh, network) have ceased to function.
It can be used to create an intentional panic for additional debugging. It may not always be a physical button.
It may be presented through an iLO or Drac Interface.

Unknown NMIs – The kernel has mechanisms to handle certain known NMIs appropriately, unknown ones typically result in kernel log warnings such as:

Uhhuh. NMI received.
Dazed and confused, but trying to continue
You probably have a hardware problem with your RAM chips
Uhhuh. NMI received for unknown reason 32.
Dazed and confused, but trying to continue.
Do you have a strange power saving mode enabled?
 

These unknown NMI messages can be produced by ECC and other hardware problems. The kernel can be configured to panic when these are received
 
though this sysctl:

kernel.unknown_nmi_panic=1
 
This is generally only enabled for troubleshooting

Sample Scenario 3:

 

The following error message appearing in /var/log/messages

kernel: Dazed and confused, but trying to continue
kernel: Do you have a strange power saving mode enabled?
kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0
kernel: Dazed and confused, but trying to continue
kernel: Do you have a strange power saving mode enabled?
kernel: Uhhuh. NMI received for unknown reason 31 on CPU 0.

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

2) A Watchdog-like software on the system that monitors for perceived system hangs

The NMI watchdog monitors system interrupts and sends an NMI if the system appears to have hung.
On a normal system hundreds of device and timer interrupts are received per second. If there are no interrupts in a 30 second interval*,
the NMI watchdog assumes that the system has hung and sends an NMI to the system to trigger a kernel panic or restart.

How an NMI watchdog works

A standard system level watchdog waits for regular events to fire and reboots the machine if no event is received within a designated timeframe. The NMI watchdog is no different. When using the NMI watchdog the system generates periodic NMI interrupts, and the kernel can monitor whether any CPU has locked up and print out debugging messages if so.

Enabling NMI Watchdog

The Red Hat Enterprise Linux 6 kernel is built with NMI watchdog support on currently supported x86 and x86-64 platforms.

Ensure NMI is being used:

For SMP machines and Single processor systems with an IO-APIC use nmi_watchdog=1.

For Single processor systems without an IO-APIC use            nmi_watchdog=2.

Verification to check NMI watchdog working

Boot the system with the the parameter as stated above and check the /proc/interrupts file for the “NMI count” line. This value should be non zero and increase over time. If the value is zero and does not increase over time the wrong NMI watchdog parameter has been used, change

If it is still zero then log a problem, you probably have a processor that needs to be added to the nmi code.

Here is an example from /etc/grub.conf for systems which utilize the GRUB boot loader:

title Red Hat Enterprise Linux Server (2.6.32-358.6.1.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-358.6.1.el6.x86_64 ro root=/dev/mapper/vg_worklaptop-lv_root crashkernel=auto rd_LVM_LV=vg_worklaptop/lv_root rhgb quiet nmi_watchdog=1
initrd /initramfs-2.6.32-358.6.1.el6.x86_64.img

To determine if the NMI watchdog was properly activated, check the /proc/interrupts file. The NMI interrupt should display a non-zero value. If the NMI interrupt displays a zero, alter the nmi_watchdog value, restart the system, and examine this file again. If a zero is still displayed, then the processor in the test system is not supported by the NMI watchdog code.

The output, when functioning correctly, should look similar to the following:

[root@work-laptop wmealing]# cat /proc/interrupts | grep ^NMI
NMI: 861 636 377 357 Non-maskable interrupts

Each processor core has an NMI count. These should all be increasing over time. The above example is a quad core system.

System wide NMI settings

The NMI settings can be configured at runtime by using the sysctl interface.

In the /etc/sysctl.conf, to enable, set:

kernel.nmi_watchdog = 1

To disable, set:

            kernel.nmi_watchdog = 0

Note that this does not enable the functionality, the kernel parameter is required to correctly enable the NMI watchdog.

unknown_nmi_panic

A feature was introduced in kernel 2.6.9 which helps to make easier the process of diagnosing system hangs on specific hardware.
The feature utilizes the kernels behavior when dealing with unknown NMI sources. The behavior is to allow it to panic, rather than handle the unknown nmi source. This feature cannot be utilized on systems that also use the NMI Watchdog or some oprofile (and other tools that use performance metric features as both of these also make use of the undefined NMI interrupt. If unknown_nmi_panic is activated with one of these features present, it will not work.

Note that this is a user-initiated interrupt which is really most useful for helping to diagnose a system that is experiencing system hangs for unknown reasons.

To enable this feature, set the following system control parameter in the /etc/sysctl.conf file as follows:

kernel.unknown_nmi_panic = 1

To disable, set:

          kernel.unknown_nmi_panic = 0

Once this change has taken effect, a panic can be forced by pushing the system’s NMI switch. Systems that do not have an NMI switch can still use the NMI Watchdog feature which will automatically generate an NMI if a system hang is detected.

panic_on_unrecovered_nmi

Some systems may generate an NMI based on vendor configuration, such as power management, low battery etc. It may be important to set this if your system is generating NMI’s in a known-working environment.

To enable this feature, set the following system control parameter in the /etc/sysctl.conf file as follows:

kernel.panic_on_unrecovered_nmi = 1

To disable, set:

              kernel.panic_on_unrecovered_nmi = 0

panic_on_io_nmi

This setting was only available in Red Hat Enterprise Linux 6. When set, this will cause a kernel panic when the kernel receives an NMI caused by an Input/Output error.

Sample Scenario 4 : 

Console Shows following Error Message

NMI: IOCK error (debug interrupt?)
CPU 0
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge mptctl mptbase bonding be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom hpilo bnx2 serio_raw shpchp pcspkr sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage qla2xxx scsi_transport_fc ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-194.17.4.el5 #1
RIP: 0010:[<ffffffff8019d550>] [<ffffffff8019d550>] acpi_processor_idle_simple+0x14c/0x30e
RSP: 0018:ffffffff803fbf58 EFLAGS: 00000046
RAX: 0000000000d4d87e RBX: ffff81061e10a160 RCX: 0000000000000908
RDX: 0000000000000915 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000000d4d87e R08: ffffffff803fa000 R09: 0000000000000039
R10: ffff810001005710 R11: 0000000000000000 R12: 0000000000000000
R13: ffff81061e10a000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff803ca000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000009013954 CR3: 000000060799d000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff803fa000, task ffffffff80308b60)
Stack: ffff81061e10a000 ffffffff8019d404 0000000000000000 ffffffff8019d404
0000000000090000 0000000000000000 0000000000000000 ffffffff8004923a
0000000000200800 ffffffff80405807 0000000000090000 0000000000000000
Call Trace:
[<ffffffff8019d404>] acpi_processor_idle_simple+0x0/0x30e
[<ffffffff8019d404>] acpi_processor_idle_simple+0x0/0x30e
[<ffffffff8004923a>] cpu_idle+0x95/0xb8
[<ffffffff80405807>] start_kernel+0x220/0x225
[<ffffffff8040522f>] _sinittext+0x22f/0x236
 
Code: 89 ca ed ed 41 89 c4 41 8a 45 1c 83 e0 30 3c 30 75 15 f0 ff

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: The BUG() macro 

This kind of  kernel panic normally caused by the kernel code when an abnormal situation is seen , that indicates a programming error . And normally the Output looks like:

Kernel BUG at spinlock:118
invalid operand: 0000 [1] SMP
CPU 0
 

Sample Scenario 5:

 NFS client kernel crash because async task already queued hitting BUG_ON(RPC_IS_QUEUED(task)); in __rpc_execute
kernel BUG at net/sunrpc/sched.c:616!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 8
Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss pcc_cpufreq sunrpc power_meter hpilo
hpwdt igb mlx4_ib(U) mlx4_en(U) raid0 mlx4_core(U) sg microcode serio_raw iTCO_wdt
iTCO_vendor_support ioatdma dca shpchp ext4 mbcache jbd2 raid1 sd_mod crc_t10dif mpt2sas
scsi_transport_sas raid_class ahci dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: scsi_wait_scan]
 
Pid: 2256, comm: rpciod/8 Not tainted 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Gen8/
RIP: 0010:[<ffffffffa01fe458>] [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]

Process rpciod/8 (pid: 2256, threadinfo ffff882016152000, task ffff8820162e80c0)

Call Trace:
[<ffffffffa01fe4d0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa01fe4e5>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
Code: db df 2e e1 f6 05 e0 26 02 00 40 0f 84 48 fe ff ff 0f b7 b3 d4 00 00 00 48 c7
c7 94 39 21 a0 31 c0 e8 b9 df 2e e1 e9 2e fe ff ff <0f> 0b eb fe 0f b7 b7 d4 00 00 00
31 c0 48 c7 c7 60 63 21 a0 e8
RIP [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

 

Software: Bad pointer handling

 

This kind of kernel panics typically indicates a programming error and normally appear as below: 

NULL pointer dereference at 0x1122334455667788 ..
or
Unable to handle kernel paging request at virtual address 0x11223344

One of the most common reason for this kind of error is possible memory corruption

 

Sample Scenario 6 : 

  • NFS client kernel panics when doing an ls in the directory of a snapshot that has already been removed.
  • NFS client kernel panics under certain conditions when connected to NFS server either NetApp or Solaris ZFS
  • Kernel crashes with message

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff81192957>] commit_tree+0x77/0x100
PGD 7ff2e69067 PUD 7feaf59067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:07:00.0/vendor
CPU 64
Modules linked in: nls_utf8 fuse mptctl mptbase autofs4 nfs lockd fscache(T) nfs_acl auth_rpcgss bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc smbus(U) ipmi_devintf ipmi_si ipmi_msghandler sunrpc cpufreq_ondemand acpi_cpufreq freq_table nf_conntrack_ftp ipt_REJECT ipt_LOG iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log microcode sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ixgbe mdio igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi ata_piix megaraid_sas dm_mod [last unloaded: scsi_wait_scan]
 
Modules linked in: nls_utf8 fuse mptctl mptbase autofs4 nfs lockd fscache(T) nfs_acl auth_rpcgss bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc smbus(U) ipmi_devintf ipmi_si ipmi_msghandler sunrpc cpufreq_ondemand acpi_cpufreq freq_table nf_conntrack_ftp ipt_REJECT ipt_LOG iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log microcode sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ixgbe mdio igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi ata_piix megaraid_sas dm_mod [last unloaded: scsi_wait_scan]
Pid: 79910, comm: ls Tainted: G —————- T 2.6.32-131.6.1.el6.x86_64 #1 PRIMERGY RX900 S1
RIP: 0010:[<ffffffff81192957>] [<ffffffff81192957>] commit_tree+0x77/0x100
RSP: 0018:ffff885f1484dab8 EFLAGS: 00010246
RAX: ffff881f5f43d3e8 RBX: ffff885f1484dab8 RCX: ffff885f1484dab8
RDX: ffff881f5f43d3e8 RSI: ffff881f5f43d3e8 RDI: ffff885f1484dab8
RBP: ffff885f1484dae8 R08: ffff881f5f43d3e8 R09: 0000000000000000
R10: ffff882080440a40 R11: 0000000000000000 R12: 0000000000000000
R13: ffff881f5f43d380 R14: ffff881f5fcba2c0 R15: 0000000000000000
FS: 00007f9b188177a0(0000) GS:ffff88011c700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000007fecaf5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 79910, threadinfo ffff885f1484c000, task ffff881fc4164b00)
Stack:
ffff881f5f43d3e8 ffff881f5f43d3e8 ffff881f5f43d380 ffff885f1484db08
<0> ffff881f5fcba2c0 ffff885f1484ddd8 ffff885f1484db48 ffffffff81192c6f
<0> ffff881c94a4d200 000000001484dbf8 ffff885f1484db08 ffff885f1484db08
Call Trace:
[<ffffffff81192c6f>] attach_recursive_mnt+0x28f/0x2a0
[<ffffffff81192d80>] graft_tree+0x100/0x140
[<ffffffff814dc686>] ? down_write+0x16/0x40
[<ffffffff81192e5f>] do_add_mount+0x9f/0x160
[<ffffffffa045ce2f>] nfs_follow_mountpoint+0x1bf/0x570 [nfs]
[<ffffffff811810a0>] do_follow_link+0x120/0x440
[<ffffffffa03112e0>] ? put_rpccred+0x50/0x150 [sunrpc]
[<ffffffff81180eeb>] __link_path_walk+0x78b/0x820
[<ffffffff8118164a>] path_walk+0x6a/0xe0
[<ffffffff8118181b>] do_path_lookup+0x5b/0xa0
[<ffffffff811819a7>] user_path_at+0x57/0xa0
[<ffffffff81041594>] ? __do_page_fault+0x1e4/0x480
[<ffffffff810ce97d>] ? audit_filter_rules+0x2d/0xa10
[<ffffffff81177cac>] vfs_fstatat+0x3c/0x80
[<ffffffff81177d5e>] vfs_lstat+0x1e/0x20
[<ffffffff81177d84>] sys_newlstat+0x24/0x50
[<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff814e054e>] ? do_page_fault+0x3e/0xa0
[<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 83 e8 68 eb 12 0f 1f 80 00 00 00 00 4c 89 a0 c0 00 00 00 48 8d 42 98 48 8b 50 68 48 8d 48 68 48 39 cb 0f 18 0a 75 e5 48 8b 45 d0 <49> 8b 54 24 18 48 39 d8 74 15 48 8b 0a 48 8b 5d d8 48 89 50 08
RIP [<ffffffff81192957>] c
ommit_tree+0x77/0x100

RSP <ffff885f1484dab8>
CR2: 0000000000000018

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: Pseudo-hangs

 

This are the common situations, that we commonly encounter where the system appears to be hung, but some progress is being made, there are several reasons for this kind of behaviour, and they are 

  • Livelock          if running a realtime kernel, application load could be too high, leading the system into a state where it becomes effectively unresponsive in a “live lock/ busy wait” state. The system is not actually hung, but just moving so slowly that it appears to be hung.
  • Thrashing – continuous swapping with close to no useful processing done
  • Lower zone starvation – on i386 the low memory has a special significance and the system may “hang” even when there’s plenty of free memory
  • Memory starvation in one node in a NUMA system

 

Normally, Hangs which are not detected by the hardware are trickier to debug:

  • Use [sysrq + t] to collect process stack traces when possible
  • Enable the NMI watchdog which should detect those situations
  • Run hardware diagnostics when it’s a hard hang: memtest86, HP diagnostics

Sample Scenario 7: 

The system is frequently getting hung and following error messages are getting logged in /var/log/messages file while performing IO operations on the /dev/cciss/xx devices:

INFO: task cmaperfd:5628 blocked for more than 120 seconds.
“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
cmaperfd D ffff810009025e20 0 5628 1 5655 5577 (NOTLB)
ffff81081bdc9d18 0000000000000082 0000000000000000 0000000000000000
0000000000000000 0000000000000007 ffff81082250f040 ffff81043e100040
0000d75ba65246a4 0000000001f4db40 ffff81082250f228 0000000828e5ac68
Call Trace:
[<ffffffff8803bccc>] :jbd2:start_this_handle+0x2ed/0x3b7
[<ffffffff800a3c28>] autoremove_wake_function+0x0/0x2e
[<ffffffff8002d0f4>] mntput_no_expire+0x19/0x89
[<ffffffff8803be39>] :jbd2:jbd2_journal_start+0xa3/0xda
[<ffffffff8805e7b0>] :ext4:ext4_dirty_inode+0x1a/0x46
[<ffffffff80013deb>] __mark_inode_dirty+0x29/0x16e
[<ffffffff80041bf5>] inode_setattr+0xfd/0x104
[<ffffffff8805e70c>] :ext4:ext4_setattr+0x2db/0x365
[<ffffffff88055abc>] :ext4:ext4_file_open+0x0/0xf5
[<ffffffff8002cf2b>] notify_change+0x145/0x2f5
[<ffffffff800e45fe>] sys_fchmod+0xb3/0xd7

 

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: Out-of-Memory killer

In certain memory starvation cases, the OOM killer is triggered to force the release of some memory by killing a “suitable” process.  In severe starvation cases, the OOM killer may have to panic the system when no killable processes are found: 

Kernel panic – not syncing:
Out of memory and no killable processes…

The kernel can also be configured to always panic during an OOM by setting the vm.panic_on_oom = 1 sysctl.

 

Sample Scenario 8 : 

When the system panics kdump starts, but kdump hangs and does not output a vmcore. I see following error messages on the console:

Kernel panic - not syncing: Out of memory and no killable processes...

 

 Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

I am just preparing my lab systems ready to give a demo on kernel crash utility to analyse the kernel panic issues. And also diagnosis and root causes to the the kernel panic  scenarios discussed in this post.

 

Please let me know , if you  have experienced  any other kind of kernel panic incidents which I have missed to refer here, so that it will be useful for others.

 

 

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

8 Responses

  1. Shahil says:

    Hi Sir How can we Resolved Kernel Panic Error in RHEL 6

  2. Ramdev Ramdev says:

    Hi Shahil,  I am working on another post for the diagnosis and some of the sample solutions for the scenarios that I have mentioned in this post. You will see them soon.

  3. Hemant says:

    Hi sir nice post for troubleshooting….and before few days on one of my CENTOS 6 machine i get error ” EXT4-fs error (device dm-0) ext4_find_entry: reading directory # 15527 offset 0″
     
    I searched all over net but didn’t get any solution…so i formatted OS. if you have any solution to above problem please add in above  troubleshooting post…
    Thanks

  4. Shahil says:

    @ Ramdev Sir,

    Thank you for your response I am waiting for Soultion of Kernel panice Errors.
    Sir, could you post about Qmail Installtion on Centos 6, I tried it on Centos 5 Its working fine But In Centos 6 Its not working.

  5. Ramdev Ramdev says:

    Hi Shahil, Will try to add the qmail configuration to the  linux articles.

  6. Ganesh.E says:

    Ramdev ji… I have a question…
    Using TOP command, how can I view/see status of particular processor in multiprocessor environment..? Pls. help me…

  1. September 16, 2015

    […] Read  – Troubleshooting Kernel Panic Issues – Part 1 . […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us