RHEL : Examining Red Hat Linux kernel state using Sysrq key combinations

The internal state of a kernel based on Unix can provide valuable information on current system state. If a user process, or the kernel, is hanging, then the more information that can be gathered at that point, the greater the chance of a good diagnosis.

Under Solaris  ON THE SPARC platform there are well known mechanisms for gathering stack traces, processor states and memory states. Under Linux, this can appear to be more of a black art.

This document sets out to document the information that can be captured, hopefully as early as possible, to improve the chances of a good diagnosis. 


Comparisons with Sun SPARC systems.

For a Sun system, the Stop-A key sequence (or send break from a serial console) will drop a system to the ok prompt. From this point, crash dumps can be forced, or register/cpu states can be examined.

Under Linux, this ability is integrated in the kernel, and triggered using alt-sysrq key sequences.

Enabling Sysrq.

The sysrq feature needs to be enabled before it can be used. It is disabled by default on RHEL 3 and 4.

To enable the feature, edit /etc/sysctl.conf and set the value below to equal 1

 # Controls the System Request debugging functionality of the kernel
kernel.sysrq = 1

Forcing sysrq

On X4200/X4100 Servers, once connected to the SP console (start /SP/console from ILOM prompt), and then press Esc followed by shift+b to send break, and then press the key corresponding to the sysrq-command to send.

On V65x Servers, send a break to the console, and then press the key corresponding to the sysrq-command to send.

On V20z and V40z, once connected via the platform console, press ^Ecl0<letter> to send the sysrq-command.

On Blades (B100x, B200x) send a break from the SC console, then press the letter corresponding to the sysrq-command in the serial console session to the blade.

This letter keystroke needs to be performed within 5 seconds of the break being sent. A ? character will print the menu of available options.

List of current (Linux-2.4.21) valid key presses

SysRq : HELP : loglevel0-8 reBoot Crash tErm kIll saK showMem Off showPc unRaw Sync showTasks Unmount shoWcpus

Note: Although the above menu displays characters in upper-case as the key to selection and are shown below in square brackets. They should be entered as lower-case to the ‘sysrq’ command, as it does not accept upper-case characters and will display something similar to the above menu above if upper-case is sent.

The correct keypress is in the square brackets

reBoot ? [B] ? This will reboot the system

Crash ? [C] ? This will force panic the system, by defererencing a pointer then reading from that address.

If diskdump or netdump are configured (see Technical Instruction 210668) then a crash dump can be forced.


 va64-v20zc-gmp02 login: [halt sent]
SysRq : Crashing the kernel by request
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
printing rip:
ffffffff801f66b0
PML4 8a1c7067 PGD 89f8e067 PMD 0
Oops: 0002
CPU 0
Pid: 0, comm: swapper Not tainted
RIP: 0010:[<ffffffff801f66b0>]{sysrq_handle_crash+0}
RSP: 0018:ffffffff805e6280  EFLAGS: 00010292
RAX: 000000000000001f RBX: ffffffff80445cd0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff80619f18 RDI: 0000000000000063
RBP: 0000000000000000 R08: 000000000000000d R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000063
R13: 0000000000000000 R14: ffffffff80619f18 R15: 0000000000000006
FS:  0000002a969654c0(0000) GS:ffffffff805e1440(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
 Call Trace: [<ffffffff801f6d12>]{__handle_sysrq_nolock+146}
[<ffffffff801f6c48>]{handle_sysrq+72} [<ffffffff801eedd5>]{receive_chars+485}
[<ffffffff801ef2b6>]{rs_interrupt_single+150} [<ffffffff8011317f>]{handle_IRQ_event+95}
[<ffffffff80113422>]{do_IRQ+274} [<ffffffff8010de20>]{default_idle+0}
[<ffffffff8010de20>]{default_idle+0} [<ffffffff80110807>]{common_interrupt+95}
<EOI> [<ffffffff8011fb45>]{thread_return+0} [<ffffffff8010de3e>]{default_idle+30}
[<ffffffff8010de20>]{default_idle+0} [<ffffffff8010dec9>]{cpu_idle+73} 
 <SNIP>
 CPU frozen: #0#1
CPU#0 is executing diskdump.
start dumping

tErm – [E] – Send Term (sig 15) to all processes except init

kIll – [I] – Send Kill (sig 9) to all processes except init

saK – [K] – Kill all processes on currently active virtual console. Should give a login prompt, that is secure (e.g. not a user process trying to look like a login prompt).

ShowMem ? [M] – This will dump the following information ? the system will continue running.


 SysRq : Show Memory
 Mem-info:
Zone:DMA freepages:     0 min:     0 low:     0 high:     0
Zone:Normal freepages:358380 min:  1246 low:  8923 high: 12889
Zone:HighMem freepages:     0 min:     0 low:     0 high:     0
Zone:DMA freepages:  2529 min:     0 low:     0 high:     0
Zone:Normal freepages:382475 min:  1278 low:  9149 high: 13212
Zone:HighMem freepages:     0 min:     0 low:     0 high:     0
Free pages:      743384 (     0 HighMem)
( Active: 28480/8679, inactive_laundry: 2665, inactive_clean: 0, free: 743384 )
aa:0 ac:0 id:0 il:0 ic:0 fr:0
aa:676 ac:12917 id:7391 il:2262 ic:0 fr:358381
aa:0 ac:0 id:0 il:0 ic:0 fr:0
aa:0 ac:0 id:0 il:0 ic:0 fr:2529
aa:1446 ac:13441 id:1288 il:403 ic:0 fr:382475
aa:0 ac:0 id:0 il:0 ic:0 fr:0
17981*4kB 51522*8kB 28603*16kB 10636*32kB 2040*64kB 123*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 1433524kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
210925 pages of slabcache
82 pages of kernel stacks
123 lowmem pagetables, 115 highmem pagetables
Free swap:       2040244kB
1032047 pages of RAM
746589 free pages
33834 reserved pages
27394 pages shared
0 pages swap cached
Buffer memory:    74448kB
Cache memory:    76640kB
CLEAN: 3301 buffers, 13183 kbyte, 67 used (last=3301), 0 locked, 0 dirty 0 delay
 Red Hat Enterprise Linux AS release 3 (Taroon Update 4)
Kernel 2.4.21-27.ELsmp on an x86_64

Off – [O] – Turn the system off (if supported by hardware)

showPc ? [P] (example from i386 Xeon) – shows register state (program counter)

 SysRq : Show Regs
 Pid/TGid: 0/0, comm:              swapper
EIP: 0060:[<c0109129>] CPU: 3
EIP is at default_idle [kernel] 0x29 (2.4.21-27.ELsmp)
ESP: 080b:c01091c2 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c0109100 ECX: c043c680 EDX: c4956000
ESI: c4956000 EDI: c4956000 EBP: c0109100 DS: 0068 ES: 0068 FS: 0000 GS: 0000
CR0: 8005003b CR2: b75f7000 CR3: 062e1f40 CR4: 000006f0
Call Trace:   [<c01091c2>] cpu_idle [kernel] 0x42 (0xc4957fb0)
[<c01295e3>] printk [kernel] 0x153 (0xc4957fcc)

showTasks ? [T] – shows all tasks running with stack traces

SysRq : Show State

                          free                        sibling
task             PC    stack   pid father child younger older
init          S 00000002  2604     1      0     6       2       (NOTLB)
Call Trace:   [<c0123f14>] schedule [kernel] 0x2f4 (0xc61f1ea0)
[<c0134f65>] schedule_timeout [kernel] 0x65 (0xc61f1ee4)
[<c015910c>] __get_free_pages [kernel] 0x1c (0xc61f1eec)
[<c0179071>] __pollwait [kernel] 0x31 (0xc61f1ef0)
[<c0134ef0>] process_timeout [kernel] 0x0 (0xc61f1f04)
[<c017933b>] do_select [kernel] 0x13b (0xc61f1f1c)
[<c01797de>] sys_select [kernel] 0x34e (0xc61f1f60)
 migration/0   S 00000000  5500     2      0             3     1 (L-TLB)
Call Trace:   [<c0123f14>] schedule [kernel] 0x2f4 (0xc4955f68)
[<c01258f0>] migration_task [kernel] 0x0 (0xc4955f9c)
[<c0125bfb>] migration_task [kernel] 0x30b (0xc4955fac)
[<c01258f0>] migration_task [kernel] 0x0 (0xc4955fc4)
[<c01258f0>] migration_task [kernel] 0x0 (0xc4955fe0)
[<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xc4955ff0)

<SNIP>

Contains full stack for every process on the system, and lists what each cpu is running

unRaw – [R] – Forces raw terminal mode

Sync – [S] – syncs all mounted file systems, flushes all pending writes

Unmount – [U] – Syncs, unmounts and then remounts all filesystems as read only.

shoWcpus ? [W] (example from dual proc, HT enabled Xeon)

 SysRq : Show CPUs
CPU2:
c63f5e74 00000002 c01cea1f 00000000 c03b2d34 00000077 00000006 c01cecaa
00000077 c63f5f7c 00000000 00000000 00000000 00000000 c63f5f7c c01cec0d
00000077 c63f5f7c 00000000 00000000 f66d6000 c03ad438 c63f5f1c f7ee1d80
Call Trace:   [<c01cea1f>] sysrq_handle_showcpus [kernel] 0xf (0xc63f5e7c)
[<c01cecaa>] __handle_sysrq_nolock [kernel] 0x7a (0xc63f5e90)
[<c01cec0d>] handle_sysrq [kernel] 0x5d (0xc63f5eb0)
[<c01c5f06>] receive_chars [kernel] 0x1d6 (0xc63f5ed4)
[<c0134933>] update_process_time_intertick [kernel] 0x53 (0xc63f5ef0)
[<c01c64ca>] rs_interrupt_single [kernel] 0x12a (0xc63f5f04)
[<c010dd39>] handle_IRQ_event [kernel] 0x69 (0xc63f5f30)
[<c010df79>] do_IRQ [kernel] 0xb9 (0xc63f5f50)
[<c010dec0>] do_IRQ [kernel] 0x0 (0xc63f5f74)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f5f7c)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f5f90)
[<c0109129>] default_idle [kernel] 0x29 (0xc63f5fa4)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc63f5fb0)
[<c01295e3>] printk [kernel] 0x153 (0xc63f5fcc)
 CPU3:
c4957f64 00000003 c011c91f 00000000 00001f7c c03f2caa c0109100 00000000
c4956000 c4956000 c4956000 c0109100 00000000 00000068 00000068 fffffffb
c0109129 00000060 00000246 c01091c2 0702080b 00000000 00000000 00000000
Call Trace:   [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc4957f6c)
[<c0109100>] default_idle [kernel] 0x0 (0xc4957f7c)
[<c0109100>] default_idle [kernel] 0x0 (0xc4957f90)
[<c0109129>] default_idle [kernel] 0x29 (0xc4957fa4)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc4957fb0)
[<c01295e3>] printk [kernel] 0x153 (0xc4957fcc)
 CPU0:
c03f1f88 00000000 c011c91f 00000000 00001fa0 c03f2caa c0109100 c043b280
c03f0000 c03f0000 c03f0000 c0109100 00000000 00000068 00000068 fffffffb
c0109129 00000060 00000246 c01091c2 0002080b 00099800 c0107000 0008e000
Call Trace:   [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc03f1f90)
[<c0109100>] default_idle [kernel] 0x0 (0xc03f1fa0)
[<c0109100>] default_idle [kernel] 0x0 (0xc03f1fb4)
[<c0109129>] default_idle [kernel] 0x29 (0xc03f1fc8)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc03f1fd4)
[<c0107000>] stext [kernel] 0x0 (0xc03f1fe0)
 CPU1:
c63f7f64 00000001 c011c91f 00000000 00001f7c c03f2caa c0109100 c043b280
c63f6000 c63f6000 c63f6000 c0109100 00000000 00000068 00000068 fffffffb
c0109129 00000060 00000246 c01091c2 0102080b 00000000 00000000 00000000
Call Trace:   [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc63f7f6c)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f7f7c)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f7f90)
[<c0109129>] default_idle [kernel] 0x29 (0xc63f7fa4)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc63f7fb0)
[<c01292b3>] call_console_drivers [kernel] 0x63 (0xc63f7fc4)
[<c01295e3>] printk [kernel] 0x153 (0xc63f7ffc)

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

3 Responses

  1. May 3, 2011

    […] the original: RHEL : Examining Red Hat Linux kernel state using Sysrq key … Posted in: Kernels ADD […]

  2. May 3, 2011

    […] the original: RHEL : Examining Red Hat Linux kernel state using Sysrq key … Posted in: Kernels ADD […]

  3. September 18, 2015

    […] Read – Examining Red Hat Linux kernel state using Sysrq key combinations […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us