public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [2.4] Lockup detected by NMI Watchdog
@ 2004-06-01  8:01 Oleg Drokin
  0 siblings, 0 replies; only message in thread
From: Oleg Drokin @ 2004-06-01  8:01 UTC (permalink / raw)
  To: linux-kernel

Hello!

   It's been awhile since last lockup detected by NMI watchdog on this box,
   but this night it happened again. In completely different place too.
   The kernel version is 2.4.26, dual P4 Xeons, HT enabled, spinlock debug
   enabled. 2G RAM ECC RAM, highmem enabled.

NMI Watchdog detected LOCKUP on CPU1, eip c01168e7, registers:
CPU:    1
EIP:    0010:[<c01168e7>]    Not tainted
EFLAGS: 00000002
eax: 00000001   ebx: 00000000   ecx: 00000000   edx: c2833f78
esi: 00000001   edi: c2832000   ebp: c2833f6c   esp: c2833f58
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c2833000)
Stack: f6465280 f5b36000 c0105460 c2832000 c2832000 c2833fac c010cf5d c2833f78
       c0105460 00000000 c2832000 c2832000 c2832000 c2833fac 00000000 c2830018
       c2830018 ffffffef c010548c 00000010 00000246 c2833fc0 c0105522 0102080b
Call Trace:    [<c0105460>] [<c010cf5d>] [<c0105460>] [<c010548c>] [<c0105522>]
  [<c012088b>] [<c01206b4>]

Code: a8 01 75 f5 31 ff f6 42 32 02 75 06 f6 42 2c 03 74 05 bf 01
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU3, eip c0109457, registers:
 NMI Watchdog detected LOCKUP on CPU2, eip c01168e2, registers:
   
>>EIP; c01168e7 <smp_apic_timer_interrupt+47/140>   <=====
   
>>edx; c2833f78 <_end+243b298/384bc380>
>>edi; c2832000 <_end+2439320/384bc380>
>>ebp; c2833f6c <_end+243b28c/384bc380>
>>esp; c2833f58 <_end+243b278/384bc380>

Trace; c0105460 <default_idle+0/40>
Trace; c010cf5d <call_apic_timer_interrupt+5/10>
Trace; c0105460 <default_idle+0/40>
Trace; c010548c <default_idle+2c/40>
Trace; c0105522 <cpu_idle+52/70>
Trace; c012088b <release_console_sem+11b/120>
Trace; c01206b4 <printk+194/200>

Code;  c01168e7 <smp_apic_timer_interrupt+47/140>
00000000 <_EIP>:
Code;  c01168e7 <smp_apic_timer_interrupt+47/140>   <=====
   0:   a8 01                     test   $0x1,%al   <=====
Code;  c01168e9 <smp_apic_timer_interrupt+49/140>
   2:   75 f5                     jne    fffffff9 <_EIP+0xfffffff9>
Code;  c01168eb <smp_apic_timer_interrupt+4b/140>
   4:   31 ff                     xor    %edi,%edi
Code;  c01168ed <smp_apic_timer_interrupt+4d/140>
   6:   f6 42 32 02               testb  $0x2,0x32(%edx)
Code;  c01168f1 <smp_apic_timer_interrupt+51/140>
   a:   75 06                     jne    12 <_EIP+0x12>
Code;  c01168f3 <smp_apic_timer_interrupt+53/140>
   c:   f6 42 2c 03               testb  $0x3,0x2c(%edx)
Code;  c01168f7 <smp_apic_timer_interrupt+57/140>
  10:   74 05                     je     17 <_EIP+0x17>
Code;  c01168f9 <smp_apic_timer_interrupt+59/140>
  12:   bf 01 00 00 00            mov    $0x1,%edi

The code itself seems to be looping in smp_apic_timer_interrupt->irq_enter()'s
loop based on what I see in the disassembly. (cpu1 an cpu2), cpu3 is at
handle_IRQ_event (irq_enter loop as well).
Unfortunatelly data on cpu0's place of execution is unavailable.
(I guess this might be not all that bad idea to print traces from all
available cpus when NMI watchdog triggers, if possible)

Sort of useless bugreport as I see it. I guess I really need to patch in
kgdb and gather more data next time something like this hits.
(will kgdb work in such conditions I wonder?)

Bye,
    Oleg

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2004-06-01  8:02 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-01  8:01 [2.4] Lockup detected by NMI Watchdog Oleg Drokin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox