public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.9-rc4-bk2 bug report
@ 2004-10-15 13:36 Nigel Kukard
  2004-10-16  5:53 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Nigel Kukard @ 2004-10-15 13:36 UTC (permalink / raw)
  To: linux-kernel

While booting with 2.6.9rc4-bk2 I seem to get the below OOPS, I copied 
the modules & System.map file over to another box of mine where i used 
serial-console to grab the oops.

Anyone know what I can try to debug the below problem?

The box is a 2.8Ghz P4 with HT enabled (smp).

Kind Regards
Nigel Kukard



ksymoops 2.4.9 on i686 2.6.8-0.23.  Options used
      -V (default)
      -K (specified)
      -L (specified)
      -o /tmp/2.6.9-0.8 (specified)
      -m /tmp/System.map-2.6.9-0.8 (specified)

No modules in ksyms, skipping objects
kernel BUG at kernel/timer.c:414!
invalid operand: 0000 [#1]
EIP:    0060:[<c01279e0>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010007   (2.6.9-0.8)
eax: c1810a60   ebx: f7097d6c   ecx: 0000000f   edx: f7097d6c
esi: c18112e4   edi: c1810a60   ebp: 0000000f   esp: c064afb0
ds: 007b   es: 007b   ss: 0068
Stack: 00000000 c05f9f88 c1810a60 c064afc8 c0127f86 c064afc8 c064afc8 
c064afc8
        c01247c0 00000001 c05f9f88 c0643bc0 00000000 c01244a4 0000000a 
c05fbf90
        00000046 c0641484 006c8007 c01097f4
Call Trace:
  [<c0127f86>] run_timer_softirq+0xe6/0x160
  [<c01247c0>] tasklet_action+0x50/0xc0
  [<c01244a4>] __do_softirq+0x64/0xe0
  [<c01097f4>] do_softirq+0x44/0x4b
  [<c01161ed>] smp_apic_timer_interrupt+0xfd/0x100
  [<c0106fda>] apic_timer_interrupt+0x1a/0x20
  [<c010405a>] default_idle+0x2a/0x40
  [<c01040d4>] cpu_idle+0x24/0x60
  [<c05fc86d>] start_kernel+0x18d/0x1c0
Code: ca 53 89 c7 89 cd 8b 1e eb 11 90 89 da 39 7b 1c 75 19 89 f8 8b 1b 
e8 80 fb ff ff 39 f3 75 ec 89 36 89 76 04 89 e8 5b 5e 5f 5d c3 <0f> 0b 
9e 01 ff 23 4d c0 eb dd 8d b6 00 00 00 00 0f bf 05 7a dc


 >>EIP; c01279e0 <cascade+30/40>   <=====

 >>eax; c1810a60 <pg0+116aa60/3f958400>
 >>ebx; f7097d6c <pg0+369f1d6c/3f958400>
 >>edx; f7097d6c <pg0+369f1d6c/3f958400>
 >>esi; c18112e4 <pg0+116b2e4/3f958400>
 >>edi; c1810a60 <pg0+116aa60/3f958400>
 >>esp; c064afb0 <softirq_stack+fb0/8000>

Trace; c0127f86 <run_timer_softirq+e6/160>
Trace; c01247c0 <tasklet_action+50/c0>
Trace; c01244a4 <__do_softirq+64/e0>
Trace; c01097f4 <do_softirq+44/4b>
Trace; c01161ed <smp_apic_timer_interrupt+fd/100>
Trace; c0106fda <apic_timer_interrupt+1a/20>
Trace; c010405a <default_idle+2a/40>
Trace; c01040d4 <cpu_idle+24/60>
Trace; c05fc86d <start_kernel+18d/1c0>

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code;  c01279b5 <cascade+5/40>
00000000 <_EIP>:
Code;  c01279b5 <cascade+5/40>
    0:   ca 53 89                  lret   $0x8953
Code;  c01279b8 <cascade+8/40>
    3:   c7 89 cd 8b 1e eb 11      movl   $0xda899011,0xeb1e8bcd(%ecx)
Code;  c01279bf <cascade+f/40>
    a:   90 89 da
Code;  c01279c2 <cascade+12/40>
    d:   39 7b 1c                  cmp    %edi,0x1c(%ebx)
Code;  c01279c5 <cascade+15/40>
   10:   75 19                     jne    2b <_EIP+0x2b>
Code;  c01279c7 <cascade+17/40>
   12:   89 f8                     mov    %edi,%eax
Code;  c01279c9 <cascade+19/40>
   14:   8b 1b                     mov    (%ebx),%ebx
Code;  c01279cb <cascade+1b/40>
   16:   e8 80 fb ff ff            call   fffffb9b <_EIP+0xfffffb9b>
Code;  c01279d0 <cascade+20/40>
   1b:   39 f3                     cmp    %esi,%ebx
Code;  c01279d2 <cascade+22/40>
   1d:   75 ec                     jne    b <_EIP+0xb>
Code;  c01279d4 <cascade+24/40>
   1f:   89 36                     mov    %esi,(%esi)
Code;  c01279d6 <cascade+26/40>
   21:   89 76 04                  mov    %esi,0x4(%esi)
Code;  c01279d9 <cascade+29/40>
   24:   89 e8                     mov    %ebp,%eax
Code;  c01279db <cascade+2b/40>
   26:   5b                        pop    %ebx
Code;  c01279dc <cascade+2c/40>
   27:   5e                        pop    %esi
Code;  c01279dd <cascade+2d/40>
   28:   5f                        pop    %edi
Code;  c01279de <cascade+2e/40>
   29:   5d                        pop    %ebp
Code;  c01279df <cascade+2f/40>
   2a:   c3                        ret

This decode from eip onwards should be reliable

Code;  c01279e0 <cascade+30/40>
00000000 <_EIP>:
Code;  c01279e0 <cascade+30/40>   <=====
    0:   0f 0b                     ud2a      <=====
Code;  c01279e2 <cascade+32/40>
    2:   9e                        sahf
Code;  c01279e3 <cascade+33/40>
    3:   01 ff                     add    %edi,%edi
Code;  c01279e5 <cascade+35/40>
    5:   23 4d c0                  and    0xffffffc0(%ebp),%ecx
Code;  c01279e8 <cascade+38/40>
    8:   eb dd                     jmp    ffffffe7 <_EIP+0xffffffe7>
Code;  c01279ea <cascade+3a/40>
    a:   8d b6 00 00 00 00         lea    0x0(%esi),%esi
Code;  c01279f0 <second_overflow+0/260>
   10:   0f                        .byte 0xf
Code;  c01279f1 <second_overflow+1/260>
   11:   bf                        .byte 0xbf
Code;  c01279f2 <second_overflow+2/260>
   12:   05                        .byte 0x5
Code;  c01279f3 <second_overflow+3/260>
   13:   7a dc                     jp     fffffff1 <_EIP+0xfffffff1>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.9-rc4-bk2 bug report
  2004-10-15 13:36 2.6.9-rc4-bk2 bug report Nigel Kukard
@ 2004-10-16  5:53 ` Andrew Morton
       [not found]   ` <4170C946.5020800@lbsd.net>
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2004-10-16  5:53 UTC (permalink / raw)
  To: Nigel Kukard; +Cc: linux-kernel

Nigel Kukard <nkukard@lbsd.net> wrote:
>
> While booting with 2.6.9rc4-bk2 I seem to get the below OOPS, I copied 
>  the modules & System.map file over to another box of mine where i used 
>  serial-console to grab the oops.
> 
>  Anyone know what I can try to debug the below problem?

Something appears to have wrecked the system-wide timer queue. 
Possibilties are that some kernel module was unloaded but forgot to remove
a pending timer, or some data structure was freed while holding a pending
timer.

You could try enabling CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC, but I
doubt if those would shed much light.

I'd suggest that you strip your .config down to the bare minimum which is
needed to boot and see if the crash goes away.  If it does, then it's just
a matter of reintroducing .config options until you find which one caused
the crash.  Code inspection should then lead us to the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.9-rc4-bk2 bug report
       [not found]   ` <4170C946.5020800@lbsd.net>
@ 2004-10-16  7:14     ` Andrew Morton
  2004-10-17  9:42       ` Nigel Kukard
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2004-10-16  7:14 UTC (permalink / raw)
  To: Nigel Kukard; +Cc: linux-kernel

Nigel Kukard <nkukard@lbsd.net> wrote:
>
> Andrew Morton wrote:
> 
> >Nigel Kukard <nkukard@lbsd.net> wrote:
> >  
> >
> >>While booting with 2.6.9rc4-bk2 I seem to get the below OOPS, I copied 
> >> the modules & System.map file over to another box of mine where i used 
> >> serial-console to grab the oops.
> >>
> >> Anyone know what I can try to debug the below problem?
> >>    
> >>
> >
> >I'd suggest that you strip your .config down to the bare minimum which is
> >needed to boot and see if the crash goes away.  If it does, then it's just
> >a matter of reintroducing .config options until you find which one caused
> >the crash.  Code inspection should then lead us to the bug.
> >  
> >
> 
> Could it be caused by loading a module?
> 

Yes.  Or by unloading a module.  The oops will happen some time _after_ the
buggy module has done its buggy thing.  Maybe you could change your
modprobe command to be:

	echo running modprobe $*
	modprobe.old $*
	sleep 5
	echo finished modprobe $*

or something like that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.9-rc4-bk2 bug report
  2004-10-16  7:14     ` Andrew Morton
@ 2004-10-17  9:42       ` Nigel Kukard
  0 siblings, 0 replies; 4+ messages in thread
From: Nigel Kukard @ 2004-10-17  9:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:

>>>>While booting with 2.6.9rc4-bk2 I seem to get the below OOPS, I copied 
>>>>the modules & System.map file over to another box of mine where i used 
>>>>serial-console to grab the oops.
>>>>        
>>>>
>>Could it be caused by loading a module?
>>    
>>
>Yes.  Or by unloading a module.  The oops will happen some time _after_ the
>buggy module has done its buggy thing.  Maybe you could change your
>modprobe command to be:
>
>	echo running modprobe $*
>	modprobe.old $*
>	sleep 5
>	echo finished modprobe $*
>
>or something like that.
>  
>
Ok, it seems to be related to specific hardware. I tried loading the 
exact same modules on another box in the exact same order, no crash. The 
box is actually running nicely.

I'm busy compiling the kernel again, this time with a few debug options 
ticked.

As far as I saw 2 days ago, its when I load uhci-hcd that the problem 
manifests itself... this is the order the modules are loaded in...

snd, snd_als4000, crc32, sis900, psmouse, evdev, tsdev, agpgart, 
sis_agp, pci_hotplug, usbcore, ohci_hcd, ub, ehci_hcd, ieee1394, 
ohci1394, uhci_hcd

The OOPS is  triggered from loading uhci_hcd, so it could be ohci1394 
that creates the corruption. I've had problems with ohci1394 in 26.8.1 
aswell, I cannot load it, unload it, and reload it... generates a D 
state for the modprobe.

Any tips on how i can track down the offending line in source?

Regards
Nigel Kukard





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-10-17  9:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-15 13:36 2.6.9-rc4-bk2 bug report Nigel Kukard
2004-10-16  5:53 ` Andrew Morton
     [not found]   ` <4170C946.5020800@lbsd.net>
2004-10-16  7:14     ` Andrew Morton
2004-10-17  9:42       ` Nigel Kukard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox