public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Adaptec driver crashes (3/3)
@ 2009-05-06  7:13 Norman Diamond
  2009-05-08 19:09 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Norman Diamond @ 2009-05-06  7:13 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

A tougher non-100%-reproducible way to crash a Linux system is as follows.

I don't remember exactly what I did, but for some reason I guessed it might 
happen a second time, so I set the console to a text mode terminal before it 
happened the second time (since Linux doesn't give Blue Screens of Death 
otherwise).  This is with an Adaptec 1480 card, AIC7xxx driver.

I wish I had a wooden table so I wouldn't have to read and type this stuff 
back in by hand.  (In case anyone here doesn't read thedailywtf, ignore the 
part about the wooden table.  I still wish I wouldn't have to read and type 
this stuff back in by hand.) 

BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000
0
printing eip: c04a50af *pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_
device snd_pcm_oss snd_mixer_oss fuse lp pcspkr snd_intel8x0 snd_ac97_codec ac97
_bus e100 snd_pcm snd_timer snd video mii iTCO_wdt soundcore serio_raw iTCO_vend
or_support output psmouse evdev pcmcia intel_agp agpgart shpchp snd_page_alloc p
arport_pc parport sg yenta_socket rsrc_nonstatic pcmcia_core aufs squashfs sqlzm
a unlzma

Pid: 3531, comm: klogs Not tainted (2.6.24.3 #1)
EIP: 0060:[<c04a50af>] EFLAGS: 00010046 CPU: 0
EIP is at ahc_handle_scsiint+0xdbf/0xef0
EAX: 00000000 EBX: 00000007 ECX: 00000001 EDX: 0000000d
ESI: ede17e00 EDI: 00000000 EBP: 00000000 ESP: ed507de4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process klogd (pid: 3531, ti=ed506000 task=edd6aaa0 task.ti=ed506000)
Stack: 00000001 00000041 00000001 ee6a6580 d662d853 41410000 000000a0 ead93024
       c01806db 00a0ee08 00000041 00000007 00000000 00000001 00000000 00000000
       ed53b541 00000001 ede17e00 00000064 00000082 0000000b c04b20f9 ede0cd60
Call Trace:
 [<c01806db>] __link_path_walk+0xaab/0xe10
 [<c04b20f9>] ahc_linux_isr+0x1e9/0x260
 [<c0151025>] handle_IRQ_event+0x25/0x50
 [<c01529bc>] handle_level_irq+0x7c/0xf0
 [<c010748b>] do_IRQ+0x3b/0x70
 [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
 [<c01052d3>] common_interrupt+0x23/0x30
 [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
 [<efbe3d9e>] aufs_getattr+0xe/0xa0 [aufs]
 [<c017fa47>] getname+0xa7/0xc0
 [<c03b7acf>] security_inode_getattr+0x1f/0x30
 [<c017a4f8>] vfs_getattr+0x48/0x70
 [<c017a727>] vfs_stat_fd+0x37/0x60
 [<c017a82f>] sys_stat64+0xf/0x30
 [<c01775ee>] vfs_write+0x11e/0x140
 [<c0177c31>] sys_write+0x41/0x70
 [<c012cc1a>] sys_time+0xa/0x30
 [<c0104352>] syscall_call+0x7/0xb
 [<c0700000>] rpcb_getport_prepare+0x10/0x40
 =======================
Code: 24 2c e8 c5 95 ff ff b9 14 00 00 00 89 f0 8d 54 24 2c c7 44 24 04 00 00 00
 00 c7 04 24 b6 d1 80 c0 e8 56 e9 ff ff e9 8d f8 ff ff <8b> 07 89 fa 0f b6 58 1b
 0f b6 c3 89 44 24 1c 89 f0 e8 5b a5 00
EIP: [<c04a50af>] ahc_handle_scsiint+0xdbf/0xef0 SS:ESP 0068:ed507de4
Kernel panic - not syncing : Fatal exception in interrupt
--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Adaptec driver crashes (3/3)
  2009-05-06  7:13 Adaptec driver crashes (3/3) Norman Diamond
@ 2009-05-08 19:09 ` Andrew Morton
  2009-05-11 11:24   ` Norman Diamond
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2009-05-08 19:09 UTC (permalink / raw)
  To: Norman Diamond; +Cc: linux-kernel, linux-scsi

On Wed, 6 May 2009 16:13:53 +0900
"Norman Diamond" <n0diamond@yahoo.co.jp> wrote:

> A tougher non-100%-reproducible way to crash a Linux system is as follows.
> 
> I don't remember exactly what I did, but for some reason I guessed it might 
> happen a second time, so I set the console to a text mode terminal before it 
> happened the second time (since Linux doesn't give Blue Screens of Death 
> otherwise).  This is with an Adaptec 1480 card, AIC7xxx driver.
> 
> I wish I had a wooden table so I wouldn't have to read and type this stuff 
> back in by hand.  (In case anyone here doesn't read thedailywtf, ignore the 
> part about the wooden table.  I still wish I wouldn't have to read and type 
> this stuff back in by hand.) 
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000
> 0
> printing eip: c04a50af *pde = 00000000
> Oops: 0000 [#1] SMP
> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_
> device snd_pcm_oss snd_mixer_oss fuse lp pcspkr snd_intel8x0 snd_ac97_codec ac97
> _bus e100 snd_pcm snd_timer snd video mii iTCO_wdt soundcore serio_raw iTCO_vend
> or_support output psmouse evdev pcmcia intel_agp agpgart shpchp snd_page_alloc p
> arport_pc parport sg yenta_socket rsrc_nonstatic pcmcia_core aufs squashfs sqlzm
> a unlzma
> 
> Pid: 3531, comm: klogs Not tainted (2.6.24.3 #1)
> EIP: 0060:[<c04a50af>] EFLAGS: 00010046 CPU: 0
> EIP is at ahc_handle_scsiint+0xdbf/0xef0
> EAX: 00000000 EBX: 00000007 ECX: 00000001 EDX: 0000000d
> ESI: ede17e00 EDI: 00000000 EBP: 00000000 ESP: ed507de4
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process klogd (pid: 3531, ti=ed506000 task=edd6aaa0 task.ti=ed506000)
> Stack: 00000001 00000041 00000001 ee6a6580 d662d853 41410000 000000a0 ead93024
>        c01806db 00a0ee08 00000041 00000007 00000000 00000001 00000000 00000000
>        ed53b541 00000001 ede17e00 00000064 00000082 0000000b c04b20f9 ede0cd60
> Call Trace:
>  [<c01806db>] __link_path_walk+0xaab/0xe10
>  [<c04b20f9>] ahc_linux_isr+0x1e9/0x260
>  [<c0151025>] handle_IRQ_event+0x25/0x50
>  [<c01529bc>] handle_level_irq+0x7c/0xf0
>  [<c010748b>] do_IRQ+0x3b/0x70
>  [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
>  [<c01052d3>] common_interrupt+0x23/0x30
>  [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
>  [<efbe3d9e>] aufs_getattr+0xe/0xa0 [aufs]
>  [<c017fa47>] getname+0xa7/0xc0
>  [<c03b7acf>] security_inode_getattr+0x1f/0x30
>  [<c017a4f8>] vfs_getattr+0x48/0x70
>  [<c017a727>] vfs_stat_fd+0x37/0x60
>  [<c017a82f>] sys_stat64+0xf/0x30
>  [<c01775ee>] vfs_write+0x11e/0x140
>  [<c0177c31>] sys_write+0x41/0x70
>  [<c012cc1a>] sys_time+0xa/0x30
>  [<c0104352>] syscall_call+0x7/0xb
>  [<c0700000>] rpcb_getport_prepare+0x10/0x40
>  =======================
> Code: 24 2c e8 c5 95 ff ff b9 14 00 00 00 89 f0 8d 54 24 2c c7 44 24 04 00 00 00
>  00 c7 04 24 b6 d1 80 c0 e8 56 e9 ff ff e9 8d f8 ff ff <8b> 07 89 fa 0f b6 58 1b
>  0f b6 c3 89 44 24 1c 89 f0 e8 5b a5 00
> EIP: [<c04a50af>] ahc_handle_scsiint+0xdbf/0xef0 SS:ESP 0068:ed507de4

ahc_handle_scsiint() is a huge function.  It would help if we can find
the file and line where it is crashing.  If you could do the following,
please.

- Run a more recent kernel: we might have fixed it since 2.6.24!

- Enable CONFIG_DEBUG_INFO

- Reproduce the crash and note the EIP address (c04a50af in this example).

- In your kernel build source directory, do

gdb vmlinux
(gdb) l *0xc04a50af

  (with a suitable value of c04a50af)

Alternatively, try doing this with your current 2.6.24 setup.

Alternatively, see if you can get the poorly-documented
scripts/markup_oops.pl to work.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Adaptec driver crashes (3/3)
  2009-05-08 19:09 ` Andrew Morton
@ 2009-05-11 11:24   ` Norman Diamond
  0 siblings, 0 replies; 3+ messages in thread
From: Norman Diamond @ 2009-05-11 11:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-scsi

Andrew Morton wrote:
> Norman Diamond wrote:
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address
>> 0000000
>> 0
>> printing eip: c04a50af *pde = 00000000
>> Oops: 0000 [#1] SMP
>> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
>> snd_seq_
>> device snd_pcm_oss snd_mixer_oss fuse lp pcspkr snd_intel8x0
>> snd_ac97_codec ac97
>> _bus e100 snd_pcm snd_timer snd video mii iTCO_wdt soundcore serio_raw
>> iTCO_vend
>> or_support output psmouse evdev pcmcia intel_agp agpgart shpchp
>> snd_page_alloc p
>> arport_pc parport sg yenta_socket rsrc_nonstatic pcmcia_core aufs
>> squashfs sqlzm
>> a unlzma
>>
>> Pid: 3531, comm: klogs Not tainted (2.6.24.3 #1)
>> EIP: 0060:[<c04a50af>] EFLAGS: 00010046 CPU: 0
>> EIP is at ahc_handle_scsiint+0xdbf/0xef0
>> EAX: 00000000 EBX: 00000007 ECX: 00000001 EDX: 0000000d
>> ESI: ede17e00 EDI: 00000000 EBP: 00000000 ESP: ed507de4
>>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process klogd (pid: 3531, ti=ed506000 task=edd6aaa0 task.ti=ed506000)
>> Stack: 00000001 00000041 00000001 ee6a6580 d662d853 41410000 000000a0
>> ead93024
>>        c01806db 00a0ee08 00000041 00000007 00000000 00000001 00000000
>> 00000000
>>        ed53b541 00000001 ede17e00 00000064 00000082 0000000b c04b20f9
>> ede0cd60
>> Call Trace:
>>  [<c01806db>] __link_path_walk+0xaab/0xe10
>>  [<c04b20f9>] ahc_linux_isr+0x1e9/0x260
>>  [<c0151025>] handle_IRQ_event+0x25/0x50
>>  [<c01529bc>] handle_level_irq+0x7c/0xf0
>>  [<c010748b>] do_IRQ+0x3b/0x70
>>  [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
>>  [<c01052d3>] common_interrupt+0x23/0x30
>>  [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
>>  [<efbe3d9e>] aufs_getattr+0xe/0xa0 [aufs]
>>  [<c017fa47>] getname+0xa7/0xc0
>>  [<c03b7acf>] security_inode_getattr+0x1f/0x30
>>  [<c017a4f8>] vfs_getattr+0x48/0x70
>>  [<c017a727>] vfs_stat_fd+0x37/0x60
>>  [<c017a82f>] sys_stat64+0xf/0x30
>>  [<c01775ee>] vfs_write+0x11e/0x140
>>  [<c0177c31>] sys_write+0x41/0x70
>>  [<c012cc1a>] sys_time+0xa/0x30
>>  [<c0104352>] syscall_call+0x7/0xb
>>  [<c0700000>] rpcb_getport_prepare+0x10/0x40
>>  =======================
>> Code: 24 2c e8 c5 95 ff ff b9 14 00 00 00 89 f0 8d 54 24 2c c7 44 24 04
>> 00 00 00
>>  00 c7 04 24 b6 d1 80 c0 e8 56 e9 ff ff e9 8d f8 ff ff <8b> 07 89 fa 0f
>> b6 58 1b
>>  0f b6 c3 89 44 24 1c 89 f0 e8 5b a5 00
>> EIP: [<c04a50af>] ahc_handle_scsiint+0xdbf/0xef0 SS:ESP 0068:ed507de4
>
> ahc_handle_scsiint() is a huge function.  It would help if we can find
> the file and line where it is crashing.  If you could do the following,
> please.
>
> - Run a more recent kernel: we might have fixed it since 2.6.24!

I experimented with a recent Knoppix distro based on a newer kernel, and
results were worse.

> - Enable CONFIG_DEBUG_INFO

Sorry, it's extra difficult to build customized kernels for Slax, and now's
not the time.  The next time I have to do it, I'll try to remember to enable
this.

> gdb vmlinux
> (gdb) l *0xc04a50af
>  (with a suitable value of c04a50af)

It looks like the devel package for this Slax version included gcc but not
gdb.  Next time I have to rebuild a customized Slax, I'll try to add gdb.
Does gdb have to be the same version as gcc, i.e. are they built together
and gdb knows details of the corresponding gcc version?  Or can I grab the
latest gdb that will be available at the time?

The file is aic_7xxx.c but I think you knew that.

My intuitive interpretation of "+0xdbf/0xef0" is that it's somewhere in the
block
  } else if ((status & BUSFREE) != 0
    && (ach_inb(ahc, SIMODE1) & ENBUSFREE) != 0) {
    [... somewhere ...]
  }

Without knowing the guts of this driver, here are a few random observed uses
of pointers.

In one place scb is used after testing "if (scb != NULL && some other stuff"
and in another place scb is used without such a test.

In one place ahc_fetch_transinfo is called and resulting pointers are used
without checking for whether it succeeded or not.

I hope it's not something as dumb as
  printf("%s: ", ahc_name(ahc));
with possibly ahc_name producing NULL due to the card having been removed
before being reinserted (or in a more recent kernel, crashing due to being
removed without even needing reinsertion). 

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-05-11 11:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-06  7:13 Adaptec driver crashes (3/3) Norman Diamond
2009-05-08 19:09 ` Andrew Morton
2009-05-11 11:24   ` Norman Diamond

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox