linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel crash with AIC94xx
@ 2007-04-24  8:52 Constantin Teodorescu
  2007-04-24 18:37 ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Constantin Teodorescu @ 2007-04-24  8:52 UTC (permalink / raw)
  To: linux-scsi

Hello, I hope I can get a little help from you regarding this kind of 
crash !

Hardware:
- server, TYAN Tempest i5000VS S5372 BIOS v1.0.4
- 8 SATA drives Seagate 136 Gb attached on a AIC-9410 controller
- one IDE (boot disk and system)
- 8 Gb RAM

Software:
- OpenSUSE 10.2 x86_64 (tried also with SLES 10 but didn't succed in 
compiling adp94xx driver from Adaptec)

Kernels: i tried with any  of them : linux-2.6.20.1 ,  linux-2.6.20.4 ,  
linux-2.6.20.7 , linux-2.6.21.rc7
The last one has the 1.0.3 version of aic94xx driver but the results are 
the same :-(

Description:
- the server is running a very heavy loaded PostgreSQL database with 
tables spread on those SAS drives, a lot of writes and reads
- at least 4, 5 times a day I got some warnings in /var/log/messages 
(sas: Enter sas_scsi_recover_host , trying to find task XXX ---> 
aic94xx: came back from clear nexus) but the system is still working
- more rarely (once per day) I got the following bug in 
/var/log/messages and the system is crashed, SAS drivers are not working 
anymore, shutdown command is waiting forever, need to hardware reset the 
system


Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e2c0, task 
0xffff81005bfcb080, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff810047f9dd00, task 
0xffff81007df80cc0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31180, task 
0xffff8101247ad500, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81021b8af380, task 
0xffff81012e550ac0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101698c3940, task 
0xffff8101a3b69b80, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865680, task 
0xffff8101a3b69380, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37340, task 
0xffff8101a3b69580, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31a40, task 
0xffff810058a93dc0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b940, task 
0xffff81005bfcbc80, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37880, task 
0xffff81015856bd00, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81022fa2f940, task 
0xffff8101d2cf87c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b080, task 
0xffff81005bfcb880, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37dc0, task 
0xffff8101d186a940, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620640, task 
0xffff81010d46a940, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae1c0, task 
0xffff81012e9bf4c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae380, task 
0xffff8101d186a740, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e8654c0, task 
0xffff8101247ad100, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620480, task 
0xffff81012e5502c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37180, task 
0xffff8101d2cf89c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81017d5268c0, task 
0xffff8101d186a540, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e800, task 
0xffff81015856b900, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81014f8db600, task 
0xffff81007df808c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865bc0, task 
0xffff81012e550cc0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620100, task 
0xffff8101a3b69980, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: Enter sas_scsi_recover_host
Apr 24 07:22:20 bnd kernel: sas: trying to find task 0xffff81005bfcb080
Apr 24 07:22:20 bnd kernel: sas: sas_scsi_find_task: aborting task 
0xffff81005bfcb080
Apr 24 07:22:25 bnd kernel: aic94xx: tmf timed out
Apr 24 07:22:25 bnd kernel: aic94xx: tmf came back
Apr 24 07:22:25 bnd kernel: aic94xx: task not done, clearing nexus
Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: POST
Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus 
posted, waiting...
Apr 24 07:22:30 bnd kernel: aic94xx: asd_clear_nexus_timedout: here
Apr 24 07:22:35 bnd kernel: aic94xx: came back from clear nexus
Apr 24 07:22:35 bnd kernel: aic94xx: task not done, clearing nexus
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: POST
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus 
posted, waiting...
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: here
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: 
opcode: 0x0
Apr 24 07:22:40 bnd kernel: aic94xx: came back from clear nexus
Apr 24 07:22:40 bnd kernel: ------------[ cut here ]------------
Apr 24 07:22:40 bnd kernel: kernel BUG at 
drivers/scsi/aic94xx/aic94xx_hwi.h:354!
Apr 24 07:22:40 bnd kernel: invalid opcode: 0000 [1] SMP
Apr 24 07:22:40 bnd kernel: CPU 0
Apr 24 07:22:40 bnd kernel: Modules linked in: aic94xx libsas xfs
Apr 24 07:22:40 bnd kernel: Pid: 3504, comm: scsi_eh_0 Not tainted 
2.6.21-rc7_RC7 #1
Apr 24 07:22:40 bnd kernel: RIP: 0010:[<ffffffff88089f51>]  
[<ffffffff88089f51>] :aic94xx:asd_abort_task+0x423/0x54a
Apr 24 07:22:40 bnd kernel: RSP: 0000:ffff81023117fde0  EFLAGS: 00010287
Apr 24 07:22:40 bnd kernel: RAX: 0000000000000000 RBX: ffff810231618000 
RCX: ffff81022f66a800
Apr 24 07:22:40 bnd kernel: RDX: ffffffff88089ebf RSI: ffff81005bfcb080 
RDI: ffff81005bfcb098
Apr 24 07:22:40 bnd kernel: RBP: 0000000000000000 R08: ffff81005bfcb080 
R09: 0000000000000001
Apr 24 07:22:40 bnd kernel: R10: ffffffff88089ea6 R11: ffff81013ba5bf80 
R12: ffff81005bfcb080
Apr 24 07:22:40 bnd kernel: R13: ffff810156e4f580 R14: ffff8101d49fb9c0 
R15: ffff81022f66a800
Apr 24 07:22:40 bnd kernel: FS:  0000000000000000(0000) 
GS:ffffffff80712000(0000) knlGS:0000000000000000
Apr 24 07:22:40 bnd kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
000000008005003b
Apr 24 07:22:40 bnd kernel: CR2: 00002b110eff3fe8 CR3: 00000001e75f6000 
CR4: 00000000000006e0
Apr 24 07:22:40 bnd kernel: Process scsi_eh_0 (pid: 3504, threadinfo 
ffff81023117e000, task ffff810232274fe0)
Apr 24 07:22:40 bnd kernel: Stack:  ffff81023117dac8 00000000c9f5e2c0 
ffff81023117fe50 ffff81005bfcb080
Apr 24 07:22:40 bnd kernel:  0000000000000000 ffff8101c9f5e2c0 
ffff81005bfcb098 ffffffff88073293
Apr 24 07:22:40 bnd kernel:  ffff810231618010 ffff81023046c000 
ffff8102316181e0 ffff81023046c000
Apr 24 07:22:40 bnd kernel: Call Trace:
Apr 24 07:22:40 bnd kernel:  [<ffffffff88073293>] 
:libsas:sas_scsi_recover_host+0x1c2/0x83b
Apr 24 07:22:40 bnd kernel:  [<ffffffff8023f7d6>] 
keventd_create_kthread+0x0/0x6d
Apr 24 07:22:40 bnd kernel:  [<ffffffff80403b26>] 
scsi_error_handler+0x6e/0x2d7
Apr 24 07:22:40 bnd kernel:  [<ffffffff80403ab8>] 
scsi_error_handler+0x0/0x2d7
Apr 24 07:22:40 bnd kernel:  [<ffffffff8023fa46>] kthread+0xd1/0x103
Apr 24 07:22:40 bnd kernel:  [<ffffffff8020a148>] child_rip+0xa/0x12
Apr 24 07:22:40 bnd kernel:  [<ffffffff8023f7d6>] 
keventd_create_kthread+0x0/0x6d
Apr 24 07:22:40 bnd kernel:  [<ffffffff8023c327>] run_workqueue+0x10/0x179
Apr 24 07:22:40 bnd kernel:  [<ffffffff8023f975>] kthread+0x0/0x103
Apr 24 07:22:40 bnd kernel:  [<ffffffff8020a13e>] child_rip+0x0/0x12
Apr 24 07:22:40 bnd kernel:
Apr 24 07:22:40 bnd kernel:
Apr 24 07:22:40 bnd kernel: Code: 0f 0b eb fe 48 8d bb 68 4b 00 00 e8 38 
df 4a f8 41 8b 95 d0
Apr 24 07:22:40 bnd kernel: RIP  [<ffffffff88089f51>] 
:aic94xx:asd_abort_task+0x423/0x54a
Apr 24 07:22:40 bnd kernel:  RSP <ffff81023117fde0>

-------------------------------------------------------------------------------------------------------------------------------- 

I tried to fetch and compile the 
Adaptec_adp94xx-OpenBuild-B11662.i386.rpm driver from adaptec but got a 
lot of stupid compile errors.
Is there anything that I can do in order to make it work ? Would you 
need more information that could help you understand the problem?
Please Cc: me at    brailateo@flex.ro

Big , BIG, BIG thanks in advance !
Constantin Teodorescu
ROMANIA





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel crash with AIC94xx
  2007-04-24  8:52 Kernel crash with AIC94xx Constantin Teodorescu
@ 2007-04-24 18:37 ` James Bottomley
       [not found]   ` <462E5732.8020408@gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2007-04-24 18:37 UTC (permalink / raw)
  To: brailateo; +Cc: linux-scsi

On Tue, 2007-04-24 at 11:52 +0300, Constantin Teodorescu wrote:
> Hello, I hope I can get a little help from you regarding this kind of 
> crash !
> 
> Hardware:
> - server, TYAN Tempest i5000VS S5372 BIOS v1.0.4
> - 8 SATA drives Seagate 136 Gb attached on a AIC-9410 controller
> - one IDE (boot disk and system)

This configuration doesn't work on the vanilla linux kernel ... you need
the scsi-aic94xxx-sas-2.6 tree as well for this; is that what you're
running with?

> - 8 Gb RAM
> 
> Software:
> - OpenSUSE 10.2 x86_64 (tried also with SLES 10 but didn't succed in 
> compiling adp94xx driver from Adaptec)
> 
> Kernels: i tried with any  of them : linux-2.6.20.1 ,  linux-2.6.20.4 ,  
> linux-2.6.20.7 , linux-2.6.21.rc7
> The last one has the 1.0.3 version of aic94xx driver but the results are 
> the same :-(
> 
> Description:
> - the server is running a very heavy loaded PostgreSQL database with 
> tables spread on those SAS drives, a lot of writes and reads

Are these SAS or SATA drives?

> - at least 4, 5 times a day I got some warnings in /var/log/messages 
> (sas: Enter sas_scsi_recover_host , trying to find task XXX ---> 
> aic94xx: came back from clear nexus) but the system is still working
> - more rarely (once per day) I got the following bug in 
> /var/log/messages and the system is crashed, SAS drivers are not working 
> anymore, shutdown command is waiting forever, need to hardware reset the 
> system
> 
> 
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e2c0, task 
> 0xffff81005bfcb080, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810047f9dd00, task 
> 0xffff81007df80cc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31180, task 
> 0xffff8101247ad500, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81021b8af380, task 
> 0xffff81012e550ac0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101698c3940, task 
> 0xffff8101a3b69b80, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865680, task 
> 0xffff8101a3b69380, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37340, task 
> 0xffff8101a3b69580, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31a40, task 
> 0xffff810058a93dc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b940, task 
> 0xffff81005bfcbc80, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37880, task 
> 0xffff81015856bd00, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81022fa2f940, task 
> 0xffff8101d2cf87c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b080, task 
> 0xffff81005bfcb880, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37dc0, task 
> 0xffff8101d186a940, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620640, task 
> 0xffff81010d46a940, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae1c0, task 
> 0xffff81012e9bf4c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae380, task 
> 0xffff8101d186a740, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e8654c0, task 
> 0xffff8101247ad100, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620480, task 
> 0xffff81012e5502c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37180, task 
> 0xffff8101d2cf89c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81017d5268c0, task 
> 0xffff8101d186a540, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e800, task 
> 0xffff81015856b900, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81014f8db600, task 
> 0xffff81007df808c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865bc0, task 
> 0xffff81012e550cc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620100, task 
> 0xffff8101a3b69980, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: Enter sas_scsi_recover_host
> Apr 24 07:22:20 bnd kernel: sas: trying to find task 0xffff81005bfcb080
> Apr 24 07:22:20 bnd kernel: sas: sas_scsi_find_task: aborting task 
> 0xffff81005bfcb080
> Apr 24 07:22:25 bnd kernel: aic94xx: tmf timed out
> Apr 24 07:22:25 bnd kernel: aic94xx: tmf came back
> Apr 24 07:22:25 bnd kernel: aic94xx: task not done, clearing nexus
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: POST
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus 
> posted, waiting...
> Apr 24 07:22:30 bnd kernel: aic94xx: asd_clear_nexus_timedout: here
> Apr 24 07:22:35 bnd kernel: aic94xx: came back from clear nexus
> Apr 24 07:22:35 bnd kernel: aic94xx: task not done, clearing nexus
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: POST
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus 
> posted, waiting...
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: here
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: 
> opcode: 0x0
> Apr 24 07:22:40 bnd kernel: aic94xx: came back from clear nexus
> Apr 24 07:22:40 bnd kernel: ------------[ cut here ]------------
> Apr 24 07:22:40 bnd kernel: kernel BUG at 
> drivers/scsi/aic94xx/aic94xx_hwi.h:354!

This is the attempted free of an in flight command.

> Apr 24 07:22:40 bnd kernel: invalid opcode: 0000 [1] SMP
> Apr 24 07:22:40 bnd kernel: CPU 0
> Apr 24 07:22:40 bnd kernel: Modules linked in: aic94xx libsas xfs
> Apr 24 07:22:40 bnd kernel: Pid: 3504, comm: scsi_eh_0 Not tainted 
> 2.6.21-rc7_RC7 #1
> Apr 24 07:22:40 bnd kernel: RIP: 0010:[<ffffffff88089f51>]  
> [<ffffffff88089f51>] :aic94xx:asd_abort_task+0x423/0x54a
> Apr 24 07:22:40 bnd kernel: RSP: 0000:ffff81023117fde0  EFLAGS: 00010287
> Apr 24 07:22:40 bnd kernel: RAX: 0000000000000000 RBX: ffff810231618000 
> RCX: ffff81022f66a800
> Apr 24 07:22:40 bnd kernel: RDX: ffffffff88089ebf RSI: ffff81005bfcb080 
> RDI: ffff81005bfcb098
> Apr 24 07:22:40 bnd kernel: RBP: 0000000000000000 R08: ffff81005bfcb080 
> R09: 0000000000000001
> Apr 24 07:22:40 bnd kernel: R10: ffffffff88089ea6 R11: ffff81013ba5bf80 
> R12: ffff81005bfcb080
> Apr 24 07:22:40 bnd kernel: R13: ffff810156e4f580 R14: ffff8101d49fb9c0 
> R15: ffff81022f66a800
> Apr 24 07:22:40 bnd kernel: FS:  0000000000000000(0000) 
> GS:ffffffff80712000(0000) knlGS:0000000000000000
> Apr 24 07:22:40 bnd kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
> 000000008005003b
> Apr 24 07:22:40 bnd kernel: CR2: 00002b110eff3fe8 CR3: 00000001e75f6000 
> CR4: 00000000000006e0
> Apr 24 07:22:40 bnd kernel: Process scsi_eh_0 (pid: 3504, threadinfo 
> ffff81023117e000, task ffff810232274fe0)
> Apr 24 07:22:40 bnd kernel: Stack:  ffff81023117dac8 00000000c9f5e2c0 
> ffff81023117fe50 ffff81005bfcb080
> Apr 24 07:22:40 bnd kernel:  0000000000000000 ffff8101c9f5e2c0 
> ffff81005bfcb098 ffffffff88073293
> Apr 24 07:22:40 bnd kernel:  ffff810231618010 ffff81023046c000 
> ffff8102316181e0 ffff81023046c000
> Apr 24 07:22:40 bnd kernel: Call Trace:
> Apr 24 07:22:40 bnd kernel:  [<ffffffff88073293>] 
> :libsas:sas_scsi_recover_host+0x1c2/0x83b
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8023f7d6>] 
> keventd_create_kthread+0x0/0x6d
> Apr 24 07:22:40 bnd kernel:  [<ffffffff80403b26>] 
> scsi_error_handler+0x6e/0x2d7
> Apr 24 07:22:40 bnd kernel:  [<ffffffff80403ab8>] 
> scsi_error_handler+0x0/0x2d7
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8023fa46>] kthread+0xd1/0x103
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8020a148>] child_rip+0xa/0x12
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8023f7d6>] 
> keventd_create_kthread+0x0/0x6d
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8023c327>] run_workqueue+0x10/0x179
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8023f975>] kthread+0x0/0x103
> Apr 24 07:22:40 bnd kernel:  [<ffffffff8020a13e>] child_rip+0x0/0x12
> Apr 24 07:22:40 bnd kernel:
> Apr 24 07:22:40 bnd kernel:
> Apr 24 07:22:40 bnd kernel: Code: 0f 0b eb fe 48 8d bb 68 4b 00 00 e8 38 
> df 4a f8 41 8b 95 d0
> Apr 24 07:22:40 bnd kernel: RIP  [<ffffffff88089f51>] 
> :aic94xx:asd_abort_task+0x423/0x54a
> Apr 24 07:22:40 bnd kernel:  RSP <ffff81023117fde0>
> 

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel crash with AIC94xx (one step forward, hope it's lucky)
       [not found]   ` <462E5732.8020408@gmail.com>
@ 2007-04-24 19:20     ` James Bottomley
  2007-04-26  9:39       ` Luben Tuikov
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2007-04-24 19:20 UTC (permalink / raw)
  To: brailateo; +Cc: linux-scsi

Please don't cut linux-scsi from the cc list

On Tue, 2007-04-24 at 22:14 +0300, Constantin Teodorescu wrote:
> James Bottomley wrote:
> > This configuration doesn't work on the vanilla linux kernel ... you need
> > the scsi-aic94xxx-sas-2.6 tree as well for this; is that what you're
> > running with?
> >   
> Yes, I am experimenting now on a 2.6.21-RC7 kernel with 1.0.3 version of 
> aic94xx driver.
> 
> > Are these SAS or SATA drives?
> >   
> 8 SAS drives
> 
> I have already received some information from Luben Tuikov and Alexis 
> Bruemmer (he told me that there is a new firmware seq file at Adaptec) 
> and I am sending you the message:
> 
> Luben Tuikov wrote:
> > Constantin,
> >
> > adp94xx is not supported by anyone.
> >
> > The in-kernel aic94xx is supported by linux-scsi mailing list
> > and your OS vendor.
> >
> >     Luben
> >   
> I was afraid of ... :-(
> 
> I already got the news from Andy Warner that told me about that !
> 
> Alexis Bruemmer send me also a message saying :
> > Yep we have seen this issue before.  However the fix involves both a
> > driver update and a sequencer f/w update found at:
> >
> > http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=SAS-48300&dn=Adaptec+Serial+Attached+SCSI+48300 
> >
> >
> > If you still get this crash with that version of the sequencer let me
> > know
> 
> Digging I discovered that the firmware is different (md5sum differs) and 
> newer (released in 2 Mar 2007) that the firmware that I installed in 
> January 2007.
> 
> bnd:~/# md5sum /lib/firmware/aic94xx-seq.fw old-aic94xx-seq.fw
> fb393f52fde81eb53afa1e204a606c37  /lib/firmware/aic94xx-seq.fw
> 589f442b43ea0cc42fec275d7a612c2e  old-aic94xx-seq.fw
> 
> So I downloaded the new firmware and I will intensively teste it.
> I will keep you informed about the results.
> 
> Thank you again for all your valuable help,
> Best regards from Romania,
> Teo
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel crash with AIC94xx (one step forward, hope it's lucky)
  2007-04-24 19:20     ` Kernel crash with AIC94xx (one step forward, hope it's lucky) James Bottomley
@ 2007-04-26  9:39       ` Luben Tuikov
  2007-04-26  9:55         ` Constantin Teodorescu
  0 siblings, 1 reply; 7+ messages in thread
From: Luben Tuikov @ 2007-04-26  9:39 UTC (permalink / raw)
  To: James Bottomley, brailateo; +Cc: linux-scsi

--- James Bottomley <James.Bottomley@SteelEye.com> wrote:

> Please don't cut linux-scsi from the cc list
> 
> On Tue, 2007-04-24 at 22:14 +0300, Constantin Teodorescu wrote:
> > James Bottomley wrote:
> > > This configuration doesn't work on the vanilla linux kernel ... you need
> > > the scsi-aic94xxx-sas-2.6 tree as well for this; is that what you're
> > > running with?
> > >   
> > Yes, I am experimenting now on a 2.6.21-RC7 kernel with 1.0.3 version of 
> > aic94xx driver.
> > 
> > > Are these SAS or SATA drives?
> > >   
> > 8 SAS drives
> > 
> > I have already received some information from Luben Tuikov and Alexis 
> > Bruemmer (he told me that there is a new firmware seq file at Adaptec) 
> > and I am sending you the message:
> > 
> > Luben Tuikov wrote:
> > > Constantin,
> > >
> > > adp94xx is not supported by anyone.
> > >
> > > The in-kernel aic94xx is supported by linux-scsi mailing list
> > > and your OS vendor.
> > >
> > >     Luben
> > >   

Having said that, I still support the original version
of the aic94xx and the SAS stack, which now includes
SAT-1 conformant SATL.

It has allowed some people to upgrade their kernels to
the latest kernel version (as in from git repo), and reported that
it is more stable (i.e. working as opposed to not, including
supporting SATA devices in SAS domains) than the in-kernel version.

It also keeps the sequencer fw together with the driver source
code, so the end-user wouldn't have to mix and match fw version
with driver (kernel) version.

   Luben

> > I was afraid of ... :-(
> > 
> > I already got the news from Andy Warner that told me about that !
> > 
> > Alexis Bruemmer send me also a message saying :
> > > Yep we have seen this issue before.  However the fix involves both a
> > > driver update and a sequencer f/w update found at:
> > >
> > >
>
http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=SAS-48300&dn=Adaptec+Serial+Attached+SCSI+48300
> 
> > >
> > >
> > > If you still get this crash with that version of the sequencer let me
> > > know
> > 
> > Digging I discovered that the firmware is different (md5sum differs) and 
> > newer (released in 2 Mar 2007) that the firmware that I installed in 
> > January 2007.
> > 
> > bnd:~/# md5sum /lib/firmware/aic94xx-seq.fw old-aic94xx-seq.fw
> > fb393f52fde81eb53afa1e204a606c37  /lib/firmware/aic94xx-seq.fw
> > 589f442b43ea0cc42fec275d7a612c2e  old-aic94xx-seq.fw
> > 
> > So I downloaded the new firmware and I will intensively teste it.
> > I will keep you informed about the results.
> > 
> > Thank you again for all your valuable help,
> > Best regards from Romania,
> > Teo
> > 
> > 
> > 
> > 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel crash with AIC94xx (one step forward, hope it's lucky)
  2007-04-26  9:39       ` Luben Tuikov
@ 2007-04-26  9:55         ` Constantin Teodorescu
  2007-04-26 20:17           ` Darrick J. Wong
  2007-05-01 15:57           ` Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Constantin Teodorescu @ 2007-04-26  9:55 UTC (permalink / raw)
  To: ltuikov; +Cc: James Bottomley, linux-scsi

Luben Tuikov wrote:
> Having said that, I still support the original version
> of the aic94xx and the SAS stack, which now includes
> SAT-1 conformant SATL.
>
> It has allowed some people to upgrade their kernels to
> the latest kernel version (as in from git repo), and reported that
> it is more stable (i.e. working as opposed to not, including
> supporting SATA devices in SAS domains) than the in-kernel version.
>
> It also keeps the sequencer fw together with the driver source
> code, so the end-user wouldn't have to mix and match fw version
> with driver (kernel) version.
>
>   
So ... the latest news , from this night ! :-D


So : I got the Adaptec SAS Sequencer Firmware v30 for Open-Source 
AIC94xx Driver included with Linux kernel 2.6.19 and above from the web 
page above.
I use a 2.6.21-RC7 kernel with AIC94xx version 1.0.3 compiled on 
OpenSUSE 10.2 x86_64

Everything went OK, he discovered the disks

aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
ACPI: PCI Interrupt 0000:05:06.0[A] -> GSI 26 (level, low) -> IRQ 26
aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:05:06.0
scsi0 : aic94xx
aic94xx: BIOS present (1,1), 1608
aic94xx: ue num:8, ue size:88
aic94xx: manuf sect SAS_ADDR 500e081000014030
aic94xx: manuf sect PCBA SN
aic94xx: ms: no phy parameters found
aic94xx: ms: Creating default phy parameters
aic94xx: ms: num_phy_desc: 8
aic94xx: ms: phy0: ENABLED
aic94xx: ms: phy1: ENABLED
aic94xx: ms: phy2: ENABLED
aic94xx: ms: phy3: ENABLED
aic94xx: ms: phy4: ENABLED
aic94xx: ms: phy5: ENABLED
aic94xx: ms: phy6: ENABLED
aic94xx: ms: phy7: ENABLED
aic94xx: ms: max_phys:0x8, num_phys:0x8
aic94xx: ms: enabled_phys:0xff
aic94xx: ms: no connector map found
aic94xx: ctrla: phy0: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy1: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy2: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy3: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy4: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy5: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy6: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy7: sas_addr: 500e081000014030, sas rate:0x9-0x8, sata 
rate:0x0-0x0, flags:0x0
aic94xx: max_scbs:512, max_ddbs:128


I started the stress tests (a lot of writes and read from the SAS disks) 
and after running 8 hours I got the error:

03:01:55 kernel: sas: command 0xffff810197b13640, task 
0xffff810218dca580, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff810169c47b40, task 
0xffff8100bcc92300, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff81017c92a680, task 
0xffff8102018fac80, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff81007460d540, task 
0xffff8101a7a7ab40, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff810196915d40, task 
0xffff8102018fa880, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff81008ac41700, task 
0xffff8101b679b8c0, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff810196915b80, task 
0xffff8101b679bcc0, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: command 0xffff810125460180, task 
0xffff8101b679b0c0, timed out: EH_NOT_HANDLED
03:01:55 kernel: sas: Enter sas_scsi_recover_host
03:01:55 kernel: sas: trying to find task 0xffff8101a7a7a940
03:01:55 kernel: sas: sas_scsi_find_task: aborting task 0xffff8101a7a7a940
03:02:00 kernel: aic94xx: tmf timed out
03:02:00 kernel: aic94xx: tmf came back
03:02:00 kernel: aic94xx: task not done, clearing nexus
03:02:00 kernel: aic94xx: asd_clear_nexus_index: PRE
03:02:00 kernel: aic94xx: asd_clear_nexus_index: POST
03:02:00 kernel: aic94xx: asd_clear_nexus_index: clear nexus posted, 
waiting...
03:02:05 kernel: aic94xx: asd_clear_nexus_timedout: here
03:02:10 kernel: aic94xx: came back from clear nexus
03:02:10 kernel: aic94xx: task not done, clearing nexus
03:02:10 kernel: aic94xx: asd_clear_nexus_index: PRE
03:02:10 kernel: aic94xx: asd_clear_nexus_index: POST
03:02:10 kernel: aic94xx: asd_clear_nexus_index: clear nexus posted, 
waiting...
03:02:10 kernel: aic94xx: asd_clear_nexus_tasklet_complete: here
03:02:10 kernel: aic94xx: asd_clear_nexus_tasklet_complete: opcode: 0x0
03:02:15 kernel: aic94xx: came back from clear nexus
03:02:15 kernel: ------------[ cut here ]------------
03:02:15 kernel: kernel BUG at drivers/scsi/aic94xx/aic94xx_hwi.h:354!
03:02:15 kernel: invalid opcode: 0000 [1] SMP
03:02:15 kernel: CPU 0
03:02:15 kernel: Modules linked in: aic94xx libsas xfs
03:02:15 kernel: Pid: 3498, comm: scsi_eh_0 Not tainted 2.6.21-rc7_RC7 #1
03:02:15 kernel: RIP: 0010:[<ffffffff88089f51>]  [<ffffffff88089f51>] 
:aic94xx:asd_abort_task+0x423/0x54a
03:02:15 kernel: RSP: 0018:ffff81022ffdbde0  EFLAGS: 00010287
03:02:15 kernel: RAX: 0000000000000000 RBX: ffff810232c50000 RCX: 
ffff8102312fa8f0
03:02:15 kernel: RDX: 0000000000000000 RSI: ffff8101a7a7a940 RDI: 
ffff8101a7a7a958
03:02:15 kernel: RBP: 0000000000000000 R08: ffff8101a7a7a940 R09: 
0000000000000001
03:02:15 kernel: R10: ffffffff88089ea6 R11: 0000000000000004 R12: 
ffff8101a7a7a940
03:02:15 kernel: R13: ffff8101e0b9d3c0 R14: ffff810185a0e880 R15: 
ffff81022f669000
03:02:15 kernel: FS:  0000000000000000(0000) GS:ffffffff80712000(0000) 
knlGS:0000000000000000
03:02:15 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
03:02:15 kernel: CR2: 00002aab4098b000 CR3: 000000007e70d000 CR4: 
00000000000006e0
03:02:15 kernel: Process scsi_eh_0 (pid: 3498, threadinfo 
ffff81022ffda000, task ffff8102312fa240)
03:02:15 kernel: Stack:  ffff810231025ac8 0000000025460c00 
ffff81022ffdbe50 ffff8101a7a7a940
03:02:15 kernel:  0000000000000000 ffff810125460c00 ffff8101a7a7a958 
ffffffff88073293
03:02:15 kernel:  ffff810232c50010 ffff810231698000 ffff810232c501e0 
ffff810231698000
03:02:15 kernel: Call Trace:
03:02:15 kernel:  [<ffffffff88073293>] 
:libsas:sas_scsi_recover_host+0x1c2/0x83b
03:02:15 kernel:  [<ffffffff8023f7d6>] keventd_create_kthread+0x0/0x6d
03:02:15 kernel:  [<ffffffff80403b26>] scsi_error_handler+0x6e/0x2d7
03:02:15 kernel:  [<ffffffff80403ab8>] scsi_error_handler+0x0/0x2d7
03:02:15 kernel:  [<ffffffff8023fa46>] kthread+0xd1/0x103
03:02:15 kernel:  [<ffffffff8020a148>] child_rip+0xa/0x12
03:02:15 kernel:  [<ffffffff8023f7d6>] keventd_create_kthread+0x0/0x6d
03:02:15 kernel:  [<ffffffff8023c327>] run_workqueue+0x10/0x179
03:02:15 kernel:  [<ffffffff8023f975>] kthread+0x0/0x103
03:02:15 kernel:  [<ffffffff8020a13e>] child_rip+0x0/0x12

and the machine  became unusable (can't shutdown it)

I have tried also an updated driver based on Adaptec SAS HostRAID SHIM 
package v1.4.11662 ,( on the same page of Adaptec) , send to me by 
Alexander Lavrinenko , compiled for OpenSUSE 10.2 x86_64.
I configured those 8 SAS disks in 2 arrays-10 and tried it.
The linux kerned did saw the arrays as /dev/sda and /dev/sdb
Started the tests ... after 2 hours I got the same type of errors .... 
didn't have the time to wait for general machine freeze  :-D

So ... should I ask for other controller quotation ?
Could you recommend me a good SAS controller, with 8 internal ports, 
supporting Linux , with 99.9999% reliability ? :-)

I have the following options : Intel® RAID Controller SRCSAS18E 
(Parowan)  and   LSI MegaRAID SAS 8408E

so ... your bet ? :-)

Best regards,
Teo

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel crash with AIC94xx (one step forward, hope it's lucky)
  2007-04-26  9:55         ` Constantin Teodorescu
@ 2007-04-26 20:17           ` Darrick J. Wong
  2007-05-01 15:57           ` Darrick J. Wong
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2007-04-26 20:17 UTC (permalink / raw)
  To: brailateo; +Cc: ltuikov, James Bottomley, linux-scsi

Constantin Teodorescu wrote:

> So ... should I ask for other controller quotation ?
> Could you recommend me a good SAS controller, with 8 internal ports,
> supporting Linux , with 99.9999% reliability ? :-)
> 
> I have the following options : Intel® RAID Controller SRCSAS18E
> (Parowan)  and   LSI MegaRAID SAS 8408E
> 
> so ... your bet ? :-)

I don't know anything about either of those controllers, though the LSI
1068E has worked quite reliably for me.  I decline to make any
statements about 99.9999% reliability, however.

--D

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel crash with AIC94xx (one step forward, hope it's lucky)
  2007-04-26  9:55         ` Constantin Teodorescu
  2007-04-26 20:17           ` Darrick J. Wong
@ 2007-05-01 15:57           ` Darrick J. Wong
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2007-05-01 15:57 UTC (permalink / raw)
  To: brailateo; +Cc: ltuikov, James Bottomley, linux-scsi

Constantin Teodorescu wrote:

> 03:02:15 kernel: ------------[ cut here ]------------
> 03:02:15 kernel: kernel BUG at drivers/scsi/aic94xx/aic94xx_hwi.h:354!

On the odd chance you still have this controller (and have the time to
test out patches), would you mind applying this patch:

http://sweaglesw.net/~djwong/docs/17-aic94xx-hwi-bugon_1.patch

and reporting back to me what happens?

--D

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-05-01 15:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-24  8:52 Kernel crash with AIC94xx Constantin Teodorescu
2007-04-24 18:37 ` James Bottomley
     [not found]   ` <462E5732.8020408@gmail.com>
2007-04-24 19:20     ` Kernel crash with AIC94xx (one step forward, hope it's lucky) James Bottomley
2007-04-26  9:39       ` Luben Tuikov
2007-04-26  9:55         ` Constantin Teodorescu
2007-04-26 20:17           ` Darrick J. Wong
2007-05-01 15:57           ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).