* Kernel crash with AIC94xx
@ 2007-04-24 8:52 Constantin Teodorescu
2007-04-24 18:37 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Constantin Teodorescu @ 2007-04-24 8:52 UTC (permalink / raw)
To: linux-scsi
Hello, I hope I can get a little help from you regarding this kind of
crash !
Hardware:
- server, TYAN Tempest i5000VS S5372 BIOS v1.0.4
- 8 SATA drives Seagate 136 Gb attached on a AIC-9410 controller
- one IDE (boot disk and system)
- 8 Gb RAM
Software:
- OpenSUSE 10.2 x86_64 (tried also with SLES 10 but didn't succed in
compiling adp94xx driver from Adaptec)
Kernels: i tried with any of them : linux-2.6.20.1 , linux-2.6.20.4 ,
linux-2.6.20.7 , linux-2.6.21.rc7
The last one has the 1.0.3 version of aic94xx driver but the results are
the same :-(
Description:
- the server is running a very heavy loaded PostgreSQL database with
tables spread on those SAS drives, a lot of writes and reads
- at least 4, 5 times a day I got some warnings in /var/log/messages
(sas: Enter sas_scsi_recover_host , trying to find task XXX --->
aic94xx: came back from clear nexus) but the system is still working
- more rarely (once per day) I got the following bug in
/var/log/messages and the system is crashed, SAS drivers are not working
anymore, shutdown command is waiting forever, need to hardware reset the
system
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e2c0, task
0xffff81005bfcb080, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff810047f9dd00, task
0xffff81007df80cc0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31180, task
0xffff8101247ad500, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81021b8af380, task
0xffff81012e550ac0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101698c3940, task
0xffff8101a3b69b80, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865680, task
0xffff8101a3b69380, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37340, task
0xffff8101a3b69580, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31a40, task
0xffff810058a93dc0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b940, task
0xffff81005bfcbc80, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37880, task
0xffff81015856bd00, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81022fa2f940, task
0xffff8101d2cf87c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b080, task
0xffff81005bfcb880, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37dc0, task
0xffff8101d186a940, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620640, task
0xffff81010d46a940, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae1c0, task
0xffff81012e9bf4c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae380, task
0xffff8101d186a740, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e8654c0, task
0xffff8101247ad100, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620480, task
0xffff81012e5502c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37180, task
0xffff8101d2cf89c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81017d5268c0, task
0xffff8101d186a540, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e800, task
0xffff81015856b900, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81014f8db600, task
0xffff81007df808c0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865bc0, task
0xffff81012e550cc0, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620100, task
0xffff8101a3b69980, timed out: EH_NOT_HANDLED
Apr 24 07:22:20 bnd kernel: sas: Enter sas_scsi_recover_host
Apr 24 07:22:20 bnd kernel: sas: trying to find task 0xffff81005bfcb080
Apr 24 07:22:20 bnd kernel: sas: sas_scsi_find_task: aborting task
0xffff81005bfcb080
Apr 24 07:22:25 bnd kernel: aic94xx: tmf timed out
Apr 24 07:22:25 bnd kernel: aic94xx: tmf came back
Apr 24 07:22:25 bnd kernel: aic94xx: task not done, clearing nexus
Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: POST
Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus
posted, waiting...
Apr 24 07:22:30 bnd kernel: aic94xx: asd_clear_nexus_timedout: here
Apr 24 07:22:35 bnd kernel: aic94xx: came back from clear nexus
Apr 24 07:22:35 bnd kernel: aic94xx: task not done, clearing nexus
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: POST
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus
posted, waiting...
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: here
Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete:
opcode: 0x0
Apr 24 07:22:40 bnd kernel: aic94xx: came back from clear nexus
Apr 24 07:22:40 bnd kernel: ------------[ cut here ]------------
Apr 24 07:22:40 bnd kernel: kernel BUG at
drivers/scsi/aic94xx/aic94xx_hwi.h:354!
Apr 24 07:22:40 bnd kernel: invalid opcode: 0000 [1] SMP
Apr 24 07:22:40 bnd kernel: CPU 0
Apr 24 07:22:40 bnd kernel: Modules linked in: aic94xx libsas xfs
Apr 24 07:22:40 bnd kernel: Pid: 3504, comm: scsi_eh_0 Not tainted
2.6.21-rc7_RC7 #1
Apr 24 07:22:40 bnd kernel: RIP: 0010:[<ffffffff88089f51>]
[<ffffffff88089f51>] :aic94xx:asd_abort_task+0x423/0x54a
Apr 24 07:22:40 bnd kernel: RSP: 0000:ffff81023117fde0 EFLAGS: 00010287
Apr 24 07:22:40 bnd kernel: RAX: 0000000000000000 RBX: ffff810231618000
RCX: ffff81022f66a800
Apr 24 07:22:40 bnd kernel: RDX: ffffffff88089ebf RSI: ffff81005bfcb080
RDI: ffff81005bfcb098
Apr 24 07:22:40 bnd kernel: RBP: 0000000000000000 R08: ffff81005bfcb080
R09: 0000000000000001
Apr 24 07:22:40 bnd kernel: R10: ffffffff88089ea6 R11: ffff81013ba5bf80
R12: ffff81005bfcb080
Apr 24 07:22:40 bnd kernel: R13: ffff810156e4f580 R14: ffff8101d49fb9c0
R15: ffff81022f66a800
Apr 24 07:22:40 bnd kernel: FS: 0000000000000000(0000)
GS:ffffffff80712000(0000) knlGS:0000000000000000
Apr 24 07:22:40 bnd kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Apr 24 07:22:40 bnd kernel: CR2: 00002b110eff3fe8 CR3: 00000001e75f6000
CR4: 00000000000006e0
Apr 24 07:22:40 bnd kernel: Process scsi_eh_0 (pid: 3504, threadinfo
ffff81023117e000, task ffff810232274fe0)
Apr 24 07:22:40 bnd kernel: Stack: ffff81023117dac8 00000000c9f5e2c0
ffff81023117fe50 ffff81005bfcb080
Apr 24 07:22:40 bnd kernel: 0000000000000000 ffff8101c9f5e2c0
ffff81005bfcb098 ffffffff88073293
Apr 24 07:22:40 bnd kernel: ffff810231618010 ffff81023046c000
ffff8102316181e0 ffff81023046c000
Apr 24 07:22:40 bnd kernel: Call Trace:
Apr 24 07:22:40 bnd kernel: [<ffffffff88073293>]
:libsas:sas_scsi_recover_host+0x1c2/0x83b
Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>]
keventd_create_kthread+0x0/0x6d
Apr 24 07:22:40 bnd kernel: [<ffffffff80403b26>]
scsi_error_handler+0x6e/0x2d7
Apr 24 07:22:40 bnd kernel: [<ffffffff80403ab8>]
scsi_error_handler+0x0/0x2d7
Apr 24 07:22:40 bnd kernel: [<ffffffff8023fa46>] kthread+0xd1/0x103
Apr 24 07:22:40 bnd kernel: [<ffffffff8020a148>] child_rip+0xa/0x12
Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>]
keventd_create_kthread+0x0/0x6d
Apr 24 07:22:40 bnd kernel: [<ffffffff8023c327>] run_workqueue+0x10/0x179
Apr 24 07:22:40 bnd kernel: [<ffffffff8023f975>] kthread+0x0/0x103
Apr 24 07:22:40 bnd kernel: [<ffffffff8020a13e>] child_rip+0x0/0x12
Apr 24 07:22:40 bnd kernel:
Apr 24 07:22:40 bnd kernel:
Apr 24 07:22:40 bnd kernel: Code: 0f 0b eb fe 48 8d bb 68 4b 00 00 e8 38
df 4a f8 41 8b 95 d0
Apr 24 07:22:40 bnd kernel: RIP [<ffffffff88089f51>]
:aic94xx:asd_abort_task+0x423/0x54a
Apr 24 07:22:40 bnd kernel: RSP <ffff81023117fde0>
--------------------------------------------------------------------------------------------------------------------------------
I tried to fetch and compile the
Adaptec_adp94xx-OpenBuild-B11662.i386.rpm driver from adaptec but got a
lot of stupid compile errors.
Is there anything that I can do in order to make it work ? Would you
need more information that could help you understand the problem?
Please Cc: me at brailateo@flex.ro
Big , BIG, BIG thanks in advance !
Constantin Teodorescu
ROMANIA
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel crash with AIC94xx
2007-04-24 8:52 Kernel crash with AIC94xx Constantin Teodorescu
@ 2007-04-24 18:37 ` James Bottomley
[not found] ` <462E5732.8020408@gmail.com>
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2007-04-24 18:37 UTC (permalink / raw)
To: brailateo; +Cc: linux-scsi
On Tue, 2007-04-24 at 11:52 +0300, Constantin Teodorescu wrote:
> Hello, I hope I can get a little help from you regarding this kind of
> crash !
>
> Hardware:
> - server, TYAN Tempest i5000VS S5372 BIOS v1.0.4
> - 8 SATA drives Seagate 136 Gb attached on a AIC-9410 controller
> - one IDE (boot disk and system)
This configuration doesn't work on the vanilla linux kernel ... you need
the scsi-aic94xxx-sas-2.6 tree as well for this; is that what you're
running with?
> - 8 Gb RAM
>
> Software:
> - OpenSUSE 10.2 x86_64 (tried also with SLES 10 but didn't succed in
> compiling adp94xx driver from Adaptec)
>
> Kernels: i tried with any of them : linux-2.6.20.1 , linux-2.6.20.4 ,
> linux-2.6.20.7 , linux-2.6.21.rc7
> The last one has the 1.0.3 version of aic94xx driver but the results are
> the same :-(
>
> Description:
> - the server is running a very heavy loaded PostgreSQL database with
> tables spread on those SAS drives, a lot of writes and reads
Are these SAS or SATA drives?
> - at least 4, 5 times a day I got some warnings in /var/log/messages
> (sas: Enter sas_scsi_recover_host , trying to find task XXX --->
> aic94xx: came back from clear nexus) but the system is still working
> - more rarely (once per day) I got the following bug in
> /var/log/messages and the system is crashed, SAS drivers are not working
> anymore, shutdown command is waiting forever, need to hardware reset the
> system
>
>
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e2c0, task
> 0xffff81005bfcb080, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810047f9dd00, task
> 0xffff81007df80cc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31180, task
> 0xffff8101247ad500, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81021b8af380, task
> 0xffff81012e550ac0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101698c3940, task
> 0xffff8101a3b69b80, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865680, task
> 0xffff8101a3b69380, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37340, task
> 0xffff8101a3b69580, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31a40, task
> 0xffff810058a93dc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b940, task
> 0xffff81005bfcbc80, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37880, task
> 0xffff81015856bd00, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81022fa2f940, task
> 0xffff8101d2cf87c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b080, task
> 0xffff81005bfcb880, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37dc0, task
> 0xffff8101d186a940, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620640, task
> 0xffff81010d46a940, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae1c0, task
> 0xffff81012e9bf4c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae380, task
> 0xffff8101d186a740, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e8654c0, task
> 0xffff8101247ad100, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620480, task
> 0xffff81012e5502c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37180, task
> 0xffff8101d2cf89c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81017d5268c0, task
> 0xffff8101d186a540, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e800, task
> 0xffff81015856b900, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81014f8db600, task
> 0xffff81007df808c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865bc0, task
> 0xffff81012e550cc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620100, task
> 0xffff8101a3b69980, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: Enter sas_scsi_recover_host
> Apr 24 07:22:20 bnd kernel: sas: trying to find task 0xffff81005bfcb080
> Apr 24 07:22:20 bnd kernel: sas: sas_scsi_find_task: aborting task
> 0xffff81005bfcb080
> Apr 24 07:22:25 bnd kernel: aic94xx: tmf timed out
> Apr 24 07:22:25 bnd kernel: aic94xx: tmf came back
> Apr 24 07:22:25 bnd kernel: aic94xx: task not done, clearing nexus
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: POST
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus
> posted, waiting...
> Apr 24 07:22:30 bnd kernel: aic94xx: asd_clear_nexus_timedout: here
> Apr 24 07:22:35 bnd kernel: aic94xx: came back from clear nexus
> Apr 24 07:22:35 bnd kernel: aic94xx: task not done, clearing nexus
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: POST
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus
> posted, waiting...
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: here
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete:
> opcode: 0x0
> Apr 24 07:22:40 bnd kernel: aic94xx: came back from clear nexus
> Apr 24 07:22:40 bnd kernel: ------------[ cut here ]------------
> Apr 24 07:22:40 bnd kernel: kernel BUG at
> drivers/scsi/aic94xx/aic94xx_hwi.h:354!
This is the attempted free of an in flight command.
> Apr 24 07:22:40 bnd kernel: invalid opcode: 0000 [1] SMP
> Apr 24 07:22:40 bnd kernel: CPU 0
> Apr 24 07:22:40 bnd kernel: Modules linked in: aic94xx libsas xfs
> Apr 24 07:22:40 bnd kernel: Pid: 3504, comm: scsi_eh_0 Not tainted
> 2.6.21-rc7_RC7 #1
> Apr 24 07:22:40 bnd kernel: RIP: 0010:[<ffffffff88089f51>]
> [<ffffffff88089f51>] :aic94xx:asd_abort_task+0x423/0x54a
> Apr 24 07:22:40 bnd kernel: RSP: 0000:ffff81023117fde0 EFLAGS: 00010287
> Apr 24 07:22:40 bnd kernel: RAX: 0000000000000000 RBX: ffff810231618000
> RCX: ffff81022f66a800
> Apr 24 07:22:40 bnd kernel: RDX: ffffffff88089ebf RSI: ffff81005bfcb080
> RDI: ffff81005bfcb098
> Apr 24 07:22:40 bnd kernel: RBP: 0000000000000000 R08: ffff81005bfcb080
> R09: 0000000000000001
> Apr 24 07:22:40 bnd kernel: R10: ffffffff88089ea6 R11: ffff81013ba5bf80
> R12: ffff81005bfcb080
> Apr 24 07:22:40 bnd kernel: R13: ffff810156e4f580 R14: ffff8101d49fb9c0
> R15: ffff81022f66a800
> Apr 24 07:22:40 bnd kernel: FS: 0000000000000000(0000)
> GS:ffffffff80712000(0000) knlGS:0000000000000000
> Apr 24 07:22:40 bnd kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
> 000000008005003b
> Apr 24 07:22:40 bnd kernel: CR2: 00002b110eff3fe8 CR3: 00000001e75f6000
> CR4: 00000000000006e0
> Apr 24 07:22:40 bnd kernel: Process scsi_eh_0 (pid: 3504, threadinfo
> ffff81023117e000, task ffff810232274fe0)
> Apr 24 07:22:40 bnd kernel: Stack: ffff81023117dac8 00000000c9f5e2c0
> ffff81023117fe50 ffff81005bfcb080
> Apr 24 07:22:40 bnd kernel: 0000000000000000 ffff8101c9f5e2c0
> ffff81005bfcb098 ffffffff88073293
> Apr 24 07:22:40 bnd kernel: ffff810231618010 ffff81023046c000
> ffff8102316181e0 ffff81023046c000
> Apr 24 07:22:40 bnd kernel: Call Trace:
> Apr 24 07:22:40 bnd kernel: [<ffffffff88073293>]
> :libsas:sas_scsi_recover_host+0x1c2/0x83b
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>]
> keventd_create_kthread+0x0/0x6d
> Apr 24 07:22:40 bnd kernel: [<ffffffff80403b26>]
> scsi_error_handler+0x6e/0x2d7
> Apr 24 07:22:40 bnd kernel: [<ffffffff80403ab8>]
> scsi_error_handler+0x0/0x2d7
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023fa46>] kthread+0xd1/0x103
> Apr 24 07:22:40 bnd kernel: [<ffffffff8020a148>] child_rip+0xa/0x12
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>]
> keventd_create_kthread+0x0/0x6d
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023c327>] run_workqueue+0x10/0x179
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023f975>] kthread+0x0/0x103
> Apr 24 07:22:40 bnd kernel: [<ffffffff8020a13e>] child_rip+0x0/0x12
> Apr 24 07:22:40 bnd kernel:
> Apr 24 07:22:40 bnd kernel:
> Apr 24 07:22:40 bnd kernel: Code: 0f 0b eb fe 48 8d bb 68 4b 00 00 e8 38
> df 4a f8 41 8b 95 d0
> Apr 24 07:22:40 bnd kernel: RIP [<ffffffff88089f51>]
> :aic94xx:asd_abort_task+0x423/0x54a
> Apr 24 07:22:40 bnd kernel: RSP <ffff81023117fde0>
>
James
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <462DB9DB.3020308@flex.ro>]
* Re: Kernel crash with AIC94xx
[not found] <462DB9DB.3020308@flex.ro>
@ 2007-04-24 13:06 ` Brian King
0 siblings, 0 replies; 8+ messages in thread
From: Brian King @ 2007-04-24 13:06 UTC (permalink / raw)
To: brailateo, SCSI Mailing List
Copying linux-scsi...
-Brian
Constantin Teodorescu wrote:
> Hello, I hope I can get a little help from you regarding this kind of
> crash !
>
> Hardware:
> - server, TYAN Tempest i5000VS S5372 BIOS v1.0.4
> - 8 SATA drives Seagate 136 Gb attached on a AIC-9410 controller
> - one IDE (boot disk and system)
> - 8 Gb RAM
>
> Software:
> - OpenSUSE 10.2 x86_64 (tried also with SLES 10 but didn't succed in
> compiling adp94xx driver from Adaptec)
>
> Kernels: i tried with any of them : linux-2.6.20.1 , linux-2.6.20.4 ,
> linux-2.6.20.7 , linux-2.6.21.rc7
> The last one has the 1.0.3 version of aic94xx driver but the results are
> the same :-(
>
> Description:
> - the server is running a very heavy loaded PostgreSQL database with
> tables spread on those SAS drives, a lot of writes and reads
> - at least 4, 5 times a day I got some warnings in /var/log/messages
> (sas: Enter sas_scsi_recover_host , trying to find task XXX --->
> aic94xx: came back from clear nexus) but the system is still working
> - more rarely (once per day) I got the following bug in
> /var/log/messages and the system is crashed, SAS drivers are not working
> anymore, shutdown command is waiting forever, need to hardware reset the
> system
>
>
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e2c0, task
> 0xffff81005bfcb080, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810047f9dd00, task
> 0xffff81007df80cc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31180, task
> 0xffff8101247ad500, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81021b8af380, task
> 0xffff81012e550ac0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101698c3940, task
> 0xffff8101a3b69b80, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865680, task
> 0xffff8101a3b69380, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37340, task
> 0xffff8101a3b69580, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31a40, task
> 0xffff810058a93dc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b940, task
> 0xffff81005bfcbc80, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37880, task
> 0xffff81015856bd00, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81022fa2f940, task
> 0xffff8101d2cf87c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b080, task
> 0xffff81005bfcb880, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37dc0, task
> 0xffff8101d186a940, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620640, task
> 0xffff81010d46a940, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae1c0, task
> 0xffff81012e9bf4c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae380, task
> 0xffff8101d186a740, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e8654c0, task
> 0xffff8101247ad100, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620480, task
> 0xffff81012e5502c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37180, task
> 0xffff8101d2cf89c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81017d5268c0, task
> 0xffff8101d186a540, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e800, task
> 0xffff81015856b900, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81014f8db600, task
> 0xffff81007df808c0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865bc0, task
> 0xffff81012e550cc0, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620100, task
> 0xffff8101a3b69980, timed out: EH_NOT_HANDLED
> Apr 24 07:22:20 bnd kernel: sas: Enter sas_scsi_recover_host
> Apr 24 07:22:20 bnd kernel: sas: trying to find task 0xffff81005bfcb080
> Apr 24 07:22:20 bnd kernel: sas: sas_scsi_find_task: aborting task
> 0xffff81005bfcb080
> Apr 24 07:22:25 bnd kernel: aic94xx: tmf timed out
> Apr 24 07:22:25 bnd kernel: aic94xx: tmf came back
> Apr 24 07:22:25 bnd kernel: aic94xx: task not done, clearing nexus
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: POST
> Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus
> posted, waiting...
> Apr 24 07:22:30 bnd kernel: aic94xx: asd_clear_nexus_timedout: here
> Apr 24 07:22:35 bnd kernel: aic94xx: came back from clear nexus
> Apr 24 07:22:35 bnd kernel: aic94xx: task not done, clearing nexus
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: PRE
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: POST
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus
> posted, waiting...
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: here
> Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete:
> opcode: 0x0
> Apr 24 07:22:40 bnd kernel: aic94xx: came back from clear nexus
> Apr 24 07:22:40 bnd kernel: ------------[ cut here ]------------
> Apr 24 07:22:40 bnd kernel: kernel BUG at
> drivers/scsi/aic94xx/aic94xx_hwi.h:354!
> Apr 24 07:22:40 bnd kernel: invalid opcode: 0000 [1] SMP
> Apr 24 07:22:40 bnd kernel: CPU 0
> Apr 24 07:22:40 bnd kernel: Modules linked in: aic94xx libsas xfs
> Apr 24 07:22:40 bnd kernel: Pid: 3504, comm: scsi_eh_0 Not tainted
> 2.6.21-rc7_RC7 #1
> Apr 24 07:22:40 bnd kernel: RIP: 0010:[<ffffffff88089f51>]
> [<ffffffff88089f51>] :aic94xx:asd_abort_task+0x423/0x54a
> Apr 24 07:22:40 bnd kernel: RSP: 0000:ffff81023117fde0 EFLAGS: 00010287
> Apr 24 07:22:40 bnd kernel: RAX: 0000000000000000 RBX: ffff810231618000
> RCX: ffff81022f66a800
> Apr 24 07:22:40 bnd kernel: RDX: ffffffff88089ebf RSI: ffff81005bfcb080
> RDI: ffff81005bfcb098
> Apr 24 07:22:40 bnd kernel: RBP: 0000000000000000 R08: ffff81005bfcb080
> R09: 0000000000000001
> Apr 24 07:22:40 bnd kernel: R10: ffffffff88089ea6 R11: ffff81013ba5bf80
> R12: ffff81005bfcb080
> Apr 24 07:22:40 bnd kernel: R13: ffff810156e4f580 R14: ffff8101d49fb9c0
> R15: ffff81022f66a800
> Apr 24 07:22:40 bnd kernel: FS: 0000000000000000(0000)
> GS:ffffffff80712000(0000) knlGS:0000000000000000
> Apr 24 07:22:40 bnd kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
> 000000008005003b
> Apr 24 07:22:40 bnd kernel: CR2: 00002b110eff3fe8 CR3: 00000001e75f6000
> CR4: 00000000000006e0
> Apr 24 07:22:40 bnd kernel: Process scsi_eh_0 (pid: 3504, threadinfo
> ffff81023117e000, task ffff810232274fe0)
> Apr 24 07:22:40 bnd kernel: Stack: ffff81023117dac8 00000000c9f5e2c0
> ffff81023117fe50 ffff81005bfcb080
> Apr 24 07:22:40 bnd kernel: 0000000000000000 ffff8101c9f5e2c0
> ffff81005bfcb098 ffffffff88073293
> Apr 24 07:22:40 bnd kernel: ffff810231618010 ffff81023046c000
> ffff8102316181e0 ffff81023046c000
> Apr 24 07:22:40 bnd kernel: Call Trace:
> Apr 24 07:22:40 bnd kernel: [<ffffffff88073293>]
> :libsas:sas_scsi_recover_host+0x1c2/0x83b
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>]
> keventd_create_kthread+0x0/0x6d
> Apr 24 07:22:40 bnd kernel: [<ffffffff80403b26>]
> scsi_error_handler+0x6e/0x2d7
> Apr 24 07:22:40 bnd kernel: [<ffffffff80403ab8>]
> scsi_error_handler+0x0/0x2d7
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023fa46>] kthread+0xd1/0x103
> Apr 24 07:22:40 bnd kernel: [<ffffffff8020a148>] child_rip+0xa/0x12
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>]
> keventd_create_kthread+0x0/0x6d
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023c327>] run_workqueue+0x10/0x179
> Apr 24 07:22:40 bnd kernel: [<ffffffff8023f975>] kthread+0x0/0x103
> Apr 24 07:22:40 bnd kernel: [<ffffffff8020a13e>] child_rip+0x0/0x12
> Apr 24 07:22:40 bnd kernel:
> Apr 24 07:22:40 bnd kernel:
> Apr 24 07:22:40 bnd kernel: Code: 0f 0b eb fe 48 8d bb 68 4b 00 00 e8 38
> df 4a f8 41 8b 95 d0
> Apr 24 07:22:40 bnd kernel: RIP [<ffffffff88089f51>]
> :aic94xx:asd_abort_task+0x423/0x54a
> Apr 24 07:22:40 bnd kernel: RSP <ffff81023117fde0>
>
> --------------------------------------------------------------------------------------------------------------------------------
>
> I tried to fetch and compile the
> Adaptec_adp94xx-OpenBuild-B11662.i386.rpm driver from adaptec but got a
> lot of stupid compile errors.
> Is there anything that I can do in order to make it work ? Would you
> need more information that could help you understand the problem?
> Please Cc: me at brailateo@gmail.com
>
> Big , BIG, BIG thanks in advance !
> Constantin Teodorescu
> ROMANIA
>
>
--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-05-01 15:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-24 8:52 Kernel crash with AIC94xx Constantin Teodorescu
2007-04-24 18:37 ` James Bottomley
[not found] ` <462E5732.8020408@gmail.com>
2007-04-24 19:20 ` Kernel crash with AIC94xx (one step forward, hope it's lucky) James Bottomley
2007-04-26 9:39 ` Luben Tuikov
2007-04-26 9:55 ` Constantin Teodorescu
2007-04-26 20:17 ` Darrick J. Wong
2007-05-01 15:57 ` Darrick J. Wong
[not found] <462DB9DB.3020308@flex.ro>
2007-04-24 13:06 ` Kernel crash with AIC94xx Brian King
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.