* Fw: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel
@ 2005-04-11 20:51 Andrew Morton
2005-04-11 21:18 ` Mike Anderson
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2005-04-11 20:51 UTC (permalink / raw)
To: linux-scsi; +Cc: gregsurbey
Begin forwarded message:
Date: Mon, 11 Apr 2005 13:03:16 -0700
From: bugme-daemon@osdl.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel
http://bugme.osdl.org/show_bug.cgi?id=4473
Summary: QLogic 2100: SCSI timeouts, device resets, and crashes
kernel
Kernel Version: 2.6.10 2.6.11.6.6 2.6.11-gentoo-r5
Status: NEW
Severity: blocking
Owner: andmike@us.ibm.com
Submitter: gregsurbey@hotmail.com
Distribution: Gentoo 2005.0
Hardware Environment: Dell PowerEdge 600SC
Configuration: md RAID-5 11 disks 1 spare
Card Description:
QLogic PCI to Fibre Channel Host Adapter for QLA2100:
Firmware version 1.19.24 TP, Driver version 8.00.02b4-k
ISP: ISP2100, Serial# A61828
Request Queue = 0x1fce8000, Response Queue = 0x1fce2000
Request Queue count = 128, Response Queue count = 64
Total number of active commands = 0
Total number of interrupts = 77339
Device queue depth = 0x10
Number of free request entries = 59
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 0
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state = <READY>, flags = 0x1c93
Dpc flags = 0x80000
MBX flags = 0x0
Link down Timeout = 000
Port down retry = 030
Login retry count = 030
Commands retried with dropped frame(s) = 0
Product ID = 4953 5020 2020 0001
During the heavy use of md RAID reconstruction the card creates output in dmesg
which I will attach to this bug report in my next post.
Steps to reproduce:
Use a Qlogic 2100 card with the qla2xxx driver
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Fw: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel
2005-04-11 20:51 Fw: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel Andrew Morton
@ 2005-04-11 21:18 ` Mike Anderson
2005-04-11 22:15 ` Andrew Vasquez
0 siblings, 1 reply; 3+ messages in thread
From: Mike Anderson @ 2005-04-11 21:18 UTC (permalink / raw)
To: Andrew Morton, Andrew Vasquez; +Cc: linux-scsi, gregsurbey
Greg,
I will add the same comment to the bug.
Did this work on a previous version of the kernel? Just checking to
understand if your connectivity to the storage unit or the unit itself
could be an issue.
If appears we are receiving timeouts, but on abort the qla is indicating
that the IO has already been completed. We could have IOs that are taking
near max timeout and then the error handler races with the completion of
the IO.
A debug step you could try is to raise the default timeout from 30 to
something like 60 seconds to see if this effects the error. To do this
just echo "60" > /sys/block/sd${N}/device/timeout. Also you can run iostat
during your testing to see what you IO times / queue depths look like.
Andrew Vasquez may be able to add more info.
Andrew Morton [akpm@osdl.org] wrote:
>
>
> Begin forwarded message:
>
> Date: Mon, 11 Apr 2005 13:03:16 -0700
> From: bugme-daemon@osdl.org
> To: bugme-new@lists.osdl.org
> Subject: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel
>
>
> http://bugme.osdl.org/show_bug.cgi?id=4473
>
> Summary: QLogic 2100: SCSI timeouts, device resets, and crashes
> kernel
> Kernel Version: 2.6.10 2.6.11.6.6 2.6.11-gentoo-r5
> Status: NEW
> Severity: blocking
> Owner: andmike@us.ibm.com
> Submitter: gregsurbey@hotmail.com
>
>
> Distribution: Gentoo 2005.0
> Hardware Environment: Dell PowerEdge 600SC
> Configuration: md RAID-5 11 disks 1 spare
>
> Card Description:
> QLogic PCI to Fibre Channel Host Adapter for QLA2100:
> Firmware version 1.19.24 TP, Driver version 8.00.02b4-k
> ISP: ISP2100, Serial# A61828
> Request Queue = 0x1fce8000, Response Queue = 0x1fce2000
> Request Queue count = 128, Response Queue count = 64
> Total number of active commands = 0
> Total number of interrupts = 77339
> Device queue depth = 0x10
> Number of free request entries = 59
> Number of mailbox timeouts = 0
> Number of ISP aborts = 0
> Number of loop resyncs = 0
> Number of retries for empty slots = 0
> Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
> Host adapter:loop state = <READY>, flags = 0x1c93
> Dpc flags = 0x80000
> MBX flags = 0x0
> Link down Timeout = 000
> Port down retry = 030
> Login retry count = 030
> Commands retried with dropped frame(s) = 0
> Product ID = 4953 5020 2020 0001
>
> During the heavy use of md RAID reconstruction the card creates output in dmesg
> which I will attach to this bug report in my next post.
>
> Steps to reproduce:
> Use a Qlogic 2100 card with the qla2xxx driver
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Fw: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel
2005-04-11 21:18 ` Mike Anderson
@ 2005-04-11 22:15 ` Andrew Vasquez
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Vasquez @ 2005-04-11 22:15 UTC (permalink / raw)
To: gregsurbey; +Cc: Andrew Morton, linux-scsi, Mike Anderson
On Mon, 11 Apr 2005, Mike Anderson wrote:
> I will add the same comment to the bug.
>
> Did this work on a previous version of the kernel? Just checking to
> understand if your connectivity to the storage unit or the unit itself
> could be an issue.
>
> If appears we are receiving timeouts, but on abort the qla is indicating
> that the IO has already been completed. We could have IOs that are taking
> near max timeout and then the error handler races with the completion of
> the IO.
>
> A debug step you could try is to raise the default timeout from 30 to
> something like 60 seconds to see if this effects the error. To do this
> just echo "60" > /sys/block/sd${N}/device/timeout. Also you can run iostat
> during your testing to see what you IO times / queue depths look like.
>
> Andrew Vasquez may be able to add more info.
>
Greg,
The logs seem to indicate some (additional) problems with the ISP
after the device-reset completes:
qla2100 0000:00:06.0: scsi(0:0:1:0): DEVICE RESET ISSUED.
qla2100 0000:00:06.0: scsi(0:0:1:0): DEVICE RESET SUCCEEDED.
qla2100 0000:00:06.0: ISP System Error - mbx1=7737h mbx2=dc5h mbx3=0h.
qla2100 0000:00:06.0: Firmware dump saved to temp buffer (0/dcec0000).
there is a small tool availble (qla_dmp.sh):
ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh
which I'd like you to use if the machine is still in a somewhat usable
state and you see a message similar to the following:
qla2100 0000:00:06.0: ISP System Error - mbx1=7737h mbx2=dc5h mbx3=0h.
qla2100 0000:00:06.0: Firmware dump saved to temp buffer (0/dcec0000).
Execute the following command:
# ./qla_dmp.sh 0
The value passed to qla_dmp.sh should be the same as the first integer
in the 'saved to temp buffer' string (in this example, 0). If the
operation was successful, a message like to following should be logged
in the messages file.
Firmware dumped to file fw_dump_20041217_023222.txt
compress the file (in this example):
# bzip2 fw_dump_20041217_023222.txt
and forward over the compressed file in addition to the
var/log/messages file.
--
av
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-04-11 22:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-11 20:51 Fw: [Bugme-new] [Bug 4473] New: QLogic 2100: SCSI timeouts, device resets, and crashes kernel Andrew Morton
2005-04-11 21:18 ` Mike Anderson
2005-04-11 22:15 ` Andrew Vasquez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox