* Weird SCSI error, can anyone interpret?
@ 2004-03-04 4:06 Kevin P. Fleming
2004-03-05 6:07 ` Andrew Vasquez
0 siblings, 1 reply; 3+ messages in thread
From: Kevin P. Fleming @ 2004-03-04 4:06 UTC (permalink / raw)
To: linux-scsi
I've got a client with a server running 2.6.3-rc2, using a QLogic
ISP2100 connected to a CMD FC-to-SCSI RAID controller. On Saturday
morning, the system lost access to the RAID array, with thousands of
messages like these in the log:
Feb 28 03:15:46 alpha kernel: SCSI error : <2 0 0 0> return code = 0x20000
Feb 28 03:15:46 alpha kernel: end_request: I/O error, dev sdc, sector
278362544
Feb 28 03:15:51 alpha kernel: SCSI error : <2 0 0 0> return code = 0x20000
Feb 28 03:15:51 alpha kernel: end_request: I/O error, dev sdc, sector
304356160
Feb 28 03:16:06 alpha kernel: SCSI error : <2 0 0 0> return code = 0x20000
Feb 28 03:16:06 alpha kernel: end_request: I/O error, dev sdc, sector
360710592
Feb 28 03:16:06 alpha kernel: SCSI error : <2 0 0 0> return code = 0x20000
Feb 28 03:16:06 alpha kernel: end_request: I/O error, dev sdc, sector
327384736
There are also some with "return code = 0x20008".
Restarting the Linux system did not help; power cycling the RAID array
did help. I'd like to get some clue as to whether this points to a
problem with the RAID controller, the FC host adapter, or the kernel's
QLogic driver. This system had been previously using the QLogic driver
from SourceForge for months on a 2.6.0-pre(something) kernel, but
recently got upgraded once the QLogic driver was merged into the main
kernel distribution.
Please CC me on any replies, I'm not subscribed to linux-scsi. Thanks!
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Weird SCSI error, can anyone interpret?
2004-03-04 4:06 Weird SCSI error, can anyone interpret? Kevin P. Fleming
@ 2004-03-05 6:07 ` Andrew Vasquez
2004-03-05 14:47 ` Kevin P. Fleming
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Vasquez @ 2004-03-05 6:07 UTC (permalink / raw)
To: linux-scsi; +Cc: kpfleming, Andrew Vasquez
On Wed, 03 Mar 2004, Kevin P. Fleming wrote:
> I've got a client with a server running 2.6.3-rc2, using a QLogic
> ISP2100 connected to a CMD FC-to-SCSI RAID controller. On Saturday
> morning, the system lost access to the RAID array, with thousands of
> messages like these in the log:
>
> Feb 28 03:15:46 alpha kernel: SCSI error : <2 0 0 0> return code = 0x20000
> Feb 28 03:15:46 alpha kernel: end_request: I/O error, dev sdc, sector
> 278362544
Before the deluge of I/O error messages, does the messages file
contain any useful bits of information, i.e. the driver posting
messages about link failures, devices going away, etc.
> Restarting the Linux system did not help;
>
Did the driver not recognize the RAID box on the loop?
> power cycling the RAID array did help.
Now that's interesting...
> I'd like to get some clue as to whether this points to a
> problem with the RAID controller, the FC host adapter, or the kernel's
> QLogic driver.
>
Well given the information that was presented, I'd say it seems rather
suspicious that the RAID- box needed a power-cycle to be restored into
functioning state. What type of I/O patterns were being run to the
RAID box? How many concurrent commands were being queued?
Could you enable some additional debug in the driver and rerun your
test? Set DEBUG_QLA2100 to 1 in qla_settings.h and define
QL_DEBUG_LEVEL_2 in qla_dbg.h. Recompile the driver, then run your
test. If a failure occurs, send me the resultant /var/log/messages
file and the output of the following command:
# cat /proc/scsi/qla2xxx/*
This should provide some clues as to what's occuring on the loop.
Regards,
Andrew Vasquez
QLogic Corporation
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Weird SCSI error, can anyone interpret?
2004-03-05 6:07 ` Andrew Vasquez
@ 2004-03-05 14:47 ` Kevin P. Fleming
0 siblings, 0 replies; 3+ messages in thread
From: Kevin P. Fleming @ 2004-03-05 14:47 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-scsi
Andrew Vasquez wrote:
> Before the deluge of I/O error messages, does the messages file
> contain any useful bits of information, i.e. the driver posting
> messages about link failures, devices going away, etc.
Nope, there were no other SCSI related messages before these (all the
way back to the last kernel boot, about 18 hours before).
> Did the driver not recognize the RAID box on the loop?
No, the loop appeared to come up but there were no devices present.
Thanks for asking, I hadn't checked the log file to that level of detail
yet. That's a pretty important sign that the problem was the RAID
controller, given that the ISP2100 and the CMD-7220 are the only devices
on the loop (direct cable between them).
> Now that's interesting...
Yes, that's why I think this may be a RAID controller problem (we've
already had one of the original two boards die a year or so ago).
> Well given the information that was presented, I'd say it seems rather
> suspicious that the RAID- box needed a power-cycle to be restored into
> functioning state. What type of I/O patterns were being run to the
> RAID box? How many concurrent commands were being queued?
At this time of day there would have been almost zero activity. The
"flood" of error messages I referred was over the next 56 hours or so,
from when the problem occurred until users starting trying to hit the
server on Monday morning.
> Could you enable some additional debug in the driver and rerun your
> test? Set DEBUG_QLA2100 to 1 in qla_settings.h and define
> QL_DEBUG_LEVEL_2 in qla_dbg.h. Recompile the driver, then run your
> test. If a failure occurs, send me the resultant /var/log/messages
> file and the output of the following command:
>
> # cat /proc/scsi/qla2xxx/*
I can try that this afternoon, can't promise when (if) the problem will
occur again, and I will have to have the customer ready to issue the cat
command but they are capable of that.
Thanks for your help.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-03-05 14:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-04 4:06 Weird SCSI error, can anyone interpret? Kevin P. Fleming
2004-03-05 6:07 ` Andrew Vasquez
2004-03-05 14:47 ` Kevin P. Fleming
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox