* puzzling scsi return code 20000
@ 2004-07-05 12:07 Yann Dupont
2004-07-06 18:46 ` Andrew Vasquez
0 siblings, 1 reply; 6+ messages in thread
From: Yann Dupont @ 2004-07-05 12:07 UTC (permalink / raw)
To: linux-scsi
hello. I have some problems with a SAN here. Don't know if the problems
lies in scsi code/driver or other thing...
The setup is :
a QLA2300 (Fibre channel) with 6.0.64 driver from qlogic site,
2.4.26 kernel, device mapper 1.0.17 applied,
and xfs or reiserfs on top of lvm volumes (using evms)
This setup seemed fine since some months now, with giga of data moved
every day, but I noticed this on the log:
Jun 22 16:45:15 talisker kernel: SCSI disk error : host 3 channel 0 id 2
lun 1 return code = 20000
Jun 22 16:45:15 talisker kernel: I/O error: dev 08:10, sector 145423312
I have two of these errors in the last month. What's the meaning of this
return code ? I made
badblock on the /dev/evms volume, and all is OK.
I also performed a disk scan on the raid array, (an IFT 7250F) and there
are no errors.
The machine can be quite heavily loaded. May I suspect the qla driver ?
May I suspect the underlying LVM. The LVM has been resized by the
past... Can an error happens here too ?
I'm very puzzled...
This is just a summary . I can furnish more information if needed.
--
Yann Dupont, Cri de l'université de Nantes
Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: puzzling scsi return code 20000
2004-07-05 12:07 puzzling scsi return code 20000 Yann Dupont
@ 2004-07-06 18:46 ` Andrew Vasquez
2004-07-07 10:04 ` Yann Dupont
2004-07-29 16:11 ` Yann Dupont
0 siblings, 2 replies; 6+ messages in thread
From: Andrew Vasquez @ 2004-07-06 18:46 UTC (permalink / raw)
To: Yann Dupont; +Cc: linux-scsi
On Mon, 05 Jul 2004, Yann Dupont wrote:
> hello. I have some problems with a SAN here. Don't know if the problems
> lies in scsi code/driver or other thing...
>
> The setup is :
> a QLA2300 (Fibre channel) with 6.0.64 driver from qlogic site,
>
I'm not familiar with that version number (are you sure it's 6.0.64,
perhaps 6.06.64?). I believe the latest driver submitted to DVT has
been posted to the website (7.00.03).
> 2.4.26 kernel, device mapper 1.0.17 applied,
> and xfs or reiserfs on top of lvm volumes (using evms)
>
> This setup seemed fine since some months now, with giga of data moved
> every day, but I noticed this on the log:
>
> Jun 22 16:45:15 talisker kernel: SCSI disk error : host 3 channel 0 id 2
> lun 1 return code = 20000
> Jun 22 16:45:15 talisker kernel: I/O error: dev 08:10, sector 145423312
>
> I have two of these errors in the last month. What's the meaning of this
> return code ? I made
> badblock on the /dev/evms volume, and all is OK.
>
20000 == DID_BUS_BUSY -- one possibility, the QLogic driver returned a
command with the status, mid-layer ran out of retries, subsequently
returning the status to the upper-layer driver. But that doesn't
help in finding out what went wrong.
> I also performed a disk scan on the raid array, (an IFT 7250F) and there
> are no errors.
>
> The machine can be quite heavily loaded. May I suspect the qla driver ?
> May I suspect the underlying LVM. The LVM has been resized by the
> past... Can an error happens here too ?
> I'm very puzzled...
>
> This is just a summary . I can furnish more information if needed.
>
I'd suggest you try a more recent driver (noted above), if the I/O
errors persist, then report the problem to tech-support at:
http://connection.qlogic.com/support/report/index.asp?id=csg
with a complete description.
Regards,
Andrew Vasquez
QLogic Corporation
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: puzzling scsi return code 20000
2004-07-06 18:46 ` Andrew Vasquez
@ 2004-07-07 10:04 ` Yann Dupont
2004-07-09 9:21 ` Yann Dupont
2004-07-29 16:11 ` Yann Dupont
1 sibling, 1 reply; 6+ messages in thread
From: Yann Dupont @ 2004-07-07 10:04 UTC (permalink / raw)
To: linux-scsi
Andrew Vasquez wrote:
>On Mon, 05 Jul 2004, Yann Dupont wrote:
>
>
>
Hello andrew, thanks for taking time to respond
>>hello. I have some problems with a SAN here. Don't know if the problems
>>lies in scsi code/driver or other thing...
>>
>>The setup is :
>>a QLA2300 (Fibre channel) with 6.0.64 driver from qlogic site,
>>
>>
>>
>
>I'm not familiar with that version number (are you sure it's 6.0.64,
>perhaps 6.06.64?). I believe the latest driver submitted to DVT has
>
>
of course it's 6.06.64, sorry for the mistake. this is the 'IBM
approved' version
>been posted to the website (7.00.03).
>
>
>
Yes i've seen that . But I had problems with that version (see later)
>>2.4.26 kernel, device mapper 1.0.17 applied,
>>and xfs or reiserfs on top of lvm volumes (using evms)
>>
>>This setup seemed fine since some months now, with giga of data moved
>>every day, but I noticed this on the log:
>>
>>Jun 22 16:45:15 talisker kernel: SCSI disk error : host 3 channel 0 id 2
>>lun 1 return code = 20000
>>Jun 22 16:45:15 talisker kernel: I/O error: dev 08:10, sector 145423312
>>
>>I have two of these errors in the last month. What's the meaning of this
>>return code ? I made
>>badblock on the /dev/evms volume, and all is OK.
>>
>>
>>
>
>20000 == DID_BUS_BUSY -- one possibility, the QLogic driver returned a
>command with the status, mid-layer ran out of retries, subsequently
>returning the status to the upper-layer driver. But that doesn't
>help in finding out what went wrong.
>
>
>
>>I also performed a disk scan on the raid array, (an IFT 7250F) and there
>>are no errors.
>>
>>The machine can be quite heavily loaded. May I suspect the qla driver ?
>>May I suspect the underlying LVM. The LVM has been resized by the
>>past... Can an error happens here too ?
>>I'm very puzzled...
>>
>>This is just a summary . I can furnish more information if needed.
>>
>>
>>
>
>I'd suggest you try a more recent driver (noted above), if the I/O
>errors persist, then report the problem to tech-support at:
>
>
>
Well the 7.00.03 version didn't see any of the disks on the second
channels of the controllers,
until I realise this is a failover by default. Appending ql2xfailover=0
did the trick.
Anyway, Yesterday the problem begins to show every second on another
machine, on a volume with lots of data...
So I took lots of time to see where the problem is..
Summary : My problem is solved since this morning but i'm not quite sure
qlogic driver is involved:
Quick resume :
2.4.26 + dm 0.17 + qla 6.06.64 = scsi error
2.4.26 + dm 0.17 + qla 7.00.03 = hang (no errors, but all scsi/qla
operations just hang) then (after long time) scsi errors
-> upgrade the san RAID to the very last firmware,
same kernels as above : same errors
2.6.7 (with embedded dm & qla 8.xx.x) = scsi error too (tried that
because I know lots of work has been done on the scsi, qla & dm layer)
2.4.27-rc3 + dm 0.19 + qla 7.00.03 = No errors !!!
As my problem has only arised on lvm volumes that has been resized by
evms, and the only difference between
failing & non failing operation is the device mapper version, I begins
to wonder if the culprit is not here.
What suprises me anyway is that it's mapped to scsi error...
Anyway, quite pleased that the problem is solved, but quite perplex
about where the *real* problem
is lying..
One note about 7.00.03 : Seems to work well, but is this version not
prone to locking too much ?? I tried a raw copy between 2 san volumes
(using dd) and the machine was very unresponsive ; That is , during
transfer, the machine sems to hang , then after data is copied, ths
machines start to respond again. This is not something i've seen on the
other drivers.
The machine is a Bi Xeon 2.8ghz with 2 gig of ram...
> http://connection.qlogic.com/support/report/index.asp?id=csg
>
>
>
Thanks for this url & Thanks for the answer.
Sincerely yours,
--
Yann Dupont, Cri de l'université de Nantes
Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: puzzling scsi return code 20000
2004-07-07 10:04 ` Yann Dupont
@ 2004-07-09 9:21 ` Yann Dupont
2004-07-09 14:57 ` Andrew Vasquez
0 siblings, 1 reply; 6+ messages in thread
From: Yann Dupont @ 2004-07-09 9:21 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-scsi
Yann Dupont wrote:
>
>
> 2.4.27-rc3 + dm 0.19 + qla 7.00.03 = No errors !!!
>
just to say that 2.4.27-rc3 + dm 0.19 + qla 6.06.64 is ok too.
So maybe it's time to suscribe to dm-devel list :) As it really seems
the problem was here.
Anyway last thing on this subject - Maybe this list is not the good
place for this ,
and I should directly try qlogic support ...
The 7.00.03 qlogic driver seems okay when you read , but when you write
large chunks of data,
the server hangs unless all data is flushed to disks. Is there someone
else seing this behaviour ??
Going back to 6.06.64 fixed the problem.
Any thoughts ?
--
Yann Dupont, Cri de l'université de Nantes
Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: puzzling scsi return code 20000
2004-07-09 9:21 ` Yann Dupont
@ 2004-07-09 14:57 ` Andrew Vasquez
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Vasquez @ 2004-07-09 14:57 UTC (permalink / raw)
To: Yann Dupont; +Cc: linux-scsi
On Fri, 09 Jul 2004, Yann Dupont wrote:
>
> Anyway last thing on this subject - Maybe this list is not the good
> place for this , and I should directly try qlogic support ...
>
> The 7.00.03 qlogic driver seems okay when you read , but when you
> write large chunks of data, the server hangs unless all data is
> flushed to disks. Is there someone else seing this behaviour ??
>
Please formulate and post your results to QLogic tech-support so that
we can get a clearer picture on this observation.
Regards,
Andrew Vasquez
QLogic Corporation
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: puzzling scsi return code 20000
2004-07-06 18:46 ` Andrew Vasquez
2004-07-07 10:04 ` Yann Dupont
@ 2004-07-29 16:11 ` Yann Dupont
1 sibling, 0 replies; 6+ messages in thread
From: Yann Dupont @ 2004-07-29 16:11 UTC (permalink / raw)
To: Andrew Vasquez, linux-scsi
Andrew Vasquez wrote:
>On Mon, 05 Jul 2004, Yann Dupont wrote:
>
>
>
>>hello. I have some problems with a SAN here. Don't know if the problems
>>lies in scsi code/driver or other thing...
>>
>>
Ok, final words .
My problem is solved. In fact my problemS ARE solved:
Thanks a lot to andrew vasquez who took time to explain me lots of
things, and the invaluable tips he gave me to put
the driver in debug mode. It has explained lots, lots of things.
So . In fact, I had 2 distincts problems.
1 FC HBA wich was dying (lots of underrun status when the driver is in
debug mode)
1 SFP (Gbic) which was making problems on the 2nd channel of a Raid.
That's why I was seeing SCSI errors on 2 hosts. In fact , the problems
were not related at all :-(. Took me time
to figure that !
--
Yann Dupont, Cri de l'université de Nantes
Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-07-29 16:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-05 12:07 puzzling scsi return code 20000 Yann Dupont
2004-07-06 18:46 ` Andrew Vasquez
2004-07-07 10:04 ` Yann Dupont
2004-07-09 9:21 ` Yann Dupont
2004-07-09 14:57 ` Andrew Vasquez
2004-07-29 16:11 ` Yann Dupont
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox