From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yann Dupont Subject: Re: puzzling scsi return code 20000 Date: Wed, 07 Jul 2004 12:04:11 +0200 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <40EBCA9B.90402@univ-nantes.fr> References: <40E9448F.1090700@univ-nantes.fr> <20040706184626.GA30970@praka.san.rr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from feyd.cri.univ-nantes.fr ([193.52.125.55]:39324 "EHLO smtp.cri.univ-nantes.fr") by vger.kernel.org with ESMTP id S265031AbUGGKEL (ORCPT ); Wed, 7 Jul 2004 06:04:11 -0400 Received: from localhost (localhost [127.0.0.1]) by smtp.cri.univ-nantes.fr (Postfix) with ESMTP id 9E9CBE0B745 for ; Wed, 7 Jul 2004 12:04:10 +0200 (CEST) Received: from smtp.cri.univ-nantes.fr ([127.0.0.1]) by localhost (smtp.cri.univ-nantes.fr [193.52.125.55]) (amavisd-new, port 10024) with LMTP id 16446-02 for ; Wed, 7 Jul 2004 12:04:09 +0200 (CEST) Received: from [193.52.125.24] (lamier.cri.univ-nantes.fr [193.52.125.24]) by smtp.cri.univ-nantes.fr (Postfix) with ESMTP id BEF01EB181F for ; Wed, 7 Jul 2004 12:04:09 +0200 (CEST) In-Reply-To: <20040706184626.GA30970@praka.san.rr.com> List-Id: linux-scsi@vger.kernel.org To: linux-scsi@vger.kernel.org Andrew Vasquez wrote: >On Mon, 05 Jul 2004, Yann Dupont wrote: > > =20 > Hello andrew, thanks for taking time to respond >>hello. I have some problems with a SAN here. Don't know if the proble= ms=20 >>lies in scsi code/driver or other thing... >> >>The setup is : >>a QLA2300 (Fibre channel) with 6.0.64 driver from qlogic site, >> >> =20 >> > >I'm not familiar with that version number (are you sure it's 6.0.64, >perhaps 6.06.64?). I believe the latest driver submitted to DVT has > =20 > of course it's 6.06.64, sorry for the mistake. this is the 'IBM approved' version >been posted to the website (7.00.03). > > =20 > Yes i've seen that . But I had problems with that version (see later) >>2.4.26 kernel, device mapper 1.0.17 applied, >>and xfs or reiserfs on top of lvm volumes (using evms) >> >>This setup seemed fine since some months now, with giga of data moved= =20 >>every day, but I noticed this on the log: >> >>Jun 22 16:45:15 talisker kernel: SCSI disk error : host 3 channel 0 i= d 2=20 >>lun 1 return code =3D 20000 >>Jun 22 16:45:15 talisker kernel: I/O error: dev 08:10, sector 145423= 312 >> >>I have two of these errors in the last month. What's the meaning of t= his=20 >>return code ? I made >>badblock on the /dev/evms volume, and all is OK. >> >> =20 >> > >20000 =3D=3D DID_BUS_BUSY -- one possibility, the QLogic driver return= ed a >command with the status, mid-layer ran out of retries, subsequently >returning the status to the upper-layer driver. But that doesn't >help in finding out what went wrong. > > =20 > >>I also performed a disk scan on the raid array, (an IFT 7250F) and th= ere=20 >>are no errors. >> >>The machine can be quite heavily loaded. May I suspect the qla driver= ?=20 >>May I suspect the underlying LVM. The LVM has been resized by the=20 >>past... Can an error happens here too ? >>I'm very puzzled... >> >>This is just a summary . I can furnish more information if needed. >> >> =20 >> > >I'd suggest you try a more recent driver (noted above), if the I/O >errors persist, then report the problem to tech-support at: > > =20 > Well the 7.00.03 version didn't see any of the disks on the second channels of the controllers, until I realise this is a failover by default. Appending ql2xfailover=3D= 0 did the trick. Anyway, Yesterday the problem begins to show every second on another machine, on a volume with lots of data... So I took lots of time to see where the problem is.. Summary : My problem is solved since this morning but i'm not quite sur= e qlogic driver is involved: Quick resume : 2.4.26 + dm 0.17 + qla 6.06.64 =3D scsi error 2.4.26 + dm 0.17 + qla 7.00.03 =3D hang (no errors, but all scsi/qla operations just hang) then (after long time) scsi errors -> upgrade the san RAID to the very last firmware, same kernels as above : same errors 2.6.7 (with embedded dm & qla 8.xx.x) =3D scsi error too (tried that because I know lots of work has been done on the scsi, qla & dm layer) 2.4.27-rc3 + dm 0.19 + qla 7.00.03 =3D No errors !!! As my problem has only arised on lvm volumes that has been resized by evms, and the only difference between failing & non failing operation is the device mapper version, I begins to wonder if the culprit is not here. What suprises me anyway is that it's mapped to scsi error... Anyway, quite pleased that the problem is solved, but quite perplex about where the *real* problem is lying.. One note about 7.00.03 : Seems to work well, but is this version not prone to locking too much ?? I tried a raw copy between 2 san volumes (using dd) and the machine was very unresponsive ; That is , during transfer, the machine sems to hang , then after data is copied, ths machines start to respond again. This is not something i've seen on the other drivers. The machine is a Bi Xeon 2.8ghz with 2 gig of ram... > http://connection.qlogic.com/support/report/index.asp?id=3Dcsg > > =20 > Thanks for this url & Thanks for the answer. Sincerely yours, --=20 Yann Dupont, Cri de l'universit=E9 de Nantes Tel: 02.51.12.53.91 - Fax: 02.51.12.58.60 - Yann.Dupont@univ-nantes.fr - To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html