From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Scott M. Ferris" Subject: Re: Request for review of Linux iSCSI driver version 4.0.0.1 Date: Mon, 1 Dec 2003 11:19:15 -0600 (CST) Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20031201171915.0F9DB5DC7E@bambi.visi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from conn.mc.mpls.visi.com ([208.42.156.2]:20434 "EHLO conn.mc.mpls.visi.com") by vger.kernel.org with ESMTP id S263887AbTLARTR (ORCPT ); Mon, 1 Dec 2003 12:19:17 -0500 In-Reply-To: from Roman Zippel at "Dec 1, 2003 10:22:36 am" List-Id: linux-scsi@vger.kernel.org To: Roman Zippel Cc: Naveen Burmi , hch@infradead.org, linux-scsi@vger.kernel.org Roman Zippel wrote: > Hi, > > On Mon, 1 Dec 2003, Naveen Burmi wrote: > > > In linux kernel there is a rare occurrence of data corruption. This data can > > be buffer cache data as well as raw I/O data. Don't know the actual root > > cause of the problem, but the fix that we put under macro > > "PREVENT_DATA_CORRUPTION" is capable of resolving this problem. > > You probably mean the page cache and it's normal that a page can be > modified, while it's written out. > > > On iSCSI target, 5428-2, upon detecting the CRC data error by the QlogicFC > > card it generates the sense data with sense key as HARDWARE ERROR > > and Additional sense data as LOGICAL UNIT COMMUNICATION FAILURE. > > Upon receving this sense data iSCSI initiator driver is not retrying this > > command and failing the command to the upper layers of SCSI subsystem. > > But linux SCSI subsystem does not perform any error recovery for this kind of > > sense data. > > The answer from the target is weird, why does it react with a scsi error > to an iSCSI transport problem? I don't think he's describing an iSCSI transport problem. It's not the iSCSI CRC that was bad. A Fibre Channel HBA in an iSCSI<->FCP gateway detected a Fibre Channel CRC error, and the gateway decided to report it as a HARDWARE ERROR to the iSCSI initiator. Linux doesn't retry HARDWARE ERRORs, so they have a problem. I think that's a bad choice of sense key by the gateway. COMMAND ABORTED would be a better choice if the gateway wants the initiator to retry the command. 0B/47xx and 0B/48xx are commonly used to report CRC and parity errors. Fixing the gateway seems like the best solution to me. Copying the data is not a good solution to anything. If they think there's a bug in the TCP stack that is corrupting the data, they ought to go fix the TCP stack. If the TCP stack isn't corrupting the data, then copying the data is a waste of time, and puts memory allocation in the I/O path, which is really not a good idea. -- Scott M. Ferris, sferris@acm.org