From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Darrick J. Wong" Subject: Re: [PATCH 00/12] Roll-up of sas_ata patches Date: Sun, 04 Feb 2007 01:21:05 -0800 Message-ID: <45C5A581.1070504@us.ibm.com> References: <200701300915.l0U9FuPu010198@d01av04.pok.ibm.com> <1170541934.3345.69.camel@mulgrave.il.steeleye.com> Reply-To: "Darrick J. Wong" Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from e1.ny.us.ibm.com ([32.97.182.141]:35225 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752192AbXBDJVL (ORCPT ); Sun, 4 Feb 2007 04:21:11 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l149LAGk019467 for ; Sun, 4 Feb 2007 04:21:10 -0500 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.2) with ESMTP id l149L9mS294252 for ; Sun, 4 Feb 2007 04:21:09 -0500 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l149L9et009226 for ; Sun, 4 Feb 2007 04:21:09 -0500 In-Reply-To: <1170541934.3345.69.camel@mulgrave.il.steeleye.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: linux-scsi , Alexis Bruemmer James Bottomley wrote: > There's a problem somewhere with your error handler changes (which I > picked up thanks to the problems with the V28 firmware). What I see > without your changes is that for a directly attached SATA device, when > the firmware begins its death spiral, the commands all return and > eventually send I/O errors to the filesystem, With your patch series > applied, it just loops forever giving messages like: > > Feb 3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR > Feb 3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout > Feb 3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > Feb 3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host Interesting, since the opposite happens with SAS disks. :) The infinite loop is usually what happens if a scsi_cmnd gets pulled off the eh queue without being scsi_eh_finish_cmnd()'d. Can you send me the whole dmesg? It's possible that we're trying to abort a command, which of course fails for a SATA disk, so we try bigger and bigger hammers.... and the big hammers don't call scsi-eh-finish-cmd. Did these SATA link reset errors only start showing up after the v28 firmware patch, or has this always happened? I've noticed lately that I get link reset errors if I run a short exercise on an ext3 filesystem on a SATA disk, yet dd exercise runs just fine. But I had also thought that it was just my flaky hardware. :) --D