From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick Mansfield Subject: Re: Mid-layer handling of NOT_READY conditions... Date: Mon, 31 Jan 2005 09:36:29 -0800 Message-ID: <20050131173629.GA29928@us.ibm.com> References: <0B1E13B586976742A7599D71A6AC733C02F326@xbl3.ma.emulex.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:2471 "EHLO e2.ny.us.ibm.com") by vger.kernel.org with ESMTP id S261286AbVAaRhR (ORCPT ); Mon, 31 Jan 2005 12:37:17 -0500 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.10/8.12.10) with ESMTP id j0VHbH7F004825 for ; Mon, 31 Jan 2005 12:37:17 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j0VHbHEF170630 for ; Mon, 31 Jan 2005 12:37:17 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j0VHbGKN021867 for ; Mon, 31 Jan 2005 12:37:17 -0500 Content-Disposition: inline In-Reply-To: <0B1E13B586976742A7599D71A6AC733C02F326@xbl3.ma.emulex.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James.Smart@Emulex.Com Cc: andrew.vasquez@qlogic.com, James.Bottomley@SteelEye.com, linux-scsi@vger.kernel.org On Mon, Jan 31, 2005 at 11:56:02AM -0500, James.Smart@Emulex.Com wrote: > > On Sat, 2005-01-29 at 11:34 -0800, Patrick Mansfield wrote: > > > > > > Why not just set scmd->retries to zero in scsi_requeue_command()? > > > > > > > This is exactly what I was thinking would be a fairly straight-forward > > approach at solving the problem... > > This is ultimately a hack, and raises the potential for the retries value > to perpetually be rezero'd. The better solution is the use the block > primitives available to avoid the i/o being issued at all if the transport > can't handle it. No, it does not change the potential to retry forever, someone still has to requeue the IO again outside of the NEEDS_RETRY/scsi_retry_command case for that to happen. We only check retries in scsi_decide_disposition (well not counting error handling), and if we hit the limit, return SUCCESS. The change is that we reset retries to zero if the command is *not* retried via NEEDS_RETRY/scsi_retry_command. It would be even clearer to zero retries in scsi_decide_disposition. For NOT_READY, we would be better off always using the scsi_requeue_command path ever: get rid of the check in scsi_check_sense, as it will be requeued via scsi_io_completion code. This would have to happen even if delaying retries to NOT_READY devices. But yes, it is better to stop IO if the transport can't handle it, and would likely avoid the problem (if we only got NOT_READY's and never returned DID_BUS_BUSY). But it is still a bug to not reset retries. Maybe I need to hack scsi_debug to demonstrate the problem ... -- Patrick Mansfield