From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luben Tuikov Subject: Re: [PATCH]: Flexible timeout infrastructure Date: Wed, 16 Jun 2004 11:48:39 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <40D06BD7.1050605@adaptec.com> References: <40CF0F9F.4050902@adaptec.com><1087313492.1796.37.camel@mulgrave> <40CF4A15.9060005@adaptec.com><1087329285.2048.94.camel@mulgrave> <20040616152758.GB4288@us.ibm.com> <1087400228.1747.16.camel@mulgrave> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from magic.adaptec.com ([216.52.22.17]:40411 "EHLO magic.adaptec.com") by vger.kernel.org with ESMTP id S264061AbUFPPtC (ORCPT ); Wed, 16 Jun 2004 11:49:02 -0400 In-Reply-To: <1087400228.1747.16.camel@mulgrave> List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: Mike Anderson , SCSI Mailing List James Bottomley wrote: > On Wed, 2004-06-16 at 10:27, Mike Anderson wrote: > > Does this mean scsi_times_out will complete the command by calling a > > SCSI mid layer internal form of the scsi_done function (less the > > scsi_delete_timer call) or that the LLDD will call scsi_done and we will > > need to modify scsi_done to accept these no timer running cases. > > Yes. We'll just abstract all of scsi_done() bar the timer check into > __scsi_done, which will be private, and called in this instance. So, now, there will be a 2nd, "fuzzy" way of returning a command back to SCSI Core: a) LLDD calls scsi_done() when all went well, an antagonist to the one and only queuecommand(), XOR b) command timed out, LLDD's eh_cmd_timed_out() was called and returned EH_HANDLED, and then _SCSI_Core_ calls __scsi_done(). I.e. in b) the LLDD _never_ gets to call scsi_done() (or a completion method) on that command. Anyway, do we have a patch for *this* solution? > > > > > > c. I need more time, reset the timer and notify me again when it > fails. > > > > > > For (c), I propose that we use the same timeout period, but increment > > > the retry count (and do this up to allowed retries plus one [so that > > > no-retry commands have one crack at being recovered by the LLD]) when > > > retries are exhausted, normal error handling would proceed on timer > > > expiry leading to certain failure of the command since it would be > > > ineligible to be retried. > > > > The comment on the no-retry commands appears counter to the intent of > > FASTFAIL. On a multi-ported device if there really is a port / > controller > > issue we have increased the failover time 2x the timeout value which > > IIRC was one case that FASTFAIL wished to address. > > Well ... perhaps the solution's to shorten the timers then for this > case? -- Luben