From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Date: Wed, 23 Oct 2013 07:46:00 +0000 Message-ID: <1382514359.2081.33.camel@dabdike.quadriga.com> References: <1372661455-122384-1-git-send-email-hare@suse.de> <1372661455-122384-8-git-send-email-hare@suse.de> <1381936290.1864.11.camel@dabdike> <526795FC.6050303@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT Return-path: Received: from mx2.parallels.com ([199.115.105.18]:39779 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751705Ab3JWHqG convert rfc822-to-8bit (ORCPT ); Wed, 23 Oct 2013 03:46:06 -0400 In-Reply-To: <526795FC.6050303@suse.de> Content-Language: en-US Content-ID: <665A6E71E8B32B40A8633757EAAE9A9F@sw.swsoft.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Hannes Reinecke Cc: "linux-scsi@vger.kernel.org" , Ewan Milne , Ren Mingxin , Bart van Assche , Joern Engel On Wed, 2013-10-23 at 11:25 +0200, Hannes Reinecke wrote: > On 10/16/2013 09:22 PM, James Bottomley wrote: > > On Mon, 2013-07-01 at 08:50 +0200, Hannes Reinecke wrote: > >> This patchs adds an 'eh_deadline' sysfs attribute to the scsi > >> host which limits the overall runtime of the SCSI EH. > >> The 'eh_deadline' value is stored in the now obsolete field > >> 'resetting'. > >> When a command is failed the start time of the EH is stored > >> in 'last_reset'. If the overall runtime of the SCSI EH is longer > >> than last_reset + eh_deadline, the EH is short-circuited and > >> falls through to issue a host reset only. > > > > OK, so the specific problem with this one is that potentially it will > > spend all its time mucking about with aborts (which most often time out > > on non FC kit because of the issue problems) and then proceed to host > > reset, which mostly does nothing for failing devices. > > > > If you want to impose a deadline, then we need to spend only 50% of the > > time attempting aborts and the rest of the time escalating the resets. > > > > [...] > >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > >> index f43de1e..84369f2 100644 > >> --- a/drivers/scsi/scsi_error.c > >> +++ b/drivers/scsi/scsi_error.c > >> @@ -89,6 +89,18 @@ void scsi_schedule_eh(struct Scsi_Host *shost) > >> } > >> EXPORT_SYMBOL_GPL(scsi_schedule_eh); > >> > >> +static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) > >> +{ > >> + if (!shost->last_reset || !shost->eh_deadline) > >> + return 0; > >> + > >> + if (time_before(jiffies, > >> + shost->last_reset + shost->eh_deadline)) > >> + return 0; > >> + > >> + return 1; > >> +} > >> + > > > > What about instead: > > > > static int scsi_host_eh_past_deadline(struct Scsi_Host *shost, int percent) { > > if (!shost->last_reset || !shost->eh_deadline) > > return 0; > > > > if (time_before(jiffies, > > shost->last_reset + shost->eh_deadline * percent/100)) > > return 0; > > > > return 1; > > } > > > > which allows us to have > > > > if (scsi_host_eh_past_deadline(shost, 50)) { > > > > in scsi_eh_abort_cmds() > > > > if (scsi_host_eh_past_deadline(shost, 66) { > > > > in scsi_eh_bus_device_reset() > > > > say 83 in target reset, and 100 in bus reset. > > > > Thus ensuring we at least get a crack at the reset chain? > > > Alternatively we could just escalate directly to LUN reset once the > first abort fails. That looks like a cleaner solution here. I'm OK with that ... if you want this in the current merge window, you have about a week to code it up ... James