From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Date: Wed, 23 Oct 2013 11:25:16 +0200 Message-ID: <526795FC.6050303@suse.de> References: <1372661455-122384-1-git-send-email-hare@suse.de> <1372661455-122384-8-git-send-email-hare@suse.de> <1381936290.1864.11.camel@dabdike> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from cantor2.suse.de ([195.135.220.15]:51126 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751111Ab3JWHXU (ORCPT ); Wed, 23 Oct 2013 03:23:20 -0400 In-Reply-To: <1381936290.1864.11.camel@dabdike> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: "linux-scsi@vger.kernel.org" , Ewan Milne , Ren Mingxin , Bart van Assche , Joern Engel On 10/16/2013 09:22 PM, James Bottomley wrote: > On Mon, 2013-07-01 at 08:50 +0200, Hannes Reinecke wrote: >> This patchs adds an 'eh_deadline' sysfs attribute to the scsi >> host which limits the overall runtime of the SCSI EH. >> The 'eh_deadline' value is stored in the now obsolete field >> 'resetting'. >> When a command is failed the start time of the EH is stored >> in 'last_reset'. If the overall runtime of the SCSI EH is longer >> than last_reset + eh_deadline, the EH is short-circuited and >> falls through to issue a host reset only. > > OK, so the specific problem with this one is that potentially it will > spend all its time mucking about with aborts (which most often time o= ut > on non FC kit because of the issue problems) and then proceed to host > reset, which mostly does nothing for failing devices. > > If you want to impose a deadline, then we need to spend only 50% of t= he > time attempting aborts and the rest of the time escalating the resets= =2E > > [...] >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index f43de1e..84369f2 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -89,6 +89,18 @@ void scsi_schedule_eh(struct Scsi_Host *shost) >> } >> EXPORT_SYMBOL_GPL(scsi_schedule_eh); >> >> +static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) >> +{ >> + if (!shost->last_reset || !shost->eh_deadline) >> + return 0; >> + >> + if (time_before(jiffies, >> + shost->last_reset + shost->eh_deadline)) >> + return 0; >> + >> + return 1; >> +} >> + > > What about instead: > > static int scsi_host_eh_past_deadline(struct Scsi_Host *shost, int pe= rcent) { > if (!shost->last_reset || !shost->eh_deadline) > return 0; > > if (time_before(jiffies, > shost->last_reset + shost->eh_deadline * percent/100)) > return 0; > > return 1; > } > > which allows us to have > > if (scsi_host_eh_past_deadline(shost, 50)) { > > in scsi_eh_abort_cmds() > > if (scsi_host_eh_past_deadline(shost, 66) { > > in scsi_eh_bus_device_reset() > > say 83 in target reset, and 100 in bus reset. > > Thus ensuring we at least get a crack at the reset chain? > Alternatively we could just escalate directly to LUN reset once the=20 first abort fails. That looks like a cleaner solution here. Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html