From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ewan Milne Subject: Re: [PATCH 8/8] scsi_error: Escalate to LUN reset if abort fails Date: Wed, 23 Oct 2013 09:53:55 -0400 Message-ID: <1382536435.3812.156.camel@localhost.localdomain> References: <1382518282-107606-1-git-send-email-hare@suse.de> <1382518282-107606-9-git-send-email-hare@suse.de> Reply-To: emilne@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:8929 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750820Ab3JWNyl (ORCPT ); Wed, 23 Oct 2013 09:54:41 -0400 In-Reply-To: <1382518282-107606-9-git-send-email-hare@suse.de> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Hannes Reinecke Cc: James Bottomley , linux-scsi@vger.kernel.org, Ren Mingxin On Wed, 2013-10-23 at 10:51 +0200, Hannes Reinecke wrote: > If a command abort fails there is a fair chance that all other > aborts will be failing, too. > So we should be calling LUN reset directly after the first failed > abort and skip aborting the remaining commands. > > Signed-off-by: Hannes Reinecke > --- > drivers/scsi/scsi_error.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index edae9e2..e8bee9f 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -1192,18 +1192,20 @@ static int scsi_eh_abort_cmds(struct list_head *work_q, > "0x%p\n", current->comm, > scmd)); > rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); > - if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { > - scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; > - if (rtn == FAST_IO_FAIL) > - scsi_eh_finish_cmd(scmd, done_q); > - else > - list_move_tail(&scmd->eh_entry, &check_list); > - } else > + if (rtn == FAILED) { > SCSI_LOG_ERROR_RECOVERY(3, printk("%s: aborting" > " cmd failed:" > "0x%p\n", > current->comm, > scmd)); > + list_splice_init(&check_list, work_q); > + return list_empty(work_q); > + } > + scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; > + if (rtn == FAST_IO_FAIL) > + scsi_eh_finish_cmd(scmd, done_q); > + else > + list_move_tail(&scmd->eh_entry, &check_list); > } > > return scsi_eh_test_devices(&check_list, work_q, done_q, 0); This looks like it should be OK, assuming that we do in fact want to stop issuing aborts once a call to ->eh_abort_handler() returns FAILED. I guess it depends upon whether drivers might occasionally return FAILED due to some temporary condition as opposed to e.g. loss of link. (I've been having a discussion with the vmw_pvscsi driver maintainer regarding their implementation of ->eh_abort_handler(), which might have to change to return FAILED if the hypervisor side doesn't respond to the abort request fast enough, instead of returning SUCCESS and then later completing the request when the abort actually completes. So I think that a temporary condition that results in returning FAILED is possible...) Note that this portion of the patch set would change the behavior regardless of whether eh_deadline was enabled. -Ewan