Re: [PATCH 1/5] Fix to cleanup aborted IO to avoid device being offlined by mid-layer

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: "Ewan D. Milne" <emilne@redhat.com>
To: "Satish Kharat (satishkh)" <satishkh@cisco.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH 1/5] Fix to cleanup aborted IO to avoid device being offlined by mid-layer
Date: Thu, 17 Mar 2016 13:51:35 -0400	[thread overview]
Message-ID: <1458237095.17965.16.camel@localhost.localdomain> (raw)
In-Reply-To: <1458167903098.91502@cisco.com>

On Wed, 2016-03-16 at 22:38 +0000, Satish Kharat (satishkh) wrote:
> Thanks for the review comments. I incorporated the comments, below is the updated patch: 
> 
> 
> If an I/O times out and an abort issued by host, if the abort is
> successful we need to set scsi status as DID_ABORT. Or else the
> mid-layer error handler which looks for this error code, will
> offline the device. Also if the original I/O is not found in fnic
> firmware, we will consider the abort as successful.
> The start_time assignment is moved because of the new goto.
> Fnic driver version changed from 1.6.0.17a to 1.6.0.19,
> version 1.6.0.18 has been skipped
> 
> Signed-off-by: Satish Kharat <satishkh@cisco.com>
> ---
>  drivers/scsi/fnic/fnic.h      |  2 +-
>  drivers/scsi/fnic/fnic_scsi.c | 35 +++++++++++++++++++++++++++++------
>  2 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/fnic/fnic.h b/drivers/scsi/fnic/fnic.h
> index ce129e5..52a53f8 100644
> --- a/drivers/scsi/fnic/fnic.h
> +++ b/drivers/scsi/fnic/fnic.h
> @@ -39,7 +39,7 @@
>  
>  #define DRV_NAME		"fnic"
>  #define DRV_DESCRIPTION		"Cisco FCoE HBA Driver"
> -#define DRV_VERSION		"1.6.0.17a"
> +#define DRV_VERSION		"1.6.0.19"
>  #define PFX			DRV_NAME ": "
>  #define DFX                     DRV_NAME "%d: "
>  
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index 266b909..b732fa3 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
> @@ -1092,6 +1092,11 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic *fnic,
>  				atomic64_inc(
>  					&term_stats->terminate_fw_timeouts);
>  			break;
> +		case FCPIO_ITMF_REJECTED:
> +			FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host,
> +				"abort reject recd. id %d\n",
> +				(int)(id & FNIC_TAG_MASK));
> +			break;
>  		case FCPIO_IO_NOT_FOUND:
>  			if (CMD_FLAGS(sc) & FNIC_IO_ABTS_ISSUED)
>  				atomic64_inc(&abts_stats->abort_io_not_found);
> @@ -1112,9 +1117,15 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic *fnic,
>  			spin_unlock_irqrestore(io_lock, flags);
>  			return;
>  		}
> -		CMD_ABTS_STATUS(sc) = hdr_status;
> +
>  		CMD_FLAGS(sc) |= FNIC_IO_ABT_TERM_DONE;
>  
> +		/* If the status is IO not found consider it as success */
> +		if (hdr_status == FCPIO_IO_NOT_FOUND)
> +			CMD_ABTS_STATUS(sc) = FCPIO_SUCCESS;
> +		else
> +			CMD_ABTS_STATUS(sc) = hdr_status;
> +
>  		atomic64_dec(&fnic_stats->io_stats.active_ios);
>  		if (atomic64_read(&fnic->io_cmpl_skip))
>  			atomic64_dec(&fnic->io_cmpl_skip);
> @@ -1927,21 +1938,33 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
>  
>  	CMD_STATE(sc) = FNIC_IOREQ_ABTS_COMPLETE;
>  
> +	start_time = io_req->start_time;
>  	/*
>  	 * firmware completed the abort, check the status,
> -	 * free the io_req irrespective of failure or success
> +	 * free the io_req if successful. If abort fails,
> +	 * Device reset will clean the I/O.
>  	 */
> -	if (CMD_ABTS_STATUS(sc) != FCPIO_SUCCESS)
> +	if (CMD_ABTS_STATUS(sc) == FCPIO_SUCCESS) {
> +		CMD_SP(sc) = NULL;
> +	}
> +	else
> +	{
>  		ret = FAILED;
> -
> -	CMD_SP(sc) = NULL;
> +		spin_unlock_irqrestore(io_lock, flags);
> +		goto fnic_abort_cmd_end;
> +	}
>  
>  	spin_unlock_irqrestore(io_lock, flags);
>  
> -	start_time = io_req->start_time;
>  	fnic_release_ioreq_buf(fnic, io_req, sc);
>  	mempool_free(io_req, fnic->io_req_pool);
>  
> +	if (sc->scsi_done) {
> +	/* Call SCSI completion function to complete the IO */
> +		sc->result = (DID_ABORT << 16);
> +		sc->scsi_done(sc);
> +	}
> +
>  fnic_abort_cmd_end:
>  	FNIC_TRACE(fnic_abort_cmd, sc->device->host->host_no,
>  		  sc->request->tag, sc,
> -- 
> 2.4.3

Reviewed-by: Ewan D. Milne <emilne@redhat.com>

> 
> ________________________________________
> From: Ewan Milne <emilne@redhat.com>
> Sent: Monday, March 14, 2016 12:59 PM
> To: Satish Kharat (satishkh)
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: [PATCH 1/5] Fix to cleanup aborted IO to avoid device being offlined by mid-layer
> 
> On Mon, 2016-03-14 at 11:14 -0700, Satish Kharat wrote:
> > If an I/O times out and an abort issued by host, if the abort is
> > successful we need to set scsi status as DID_ABORT. Or else the
> > mid-layer error handler which looks for this error code, will
> > offline the device. Also if the original I/O is not found in fnic
> > firmware, we will consider the abort as successful.
> > Fnic driver version changed from 1.6.0.17a to 1.6.0.19,
> > version 1.6.0.18 has been skipped
> >
> > Signed-off-by: Satish Kharat <satishkh@cisco.com>
> > ---
> >  drivers/scsi/fnic/fnic.h      |  2 +-
> >  drivers/scsi/fnic/fnic_scsi.c | 32 +++++++++++++++++++++++++++-----
> >  2 files changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/scsi/fnic/fnic.h b/drivers/scsi/fnic/fnic.h
> > index ce129e5..52a53f8 100644
> > --- a/drivers/scsi/fnic/fnic.h
> > +++ b/drivers/scsi/fnic/fnic.h
> > @@ -39,7 +39,7 @@
> >
> >  #define DRV_NAME             "fnic"
> >  #define DRV_DESCRIPTION              "Cisco FCoE HBA Driver"
> > -#define DRV_VERSION          "1.6.0.17a"
> > +#define DRV_VERSION          "1.6.0.19"
> >  #define PFX                  DRV_NAME ": "
> >  #define DFX                     DRV_NAME "%d: "
> >
> > diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> > index 266b909..b9732b1 100644
> > --- a/drivers/scsi/fnic/fnic_scsi.c
> > +++ b/drivers/scsi/fnic/fnic_scsi.c
> > @@ -1092,6 +1092,11 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic *fnic,
> >                               atomic64_inc(
> >                                       &term_stats->terminate_fw_timeouts);
> >                       break;
> > +             case FCPIO_ITMF_REJECTED:
> > +                     FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host,
> > +                             "abort reject recd. id %d\n",
> > +                             (int)(id & FNIC_TAG_MASK));
> > +                     break;
> >               case FCPIO_IO_NOT_FOUND:
> >                       if (CMD_FLAGS(sc) & FNIC_IO_ABTS_ISSUED)
> >                               atomic64_inc(&abts_stats->abort_io_not_found);
> > @@ -1112,9 +1117,14 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic *fnic,
> >                       spin_unlock_irqrestore(io_lock, flags);
> >                       return;
> >               }
> > +
> >               CMD_ABTS_STATUS(sc) = hdr_status;
> >               CMD_FLAGS(sc) |= FNIC_IO_ABT_TERM_DONE;
> >
> > +             /* If the status is IO not found consider it as success */
> > +             if (hdr_status == FCPIO_IO_NOT_FOUND)
> > +                     CMD_ABTS_STATUS(sc) = FCPIO_SUCCESS;
> > +
> 
> Minor nitpick:  I would have done:
> 
>                 if (hdr_status == FCP_IO_NOT_FOUND)
>                         CMD_ABTS_STATUS(sc) = FCPIO_SUCCESS;
>                 else
>                         CMD_ABTS_STATUS(sc) = hdr_status;
> 
> rather than overwriting the assignment made several lines above, it is
> more explicit and less prone to errors if the code is later changed.
> 
> >               atomic64_dec(&fnic_stats->io_stats.active_ios);
> >               if (atomic64_read(&fnic->io_cmpl_skip))
> >                       atomic64_dec(&fnic->io_cmpl_skip);
> > @@ -1927,21 +1937,33 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
> >
> >       CMD_STATE(sc) = FNIC_IOREQ_ABTS_COMPLETE;
> >
> > +     start_time = io_req->start_time;
> >       /*
> >        * firmware completed the abort, check the status,
> > -      * free the io_req irrespective of failure or success
> > +      * free the io_req if successful. If abort fails,
> > +      * Device reset will clean the I/O.
> >        */
> > -     if (CMD_ABTS_STATUS(sc) != FCPIO_SUCCESS)
> > +     if (CMD_ABTS_STATUS(sc) == FCPIO_SUCCESS) {
> > +             CMD_SP(sc) = NULL;
> > +     }
> > +     else
> > +     {
> >               ret = FAILED;
> > -
> > -     CMD_SP(sc) = NULL;
> > +             spin_unlock_irqrestore(io_lock, flags);
> > +             goto fnic_abort_cmd_end;
> > +     }
> >
> >       spin_unlock_irqrestore(io_lock, flags);
> >
> > -     start_time = io_req->start_time;
> 
> What was the point of moving the assignment of start_time?
> Other places in the code still set it just before the call to
> fnic_release_ioreq_buf().
> 
> >       fnic_release_ioreq_buf(fnic, io_req, sc);
> >       mempool_free(io_req, fnic->io_req_pool);
> >
> > +     if (sc->scsi_done) {
> > +     /* Call SCSI completion function to complete the IO */
> > +             sc->result = (DID_ABORT << 16);
> > +             sc->scsi_done(sc);
> > +     }
> > +
> >  fnic_abort_cmd_end:
> >       FNIC_TRACE(fnic_abort_cmd, sc->device->host->host_no,
> >                 sc->request->tag, sc,
> 
> Despite the above comments, I guess this looks OK.
> 
> Reviewed-by: Ewan D. Milne <emilne@redhat.com>
>

     prev parent reply	other threads:[~2016-03-17 17:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-14 18:14 [PATCH 1/5] Fix to cleanup aborted IO to avoid device being offlined by mid-layer Satish Kharat
2016-03-14 18:14 ` [PATCH 2/5] Cleanup the I/O that has timed out and is used to issue LUN reset Satish Kharat
2016-03-14 20:14   ` Ewan Milne
2016-03-15  9:53   ` Christoph Hellwig
2016-03-14 18:14 ` [PATCH 3/5] Using rport->dd_data to check rport online instead of rport_lookup Satish Kharat
2016-03-14 20:15   ` Ewan Milne
2016-03-14 18:14 ` [PATCH 4/5] Setting scsi host template to indicate that fnic does not support multiqueue Satish Kharat
2016-03-14 20:16   ` Ewan Milne
2016-03-15  9:46   ` Johannes Thumshirn
2016-03-15  9:51   ` Christoph Hellwig
2016-03-14 18:14 ` [PATCH 5/5] Leftshift returned scsi_cmnd error code 16 bits Satish Kharat
2016-03-14 20:17   ` Ewan Milne
2016-03-14 19:59 ` [PATCH 1/5] Fix to cleanup aborted IO to avoid device being offlined by mid-layer Ewan Milne
2016-03-16 22:38   ` Satish Kharat (satishkh)
2016-03-17 17:51     ` Ewan D. Milne [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1458237095.17965.16.camel@localhost.localdomain \
    --to=emilne@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=satishkh@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox