Re: [PATCH 3/4] scsi: improved eh timeout handler

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Jörn Engel" <joern@logfs.org>
To: Hannes Reinecke <hare@suse.de>
Cc: James Bottomley <jbottomley@parallels.com>,
	linux-scsi@vger.kernel.org, Ewan Milne <emilne@redhat.com>,
	James Smart <james.smart@emulex.com>,
	Ren Mingxin <renmx@cn.fujitsu.com>,
	Roland Dreier <roland@purestorage.com>,
	Bryn Reeves <bmr@redhat.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH 3/4] scsi: improved eh timeout handler
Date: Thu, 6 Jun 2013 12:23:20 -0400	[thread overview]
Message-ID: <20130606162320.GC16616@logfs.org> (raw)
In-Reply-To: <1370511835-50072-4-git-send-email-hare@suse.de>

On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote:
> 
> When a command runs into a timeout we need to send an 'ABORT TASK'
> TMF. This is typically done by the 'eh_abort_handler' LLDD callback.
> 
> Conceptually, however, this function is a normal SCSI command, so
> there is no need to enter the error handler.
> 
> This patch implements a new scsi_abort_command() function which
> invokes an asynchronous function scsi_eh_abort_handler() to
> abort the commands via 'eh_abort_handler'.
> 
> If the 'eh_abort_handler' returns SUCCESS or FAST_IO_FAIL the
> command will be retried if possible. If no retries are allowed
> the command will be returned immediately, as we have to assume
> the TMF succeeded and the command is completed with the LLDD.
> For any other return code from 'eh_abort_handler' the command
> will be pushed onto the existing SCSI EH handler, or aborted
> with DID_TIME_OUT if that fails.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/scsi_error.c        | 79 ++++++++++++++++++++++++++++++++++++++++
>  drivers/scsi/scsi_scan.c         |  3 ++
>  drivers/scsi/scsi_transport_fc.c |  3 +-
>  include/scsi/scsi_cmnd.h         |  1 +
>  include/scsi/scsi_device.h       |  2 +
>  5 files changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 96b4bb6..0a6b21c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -55,6 +55,8 @@ static void scsi_eh_done(struct scsi_cmnd *scmd);
>  #define HOST_RESET_SETTLE_TIME  (10)
>  
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
> +static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
> +				 struct scsi_cmnd *scmd);
>  
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -90,6 +92,83 @@ void scsi_schedule_eh(struct Scsi_Host *shost)
>  EXPORT_SYMBOL_GPL(scsi_schedule_eh);
>  
>  /**
> + * scsi_eh_abort_handler - Handle command aborts
> + * @work:	sdev on which commands should be aborted.
> + */
> +void
> +scsi_eh_abort_handler(struct work_struct *work)
> +{
> +	struct scsi_device *sdev =
> +		container_of(work, struct scsi_device, abort_work);
> +	struct Scsi_Host *shost = sdev->host;
> +	struct scsi_cmnd *scmd, *tmp;
> +	unsigned long flags;
> +	int rtn;
> +
> +	spin_lock_irqsave(&sdev->list_lock, flags);
> +	list_for_each_entry_safe(scmd, tmp, &sdev->eh_abort_list, eh_entry) {
> +		list_del_init(&scmd->eh_entry);

The _init bit is not needed.  I prefer list_del, as the poisoning
sometimes helps catching bugs.

> +		spin_unlock_irqrestore(&sdev->list_lock, flags);
> +		SCSI_LOG_ERROR_RECOVERY(3,
> +			scmd_printk(KERN_INFO, scmd,
> +				    "aborting command %p\n", scmd));
> +		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
> +		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
> +			if (((scmd->request->cmd_flags & REQ_FAILFAST_DEV) ||

Am I being stupid again or should this be negated?

> +			     (scmd->request->cmd_type == REQ_TYPE_BLOCK_PC)) &&
> +			    (++scmd->retries <= scmd->allowed)) {
> +				SCSI_LOG_ERROR_RECOVERY(3,
> +					scmd_printk(KERN_WARNING, scmd,
> +						    "retry aborted command\n"));
> +
> +				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
> +			} else {
> +				SCSI_LOG_ERROR_RECOVERY(3,
> +					scmd_printk(KERN_WARNING, scmd,
> +						    "fast fail aborted command\n"));
> +				scmd->result |= DID_TRANSPORT_FAILFAST << 16;
> +				scsi_finish_command(scmd);
> +			}
> +		} else {
> +			if (!scsi_eh_scmd_add(scmd, 0)) {
> +				SCSI_LOG_ERROR_RECOVERY(3,
> +					scmd_printk(KERN_WARNING, scmd,
> +						    "terminate aborted command\n"));
> +				scmd->result |= DID_TIME_OUT << 16;
> +				scsi_finish_command(scmd);
> +			}
> +		}
> +		spin_lock_irqsave(&sdev->list_lock, flags);
> +	}
> +	spin_unlock_irqrestore(&sdev->list_lock, flags);
> +}
> +
> +/**
> + * scsi_abort_command - schedule a command abort
> + * @scmd:	scmd to abort.
> + *
> + * We only need to abort commands after a command timeout
> + */
> +void
> +scsi_abort_command(struct scsi_cmnd *scmd)
> +{
> +	unsigned long flags;
> +	int kick_worker = 0;
> +	struct scsi_device *sdev = scmd->device;
> +
> +	spin_lock_irqsave(&sdev->list_lock, flags);
> +	if (list_empty(&sdev->eh_abort_list))
> +		kick_worker = 1;
> +	list_add(&scmd->eh_entry, &sdev->eh_abort_list);
> +	SCSI_LOG_ERROR_RECOVERY(3,
> +		scmd_printk(KERN_INFO, scmd, "adding to eh_abort_list\n"));

The printk can be moved outside the spinlock.  Who knows, maybe this
will become a scalability bottleneck before the millenium is over.

> +	spin_unlock_irqrestore(&sdev->list_lock, flags);
> +	if (kick_worker)
> +		schedule_work(&sdev->abort_work);
> +}
> +EXPORT_SYMBOL_GPL(scsi_abort_command);
> +
> +/**
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:	scmd to run eh on.
>   * @eh_flag:	optional SCSI_EH flag.
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 3e58b22..f9cc6fc 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -231,6 +231,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>  	extern void scsi_evt_thread(struct work_struct *work);
>  	extern void scsi_requeue_run_queue(struct work_struct *work);
> +	extern void scsi_eh_abort_handler(struct work_struct *work);

Function declarations in a .c file?  Ick!

>  
>  	sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
>  		       GFP_ATOMIC);
> @@ -251,9 +252,11 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>  	INIT_LIST_HEAD(&sdev->cmd_list);
>  	INIT_LIST_HEAD(&sdev->starved_entry);
>  	INIT_LIST_HEAD(&sdev->event_list);
> +	INIT_LIST_HEAD(&sdev->eh_abort_list);
>  	spin_lock_init(&sdev->list_lock);
>  	INIT_WORK(&sdev->event_work, scsi_evt_thread);
>  	INIT_WORK(&sdev->requeue_work, scsi_requeue_run_queue);
> +	INIT_WORK(&sdev->abort_work, scsi_eh_abort_handler);
>  
>  	sdev->sdev_gendev.parent = get_device(&starget->dev);
>  	sdev->sdev_target = starget;
> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> index e106c27..09237bf 100644
> --- a/drivers/scsi/scsi_transport_fc.c
> +++ b/drivers/scsi/scsi_transport_fc.c
> @@ -2079,7 +2079,8 @@ fc_timed_out(struct scsi_cmnd *scmd)
>  	if (rport->port_state == FC_PORTSTATE_BLOCKED)
>  		return BLK_EH_RESET_TIMER;
>  
> -	return BLK_EH_NOT_HANDLED;
> +	scsi_abort_command(scmd);
> +	return BLK_EH_SCHEDULED;
>  }
>  
>  /*
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index de5f5d8..460e514 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -144,6 +144,7 @@ extern void scsi_put_command(struct scsi_cmnd *);
>  extern void __scsi_put_command(struct Scsi_Host *, struct scsi_cmnd *,
>  			       struct device *);
>  extern void scsi_finish_command(struct scsi_cmnd *cmd);
> +extern void scsi_abort_command(struct scsi_cmnd *cmd);
>  
>  extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
>  				 size_t *offset, size_t *len);
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index cc64587..e03d379 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -80,6 +80,7 @@ struct scsi_device {
>  	spinlock_t list_lock;
>  	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
>  	struct list_head starved_entry;
> +	struct list_head eh_abort_list;
>  	struct scsi_cmnd *current_cmnd;	/* currently active command */
>  	unsigned short queue_depth;	/* How deep of a queue we want */
>  	unsigned short max_queue_depth;	/* max queue depth */
> @@ -180,6 +181,7 @@ struct scsi_device {
>  
>  	struct execute_work	ew; /* used to get process context on put */
>  	struct work_struct	requeue_work;
> +	struct work_struct	abort_work;
>  
>  	struct scsi_dh_data	*scsi_dh_data;
>  	enum scsi_device_state sdev_state;
> -- 
> 1.7.12.4
> 

Jörn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2013-06-06 17:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06  9:43 [PATCH 0/4] New SCSI command timeout handler Hannes Reinecke
2013-06-06  9:43 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
2013-06-06 16:14   ` Jörn Engel
2013-06-06  9:43 ` [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
2013-06-06 16:24   ` Jörn Engel
2013-06-06  9:43 ` [PATCH 3/4] scsi: improved eh timeout handler Hannes Reinecke
2013-06-06 16:23   ` Jörn Engel [this message]
2013-06-06 20:39     ` Hannes Reinecke
2013-06-06 20:28       ` Jörn Engel
2013-06-07  6:25         ` Ren Mingxin
2013-06-07  6:42           ` Hannes Reinecke
2013-06-07 16:21   ` Jörn Engel
2013-06-10  0:12   ` Baruch Even
2013-06-10  5:48     ` Hannes Reinecke
2013-06-06  9:43 ` [PATCH 4/4] virtio_scsi: use " Hannes Reinecke
2013-06-07  6:54 ` [PATCH 0/4] New SCSI command " Ren Mingxin
2013-06-07  7:31   ` Hannes Reinecke
2013-06-07 16:02   ` Jörn Engel
  -- strict thread matches above, loose matches on Subject: below --
2013-10-30  8:37 [PATCHv7 0/4] New EH " Hannes Reinecke
2013-10-30  8:37 ` [PATCH 3/4] scsi: improved eh " Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130606162320.GC16616@logfs.org \
    --to=joern@logfs.org \
    --cc=bmr@redhat.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=james.smart@emulex.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=renmx@cn.fujitsu.com \
    --cc=roland@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.