All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michaelc@cs.wisc.edu>
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	James Bottomley <jbottomley@parallels.com>,
	Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
	Stefan Richter <stefanr@s5r6.in-berlin.de>,
	Tomas Henzl <thenzl@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH 2/3] Stop accepting SCSI requests before removing a device
Date: Thu, 31 May 2012 22:13:55 -0500	[thread overview]
Message-ID: <4FC83373.3050709@cs.wisc.edu> (raw)
In-Reply-To: <4FC67C74.4040209@acm.org>

On 05/30/2012 03:00 PM, Bart Van Assche wrote:
> On 05/30/12 17:27, Mike Christie wrote:
> 
>> It should be waiting now if the scsi_cmnd has a request backing
>> shouldn't it? We will allocate a request struct with blk_get_request or
>> one of the other blk helpers for each scsi_cmnd, and that will increment
>> the q->rq.count. If we then go down the error path because a cmd timed
>> out or because scsi_decide_disposition returned FAILED, then we will
>> still have that request backing the scsi cmnd and the count should still
>> be incremented for it. When we call scsi_send_eh_cmnd for eh operations
>> the request is then still there and not freed yet. The request will get
>> freed later when scsi_eh_flush_done_q is called. In there we will either
>> retry or call scsi_finish_command which will go through the normal
>> completion process and eventually call __blk_put_request and freed_request.
> 
> 
> OK, that means that the counter manipulation code can be left out.
> Skipping the queuecommand() call once device removal started is still
> useful though since when not doing that scsi_remove_host() sometimes
> takes much longer than expected. A call stack I obtained via echo w
>> /proc/sysrq-trigger while scsi_remove_host() took longer than expected
> is as follows:
> 
>  [<ffffffff81404799>] schedule+0x29/0x70
>  [<ffffffff81063c55>] async_synchronize_cookie_domain+0x75/0x120
>  [<ffffffff8105c940>] ? wake_up_bit+0x40/0x40
>  [<ffffffff812c88dc>] ? __pm_runtime_resume+0x6c/0xa0
>  [<ffffffff81063d15>] async_synchronize_cookie+0x15/0x20
>  [<ffffffff81063d3c>] async_synchronize_full+0x1c/0x40
>  [<ffffffffa015aaf6>] sd_remove+0x36/0xc0 [sd_mod]
>  [<ffffffff812bce1c>] __device_release_driver+0x7c/0xe0
>  [<ffffffff812bd00f>] device_release_driver+0x2f/0x50
>  [<ffffffff812bc6cb>] bus_remove_device+0xfb/0x170
>  [<ffffffff812b97cd>] device_del+0x12d/0x1c0
>  [<ffffffffa003e714>] __scsi_remove_device+0xd4/0xe0 [scsi_mod]
>  [<ffffffffa003d10f>] scsi_forget_host+0x6f/0x80 [scsi_mod]
>  [<ffffffffa003266a>] scsi_remove_host+0x7a/0x130 [scsi_mod]
>  [<ffffffffa0564096>] srp_remove_target+0xa6/0x100 [ib_srp]
>  [<ffffffffa05642d4>] srp_remove_work+0x64/0x90 [ib_srp]
>  [<ffffffff81054f98>] process_one_work+0x1a8/0x530
>  [<ffffffff81054f29>] ? process_one_work+0x139/0x530
>  [<ffffffffa0564270>] ? srp_remove_one+0x180/0x180 [ib_srp]
>  [<ffffffff81056cea>] worker_thread+0x16a/0x350
>  [<ffffffff81056b80>] ? manage_workers+0x250/0x250
>  [<ffffffff8105c12e>] kthread+0xae/0xc0
>  [<ffffffff8140f514>] kernel_thread_helper+0x4/0x10
> 
> With the patch below these delays do not occur:
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 386f0c5..0d6ab69 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -791,14 +791,15 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>  
>  	scsi_log_send(scmd);
>  	scmd->scsi_done = scsi_eh_done;
> -	shost->hostt->queuecommand(shost, scmd);
> -
> -	timeleft = wait_for_completion_timeout(&done, timeout);
> -
> +	if (sdev->sdev_state != SDEV_DEL &&
> +	    shost->hostt->queuecommand(shost, scmd) == 0) {
> +		timeleft = wait_for_completion_timeout(&done, timeout);
> +		scsi_log_completion(scmd, SUCCESS);
> +	} else {
> +		timeleft = 0;
> +	}
>  	shost->eh_action = NULL;
>  
> -	scsi_log_completion(scmd, SUCCESS);
> -
>  	SCSI_LOG_ERROR_RECOVERY(3,
>  		printk("%s: scmd: %p, timeleft: %ld\n",
>  			__func__, scmd, timeleft));
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 42c35ff..f32757c 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -955,24 +955,30 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>  void __scsi_remove_device(struct scsi_device *sdev)
>  {
>  	struct device *dev = &sdev->sdev_gendev;
> +	struct request_queue *q = sdev->request_queue;
>  
>  	if (sdev->is_visible) {
>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>  			return;
>  
> -		bsg_unregister_queue(sdev->request_queue);
> +		bsg_unregister_queue(q);
>  		device_unregister(&sdev->sdev_dev);
>  		transport_remove_device(dev);
>  		device_del(dev);
>  	} else
>  		put_device(&sdev->sdev_dev);
> +
> +	/*
> +	 * Stop accepting new requests and wait until all queuecommand()
> +	 * invocations have finished before tearing down the device.
> +	 */
>  	scsi_device_set_state(sdev, SDEV_DEL);
> +	blk_cleanup_queue(q);
> +
>  	if (sdev->host->hostt->slave_destroy)
>  		sdev->host->hostt->slave_destroy(sdev);
>  	transport_destroy_device(dev);
>  
> -	/* Freeing the queue signals to block that we're done */
> -	blk_cleanup_queue(sdev->request_queue);
>  	put_device(dev);
>  }
>  

Looks ok to me.

  reply	other threads:[~2012-06-01  3:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-04 15:00 [PATCH 0/3 v6] Fixes for SCSI device removal Bart Van Assche
2012-05-04 15:03 ` [PATCH 1/3] sd: Fix device removal NULL pointer dereference Bart Van Assche
2012-05-04 15:06 ` [PATCH 2/3] Stop accepting SCSI requests before removing a device Bart Van Assche
2012-05-04 20:16   ` Mike Christie
2012-05-04 20:30     ` Mike Christie
2012-05-05 13:04       ` Bart Van Assche
2012-05-29 15:00         ` Bart Van Assche
2012-05-29 17:35           ` Mike Christie
2012-05-30  6:56             ` Bart Van Assche
2012-05-30 17:27               ` Mike Christie
2012-05-30 20:00                 ` Bart Van Assche
2012-06-01  3:13                   ` Mike Christie [this message]
2012-05-04 15:07 ` [PATCH 3/3] Make scsi_free_queue() abort pending requests Bart Van Assche
2012-05-04 20:25   ` Mike Christie
2012-05-04 20:32     ` Mike Christie
2012-05-05  6:07       ` Bart Van Assche
2012-05-07  0:44         ` Mike Christie
2012-05-07  1:15           ` Mike Christie
2012-05-14 18:43           ` Bart Van Assche
2012-05-29 14:56             ` Bart Van Assche
2012-05-05 13:41     ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC83373.3050709@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=bvanassche@acm.org \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=stefanr@s5r6.in-berlin.de \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.