linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Mike Christie <michaelc@cs.wisc.edu>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	James Bottomley <jbottomley@parallels.com>,
	Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
	Stefan Richter <stefanr@s5r6.in-berlin.de>,
	Tomas Henzl <thenzl@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH 2/3] Stop accepting SCSI requests before removing a device
Date: Wed, 30 May 2012 20:00:52 +0000	[thread overview]
Message-ID: <4FC67C74.4040209@acm.org> (raw)
In-Reply-To: <4FC65888.3000907@cs.wisc.edu>

On 05/30/12 17:27, Mike Christie wrote:

> It should be waiting now if the scsi_cmnd has a request backing
> shouldn't it? We will allocate a request struct with blk_get_request or
> one of the other blk helpers for each scsi_cmnd, and that will increment
> the q->rq.count. If we then go down the error path because a cmd timed
> out or because scsi_decide_disposition returned FAILED, then we will
> still have that request backing the scsi cmnd and the count should still
> be incremented for it. When we call scsi_send_eh_cmnd for eh operations
> the request is then still there and not freed yet. The request will get
> freed later when scsi_eh_flush_done_q is called. In there we will either
> retry or call scsi_finish_command which will go through the normal
> completion process and eventually call __blk_put_request and freed_request.


OK, that means that the counter manipulation code can be left out.
Skipping the queuecommand() call once device removal started is still
useful though since when not doing that scsi_remove_host() sometimes
takes much longer than expected. A call stack I obtained via echo w
>/proc/sysrq-trigger while scsi_remove_host() took longer than expected
is as follows:

 [<ffffffff81404799>] schedule+0x29/0x70
 [<ffffffff81063c55>] async_synchronize_cookie_domain+0x75/0x120
 [<ffffffff8105c940>] ? wake_up_bit+0x40/0x40
 [<ffffffff812c88dc>] ? __pm_runtime_resume+0x6c/0xa0
 [<ffffffff81063d15>] async_synchronize_cookie+0x15/0x20
 [<ffffffff81063d3c>] async_synchronize_full+0x1c/0x40
 [<ffffffffa015aaf6>] sd_remove+0x36/0xc0 [sd_mod]
 [<ffffffff812bce1c>] __device_release_driver+0x7c/0xe0
 [<ffffffff812bd00f>] device_release_driver+0x2f/0x50
 [<ffffffff812bc6cb>] bus_remove_device+0xfb/0x170
 [<ffffffff812b97cd>] device_del+0x12d/0x1c0
 [<ffffffffa003e714>] __scsi_remove_device+0xd4/0xe0 [scsi_mod]
 [<ffffffffa003d10f>] scsi_forget_host+0x6f/0x80 [scsi_mod]
 [<ffffffffa003266a>] scsi_remove_host+0x7a/0x130 [scsi_mod]
 [<ffffffffa0564096>] srp_remove_target+0xa6/0x100 [ib_srp]
 [<ffffffffa05642d4>] srp_remove_work+0x64/0x90 [ib_srp]
 [<ffffffff81054f98>] process_one_work+0x1a8/0x530
 [<ffffffff81054f29>] ? process_one_work+0x139/0x530
 [<ffffffffa0564270>] ? srp_remove_one+0x180/0x180 [ib_srp]
 [<ffffffff81056cea>] worker_thread+0x16a/0x350
 [<ffffffff81056b80>] ? manage_workers+0x250/0x250
 [<ffffffff8105c12e>] kthread+0xae/0xc0
 [<ffffffff8140f514>] kernel_thread_helper+0x4/0x10

With the patch below these delays do not occur:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 386f0c5..0d6ab69 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -791,14 +791,15 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 
 	scsi_log_send(scmd);
 	scmd->scsi_done = scsi_eh_done;
-	shost->hostt->queuecommand(shost, scmd);
-
-	timeleft = wait_for_completion_timeout(&done, timeout);
-
+	if (sdev->sdev_state != SDEV_DEL &&
+	    shost->hostt->queuecommand(shost, scmd) == 0) {
+		timeleft = wait_for_completion_timeout(&done, timeout);
+		scsi_log_completion(scmd, SUCCESS);
+	} else {
+		timeleft = 0;
+	}
 	shost->eh_action = NULL;
 
-	scsi_log_completion(scmd, SUCCESS);
-
 	SCSI_LOG_ERROR_RECOVERY(3,
 		printk("%s: scmd: %p, timeleft: %ld\n",
 			__func__, scmd, timeleft));
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 42c35ff..f32757c 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -955,24 +955,30 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	struct request_queue *q = sdev->request_queue;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
 			return;
 
-		bsg_unregister_queue(sdev->request_queue);
+		bsg_unregister_queue(q);
 		device_unregister(&sdev->sdev_dev);
 		transport_remove_device(dev);
 		device_del(dev);
 	} else
 		put_device(&sdev->sdev_dev);
+
+	/*
+	 * Stop accepting new requests and wait until all queuecommand()
+	 * invocations have finished before tearing down the device.
+	 */
 	scsi_device_set_state(sdev, SDEV_DEL);
+	blk_cleanup_queue(q);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
 
-	/* Freeing the queue signals to block that we're done */
-	blk_cleanup_queue(sdev->request_queue);
 	put_device(dev);
 }
 

  reply	other threads:[~2012-05-30 20:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-04 15:00 [PATCH 0/3 v6] Fixes for SCSI device removal Bart Van Assche
2012-05-04 15:03 ` [PATCH 1/3] sd: Fix device removal NULL pointer dereference Bart Van Assche
2012-05-04 15:06 ` [PATCH 2/3] Stop accepting SCSI requests before removing a device Bart Van Assche
2012-05-04 20:16   ` Mike Christie
2012-05-04 20:30     ` Mike Christie
2012-05-05 13:04       ` Bart Van Assche
2012-05-29 15:00         ` Bart Van Assche
2012-05-29 17:35           ` Mike Christie
2012-05-30  6:56             ` Bart Van Assche
2012-05-30 17:27               ` Mike Christie
2012-05-30 20:00                 ` Bart Van Assche [this message]
2012-06-01  3:13                   ` Mike Christie
2012-05-04 15:07 ` [PATCH 3/3] Make scsi_free_queue() abort pending requests Bart Van Assche
2012-05-04 20:25   ` Mike Christie
2012-05-04 20:32     ` Mike Christie
2012-05-05  6:07       ` Bart Van Assche
2012-05-07  0:44         ` Mike Christie
2012-05-07  1:15           ` Mike Christie
2012-05-14 18:43           ` Bart Van Assche
2012-05-29 14:56             ` Bart Van Assche
2012-05-05 13:41     ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC67C74.4040209@acm.org \
    --to=bvanassche@acm.org \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=snitzer@redhat.com \
    --cc=stefanr@s5r6.in-berlin.de \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).