[PATCH 0/4 v9] SCSI device removal fixes

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4 v9] SCSI device removal fixes
@ 2012-06-25 18:12 Bart Van Assche
  2012-06-25 18:14 ` [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-25 18:12 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

This is version nine of the SCSI device removal patch series. This
patch series has been tested by triggering a large number of removals of
a SCSI device controlled by the ib_srp LLD and at the same time running
an I/O integrity test with fio on a dm device on top of the SRP SCSI device.

Changes compared to v8:
- Replaced the block layer patch (1/4) by a more comprehensive patch
  from Muthukumar Ratty.
- Added "Reviewed-by: Mike Christie" in patches 2/4..4/4.

Changes compared to v7:
- Made sure that __scsi_queue_insert() processes the request that has
  been passed to that function. Added a comment in that function too.
- Fixed a typo in the description of the first patch in this series
  and removed "CC: stable".

Changes compared to v6:
- Added a fix for a race in the block layer.
- Moved a BUG_ON(!sdev) statement up since it's a precondition check.
- Changed return type of scsi_queue_insert() from int to void.
- Added a cancel_work_sync(&sdev->requeue_work) call in
  __scsi_remove_device().

Changes compared to v5:
- Removed the function scsi_free_queue() and inlined that function
  in its callers.
- Added two additional patches.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling
  2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
@ 2012-06-25 18:14 ` Bart Van Assche
  2012-06-25 18:41   ` Tejun Heo
  2012-06-25 18:15 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2012-06-25 18:14 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter, Muthu Kumar,
	Muthukumar Ratty

From: Muthukumar Ratty <muthur@gmail.com>

If the queue is dead blk_execute_rq_nowait() doesn't invoke the done()
callback function. That will result in blk_execute_rq() being stuck
in wait_for_completion(). Avoid this by initializing rq->end_io to the
done() callback before we check the queue state. Also, make sure the
queue lock is held around the invocation of the done() callback. Found
this through source code review.

Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
---
 block/blk-exec.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/blk-exec.c b/block/blk-exec.c
index fb2cbd5..8b6dc5b 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -43,6 +43,9 @@ static void blk_end_sync_rq(struct request *rq, int error)
  * Description:
  *    Insert a fully prepared request at the back of the I/O scheduler queue
  *    for execution.  Don't wait for completion.
+ *
+ * Note:
+ *    This function will invoke @done directly if the queue is dead.
  */
 void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 			   struct request *rq, int at_head,
@@ -51,18 +54,20 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 	int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
 
 	WARN_ON(irqs_disabled());
+
+	rq->rq_disk = bd_disk;
+	rq->end_io = done;
+
 	spin_lock_irq(q->queue_lock);
 
 	if (unlikely(blk_queue_dead(q))) {
-		spin_unlock_irq(q->queue_lock);
 		rq->errors = -ENXIO;
 		if (rq->end_io)
 			rq->end_io(rq, rq->errors);
+		spin_unlock_irq(q->queue_lock);
 		return;
 	}
 
-	rq->rq_disk = bd_disk;
-	rq->end_io = done;
 	__elv_add_request(q, rq, where);
 	__blk_run_queue(q);
 	/* the queue is stopped so it won't be run */

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling
  2012-06-25 18:14 ` [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling Bart Van Assche
@ 2012-06-25 18:41   ` Tejun Heo
  2012-06-25 19:18     ` Muthu Kumar
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2012-06-25 18:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Jun'ichi Nomura, Stefan Richter, Muthu Kumar,
	Muthukumar Ratty

On Mon, Jun 25, 2012 at 06:14:15PM +0000, Bart Van Assche wrote:
> From: Muthukumar Ratty <muthur@gmail.com>
> 
> If the queue is dead blk_execute_rq_nowait() doesn't invoke the done()
> callback function. That will result in blk_execute_rq() being stuck
> in wait_for_completion(). Avoid this by initializing rq->end_io to the
> done() callback before we check the queue state. Also, make sure the
> queue lock is held around the invocation of the done() callback. Found
> this through source code review.
> 
> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: James Bottomley <James.Bottomley@hansenpartnership.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks a lot!

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling
  2012-06-25 18:41   ` Tejun Heo
@ 2012-06-25 19:18     ` Muthu Kumar
  0 siblings, 0 replies; 22+ messages in thread
From: Muthu Kumar @ 2012-06-25 19:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Bart Van Assche, linux-scsi, James Bottomley, Mike Christie,
	Jens Axboe, Jun'ichi Nomura, Stefan Richter, Muthukumar Ratty

On Mon, Jun 25, 2012 at 11:41 AM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Jun 25, 2012 at 06:14:15PM +0000, Bart Van Assche wrote:
>> From: Muthukumar Ratty <muthur@gmail.com>
>>
>> If the queue is dead blk_execute_rq_nowait() doesn't invoke the done()
>> callback function. That will result in blk_execute_rq() being stuck
>> in wait_for_completion(). Avoid this by initializing rq->end_io to the
>> done() callback before we check the queue state. Also, make sure the
>> queue lock is held around the invocation of the done() callback. Found
>> this through source code review.
>>
>> Signed-off-by: Muthukumar Ratty <muthur@gmail.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
>
> Acked-by: Tejun Heo <tj@kernel.org>
>
> Thanks a lot!
>

Yep.. Thanks for the help Bart.


> --
> tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
  2012-06-25 18:14 ` [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling Bart Van Assche
@ 2012-06-25 18:15 ` Bart Van Assche
  2012-06-25 20:36   ` Tejun Heo
  2012-06-25 21:14   ` James Bottomley
  2012-06-25 18:16 ` [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void Bart Van Assche
  2012-06-25 18:17 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
  3 siblings, 2 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-25 18:15 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

Since scsi_prep_fn() may be invoked concurrently with
__scsi_remove_device(), keep the queuedata pointer in
__scsi_remove_device(). This patch fixes a kernel oops that
can be triggered by USB device removal. See also
http://www.spinics.net/lists/linux-scsi/msg56254.html.

Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
---
 drivers/scsi/hosts.c      |    8 +++++++-
 drivers/scsi/scsi_lib.c   |   35 ++++++++---------------------------
 drivers/scsi/scsi_priv.h  |    1 -
 drivers/scsi/scsi_sysfs.c |    5 +----
 4 files changed, 16 insertions(+), 33 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index a3a056a..6b9d89a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -299,9 +299,15 @@ static void scsi_host_dev_release(struct device *dev)
 		destroy_workqueue(shost->work_q);
 	q = shost->uspace_req_q;
 	if (q) {
+		/*
+		 * Note: freeing queuedata before invoking blk_cleanup_queue()
+		 * is safe here because no request function is associated with
+		 * uspace_req_q. See also the __scsi_alloc_queue() call in
+		 * drivers/scsi/scsi_tgt_lib.c.
+		 */
 		kfree(q->queuedata);
 		q->queuedata = NULL;
-		scsi_free_queue(q);
+		blk_cleanup_queue(q);
 	}
 
 	scsi_destroy_command_freelist(shost);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6dfb978..c26ef49 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
 	LIST_HEAD(starved_list);
 	unsigned long flags;
 
-	/* if the device is dead, sdev will be NULL, so no queue to run */
-	if (!sdev)
-		return;
-
+	BUG_ON(!sdev);
 	shost = sdev->host;
 	if (scsi_target(sdev)->single_lun)
 		scsi_single_lun_run(sdev);
@@ -1370,16 +1367,18 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
  * may be changed after request stacking drivers call the function,
  * regardless of taking lock or not.
  *
- * When scsi can't dispatch I/Os anymore and needs to kill I/Os
- * (e.g. !sdev), scsi needs to return 'not busy'.
- * Otherwise, request stacking drivers may hold requests forever.
+ * When scsi can't dispatch I/Os anymore and needs to kill I/Os scsi
+ * needs to return 'not busy'. Otherwise, request stacking drivers
+ * may hold requests forever.
  */
 static int scsi_lld_busy(struct request_queue *q)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 
-	if (!sdev)
+	BUG_ON(!sdev);
+
+	if (blk_queue_dead(q))
 		return 0;
 
 	shost = sdev->host;
@@ -1490,11 +1489,7 @@ static void scsi_request_fn(struct request_queue *q)
 	struct scsi_cmnd *cmd;
 	struct request *req;
 
-	if (!sdev) {
-		while ((req = blk_peek_request(q)) != NULL)
-			scsi_kill_request(req, q);
-		return;
-	}
+	BUG_ON(!sdev);
 
 	if(!get_device(&sdev->sdev_gendev))
 		/* We must be tearing the block queue down already */
@@ -1697,20 +1692,6 @@ struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
 	return q;
 }
 
-void scsi_free_queue(struct request_queue *q)
-{
-	unsigned long flags;
-
-	WARN_ON(q->queuedata);
-
-	/* cause scsi_request_fn() to kill all non-finished requests */
-	spin_lock_irqsave(q->queue_lock, flags);
-	q->request_fn(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
-	blk_cleanup_queue(q);
-}
-
 /*
  * Function:    scsi_block_requests()
  *
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 07ce3f5..2b8d8b5 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -84,7 +84,6 @@ extern void scsi_next_command(struct scsi_cmnd *cmd);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);
 extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev);
-extern void scsi_free_queue(struct request_queue *q);
 extern int scsi_init_queue(void);
 extern void scsi_exit_queue(void);
 struct request_queue;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 04c2a27..42c35ff 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -971,11 +971,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
 
-	/* cause the request function to reject all I/O requests */
-	sdev->request_queue->queuedata = NULL;
-
 	/* Freeing the queue signals to block that we're done */
-	scsi_free_queue(sdev->request_queue);
+	blk_cleanup_queue(sdev->request_queue);
 	put_device(dev);
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 18:15 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
@ 2012-06-25 20:36   ` Tejun Heo
  2012-06-25 21:14   ` James Bottomley
  1 sibling, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2012-06-25 20:36 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Jun'ichi Nomura, Stefan Richter

Hello,

This generally looks good to me but a couple comments.

On Mon, Jun 25, 2012 at 06:15:21PM +0000, Bart Van Assche wrote:
>  	if (q) {
> +		/*
> +		 * Note: freeing queuedata before invoking blk_cleanup_queue()
> +		 * is safe here because no request function is associated with
> +		 * uspace_req_q. See also the __scsi_alloc_queue() call in
> +		 * drivers/scsi/scsi_tgt_lib.c.
> +		 */
>  		kfree(q->queuedata);
>  		q->queuedata = NULL;
> -		scsi_free_queue(q);
> +		blk_cleanup_queue(q);

Why not just do blk_cleanup_queue() first and then free queuedata
using a local variable?  It's easier to grasp and less error-prone to
shut down the queue before destroying it.

>  	}
>  
>  	scsi_destroy_command_freelist(shost);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6dfb978..c26ef49 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
>  	LIST_HEAD(starved_list);
>  	unsigned long flags;
>  
> -	/* if the device is dead, sdev will be NULL, so no queue to run */
> -	if (!sdev)
> -		return;
> -
> +	BUG_ON(!sdev);
>  	shost = sdev->host;
>  	if (scsi_target(sdev)->single_lun)
>  		scsi_single_lun_run(sdev);
> @@ -1370,16 +1367,18 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   * may be changed after request stacking drivers call the function,
>   * regardless of taking lock or not.
>   *
> - * When scsi can't dispatch I/Os anymore and needs to kill I/Os
> - * (e.g. !sdev), scsi needs to return 'not busy'.
> - * Otherwise, request stacking drivers may hold requests forever.
> + * When scsi can't dispatch I/Os anymore and needs to kill I/Os scsi
> + * needs to return 'not busy'. Otherwise, request stacking drivers
> + * may hold requests forever.
>   */
>  static int scsi_lld_busy(struct request_queue *q)
>  {
>  	struct scsi_device *sdev = q->queuedata;
>  	struct Scsi_Host *shost;
>  
> -	if (!sdev)
> +	BUG_ON(!sdev);

Do these BUG_ON()s actually add anything?  Kernel is gonna oops in a
few lines on NULL sdev anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 18:15 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
  2012-06-25 20:36   ` Tejun Heo
@ 2012-06-25 21:14   ` James Bottomley
  2012-06-25 21:21     ` Tejun Heo
                       ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: James Bottomley @ 2012-06-25 21:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On Mon, 2012-06-25 at 18:15 +0000, Bart Van Assche wrote:
> Since scsi_prep_fn() may be invoked concurrently with
> __scsi_remove_device(), keep the queuedata pointer in
> __scsi_remove_device(). This patch fixes a kernel oops that
> can be triggered by USB device removal. See also
> http://www.spinics.net/lists/linux-scsi/msg56254.html.

This patch causes a subtle change of semantics: you're substituting our
signal for dead queue as !sdev with a check of blk_queue_dead(). 

> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@parallels.com>
> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
> Cc: <stable@kernel.org>
> ---
>  drivers/scsi/hosts.c      |    8 +++++++-
>  drivers/scsi/scsi_lib.c   |   35 ++++++++---------------------------
>  drivers/scsi/scsi_priv.h  |    1 -
>  drivers/scsi/scsi_sysfs.c |    5 +----
>  4 files changed, 16 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index a3a056a..6b9d89a 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -299,9 +299,15 @@ static void scsi_host_dev_release(struct device *dev)
>  		destroy_workqueue(shost->work_q);
>  	q = shost->uspace_req_q;
>  	if (q) {
> +		/*
> +		 * Note: freeing queuedata before invoking blk_cleanup_queue()
> +		 * is safe here because no request function is associated with
> +		 * uspace_req_q. See also the __scsi_alloc_queue() call in
> +		 * drivers/scsi/scsi_tgt_lib.c.
> +		 */

This comment doesn't really make a whole lot of sense.  What I think
it's saying is that it's OK for the commands executed by the drain in
blk_cleanup_queue to have a NULL queuedata and by the time we reach
this, there can be no concurrent racing calls to queuecommand?  Is this
true, and why?

>  		kfree(q->queuedata);
>  		q->queuedata = NULL;
> -		scsi_free_queue(q);
> +		blk_cleanup_queue(q);
>  	}
>  
>  	scsi_destroy_command_freelist(shost);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6dfb978..c26ef49 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
>  	LIST_HEAD(starved_list);
>  	unsigned long flags;
>  
> -	/* if the device is dead, sdev will be NULL, so no queue to run */
> -	if (!sdev)
> -		return;
> -
> +	BUG_ON(!sdev);

Needs to be a blk_queue_dead() check as well.

>  	shost = sdev->host;
>  	if (scsi_target(sdev)->single_lun)
>  		scsi_single_lun_run(sdev);
> @@ -1370,16 +1367,18 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   * may be changed after request stacking drivers call the function,
>   * regardless of taking lock or not.
>   *
> - * When scsi can't dispatch I/Os anymore and needs to kill I/Os
> - * (e.g. !sdev), scsi needs to return 'not busy'.
> - * Otherwise, request stacking drivers may hold requests forever.
> + * When scsi can't dispatch I/Os anymore and needs to kill I/Os scsi
> + * needs to return 'not busy'. Otherwise, request stacking drivers
> + * may hold requests forever.
>   */
>  static int scsi_lld_busy(struct request_queue *q)
>  {
>  	struct scsi_device *sdev = q->queuedata;
>  	struct Scsi_Host *shost;
>  
> -	if (!sdev)
> +	BUG_ON(!sdev);
> +
> +	if (blk_queue_dead(q))
>  		return 0;
>  
>  	shost = sdev->host;
> @@ -1490,11 +1489,7 @@ static void scsi_request_fn(struct request_queue *q)
>  	struct scsi_cmnd *cmd;
>  	struct request *req;
>  
> -	if (!sdev) {
> -		while ((req = blk_peek_request(q)) != NULL)
> -			scsi_kill_request(req, q);
> -		return;
> -	}

That means that this hunk of code has to stay, but needs to be gated on
blk_queue_dead(q); there's still a race where this can occur.


> +	BUG_ON(!sdev);

I'm with Tejun, these BUG_ON's are now pretty pointless.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 21:14   ` James Bottomley
@ 2012-06-25 21:21     ` Tejun Heo
  2012-06-25 21:35       ` James Bottomley
  2012-06-25 22:05     ` Mike Christie
  2012-06-26  6:46     ` Bart Van Assche
  2 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2012-06-25 21:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, linux-scsi, Mike Christie, Jens Axboe,
	Jun'ichi Nomura, Stefan Richter

Hey, James.

On Mon, Jun 25, 2012 at 10:14:49PM +0100, James Bottomley wrote:
> > @@ -1490,11 +1489,7 @@ static void scsi_request_fn(struct request_queue *q)
> >  	struct scsi_cmnd *cmd;
> >  	struct request *req;
> >  
> > -	if (!sdev) {
> > -		while ((req = blk_peek_request(q)) != NULL)
> > -			scsi_kill_request(req, q);
> > -		return;
> > -	}
> 
> That means that this hunk of code has to stay, but needs to be gated on
> blk_queue_dead(q); there's still a race where this can occur.

Wouldn't the scsi_device_online() check down below be enough?  Block
layer drain is gonna loop until all requests are done, so the looping
is handled from block layer.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 21:21     ` Tejun Heo
@ 2012-06-25 21:35       ` James Bottomley
  2012-06-26  7:02         ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2012-06-25 21:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Bart Van Assche, linux-scsi, Mike Christie, Jens Axboe,
	Jun'ichi Nomura, Stefan Richter

On Mon, 2012-06-25 at 14:21 -0700, Tejun Heo wrote:
> Hey, James.
> 
> On Mon, Jun 25, 2012 at 10:14:49PM +0100, James Bottomley wrote:
> > > @@ -1490,11 +1489,7 @@ static void scsi_request_fn(struct request_queue *q)
> > >  	struct scsi_cmnd *cmd;
> > >  	struct request *req;
> > >  
> > > -	if (!sdev) {
> > > -		while ((req = blk_peek_request(q)) != NULL)
> > > -			scsi_kill_request(req, q);
> > > -		return;
> > > -	}
> > 
> > That means that this hunk of code has to stay, but needs to be gated on
> > blk_queue_dead(q); there's still a race where this can occur.
> 
> Wouldn't the scsi_device_online() check down below be enough?  Block
> layer drain is gonna loop until all requests are done, so the looping
> is handled from block layer.

It might be ... in theory the teardown is supposed to happen in
SDEV_CANCEL and be done by SDEV_DEL.  However, I'm not sure that's
entirely true now.  blk_queue_dead() is safer since we know we just
killed the queue.  Another reason for doing it like this is that the
kill on queue dead isn't noisy ... the one on !online is ... and logs
were getting stuffed with messages about killing requests to dead
queues.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 21:35       ` James Bottomley
@ 2012-06-26  7:02         ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-26  7:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, linux-scsi, Mike Christie, Jens Axboe,
	Jun'ichi Nomura, Stefan Richter

On 06/25/12 21:35, James Bottomley wrote:

> On Mon, 2012-06-25 at 14:21 -0700, Tejun Heo wrote:
>> Hey, James.
>>
>> On Mon, Jun 25, 2012 at 10:14:49PM +0100, James Bottomley wrote:
>>>> @@ -1490,11 +1489,7 @@ static void scsi_request_fn(struct request_queue *q)
>>>>  	struct scsi_cmnd *cmd;
>>>>  	struct request *req;
>>>>  
>>>> -	if (!sdev) {
>>>> -		while ((req = blk_peek_request(q)) != NULL)
>>>> -			scsi_kill_request(req, q);
>>>> -		return;
>>>> -	}
>>>
>>> That means that this hunk of code has to stay, but needs to be gated on
>>> blk_queue_dead(q); there's still a race where this can occur.
>>
>> Wouldn't the scsi_device_online() check down below be enough?  Block
>> layer drain is gonna loop until all requests are done, so the looping
>> is handled from block layer.
> 
> It might be ... in theory the teardown is supposed to happen in
> SDEV_CANCEL and be done by SDEV_DEL.  However, I'm not sure that's
> entirely true now.  blk_queue_dead() is safer since we know we just
> killed the queue.  Another reason for doing it like this is that the
> kill on queue dead isn't noisy ... the one on !online is ... and logs
> were getting stuffed with messages about killing requests to dead
> queues.


That log filling was fixed by commit 7457181. Without patch 2/4 of this
series a single message is printed when a request is killed because the
queue is dead ("killing request"). With patch 2/4 two messages are
printed ("rejecting I/O to offline device" + "killing request"). I can
suppress the first message by inserting an additional if statement if you
want, e.g. as follows (compile-tested only):

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dcef9b8..e307314 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1506,7 +1506,8 @@ static void scsi_request_fn(struct request_queue *q)
                        break;
 
                if (unlikely(!scsi_device_online(sdev))) {
-                       sdev_printk(KERN_ERR, sdev,
+                       if (!blk_queue_dead(q))
+                               sdev_printk(KERN_ERR, sdev,
                                    "rejecting I/O to offline device\n");
                        scsi_kill_request(req, q);
                        continue;


Bart.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 21:14   ` James Bottomley
  2012-06-25 21:21     ` Tejun Heo
@ 2012-06-25 22:05     ` Mike Christie
  2012-06-26  7:19       ` James Bottomley
  2012-06-26  6:46     ` Bart Van Assche
  2 siblings, 1 reply; 22+ messages in thread
From: Mike Christie @ 2012-06-25 22:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, linux-scsi, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On 06/25/2012 04:14 PM, James Bottomley wrote:
> On Mon, 2012-06-25 at 18:15 +0000, Bart Van Assche wrote:
>> Since scsi_prep_fn() may be invoked concurrently with
>> __scsi_remove_device(), keep the queuedata pointer in
>> __scsi_remove_device(). This patch fixes a kernel oops that
>> can be triggered by USB device removal. See also
>> http://www.spinics.net/lists/linux-scsi/msg56254.html.
> 
> This patch causes a subtle change of semantics: you're substituting our
> signal for dead queue as !sdev with a check of blk_queue_dead(). 
> 
>> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
>> Cc: James Bottomley <JBottomley@parallels.com>
>> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
>> Cc: <stable@kernel.org>
>> ---
>>  drivers/scsi/hosts.c      |    8 +++++++-
>>  drivers/scsi/scsi_lib.c   |   35 ++++++++---------------------------
>>  drivers/scsi/scsi_priv.h  |    1 -
>>  drivers/scsi/scsi_sysfs.c |    5 +----
>>  4 files changed, 16 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index a3a056a..6b9d89a 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -299,9 +299,15 @@ static void scsi_host_dev_release(struct device *dev)
>>  		destroy_workqueue(shost->work_q);
>>  	q = shost->uspace_req_q;
>>  	if (q) {
>> +		/*
>> +		 * Note: freeing queuedata before invoking blk_cleanup_queue()
>> +		 * is safe here because no request function is associated with
>> +		 * uspace_req_q. See also the __scsi_alloc_queue() call in
>> +		 * drivers/scsi/scsi_tgt_lib.c.
>> +		 */
> 
> This comment doesn't really make a whole lot of sense.  What I think
> it's saying is that it's OK for the commands executed by the drain in
> blk_cleanup_queue to have a NULL queuedata and by the time we reach
> this, there can be no concurrent racing calls to queuecommand?  Is this
> true, and why?

Only the scsi_tgt_lib.c code uses the uses the uspace_req_q. That code
does not use it in a traditional way. It passes __scsi_alloc_queue NULL
for the request_fn function argument, so this request queue does not
have a function that pulls requests off a queue like is done for the
initiator path.

That target code mostly uses the queue struct so that it can use the
block/scsi functions that need a request queue to build scatterlists,
cmds, bios, etc. When that code was made originally we were going to use
the queue in a more tradition way or break out the queue limits into
another struct and pass them around. I think we have the latter now, but
the target code has not been converted.

But so that is why we cannot hit a race like you are thinking about in
the initiator path.

It is also a weird use of the queue so it is also why it does not make
sense :)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 22:05     ` Mike Christie
@ 2012-06-26  7:19       ` James Bottomley
  2012-06-26  7:26         ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2012-06-26  7:19 UTC (permalink / raw)
  To: Mike Christie
  Cc: Bart Van Assche, linux-scsi, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On Mon, 2012-06-25 at 17:05 -0500, Mike Christie wrote:
> On 06/25/2012 04:14 PM, James Bottomley wrote:
> > On Mon, 2012-06-25 at 18:15 +0000, Bart Van Assche wrote:
> >> Since scsi_prep_fn() may be invoked concurrently with
> >> __scsi_remove_device(), keep the queuedata pointer in
> >> __scsi_remove_device(). This patch fixes a kernel oops that
> >> can be triggered by USB device removal. See also
> >> http://www.spinics.net/lists/linux-scsi/msg56254.html.
> > 
> > This patch causes a subtle change of semantics: you're substituting our
> > signal for dead queue as !sdev with a check of blk_queue_dead(). 
> > 
> >> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> >> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> >> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> >> Cc: James Bottomley <JBottomley@parallels.com>
> >> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
> >> Cc: <stable@kernel.org>
> >> ---
> >>  drivers/scsi/hosts.c      |    8 +++++++-
> >>  drivers/scsi/scsi_lib.c   |   35 ++++++++---------------------------
> >>  drivers/scsi/scsi_priv.h  |    1 -
> >>  drivers/scsi/scsi_sysfs.c |    5 +----
> >>  4 files changed, 16 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> >> index a3a056a..6b9d89a 100644
> >> --- a/drivers/scsi/hosts.c
> >> +++ b/drivers/scsi/hosts.c
> >> @@ -299,9 +299,15 @@ static void scsi_host_dev_release(struct device *dev)
> >>  		destroy_workqueue(shost->work_q);
> >>  	q = shost->uspace_req_q;
> >>  	if (q) {
> >> +		/*
> >> +		 * Note: freeing queuedata before invoking blk_cleanup_queue()
> >> +		 * is safe here because no request function is associated with
> >> +		 * uspace_req_q. See also the __scsi_alloc_queue() call in
> >> +		 * drivers/scsi/scsi_tgt_lib.c.
> >> +		 */
> > 
> > This comment doesn't really make a whole lot of sense.  What I think
> > it's saying is that it's OK for the commands executed by the drain in
> > blk_cleanup_queue to have a NULL queuedata and by the time we reach
> > this, there can be no concurrent racing calls to queuecommand?  Is this
> > true, and why?
> 
> Only the scsi_tgt_lib.c code uses the uses the uspace_req_q. That code
> does not use it in a traditional way. It passes __scsi_alloc_queue NULL
> for the request_fn function argument, so this request queue does not
> have a function that pulls requests off a queue like is done for the
> initiator path.
> 
> That target code mostly uses the queue struct so that it can use the
> block/scsi functions that need a request queue to build scatterlists,
> cmds, bios, etc. When that code was made originally we were going to use
> the queue in a more tradition way or break out the queue limits into
> another struct and pass them around. I think we have the latter now, but
> the target code has not been converted.
> 
> But so that is why we cannot hit a race like you are thinking about in
> the initiator path.
> 
> It is also a weird use of the queue so it is also why it does not make
> sense :)

OK, could we encapsulate all that in the comment so I don't have to ask
again in a year's time ... ?

Thanks,

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-26  7:19       ` James Bottomley
@ 2012-06-26  7:26         ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-26  7:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, linux-scsi, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On 06/26/12 07:19, James Bottomley wrote:

> On Mon, 2012-06-25 at 17:05 -0500, Mike Christie wrote:
>> Only the scsi_tgt_lib.c code uses the uses the uspace_req_q. That code
>> does not use it in a traditional way. It passes __scsi_alloc_queue NULL
>> for the request_fn function argument, so this request queue does not
>> have a function that pulls requests off a queue like is done for the
>> initiator path.
>>
>> That target code mostly uses the queue struct so that it can use the
>> block/scsi functions that need a request queue to build scatterlists,
>> cmds, bios, etc. When that code was made originally we were going to use
>> the queue in a more tradition way or break out the queue limits into
>> another struct and pass them around. I think we have the latter now, but
>> the target code has not been converted.
>>
>> But so that is why we cannot hit a race like you are thinking about in
>> the initiator path.
>>
>> It is also a weird use of the queue so it is also why it does not make
>> sense :)
> 
> OK, could we encapsulate all that in the comment so I don't have to ask
> again in a year's time ... ?


Maybe it's easier to implement Tejun's suggestion (swapping the
blk_cleanup_queue() call and freeing the queuedata) ? That should be
readable for everyone and shouldn't need a comment.

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-25 21:14   ` James Bottomley
  2012-06-25 21:21     ` Tejun Heo
  2012-06-25 22:05     ` Mike Christie
@ 2012-06-26  6:46     ` Bart Van Assche
  2012-06-26  7:25       ` James Bottomley
  2012-06-26  9:13       ` Mike Christie
  2 siblings, 2 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-26  6:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On 06/25/12 21:14, James Bottomley wrote:

> On Mon, 2012-06-25 at 18:15 +0000, Bart Van Assche wrote:
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 6dfb978..c26ef49 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
>>  	LIST_HEAD(starved_list);
>>  	unsigned long flags;
>>  
>> -	/* if the device is dead, sdev will be NULL, so no queue to run */
>> -	if (!sdev)
>> -		return;
>> -
>> +	BUG_ON(!sdev);
> 
> Needs to be a blk_queue_dead() check as well.


Callers of scsi_run_queue() don't hold the queue lock. Does it make
sense to test whether the queue is dead without the queue lock being held ?

>> +	BUG_ON(!sdev);
> 
> I'm with Tejun, these BUG_ON's are now pretty pointless.


OK, I'll remove these.

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-26  6:46     ` Bart Van Assche
@ 2012-06-26  7:25       ` James Bottomley
  2012-06-26 10:00         ` Bart Van Assche
  2012-06-26  9:13       ` Mike Christie
  1 sibling, 1 reply; 22+ messages in thread
From: James Bottomley @ 2012-06-26  7:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On Tue, 2012-06-26 at 06:46 +0000, Bart Van Assche wrote:
> On 06/25/12 21:14, James Bottomley wrote:
> 
> > On Mon, 2012-06-25 at 18:15 +0000, Bart Van Assche wrote:
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 6dfb978..c26ef49 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
> >>  	LIST_HEAD(starved_list);
> >>  	unsigned long flags;
> >>  
> >> -	/* if the device is dead, sdev will be NULL, so no queue to run */
> >> -	if (!sdev)
> >> -		return;
> >> -
> >> +	BUG_ON(!sdev);
> > 
> > Needs to be a blk_queue_dead() check as well.
> 
> 
> Callers of scsi_run_queue() don't hold the queue lock. Does it make
> sense to test whether the queue is dead without the queue lock being held ?

I'm not entirely sure I understand the question, but I think you think
that locking is required to read the queue state?  That's emphatically
not correct.  In fact, you don't even need locking to write the state
provied the state variable is a natcural processor witdth (i.e. u32 on
32 bit or u64 or u32 on 64 bit) and you don't do dependent state
variable updates or other atomic sequences.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-26  7:25       ` James Bottomley
@ 2012-06-26 10:00         ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-26 10:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On 06/26/12 07:25, James Bottomley wrote:

> On Tue, 2012-06-26 at 06:46 +0000, Bart Van Assche wrote:
>> On 06/25/12 21:14, James Bottomley wrote:
>>
>>> On Mon, 2012-06-25 at 18:15 +0000, Bart Van Assche wrote:
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index 6dfb978..c26ef49 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
>>>>  	LIST_HEAD(starved_list);
>>>>  	unsigned long flags;
>>>>  
>>>> -	/* if the device is dead, sdev will be NULL, so no queue to run */
>>>> -	if (!sdev)
>>>> -		return;
>>>> -
>>>> +	BUG_ON(!sdev);
>>>
>>> Needs to be a blk_queue_dead() check as well.
>>
>> Callers of scsi_run_queue() don't hold the queue lock. Does it make
>> sense to test whether the queue is dead without the queue lock being held ?
> 
> I'm not entirely sure I understand the question, but I think you think
> that locking is required to read the queue state?  That's emphatically
> not correct.  In fact, you don't even need locking to write the state
> provied the state variable is a natcural processor witdth (i.e. u32 on
> 32 bit or u64 or u32 on 64 bit) and you don't do dependent state
> variable updates or other atomic sequences.


Sorry if my comment was not clear enough. What I meant is that since
callers of scsi_run_queue() don't hold the queue lock that the queue can
become dead even after having checked the queue state at the start of
that function. Since scsi_run_queue() can already handle queue state
changes while it is in progress, it is fine to leave out the queue state
check at the start of that function.

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-26  6:46     ` Bart Van Assche
  2012-06-26  7:25       ` James Bottomley
@ 2012-06-26  9:13       ` Mike Christie
  2012-06-26 10:03         ` Bart Van Assche
  2012-06-26 15:03         ` Hannes Reinecke
  1 sibling, 2 replies; 22+ messages in thread
From: Mike Christie @ 2012-06-26  9:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-scsi, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On 06/26/2012 01:46 AM, Bart Van Assche wrote:
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> >> index 6dfb978..c26ef49 100644
>>> >> --- a/drivers/scsi/scsi_lib.c
>>> >> +++ b/drivers/scsi/scsi_lib.c
>>> >> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
>>> >>  	LIST_HEAD(starved_list);
>>> >>  	unsigned long flags;
>>> >>  
>>> >> -	/* if the device is dead, sdev will be NULL, so no queue to run */
>>> >> -	if (!sdev)
>>> >> -		return;
>>> >> -
>>> >> +	BUG_ON(!sdev);
>> > 
>> > Needs to be a blk_queue_dead() check as well.
> 
> Callers of scsi_run_queue() don't hold the queue lock. Does it make
> sense to test whether the queue is dead without the queue lock being held ?
> 

I think there is still another bug in this path when it is called from
the requeue path. If scsi_requeue_command requeues a command and that
gets executed by some other thread before scsi_requeue_command calls
scsi_run_queue then we could end up accessing freed memory.

It looks possible some other thread is doing
blk_cleanup_queue->blk_drain_queue and that calls blk_run_queue and that
kills/fails the IO that scsi_requeue_command had queued. Then
blk_cleanup_queue could complete and we could end up doing the last put
on the device and freeing the queue and sdev before scsi_requeue_command
can call scsi_run_queue. scsi_run_queue could then be accessing freed
memory.

I think we need a get/put:

scsi_requeue_command....

get_device(&sdev->sdev_gendev);

spin_lock_irqsave(q->queue_lock, flags);
scsi_unprep_request(req);
blk_requeue_request(q, req);
spin_unlock_irqrestore(q->queue_lock, flags);

scsi_run_queue(q);

put_device(&sdev->sdev_gendev);

This will prevent some other path from freeing the queue/sdev from under us.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-26  9:13       ` Mike Christie
@ 2012-06-26 10:03         ` Bart Van Assche
  2012-06-26 15:03         ` Hannes Reinecke
  1 sibling, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-26 10:03 UTC (permalink / raw)
  To: Mike Christie
  Cc: James Bottomley, linux-scsi, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

On 06/26/12 09:13, Mike Christie wrote:

> I think there is still another bug in this path when it is called from
> the requeue path. If scsi_requeue_command requeues a command and that
> gets executed by some other thread before scsi_requeue_command calls
> scsi_run_queue then we could end up accessing freed memory.
> 
> It looks possible some other thread is doing
> blk_cleanup_queue->blk_drain_queue and that calls blk_run_queue and that
> kills/fails the IO that scsi_requeue_command had queued. Then
> blk_cleanup_queue could complete and we could end up doing the last put
> on the device and freeing the queue and sdev before scsi_requeue_command
> can call scsi_run_queue. scsi_run_queue could then be accessing freed
> memory.
> 
> I think we need a get/put:
> 
> scsi_requeue_command....
> 
> get_device(&sdev->sdev_gendev);
> 
> spin_lock_irqsave(q->queue_lock, flags);
> scsi_unprep_request(req);
> blk_requeue_request(q, req);
> spin_unlock_irqrestore(q->queue_lock, flags);
> 
> scsi_run_queue(q);
> 
> put_device(&sdev->sdev_gendev);
> 
> This will prevent some other path from freeing the queue/sdev from under us.


The above makes a lot of sense to me - I'll have a closer look at it.

However, it seems to me that we are paying a price for
scsi_dispatch_cmd() being called without the queue lock held, namely
increased complexity of the SCSI core.

Bart.

 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
  2012-06-26  9:13       ` Mike Christie
  2012-06-26 10:03         ` Bart Van Assche
@ 2012-06-26 15:03         ` Hannes Reinecke
  1 sibling, 0 replies; 22+ messages in thread
From: Hannes Reinecke @ 2012-06-26 15:03 UTC (permalink / raw)
  To: Mike Christie
  Cc: Bart Van Assche, James Bottomley, linux-scsi, Jens Axboe,
	Tejun Heo, Jun'ichi Nomura, Stefan Richter

On 06/26/2012 11:13 AM, Mike Christie wrote:
> On 06/26/2012 01:46 AM, Bart Van Assche wrote:
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>>> index 6dfb978..c26ef49 100644
>>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>>> @@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
>>>>>>  	LIST_HEAD(starved_list);
>>>>>>  	unsigned long flags;
>>>>>>  
>>>>>> -	/* if the device is dead, sdev will be NULL, so no queue to run */
>>>>>> -	if (!sdev)
>>>>>> -		return;
>>>>>> -
>>>>>> +	BUG_ON(!sdev);
>>>>
>>>> Needs to be a blk_queue_dead() check as well.
>>
>> Callers of scsi_run_queue() don't hold the queue lock. Does it make
>> sense to test whether the queue is dead without the queue lock being held ?
>>
> 
> I think there is still another bug in this path when it is called from
> the requeue path. If scsi_requeue_command requeues a command and that
> gets executed by some other thread before scsi_requeue_command calls
> scsi_run_queue then we could end up accessing freed memory.
> 
> It looks possible some other thread is doing
> blk_cleanup_queue->blk_drain_queue and that calls blk_run_queue and that
> kills/fails the IO that scsi_requeue_command had queued. Then
> blk_cleanup_queue could complete and we could end up doing the last put
> on the device and freeing the queue and sdev before scsi_requeue_command
> can call scsi_run_queue. scsi_run_queue could then be accessing freed
> memory.
> 
> I think we need a get/put:
> 
> scsi_requeue_command....
> 
> get_device(&sdev->sdev_gendev);
> 
> spin_lock_irqsave(q->queue_lock, flags);
> scsi_unprep_request(req);
> blk_requeue_request(q, req);
> spin_unlock_irqrestore(q->queue_lock, flags);
> 
> scsi_run_queue(q);
> 
> put_device(&sdev->sdev_gendev);
> 
> This will prevent some other path from freeing the queue/sdev from under us.

<clapping hands>
Congrats, Mike.

This looks like the bug I have been hunting since nearly a year now.
(and which I've been pestering folks at the Storage Summit :-)

So yeah, definitely a good idea.
I'll give it a shout and see if it improves things.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void
  2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
  2012-06-25 18:14 ` [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling Bart Van Assche
  2012-06-25 18:15 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
@ 2012-06-25 18:16 ` Bart Van Assche
  2012-06-25 18:17 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
  3 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2012-06-25 18:16 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter

The return value of scsi_queue_insert() is ignored by all its
callers, hence change the return type of this function into
void.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: <stable@kernel.org>
---
 drivers/scsi/scsi_lib.c  |    8 +++-----
 drivers/scsi/scsi_priv.h |    2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c26ef49..082c1e5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -109,7 +109,7 @@ static void scsi_unprep_request(struct request *req)
  * for a requeue after completion, which should only occur in this
  * file.
  */
-static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 {
 	struct Scsi_Host *host = cmd->device->host;
 	struct scsi_device *device = cmd->device;
@@ -162,8 +162,6 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	kblockd_schedule_work(q, &device->requeue_work);
-
-	return 0;
 }
 
 /*
@@ -185,9 +183,9 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
  * Notes:       This could be called either from an interrupt context or a
  *              normal process context.
  */
-int scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
+void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
-	return __scsi_queue_insert(cmd, reason, 1);
+	__scsi_queue_insert(cmd, reason, 1);
 }
 /**
  * scsi_execute - insert request and wait for the result
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 2b8d8b5..cacb0e7 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -79,7 +79,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd);
 /* scsi_lib.c */
 extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
 extern void scsi_device_unbusy(struct scsi_device *sdev);
-extern int scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
+extern void scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
 extern void scsi_next_command(struct scsi_cmnd *cmd);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
  2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
                   ` (2 preceding siblings ...)
  2012-06-25 18:16 ` [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void Bart Van Assche
@ 2012-06-25 18:17 ` Bart Van Assche
  2012-06-25 20:42   ` Tejun Heo
  3 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2012-06-25 18:17 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Tejun Heo,
	Jun'ichi Nomura, Stefan Richter, Joe Lawrence

Avoid that the code for requeueing SCSI requests triggers a
crash by making sure that that code isn't scheduled anymore
after a device has been removed.

Also, source code inspection of __scsi_remove_device() revealed
a race condition in this function: no new SCSI requests must be
accepted for a SCSI device after device removal started.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joe Lawrence <jdl1291@gmail.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: <stable@kernel.org>
---
 drivers/scsi/scsi_lib.c   |    7 ++++---
 drivers/scsi/scsi_sysfs.c |   11 +++++++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 082c1e5..fc2b9f4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -155,13 +155,14 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 
 	/*
 	 * Requeue this command.  It will go before all other commands
-	 * that are already in the queue.
+	 * that are already in the queue. Schedule requeue work under
+	 * lock such that the kblockd_schedule_work() call happens
+	 * before blk_cleanup_queue() finishes.
 	 */
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_requeue_request(q, cmd->request);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	kblockd_schedule_work(q, &device->requeue_work);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 /*
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 42c35ff..efffc92 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -966,13 +966,20 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		device_del(dev);
 	} else
 		put_device(&sdev->sdev_dev);
+
+	/*
+	 * Stop accepting new requests and wait until all queuecommand() and
+	 * scsi_run_queue() invocations have finished before tearing down the
+	 * device.
+	 */
 	scsi_device_set_state(sdev, SDEV_DEL);
+	blk_cleanup_queue(sdev->request_queue);
+	cancel_work_sync(&sdev->requeue_work);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
 
-	/* Freeing the queue signals to block that we're done */
-	blk_cleanup_queue(sdev->request_queue);
 	put_device(dev);
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
  2012-06-25 18:17 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
@ 2012-06-25 20:42   ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2012-06-25 20:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Jun'ichi Nomura, Stefan Richter, Joe Lawrence

On Mon, Jun 25, 2012 at 06:17:18PM +0000, Bart Van Assche wrote:
> Avoid that the code for requeueing SCSI requests triggers a
> crash by making sure that that code isn't scheduled anymore
> after a device has been removed.
> 
> Also, source code inspection of __scsi_remove_device() revealed
> a race condition in this function: no new SCSI requests must be
> accepted for a SCSI device after device removal started.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@parallels.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Joe Lawrence <jdl1291@gmail.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: <stable@kernel.org>
> ---
>  drivers/scsi/scsi_lib.c   |    7 ++++---
>  drivers/scsi/scsi_sysfs.c |   11 +++++++++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 082c1e5..fc2b9f4 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -155,13 +155,14 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>  
>  	/*
>  	 * Requeue this command.  It will go before all other commands
> -	 * that are already in the queue.
> +	 * that are already in the queue. Schedule requeue work under
> +	 * lock such that the kblockd_schedule_work() call happens
> +	 * before blk_cleanup_queue() finishes.
>  	 */
>  	spin_lock_irqsave(q->queue_lock, flags);
>  	blk_requeue_request(q, cmd->request);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
> -
>  	kblockd_schedule_work(q, &device->requeue_work);
> +	spin_unlock_irqrestore(q->queue_lock, flags);

This is rather subtle but yeah it would achieve proper
synchronization.  Can't think of a better way ATM.

Aside from the nits, for all patches in this series

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-06-26 15:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
2012-06-25 18:14 ` [PATCH 1/4] block: Fix blk_execute_rq_nowait() dead queue handling Bart Van Assche
2012-06-25 18:41   ` Tejun Heo
2012-06-25 19:18     ` Muthu Kumar
2012-06-25 18:15 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
2012-06-25 20:36   ` Tejun Heo
2012-06-25 21:14   ` James Bottomley
2012-06-25 21:21     ` Tejun Heo
2012-06-25 21:35       ` James Bottomley
2012-06-26  7:02         ` Bart Van Assche
2012-06-25 22:05     ` Mike Christie
2012-06-26  7:19       ` James Bottomley
2012-06-26  7:26         ` Bart Van Assche
2012-06-26  6:46     ` Bart Van Assche
2012-06-26  7:25       ` James Bottomley
2012-06-26 10:00         ` Bart Van Assche
2012-06-26  9:13       ` Mike Christie
2012-06-26 10:03         ` Bart Van Assche
2012-06-26 15:03         ` Hannes Reinecke
2012-06-25 18:16 ` [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void Bart Van Assche
2012-06-25 18:17 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
2012-06-25 20:42   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).