* [PATCH 0/4 v7] Fixes for SCSI device removal
@ 2012-06-05 17:08 Bart Van Assche
2012-06-05 17:10 ` [PATCH 1/4] block: Fix race on request_queue.end_io invocations Bart Van Assche
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-05 17:08 UTC (permalink / raw)
To: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter
This is version seven of the SCSI device removal patch series. This
version of this patch series has been tested in the same way as the
previous series: by triggering a large number of removals of a SCSI
device controlled by the ib_srp LLD and also by checking that dm devices
controlled by multipathd still work fine.
Changes compared to v6:
- Added a fix for a race in the block layer.
- Moved a BUG_ON(!sdev) statement up since it's a precondition check.
- Changed return type of scsi_queue_insert() from int to void.
- Added a cancel_work_sync(&sdev->requeue_work) call in
__scsi_remove_device().
Changes compared to v5:
- Removed the function scsi_free_queue() and inlined that function
in its callers.
- Added two additional patches.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/4] block: Fix race on request_queue.end_io invocations
2012-06-05 17:08 [PATCH 0/4 v7] Fixes for SCSI device removal Bart Van Assche
@ 2012-06-05 17:10 ` Bart Van Assche
2012-06-05 21:32 ` Tejun Heo
2012-06-06 12:45 ` Jens Axboe
2012-06-05 17:11 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
` (2 subsequent siblings)
3 siblings, 2 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-05 17:10 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter, Jens Axboe, Tejun Heo
Some request_queue.end_io implementations can be called safely
without the queue lock held while several other implementations
assume that the queue lock is held. So let's play it safe and
make sure that the queue lock is held around end_io invocations.
Found this through source code review.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
---
block/blk-exec.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/block/blk-exec.c b/block/blk-exec.c
index fb2cbd5..6724fab 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -54,10 +54,10 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
spin_lock_irq(q->queue_lock);
if (unlikely(blk_queue_dead(q))) {
- spin_unlock_irq(q->queue_lock);
rq->errors = -ENXIO;
if (rq->end_io)
rq->end_io(rq, rq->errors);
+ spin_unlock_irq(q->queue_lock);
return;
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/4] scsi: Fix device removal NULL pointer dereference
2012-06-05 17:08 [PATCH 0/4 v7] Fixes for SCSI device removal Bart Van Assche
2012-06-05 17:10 ` [PATCH 1/4] block: Fix race on request_queue.end_io invocations Bart Van Assche
@ 2012-06-05 17:11 ` Bart Van Assche
2012-06-05 17:12 ` [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void Bart Van Assche
2012-06-05 17:14 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
3 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-05 17:11 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter
Since scsi_prep_fn() may be invoked concurrently with
__scsi_remove_device(), keep the queuedata pointer in
__scsi_remove_device(). This patch fixes a kernel oops that
can be triggered by USB device removal. See also
http://www.spinics.net/lists/linux-scsi/msg56254.html.
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
---
drivers/scsi/hosts.c | 8 +++++++-
drivers/scsi/scsi_lib.c | 35 ++++++++---------------------------
drivers/scsi/scsi_priv.h | 1 -
drivers/scsi/scsi_sysfs.c | 5 +----
4 files changed, 16 insertions(+), 33 deletions(-)
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index a3a056a..6b9d89a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -299,9 +299,15 @@ static void scsi_host_dev_release(struct device *dev)
destroy_workqueue(shost->work_q);
q = shost->uspace_req_q;
if (q) {
+ /*
+ * Note: freeing queuedata before invoking blk_cleanup_queue()
+ * is safe here because no request function is associated with
+ * uspace_req_q. See also the __scsi_alloc_queue() call in
+ * drivers/scsi/scsi_tgt_lib.c.
+ */
kfree(q->queuedata);
q->queuedata = NULL;
- scsi_free_queue(q);
+ blk_cleanup_queue(q);
}
scsi_destroy_command_freelist(shost);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6dfb978..c26ef49 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -406,10 +406,7 @@ static void scsi_run_queue(struct request_queue *q)
LIST_HEAD(starved_list);
unsigned long flags;
- /* if the device is dead, sdev will be NULL, so no queue to run */
- if (!sdev)
- return;
-
+ BUG_ON(!sdev);
shost = sdev->host;
if (scsi_target(sdev)->single_lun)
scsi_single_lun_run(sdev);
@@ -1370,16 +1367,18 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
* may be changed after request stacking drivers call the function,
* regardless of taking lock or not.
*
- * When scsi can't dispatch I/Os anymore and needs to kill I/Os
- * (e.g. !sdev), scsi needs to return 'not busy'.
- * Otherwise, request stacking drivers may hold requests forever.
+ * When scsi can't dispatch I/Os anymore and needs to kill I/Os scsi
+ * needs to return 'not busy'. Otherwise, request stacking drivers
+ * may hold requests forever.
*/
static int scsi_lld_busy(struct request_queue *q)
{
struct scsi_device *sdev = q->queuedata;
struct Scsi_Host *shost;
- if (!sdev)
+ BUG_ON(!sdev);
+
+ if (blk_queue_dead(q))
return 0;
shost = sdev->host;
@@ -1490,11 +1489,7 @@ static void scsi_request_fn(struct request_queue *q)
struct scsi_cmnd *cmd;
struct request *req;
- if (!sdev) {
- while ((req = blk_peek_request(q)) != NULL)
- scsi_kill_request(req, q);
- return;
- }
+ BUG_ON(!sdev);
if(!get_device(&sdev->sdev_gendev))
/* We must be tearing the block queue down already */
@@ -1697,20 +1692,6 @@ struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
return q;
}
-void scsi_free_queue(struct request_queue *q)
-{
- unsigned long flags;
-
- WARN_ON(q->queuedata);
-
- /* cause scsi_request_fn() to kill all non-finished requests */
- spin_lock_irqsave(q->queue_lock, flags);
- q->request_fn(q);
- spin_unlock_irqrestore(q->queue_lock, flags);
-
- blk_cleanup_queue(q);
-}
-
/*
* Function: scsi_block_requests()
*
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 07ce3f5..2b8d8b5 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -84,7 +84,6 @@ extern void scsi_next_command(struct scsi_cmnd *cmd);
extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
extern void scsi_run_host_queues(struct Scsi_Host *shost);
extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev);
-extern void scsi_free_queue(struct request_queue *q);
extern int scsi_init_queue(void);
extern void scsi_exit_queue(void);
struct request_queue;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 04c2a27..42c35ff 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -971,11 +971,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
- /* cause the request function to reject all I/O requests */
- sdev->request_queue->queuedata = NULL;
-
/* Freeing the queue signals to block that we're done */
- scsi_free_queue(sdev->request_queue);
+ blk_cleanup_queue(sdev->request_queue);
put_device(dev);
}
--
1.7.7
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void
2012-06-05 17:08 [PATCH 0/4 v7] Fixes for SCSI device removal Bart Van Assche
2012-06-05 17:10 ` [PATCH 1/4] block: Fix race on request_queue.end_io invocations Bart Van Assche
2012-06-05 17:11 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
@ 2012-06-05 17:12 ` Bart Van Assche
2012-06-05 17:14 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
3 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-05 17:12 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter
The return value of scsi_queue_insert() is ignored by all its
callers, hence change the return type of this function into
void.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@kernel.org>
---
drivers/scsi/scsi_lib.c | 8 +++-----
drivers/scsi/scsi_priv.h | 2 +-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c26ef49..082c1e5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -109,7 +109,7 @@ static void scsi_unprep_request(struct request *req)
* for a requeue after completion, which should only occur in this
* file.
*/
-static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
{
struct Scsi_Host *host = cmd->device->host;
struct scsi_device *device = cmd->device;
@@ -162,8 +162,6 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
spin_unlock_irqrestore(q->queue_lock, flags);
kblockd_schedule_work(q, &device->requeue_work);
-
- return 0;
}
/*
@@ -185,9 +183,9 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
* Notes: This could be called either from an interrupt context or a
* normal process context.
*/
-int scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
+void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
{
- return __scsi_queue_insert(cmd, reason, 1);
+ __scsi_queue_insert(cmd, reason, 1);
}
/**
* scsi_execute - insert request and wait for the result
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 2b8d8b5..cacb0e7 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -79,7 +79,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd);
/* scsi_lib.c */
extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
extern void scsi_device_unbusy(struct scsi_device *sdev);
-extern int scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
+extern void scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
extern void scsi_next_command(struct scsi_cmnd *cmd);
extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
extern void scsi_run_host_queues(struct Scsi_Host *shost);
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-05 17:08 [PATCH 0/4 v7] Fixes for SCSI device removal Bart Van Assche
` (2 preceding siblings ...)
2012-06-05 17:12 ` [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void Bart Van Assche
@ 2012-06-05 17:14 ` Bart Van Assche
2012-06-05 21:36 ` Mike Christie
2012-06-05 22:08 ` Mike Christie
3 siblings, 2 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-05 17:14 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter, Jens Axboe, Joe Lawrence
Avoid that the code for requeueing SCSI requests triggers a
crash by making sure that that code isn't scheduled anymore
after a device has been removed.
Also, source code inspection of __scsi_remove_device() revealed
a race condition in this function: no new SCSI requests must be
accepted for a SCSI device after device removal started.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joe Lawrence <jdl1291@gmail.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: <stable@kernel.org>
---
drivers/scsi/scsi_lib.c | 7 ++++---
drivers/scsi/scsi_sysfs.c | 11 +++++++++--
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 082c1e5..b722a8b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
* that are already in the queue.
*/
spin_lock_irqsave(q->queue_lock, flags);
- blk_requeue_request(q, cmd->request);
+ if (!blk_queue_dead(q)) {
+ blk_requeue_request(q, cmd->request);
+ kblockd_schedule_work(q, &device->requeue_work);
+ }
spin_unlock_irqrestore(q->queue_lock, flags);
-
- kblockd_schedule_work(q, &device->requeue_work);
}
/*
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 42c35ff..efffc92 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -966,13 +966,20 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_del(dev);
} else
put_device(&sdev->sdev_dev);
+
+ /*
+ * Stop accepting new requests and wait until all queuecommand() and
+ * scsi_run_queue() invocations have finished before tearing down the
+ * device.
+ */
scsi_device_set_state(sdev, SDEV_DEL);
+ blk_cleanup_queue(sdev->request_queue);
+ cancel_work_sync(&sdev->requeue_work);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
- /* Freeing the queue signals to block that we're done */
- blk_cleanup_queue(sdev->request_queue);
put_device(dev);
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/4] block: Fix race on request_queue.end_io invocations
2012-06-05 17:10 ` [PATCH 1/4] block: Fix race on request_queue.end_io invocations Bart Van Assche
@ 2012-06-05 21:32 ` Tejun Heo
2012-06-06 12:44 ` Bart Van Assche
2012-06-06 12:45 ` Jens Axboe
1 sibling, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2012-06-05 21:32 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter, Jens Axboe
On Tue, Jun 05, 2012 at 05:10:15PM +0000, Bart Van Assche wrote:
> Some request_queue.end_io implementations can be called safely
> without the queue lock held while several other implementations
> assume that the queue lock is held. So let's play it safe and
> make sure that the queue lock is held around end_io invocations.
> Found this through source code review.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: <stable@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Not sure about stable@ tho. This doesn't fix any visible issue at the
moment and any change carries some risk.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-05 17:14 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
@ 2012-06-05 21:36 ` Mike Christie
2012-06-06 12:17 ` Bart Van Assche
2012-06-05 22:08 ` Mike Christie
1 sibling, 1 reply; 26+ messages in thread
From: Mike Christie @ 2012-06-05 21:36 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/05/2012 12:14 PM, Bart Van Assche wrote:
> Avoid that the code for requeueing SCSI requests triggers a
> crash by making sure that that code isn't scheduled anymore
> after a device has been removed.
>
> Also, source code inspection of __scsi_remove_device() revealed
> a race condition in this function: no new SCSI requests must be
> accepted for a SCSI device after device removal started.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@parallels.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Joe Lawrence <jdl1291@gmail.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: <stable@kernel.org>
> ---
> drivers/scsi/scsi_lib.c | 7 ++++---
> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 082c1e5..b722a8b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
> * that are already in the queue.
> */
> spin_lock_irqsave(q->queue_lock, flags);
> - blk_requeue_request(q, cmd->request);
> + if (!blk_queue_dead(q)) {
> + blk_requeue_request(q, cmd->request);
> + kblockd_schedule_work(q, &device->requeue_work);
> + }
If we do not requeue what eventually frees the request?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-05 17:14 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
2012-06-05 21:36 ` Mike Christie
@ 2012-06-05 22:08 ` Mike Christie
2012-06-06 12:25 ` Bart Van Assche
1 sibling, 1 reply; 26+ messages in thread
From: Mike Christie @ 2012-06-05 22:08 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/05/2012 12:14 PM, Bart Van Assche wrote:
> Avoid that the code for requeueing SCSI requests triggers a
> crash by making sure that that code isn't scheduled anymore
> after a device has been removed.
>
> Also, source code inspection of __scsi_remove_device() revealed
> a race condition in this function: no new SCSI requests must be
> accepted for a SCSI device after device removal started.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@parallels.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Joe Lawrence <jdl1291@gmail.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: <stable@kernel.org>
> ---
> drivers/scsi/scsi_lib.c | 7 ++++---
> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 082c1e5..b722a8b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
> * that are already in the queue.
> */
> spin_lock_irqsave(q->queue_lock, flags);
> - blk_requeue_request(q, cmd->request);
> + if (!blk_queue_dead(q)) {
> + blk_requeue_request(q, cmd->request);
> + kblockd_schedule_work(q, &device->requeue_work);
> + }
> spin_unlock_irqrestore(q->queue_lock, flags);
> -
> - kblockd_schedule_work(q, &device->requeue_work);
If we do not have the part of the patch above, but have your other
patches and the code below, will we be ok?
I think we will requeue the request, then blk_drain_queue will end up
running the queue (blk_drain_queue will not return while req is requeued
because the rq count is still incremented). Then scsi_request_fn will be
run. blk_peek_request will give us the requeued request. We will hit the
scsi_device_online check with the state being SDEV_DEL, and then call we
call scsi_kill_request. This should then lead to the scsi_cmnd and its
other scsi stuff, like the scatterlist, and sense bufffer, to be
released. And then the request struct will be finished and freed and
then the rq count decremented?
> }
>
> /*
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 42c35ff..efffc92 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -966,13 +966,20 @@ void __scsi_remove_device(struct scsi_device *sdev)
> device_del(dev);
> } else
> put_device(&sdev->sdev_dev);
> +
> + /*
> + * Stop accepting new requests and wait until all queuecommand() and
> + * scsi_run_queue() invocations have finished before tearing down the
> + * device.
> + */
> scsi_device_set_state(sdev, SDEV_DEL);
> + blk_cleanup_queue(sdev->request_queue);
> + cancel_work_sync(&sdev->requeue_work);
I agree we do still need this part of the patch with the cancel, because
the workstruct could still be queued but something could run the queue
without dequeueing the workstruct. That would free the request and that
could lead to blk_cleanup_queue running and us freeing the scsi_device
from under the workstruct.
> +
> if (sdev->host->hostt->slave_destroy)
> sdev->host->hostt->slave_destroy(sdev);
> transport_destroy_device(dev);
>
> - /* Freeing the queue signals to block that we're done */
> - blk_cleanup_queue(sdev->request_queue);
> put_device(dev);
> }
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-05 21:36 ` Mike Christie
@ 2012-06-06 12:17 ` Bart Van Assche
2012-06-06 13:29 ` Mike Christie
0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 12:17 UTC (permalink / raw)
To: Mike Christie
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/05/12 21:36, Mike Christie wrote:
> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>> Avoid that the code for requeueing SCSI requests triggers a
>> crash by making sure that that code isn't scheduled anymore
>> after a device has been removed.
>>
>> Also, source code inspection of __scsi_remove_device() revealed
>> a race condition in this function: no new SCSI requests must be
>> accepted for a SCSI device after device removal started.
>>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>> Cc: James Bottomley <JBottomley@parallels.com>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Joe Lawrence <jdl1291@gmail.com>
>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>> Cc: <stable@kernel.org>
>> ---
>> drivers/scsi/scsi_lib.c | 7 ++++---
>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 082c1e5..b722a8b 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>> * that are already in the queue.
>> */
>> spin_lock_irqsave(q->queue_lock, flags);
>> - blk_requeue_request(q, cmd->request);
>> + if (!blk_queue_dead(q)) {
>> + blk_requeue_request(q, cmd->request);
>> + kblockd_schedule_work(q, &device->requeue_work);
>> + }
>
> If we do not requeue what eventually frees the request?
As far as I can see any request passed to __scsi_queue_insert() has
already been started. So if it isn't requeued it's timer remains active
and hence will fire eventually.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-05 22:08 ` Mike Christie
@ 2012-06-06 12:25 ` Bart Van Assche
2012-06-06 13:43 ` Mike Christie
0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 12:25 UTC (permalink / raw)
To: Mike Christie
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/05/12 22:08, Mike Christie wrote:
> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>> Avoid that the code for requeueing SCSI requests triggers a
>> crash by making sure that that code isn't scheduled anymore
>> after a device has been removed.
>>
>> Also, source code inspection of __scsi_remove_device() revealed
>> a race condition in this function: no new SCSI requests must be
>> accepted for a SCSI device after device removal started.
>>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>> Cc: James Bottomley <JBottomley@parallels.com>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Joe Lawrence <jdl1291@gmail.com>
>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>> Cc: <stable@kernel.org>
>> ---
>> drivers/scsi/scsi_lib.c | 7 ++++---
>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 082c1e5..b722a8b 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>> * that are already in the queue.
>> */
>> spin_lock_irqsave(q->queue_lock, flags);
>> - blk_requeue_request(q, cmd->request);
>> + if (!blk_queue_dead(q)) {
>> + blk_requeue_request(q, cmd->request);
>> + kblockd_schedule_work(q, &device->requeue_work);
>> + }
>> spin_unlock_irqrestore(q->queue_lock, flags);
>> -
>> - kblockd_schedule_work(q, &device->requeue_work);
>
> If we do not have the part of the patch above, but have your other
> patches and the code below, will we be ok?
I'm not sure. Without the above part the request could get killed after
the blk_requeue_request() call finished but before the requeue_work is
scheduled, e.g. because the request timer fired or due to a
blk_abort_queue() call.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/4] block: Fix race on request_queue.end_io invocations
2012-06-05 21:32 ` Tejun Heo
@ 2012-06-06 12:44 ` Bart Van Assche
0 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 12:44 UTC (permalink / raw)
To: Tejun Heo
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter, Jens Axboe
On 06/05/12 21:32, Tejun Heo wrote:
> On Tue, Jun 05, 2012 at 05:10:15PM +0000, Bart Van Assche wrote:
>> Some request_queue.end_io implementations can be called safely
>> without the queue lock held while several other implementations
>> assume that the queue lock is held. So let's play it safe and
>> make sure that the queue lock is held around end_io invocations.
>> Found this through source code review.
>>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: <stable@kernel.org>
>
> Acked-by: Tejun Heo <tj@kernel.org>
>
> Not sure about stable@ tho. This doesn't fix any visible issue at the
> moment and any change carries some risk.
I don't have a strong opinion about cc-ing stable - leaving out that tag
is fine for me.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/4] block: Fix race on request_queue.end_io invocations
2012-06-05 17:10 ` [PATCH 1/4] block: Fix race on request_queue.end_io invocations Bart Van Assche
2012-06-05 21:32 ` Tejun Heo
@ 2012-06-06 12:45 ` Jens Axboe
2012-06-06 13:10 ` Bart Van Assche
1 sibling, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2012-06-06 12:45 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter, Tejun Heo
On 06/05/2012 07:10 PM, Bart Van Assche wrote:
> Some request_queue.end_io implementations can be called safely
> without the queue lock held while several other implementations
> assume that the queue lock is held. So let's play it safe and
> make sure that the queue lock is held around end_io invocations.
> Found this through source code review.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: <stable@kernel.org>
> ---
> block/blk-exec.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index fb2cbd5..6724fab 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -54,10 +54,10 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
> spin_lock_irq(q->queue_lock);
>
> if (unlikely(blk_queue_dead(q))) {
> - spin_unlock_irq(q->queue_lock);
> rq->errors = -ENXIO;
> if (rq->end_io)
> rq->end_io(rq, rq->errors);
> + spin_unlock_irq(q->queue_lock);
> return;
> }
I'm assuming you checked any in-kernel users of rq->end_io to ensure
that it is fine? If so, patch looks fine to me. And I agree, it's not
stable material.
--
Jens Axboe
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/4] block: Fix race on request_queue.end_io invocations
2012-06-06 12:45 ` Jens Axboe
@ 2012-06-06 13:10 ` Bart Van Assche
0 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 13:10 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-scsi, James Bottomley, Mike Christie, Jun'ichi Nomura,
Stefan Richter, Tejun Heo
On 06/06/12 12:45, Jens Axboe wrote:
> On 06/05/2012 07:10 PM, Bart Van Assche wrote:
>> Some request_queue.end_io implementations can be called safely
>> without the queue lock held while several other implementations
>> assume that the queue lock is held. So let's play it safe and
>> make sure that the queue lock is held around end_io invocations.
>> Found this through source code review.
>>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: <stable@kernel.org>
>> ---
>> block/blk-exec.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/block/blk-exec.c b/block/blk-exec.c
>> index fb2cbd5..6724fab 100644
>> --- a/block/blk-exec.c
>> +++ b/block/blk-exec.c
>> @@ -54,10 +54,10 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
>> spin_lock_irq(q->queue_lock);
>>
>> if (unlikely(blk_queue_dead(q))) {
>> - spin_unlock_irq(q->queue_lock);
>> rq->errors = -ENXIO;
>> if (rq->end_io)
>> rq->end_io(rq, rq->errors);
>> + spin_unlock_irq(q->queue_lock);
>> return;
>> }
>
> I'm assuming you checked any in-kernel users of rq->end_io to ensure
> that it is fine? If so, patch looks fine to me. And I agree, it's not
> stable material.
The in-tree request.end_io implementations can be found as follows:
$ git grep -nHE 'end_io\(struct request .*[^;]$'
block/blk-flush.c:194:static void flush_end_io(struct request *flush_rq, int error)
block/blk-flush.c:275:static void flush_data_end_io(struct request *rq, int error)
block/bsg.c:336:static void bsg_rq_end_io(struct request *rq, int uptodate)
drivers/scsi/sg.c:1279:static void sg_rq_end_io(struct request *rq, int uptodate)
To me it looks like flush_end_io() and flush_data_end_io() need to be
called with the queue lock held since these access the queue state. For
bsg_rq_end_io() and sg_rq_end_io() this patch will trigger nested
locking. As far as I can see that should be fine though.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 12:17 ` Bart Van Assche
@ 2012-06-06 13:29 ` Mike Christie
2012-06-06 14:53 ` Bart Van Assche
0 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2012-06-06 13:29 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/2012 07:17 AM, Bart Van Assche wrote:
> On 06/05/12 21:36, Mike Christie wrote:
>
>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>> Avoid that the code for requeueing SCSI requests triggers a
>>> crash by making sure that that code isn't scheduled anymore
>>> after a device has been removed.
>>>
>>> Also, source code inspection of __scsi_remove_device() revealed
>>> a race condition in this function: no new SCSI requests must be
>>> accepted for a SCSI device after device removal started.
>>>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>> Cc: James Bottomley <JBottomley@parallels.com>
>>> Cc: Jens Axboe <axboe@kernel.dk>
>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>> Cc: <stable@kernel.org>
>>> ---
>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 082c1e5..b722a8b 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>> * that are already in the queue.
>>> */
>>> spin_lock_irqsave(q->queue_lock, flags);
>>> - blk_requeue_request(q, cmd->request);
>>> + if (!blk_queue_dead(q)) {
>>> + blk_requeue_request(q, cmd->request);
>>> + kblockd_schedule_work(q, &device->requeue_work);
>>> + }
>>
>> If we do not requeue what eventually frees the request?
>
> As far as I can see any request passed to __scsi_queue_insert() has
> already been started. So if it isn't requeued it's timer remains active
> and hence will fire eventually.
This is true in the scsi_dispatch_cmd path, but not others.
If we were requeueing from the error handler scsi_eh_flush_done_q then
the timer would have been stopped because that is how we got into the eh.
If we are coming from the completion path then the timer will not be
running basically. For example if we requeue from scsi_softirq_done or
scsi_io_completion then we will have called blk_complete_request in
scsi_done already, so the blk_mark_rq_complete call in there would
prevent the timer code from running on the request.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 12:25 ` Bart Van Assche
@ 2012-06-06 13:43 ` Mike Christie
2012-06-06 14:01 ` Mike Christie
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Mike Christie @ 2012-06-06 13:43 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/2012 07:25 AM, Bart Van Assche wrote:
> On 06/05/12 22:08, Mike Christie wrote:
>
>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>> Avoid that the code for requeueing SCSI requests triggers a
>>> crash by making sure that that code isn't scheduled anymore
>>> after a device has been removed.
>>>
>>> Also, source code inspection of __scsi_remove_device() revealed
>>> a race condition in this function: no new SCSI requests must be
>>> accepted for a SCSI device after device removal started.
>>>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>> Cc: James Bottomley <JBottomley@parallels.com>
>>> Cc: Jens Axboe <axboe@kernel.dk>
>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>> Cc: <stable@kernel.org>
>>> ---
>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 082c1e5..b722a8b 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>> * that are already in the queue.
>>> */
>>> spin_lock_irqsave(q->queue_lock, flags);
>>> - blk_requeue_request(q, cmd->request);
>>> + if (!blk_queue_dead(q)) {
>>> + blk_requeue_request(q, cmd->request);
>>> + kblockd_schedule_work(q, &device->requeue_work);
>>> + }
>>> spin_unlock_irqrestore(q->queue_lock, flags);
>>> -
>>> - kblockd_schedule_work(q, &device->requeue_work);
>>
>> If we do not have the part of the patch above, but have your other
>> patches and the code below, will we be ok?
>
>
> I'm not sure. Without the above part the request could get killed after
> the blk_requeue_request() call finished but before the requeue_work is
> scheduled, e.g. because the request timer fired or due to a
> blk_abort_queue() call.
>
You are right.
What if we moved the requeue work struct to the request queue, then have
blk_cleanup_queue or blk_drain_queue call cancel_work_sync before the
queue is freed. That way that code could make sure the queue and work is
flushed and drained, and it can make sure it is flushed and drained
before freeing the queue?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 13:43 ` Mike Christie
@ 2012-06-06 14:01 ` Mike Christie
2012-06-06 14:12 ` Mike Christie
2012-06-06 15:07 ` Bart Van Assche
2 siblings, 0 replies; 26+ messages in thread
From: Mike Christie @ 2012-06-06 14:01 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/2012 08:43 AM, Mike Christie wrote:
> On 06/06/2012 07:25 AM, Bart Van Assche wrote:
>> On 06/05/12 22:08, Mike Christie wrote:
>>
>>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>>> Avoid that the code for requeueing SCSI requests triggers a
>>>> crash by making sure that that code isn't scheduled anymore
>>>> after a device has been removed.
>>>>
>>>> Also, source code inspection of __scsi_remove_device() revealed
>>>> a race condition in this function: no new SCSI requests must be
>>>> accepted for a SCSI device after device removal started.
>>>>
>>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>>> Cc: James Bottomley <JBottomley@parallels.com>
>>>> Cc: Jens Axboe <axboe@kernel.dk>
>>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>>> Cc: <stable@kernel.org>
>>>> ---
>>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index 082c1e5..b722a8b 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>>> * that are already in the queue.
>>>> */
>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>> - blk_requeue_request(q, cmd->request);
>>>> + if (!blk_queue_dead(q)) {
>>>> + blk_requeue_request(q, cmd->request);
>>>> + kblockd_schedule_work(q, &device->requeue_work);
>>>> + }
>>>> spin_unlock_irqrestore(q->queue_lock, flags);
>>>> -
>>>> - kblockd_schedule_work(q, &device->requeue_work);
>>>
>>> If we do not have the part of the patch above, but have your other
>>> patches and the code below, will we be ok?
>>
>>
>> I'm not sure. Without the above part the request could get killed after
>> the blk_requeue_request() call finished but before the requeue_work is
>> scheduled, e.g. because the request timer fired or due to a
>> blk_abort_queue() call.
>>
>
> You are right.
>
> What if we moved the requeue work struct to the request queue, then have
> blk_cleanup_queue or blk_drain_queue call cancel_work_sync before the
> queue is freed. That way that code could make sure the queue and work is
> flushed and drained, and it can make sure it is flushed and drained
> before freeing the queue?
Oh yeah, one clarification. With the above proposal we would do the part
of your patch where we do the requeue and schedule under the queue lock.
We would just do not do the blk_queue_dead check. We would just always
requeue like before.
I think no matter what blk_abort_queue is going to be a problem. I wish
we would have removed the code so btrfs was not using it now. As you
know from that other thread we removed it from dm-multipath use, because
bad things can happen if things like blk_abort_queue runs at the same
time blk_requeue_request was from the queueing path. Same goes for the
race where a cmd times out in scsi_dispatch_cmd (so if you did something
evil like set a timeout of 0 seconds you hit a similar problem as with
blk_abort_queue use).
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 13:43 ` Mike Christie
2012-06-06 14:01 ` Mike Christie
@ 2012-06-06 14:12 ` Mike Christie
2012-06-06 15:04 ` Bart Van Assche
2012-06-06 15:07 ` Bart Van Assche
2 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2012-06-06 14:12 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/2012 08:43 AM, Mike Christie wrote:
> On 06/06/2012 07:25 AM, Bart Van Assche wrote:
>> On 06/05/12 22:08, Mike Christie wrote:
>>
>>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>>> Avoid that the code for requeueing SCSI requests triggers a
>>>> crash by making sure that that code isn't scheduled anymore
>>>> after a device has been removed.
>>>>
>>>> Also, source code inspection of __scsi_remove_device() revealed
>>>> a race condition in this function: no new SCSI requests must be
>>>> accepted for a SCSI device after device removal started.
>>>>
>>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>>> Cc: James Bottomley <JBottomley@parallels.com>
>>>> Cc: Jens Axboe <axboe@kernel.dk>
>>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>>> Cc: <stable@kernel.org>
>>>> ---
>>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index 082c1e5..b722a8b 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>>> * that are already in the queue.
>>>> */
>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>> - blk_requeue_request(q, cmd->request);
>>>> + if (!blk_queue_dead(q)) {
>>>> + blk_requeue_request(q, cmd->request);
>>>> + kblockd_schedule_work(q, &device->requeue_work);
>>>> + }
>>>> spin_unlock_irqrestore(q->queue_lock, flags);
>>>> -
>>>> - kblockd_schedule_work(q, &device->requeue_work);
>>>
>>> If we do not have the part of the patch above, but have your other
>>> patches and the code below, will we be ok?
>>
>>
>> I'm not sure. Without the above part the request could get killed after
>> the blk_requeue_request() call finished but before the requeue_work is
>> scheduled, e.g. because the request timer fired or due to a
>> blk_abort_queue() call.
>>
>
> You are right.
>
> What if we moved the requeue work struct to the request queue, then have
> blk_cleanup_queue or blk_drain_queue call cancel_work_sync before the
> queue is freed. That way that code could make sure the queue and work is
> flushed and drained, and it can make sure it is flushed and drained
> before freeing the queue?
Or, in scsi_requeue_run_queue could we just add a check for the
scsi_device being in the SDEV_DEL state. That combined with your cancel
call in __scsi_remove_device would prevent us from running a cleaned up
queue, right?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 13:29 ` Mike Christie
@ 2012-06-06 14:53 ` Bart Van Assche
2012-06-06 15:21 ` Mike Christie
0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 14:53 UTC (permalink / raw)
To: Mike Christie
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/12 13:29, Mike Christie wrote:
> On 06/06/2012 07:17 AM, Bart Van Assche wrote:
>> On 06/05/12 21:36, Mike Christie wrote:
>>
>>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>>> Avoid that the code for requeueing SCSI requests triggers a
>>>> crash by making sure that that code isn't scheduled anymore
>>>> after a device has been removed.
>>>>
>>>> Also, source code inspection of __scsi_remove_device() revealed
>>>> a race condition in this function: no new SCSI requests must be
>>>> accepted for a SCSI device after device removal started.
>>>>
>>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>>> Cc: James Bottomley <JBottomley@parallels.com>
>>>> Cc: Jens Axboe <axboe@kernel.dk>
>>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>>> Cc: <stable@kernel.org>
>>>> ---
>>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index 082c1e5..b722a8b 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>>> * that are already in the queue.
>>>> */
>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>> - blk_requeue_request(q, cmd->request);
>>>> + if (!blk_queue_dead(q)) {
>>>> + blk_requeue_request(q, cmd->request);
>>>> + kblockd_schedule_work(q, &device->requeue_work);
>>>> + }
>>>
>>> If we do not requeue what eventually frees the request?
>>
>> As far as I can see any request passed to __scsi_queue_insert() has
>> already been started. So if it isn't requeued it's timer remains active
>> and hence will fire eventually.
>
> This is true in the scsi_dispatch_cmd path, but not others.
>
> If we were requeueing from the error handler scsi_eh_flush_done_q then
> the timer would have been stopped because that is how we got into the eh.
Given the above I propose to add a __blk_end_request_all(req, -ENXIO)
call if the queue is dead.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 14:12 ` Mike Christie
@ 2012-06-06 15:04 ` Bart Van Assche
2012-06-06 15:28 ` Mike Christie
0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 15:04 UTC (permalink / raw)
To: Mike Christie
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/12 14:12, Mike Christie wrote:
> On 06/06/2012 08:43 AM, Mike Christie wrote:
>> On 06/06/2012 07:25 AM, Bart Van Assche wrote:
>>> On 06/05/12 22:08, Mike Christie wrote:
>>>
>>>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>>>> Avoid that the code for requeueing SCSI requests triggers a
>>>>> crash by making sure that that code isn't scheduled anymore
>>>>> after a device has been removed.
>>>>>
>>>>> Also, source code inspection of __scsi_remove_device() revealed
>>>>> a race condition in this function: no new SCSI requests must be
>>>>> accepted for a SCSI device after device removal started.
>>>>>
>>>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>>>> Cc: James Bottomley <JBottomley@parallels.com>
>>>>> Cc: Jens Axboe <axboe@kernel.dk>
>>>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>>>> Cc: <stable@kernel.org>
>>>>> ---
>>>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index 082c1e5..b722a8b 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>>>> * that are already in the queue.
>>>>> */
>>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>>> - blk_requeue_request(q, cmd->request);
>>>>> + if (!blk_queue_dead(q)) {
>>>>> + blk_requeue_request(q, cmd->request);
>>>>> + kblockd_schedule_work(q, &device->requeue_work);
>>>>> + }
>>>>> spin_unlock_irqrestore(q->queue_lock, flags);
>>>>> -
>>>>> - kblockd_schedule_work(q, &device->requeue_work);
>>>>
>>>> If we do not have the part of the patch above, but have your other
>>>> patches and the code below, will we be ok?
>>>
>>>
>>> I'm not sure. Without the above part the request could get killed after
>>> the blk_requeue_request() call finished but before the requeue_work is
>>> scheduled, e.g. because the request timer fired or due to a
>>> blk_abort_queue() call.
>>>
>>
>> You are right.
>>
>> What if we moved the requeue work struct to the request queue, then have
>> blk_cleanup_queue or blk_drain_queue call cancel_work_sync before the
>> queue is freed. That way that code could make sure the queue and work is
>> flushed and drained, and it can make sure it is flushed and drained
>> before freeing the queue?
>
> Or, in scsi_requeue_run_queue could we just add a check for the
> scsi_device being in the SDEV_DEL state. That combined with your cancel
> call in __scsi_remove_device would prevent us from running a cleaned up
> queue, right?
I'm not sure. If a requeued request times out before blk_cleanup_queue()
is invoked then it's possible that the requeue_work is started after the
struct scsi_device has already been deleted.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 13:43 ` Mike Christie
2012-06-06 14:01 ` Mike Christie
2012-06-06 14:12 ` Mike Christie
@ 2012-06-06 15:07 ` Bart Van Assche
2 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 15:07 UTC (permalink / raw)
To: Mike Christie
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/12 13:43, Mike Christie wrote:
> What if we moved the requeue work struct to the request queue, then have
> blk_cleanup_queue or blk_drain_queue call cancel_work_sync before the
> queue is freed. That way that code could make sure the queue and work is
> flushed and drained, and it can make sure it is flushed and drained
> before freeing the queue?
Sounds like a good idea to me. However, personally I'd prefer with this
patch set to focus on fixing bugs and to leave such restructuring for a
subsequent patch.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 14:53 ` Bart Van Assche
@ 2012-06-06 15:21 ` Mike Christie
0 siblings, 0 replies; 26+ messages in thread
From: Mike Christie @ 2012-06-06 15:21 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/2012 09:53 AM, Bart Van Assche wrote:
> On 06/06/12 13:29, Mike Christie wrote:
>
>> On 06/06/2012 07:17 AM, Bart Van Assche wrote:
>>> On 06/05/12 21:36, Mike Christie wrote:
>>>
>>>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>>>> Avoid that the code for requeueing SCSI requests triggers a
>>>>> crash by making sure that that code isn't scheduled anymore
>>>>> after a device has been removed.
>>>>>
>>>>> Also, source code inspection of __scsi_remove_device() revealed
>>>>> a race condition in this function: no new SCSI requests must be
>>>>> accepted for a SCSI device after device removal started.
>>>>>
>>>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>>>> Cc: James Bottomley <JBottomley@parallels.com>
>>>>> Cc: Jens Axboe <axboe@kernel.dk>
>>>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>>>> Cc: <stable@kernel.org>
>>>>> ---
>>>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index 082c1e5..b722a8b 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>>>> * that are already in the queue.
>>>>> */
>>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>>> - blk_requeue_request(q, cmd->request);
>>>>> + if (!blk_queue_dead(q)) {
>>>>> + blk_requeue_request(q, cmd->request);
>>>>> + kblockd_schedule_work(q, &device->requeue_work);
>>>>> + }
>>>>
>>>> If we do not requeue what eventually frees the request?
>>>
>>> As far as I can see any request passed to __scsi_queue_insert() has
>>> already been started. So if it isn't requeued it's timer remains active
>>> and hence will fire eventually.
>>
>> This is true in the scsi_dispatch_cmd path, but not others.
>>
>> If we were requeueing from the error handler scsi_eh_flush_done_q then
>> the timer would have been stopped because that is how we got into the eh.
>
>
> Given the above I propose to add a __blk_end_request_all(req, -ENXIO)
> call if the queue is dead.
>
Seems ok to me. Maybe a little messy. Don't forget to free the scsi
stuff like the scatterlist, scsi_cmnd, etc. If it is REQ_TYPE_BLOCK_PC
set req->errors.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 15:04 ` Bart Van Assche
@ 2012-06-06 15:28 ` Mike Christie
2012-06-06 16:18 ` Bart Van Assche
0 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2012-06-06 15:28 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/2012 10:04 AM, Bart Van Assche wrote:
> On 06/06/12 14:12, Mike Christie wrote:
>
>> On 06/06/2012 08:43 AM, Mike Christie wrote:
>>> On 06/06/2012 07:25 AM, Bart Van Assche wrote:
>>>> On 06/05/12 22:08, Mike Christie wrote:
>>>>
>>>>> On 06/05/2012 12:14 PM, Bart Van Assche wrote:
>>>>>> Avoid that the code for requeueing SCSI requests triggers a
>>>>>> crash by making sure that that code isn't scheduled anymore
>>>>>> after a device has been removed.
>>>>>>
>>>>>> Also, source code inspection of __scsi_remove_device() revealed
>>>>>> a race condition in this function: no new SCSI requests must be
>>>>>> accepted for a SCSI device after device removal started.
>>>>>>
>>>>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>>>>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>>>>>> Cc: James Bottomley <JBottomley@parallels.com>
>>>>>> Cc: Jens Axboe <axboe@kernel.dk>
>>>>>> Cc: Joe Lawrence <jdl1291@gmail.com>
>>>>>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>>>>>> Cc: <stable@kernel.org>
>>>>>> ---
>>>>>> drivers/scsi/scsi_lib.c | 7 ++++---
>>>>>> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
>>>>>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>>> index 082c1e5..b722a8b 100644
>>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>>> @@ -158,10 +158,11 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>>>>>> * that are already in the queue.
>>>>>> */
>>>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>>>> - blk_requeue_request(q, cmd->request);
>>>>>> + if (!blk_queue_dead(q)) {
>>>>>> + blk_requeue_request(q, cmd->request);
>>>>>> + kblockd_schedule_work(q, &device->requeue_work);
>>>>>> + }
>>>>>> spin_unlock_irqrestore(q->queue_lock, flags);
>>>>>> -
>>>>>> - kblockd_schedule_work(q, &device->requeue_work);
>>>>>
>>>>> If we do not have the part of the patch above, but have your other
>>>>> patches and the code below, will we be ok?
>>>>
>>>>
>>>> I'm not sure. Without the above part the request could get killed after
>>>> the blk_requeue_request() call finished but before the requeue_work is
>>>> scheduled, e.g. because the request timer fired or due to a
>>>> blk_abort_queue() call.
>>>>
>>>
>>> You are right.
>>>
>>> What if we moved the requeue work struct to the request queue, then have
>>> blk_cleanup_queue or blk_drain_queue call cancel_work_sync before the
>>> queue is freed. That way that code could make sure the queue and work is
>>> flushed and drained, and it can make sure it is flushed and drained
>>> before freeing the queue?
>>
>> Or, in scsi_requeue_run_queue could we just add a check for the
>> scsi_device being in the SDEV_DEL state. That combined with your cancel
>> call in __scsi_remove_device would prevent us from running a cleaned up
>> queue, right?
>
>
> I'm not sure. If a requeued request times out before blk_cleanup_queue()
> is invoked then it's possible that the requeue_work is started after the
> struct scsi_device has already been deleted.
>
Won't the cancel_work_sync call you are adding prevent that? After
blk_cleanup_queue has returned we know that no IO is running or in the
eh, and we know no new IO will be started. And then, after the
cancel_work_sync call you are adding has returned we know that there
will not be any workstruct queued or running.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-06 15:28 ` Mike Christie
@ 2012-06-06 16:18 ` Bart Van Assche
0 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-06 16:18 UTC (permalink / raw)
To: Mike Christie
Cc: linux-scsi, James Bottomley, Jun'ichi Nomura, Stefan Richter,
Jens Axboe, Joe Lawrence
On 06/06/12 15:28, Mike Christie wrote:
> On 06/06/2012 10:04 AM, Bart Van Assche wrote:
>> I'm not sure. If a requeued request times out before blk_cleanup_queue()
>> is invoked then it's possible that the requeue_work is started after the
>> struct scsi_device has already been deleted.
>
> Won't the cancel_work_sync call you are adding prevent that? After
> blk_cleanup_queue has returned we know that no IO is running or in the
> eh, and we know no new IO will be started. And then, after the
> cancel_work_sync call you are adding has returned we know that there
> will not be any workstruct queued or running.
We've probably each made different assumptions about whether or not the
requeue work scheduling happens under the queue lock, so we're probably
both right. I'll prepare, test and post a new patch set, that will make
the discussion easier.
Bart.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-07 18:39 [PATCH 0/4 v8] Fixes for SCSI device removal Bart Van Assche
@ 2012-06-07 18:46 ` Bart Van Assche
0 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2012-06-07 18:46 UTC (permalink / raw)
Cc: linux-scsi, James Bottomley, Joe Lawrence, Jun'ichi Nomura,
Mike Christie, Jens Axboe
Avoid that the code for requeueing SCSI requests triggers a
crash by making sure that that code isn't scheduled anymore
after a device has been removed.
Also, source code inspection of __scsi_remove_device() revealed
a race condition in this function: no new SCSI requests must be
accepted for a SCSI device after device removal started.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joe Lawrence <jdl1291@gmail.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: <stable@kernel.org>
---
drivers/scsi/scsi_lib.c | 7 ++++---
drivers/scsi/scsi_sysfs.c | 11 +++++++++--
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 082c1e5..fc2b9f4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -155,13 +155,14 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
/*
* Requeue this command. It will go before all other commands
- * that are already in the queue.
+ * that are already in the queue. Schedule requeue work under
+ * lock such that the kblockd_schedule_work() call happens
+ * before blk_cleanup_queue() finishes.
*/
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, cmd->request);
- spin_unlock_irqrestore(q->queue_lock, flags);
-
kblockd_schedule_work(q, &device->requeue_work);
+ spin_unlock_irqrestore(q->queue_lock, flags);
}
/*
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 42c35ff..efffc92 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -966,13 +966,20 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_del(dev);
} else
put_device(&sdev->sdev_dev);
+
+ /*
+ * Stop accepting new requests and wait until all queuecommand() and
+ * scsi_run_queue() invocations have finished before tearing down the
+ * device.
+ */
scsi_device_set_state(sdev, SDEV_DEL);
+ blk_cleanup_queue(sdev->request_queue);
+ cancel_work_sync(&sdev->requeue_work);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
- /* Freeing the queue signals to block that we're done */
- blk_cleanup_queue(sdev->request_queue);
put_device(dev);
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
@ 2012-06-25 18:17 ` Bart Van Assche
2012-06-25 20:42 ` Tejun Heo
0 siblings, 1 reply; 26+ messages in thread
From: Bart Van Assche @ 2012-06-25 18:17 UTC (permalink / raw)
To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Tejun Heo,
Jun'ichi Nomura, Stefan Richter, Joe Lawrence
Avoid that the code for requeueing SCSI requests triggers a
crash by making sure that that code isn't scheduled anymore
after a device has been removed.
Also, source code inspection of __scsi_remove_device() revealed
a race condition in this function: no new SCSI requests must be
accepted for a SCSI device after device removal started.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joe Lawrence <jdl1291@gmail.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: <stable@kernel.org>
---
drivers/scsi/scsi_lib.c | 7 ++++---
drivers/scsi/scsi_sysfs.c | 11 +++++++++--
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 082c1e5..fc2b9f4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -155,13 +155,14 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
/*
* Requeue this command. It will go before all other commands
- * that are already in the queue.
+ * that are already in the queue. Schedule requeue work under
+ * lock such that the kblockd_schedule_work() call happens
+ * before blk_cleanup_queue() finishes.
*/
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, cmd->request);
- spin_unlock_irqrestore(q->queue_lock, flags);
-
kblockd_schedule_work(q, &device->requeue_work);
+ spin_unlock_irqrestore(q->queue_lock, flags);
}
/*
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 42c35ff..efffc92 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -966,13 +966,20 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_del(dev);
} else
put_device(&sdev->sdev_dev);
+
+ /*
+ * Stop accepting new requests and wait until all queuecommand() and
+ * scsi_run_queue() invocations have finished before tearing down the
+ * device.
+ */
scsi_device_set_state(sdev, SDEV_DEL);
+ blk_cleanup_queue(sdev->request_queue);
+ cancel_work_sync(&sdev->requeue_work);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
- /* Freeing the queue signals to block that we're done */
- blk_cleanup_queue(sdev->request_queue);
put_device(dev);
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device
2012-06-25 18:17 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
@ 2012-06-25 20:42 ` Tejun Heo
0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2012-06-25 20:42 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
Jun'ichi Nomura, Stefan Richter, Joe Lawrence
On Mon, Jun 25, 2012 at 06:17:18PM +0000, Bart Van Assche wrote:
> Avoid that the code for requeueing SCSI requests triggers a
> crash by making sure that that code isn't scheduled anymore
> after a device has been removed.
>
> Also, source code inspection of __scsi_remove_device() revealed
> a race condition in this function: no new SCSI requests must be
> accepted for a SCSI device after device removal started.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@parallels.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Joe Lawrence <jdl1291@gmail.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: <stable@kernel.org>
> ---
> drivers/scsi/scsi_lib.c | 7 ++++---
> drivers/scsi/scsi_sysfs.c | 11 +++++++++--
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 082c1e5..fc2b9f4 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -155,13 +155,14 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
>
> /*
> * Requeue this command. It will go before all other commands
> - * that are already in the queue.
> + * that are already in the queue. Schedule requeue work under
> + * lock such that the kblockd_schedule_work() call happens
> + * before blk_cleanup_queue() finishes.
> */
> spin_lock_irqsave(q->queue_lock, flags);
> blk_requeue_request(q, cmd->request);
> - spin_unlock_irqrestore(q->queue_lock, flags);
> -
> kblockd_schedule_work(q, &device->requeue_work);
> + spin_unlock_irqrestore(q->queue_lock, flags);
This is rather subtle but yeah it would achieve proper
synchronization. Can't think of a better way ATM.
Aside from the nits, for all patches in this series
Acked-by: Tejun Heo <tj@kernel.org>
Thanks!
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2012-06-25 20:42 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-05 17:08 [PATCH 0/4 v7] Fixes for SCSI device removal Bart Van Assche
2012-06-05 17:10 ` [PATCH 1/4] block: Fix race on request_queue.end_io invocations Bart Van Assche
2012-06-05 21:32 ` Tejun Heo
2012-06-06 12:44 ` Bart Van Assche
2012-06-06 12:45 ` Jens Axboe
2012-06-06 13:10 ` Bart Van Assche
2012-06-05 17:11 ` [PATCH 2/4] scsi: Fix device removal NULL pointer dereference Bart Van Assche
2012-06-05 17:12 ` [PATCH 3/4] scsi: Change return type of scsi_queue_insert() into void Bart Van Assche
2012-06-05 17:14 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
2012-06-05 21:36 ` Mike Christie
2012-06-06 12:17 ` Bart Van Assche
2012-06-06 13:29 ` Mike Christie
2012-06-06 14:53 ` Bart Van Assche
2012-06-06 15:21 ` Mike Christie
2012-06-05 22:08 ` Mike Christie
2012-06-06 12:25 ` Bart Van Assche
2012-06-06 13:43 ` Mike Christie
2012-06-06 14:01 ` Mike Christie
2012-06-06 14:12 ` Mike Christie
2012-06-06 15:04 ` Bart Van Assche
2012-06-06 15:28 ` Mike Christie
2012-06-06 16:18 ` Bart Van Assche
2012-06-06 15:07 ` Bart Van Assche
-- strict thread matches above, loose matches on Subject: below --
2012-06-07 18:39 [PATCH 0/4 v8] Fixes for SCSI device removal Bart Van Assche
2012-06-07 18:46 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
2012-06-25 18:12 [PATCH 0/4 v9] SCSI device removal fixes Bart Van Assche
2012-06-25 18:17 ` [PATCH 4/4] scsi: Stop accepting SCSI requests before removing a device Bart Van Assche
2012-06-25 20:42 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).