* [PATCH 1/4] Fix race between starved list and device removal
2013-07-02 13:04 [PATCH v13 0/4] SCSI device removal fixes Bart Van Assche
@ 2013-07-02 13:05 ` Bart Van Assche
2013-07-02 13:06 ` [PATCH 2/4] Avoid calling __scsi_remove_device() twice Bart Van Assche
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-07-02 13:05 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
linux-scsi, David Milburn, Tejun Heo
From: James Bottomley <JBottomley@Parallels.com>
scsi_run_queue() examines all SCSI devices that are present on
the starved list. Since scsi_run_queue() unlocks the SCSI host
lock a SCSI device can get removed after it has been removed
from the starved list and before its queue is run. Protect
against that race condition by holding a reference on the
queue while running it.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Chanho Min <chanho.min@lge.com>
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
drivers/scsi/scsi_lib.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 86d5220..df8bd5a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -434,6 +434,8 @@ static void scsi_run_queue(struct request_queue *q)
list_splice_init(&shost->starved_list, &starved_list);
while (!list_empty(&starved_list)) {
+ struct request_queue *slq;
+
/*
* As long as shost is accepting commands and we have
* starved queues, call blk_run_queue. scsi_request_fn
@@ -456,11 +458,25 @@ static void scsi_run_queue(struct request_queue *q)
continue;
}
- spin_unlock(shost->host_lock);
- spin_lock(sdev->request_queue->queue_lock);
- __blk_run_queue(sdev->request_queue);
- spin_unlock(sdev->request_queue->queue_lock);
- spin_lock(shost->host_lock);
+ /*
+ * Once we drop the host lock, a racing scsi_remove_device()
+ * call may remove the sdev from the starved list and destroy
+ * it and the queue. Mitigate by taking a reference to the
+ * queue and never touching the sdev again after we drop the
+ * host lock. Note: if __scsi_remove_device() invokes
+ * blk_cleanup_queue() before the queue is run from this
+ * function then blk_run_queue() will return immediately since
+ * blk_cleanup_queue() marks the queue with QUEUE_FLAG_DYING.
+ */
+ slq = sdev->request_queue;
+ if (!blk_get_queue(slq))
+ continue;
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+ blk_run_queue(slq);
+ blk_put_queue(slq);
+
+ spin_lock_irqsave(shost->host_lock, flags);
}
/* put any unprocessed entries back */
list_splice(&starved_list, &shost->starved_list);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 2/4] Avoid calling __scsi_remove_device() twice
2013-07-02 13:04 [PATCH v13 0/4] SCSI device removal fixes Bart Van Assche
2013-07-02 13:05 ` [PATCH 1/4] Fix race between starved list and device removal Bart Van Assche
@ 2013-07-02 13:06 ` Bart Van Assche
2013-07-02 13:07 ` [PATCH 3/4] Avoid re-enabling I/O after the transport became offline Bart Van Assche
2013-07-02 13:08 ` [PATCH 4/4] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
3 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-07-02 13:06 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
linux-scsi, David Milburn, Tejun Heo
If something goes wrong during LUN scanning, e.g. a transport layer
failure occurs, then __scsi_remove_device() can get invoked by the
LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK and
before the SCSI device has been added to sysfs (is_visible == 0).
Make sure that even in this case the transition into state SDEV_DEL
occurs. This avoids that __scsi_remove_device() can get invoked a
second time by scsi_forget_host() if this last function is invoked
from another thread than the thread that performs LUN scanning.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
drivers/scsi/scsi_lib.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index df8bd5a..124392f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2193,6 +2193,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
case SDEV_OFFLINE:
case SDEV_TRANSPORT_OFFLINE:
case SDEV_CANCEL:
+ case SDEV_CREATED_BLOCK:
break;
default:
goto illegal;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/4] Avoid re-enabling I/O after the transport became offline
2013-07-02 13:04 [PATCH v13 0/4] SCSI device removal fixes Bart Van Assche
2013-07-02 13:05 ` [PATCH 1/4] Fix race between starved list and device removal Bart Van Assche
2013-07-02 13:06 ` [PATCH 2/4] Avoid calling __scsi_remove_device() twice Bart Van Assche
@ 2013-07-02 13:07 ` Bart Van Assche
2013-07-02 13:44 ` James Bottomley
2013-07-02 13:08 ` [PATCH 4/4] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
3 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2013-07-02 13:07 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
linux-scsi, David Milburn, Tejun Heo
Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
that no I/O is sent to devices for which the transport is offline.
Notes:
- Functions like sd_shutdown() use scsi_execute_req() and hence
set the REQ_PREEMPT flag. Such requests are passed to the LLD
queuecommand callback in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
drivers invoke fc_remote_port_chkready() before submitting a SCSI
request to the HBA. That prevents a timeout to occur in state
SDEV_CANCEL if the transport is offline.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
drivers/scsi/scsi_lib.c | 1 -
drivers/scsi/scsi_sysfs.c | 9 ++++++++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 124392f..a0fb56b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2178,7 +2178,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
case SDEV_RUNNING:
case SDEV_QUIESCE:
case SDEV_OFFLINE:
- case SDEV_TRANSPORT_OFFLINE:
case SDEV_BLOCK:
break;
default:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 931a7d9..1711617 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -955,7 +955,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
struct device *dev = &sdev->sdev_gendev;
if (sdev->is_visible) {
- if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+ /*
+ * The transition from SDEV_TRANSPORT_OFFLINE into
+ * SDEV_CANCEL is not allowed since this transition would
+ * reenable I/O. However, if the device state was already
+ * SDEV_TRANSPORT_OFFLINE, proceed with device removal.
+ */
+ if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0 &&
+ sdev->sdev_state != SDEV_TRANSPORT_OFFLINE)
return;
bsg_unregister_queue(sdev->request_queue);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 3/4] Avoid re-enabling I/O after the transport became offline
2013-07-02 13:07 ` [PATCH 3/4] Avoid re-enabling I/O after the transport became offline Bart Van Assche
@ 2013-07-02 13:44 ` James Bottomley
0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2013-07-02 13:44 UTC (permalink / raw)
To: Bart Van Assche
Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
linux-scsi, David Milburn, Tejun Heo
On Tue, 2013-07-02 at 15:07 +0200, Bart Van Assche wrote:
> Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
> that no I/O is sent to devices for which the transport is offline.
> Notes:
> - Functions like sd_shutdown() use scsi_execute_req() and hence
> set the REQ_PREEMPT flag. Such requests are passed to the LLD
> queuecommand callback in the SDEV_CANCEL state.
> - This patch does not affect Fibre Channel LLD drivers since these
> drivers invoke fc_remote_port_chkready() before submitting a SCSI
> request to the HBA. That prevents a timeout to occur in state
> SDEV_CANCEL if the transport is offline.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Tejun Heo <tj@kernel.org>
> ---
> drivers/scsi/scsi_lib.c | 1 -
> drivers/scsi/scsi_sysfs.c | 9 ++++++++-
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 124392f..a0fb56b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2178,7 +2178,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
> case SDEV_RUNNING:
> case SDEV_QUIESCE:
> case SDEV_OFFLINE:
> - case SDEV_TRANSPORT_OFFLINE:
> case SDEV_BLOCK:
> break;
> default:
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 931a7d9..1711617 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -955,7 +955,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
> struct device *dev = &sdev->sdev_gendev;
>
> if (sdev->is_visible) {
> - if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> + /*
> + * The transition from SDEV_TRANSPORT_OFFLINE into
> + * SDEV_CANCEL is not allowed since this transition would
> + * reenable I/O. However, if the device state was already
> + * SDEV_TRANSPORT_OFFLINE, proceed with device removal.
> + */
> + if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0 &&
> + sdev->sdev_state != SDEV_TRANSPORT_OFFLINE)
This isn't the right way to do this, because it's adding uncharted state
to the state model. What should happen is that this should be reflected
in the actual state model. It sounds like we need a CANCEL_OFFLINE
state to which TRANSPORT_OFFLINE (and possibly OFFLINE) can transition.
The comment on the transition should state that CANCEL_OFFLINE won't
allow any I/O.
James
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 4/4] Disallow changing the device state via sysfs into "deleted"
2013-07-02 13:04 [PATCH v13 0/4] SCSI device removal fixes Bart Van Assche
` (2 preceding siblings ...)
2013-07-02 13:07 ` [PATCH 3/4] Avoid re-enabling I/O after the transport became offline Bart Van Assche
@ 2013-07-02 13:08 ` Bart Van Assche
3 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-07-02 13:08 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
linux-scsi, David Milburn, Tejun Heo
Changing the state of a SCSI device via sysfs into "cancel" or
"deleted" prevents removal of these devices by scsi_remove_host().
Hence do not allow this.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: David Milburn <dmilburn@redhat.com>
---
drivers/scsi/scsi_sysfs.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 1711617..292df85 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -605,10 +605,8 @@ store_state_field(struct device *dev, struct device_attribute *attr,
break;
}
}
- if (!state)
- return -EINVAL;
-
- if (scsi_device_set_state(sdev, state))
+ if (state == 0 || state == SDEV_CANCEL || state == SDEV_DEL ||
+ scsi_device_set_state(sdev, state))
return -EINVAL;
return count;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread