[PATCH v12 0/6] SCSI device removal fixes

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v12 0/6] SCSI device removal fixes
@ 2013-06-27 14:51 Bart Van Assche
  2013-06-27 14:52 ` [PATCH v12 1/6] Fix race between starved list and device removal Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn

Fix a few issues related to SCSI device removal:
- Fix a race between starved list processing and device removal that
   can trigger a kernel oops.
- Avoid that __scsi_remove_device() is called twice for the same SCSI
   device, which also can cause a kernel oops.
- Restrict the SCSI device state changes allowed via sysfs.
- Avoid that invoking scsi_device_set_state() triggers a race.
- Avoid re-enabling I/O after the transport layer became offline.

Changes compared to v11:
- Left out a patch that was not a device removal bug fix.
- Left out the patches about which there is not yet an agreement.

Changes compared to v10:
- Rebased and retested on top of Linux kernel v3.10-rc5.

Changes compared to v9:
- Changed one WARN_ON() statement into a WARN() statement.

Changes compared to v8:
- Addressed the feedback from Joe Lawrence - dropped the patch that
   makes scsi_remove_host() wait until the last sdev user is gone.
- Eliminated Scsi_Host.tmf_in_progress since it duplicates state
   information available in Scsi_Host.eh_active.
- Added a patch to avoid reenabling I/O after the transport layer
   became offline.

Changes compared to v7:
- Addressed the review comments posted by Hannes Reinecke and Rolf Eike
   Beer.
- Modified patch "Make scsi_remove_host() wait until error handling
   finished" such that it is also safe for SCSI timeout values below
   the maximum LLD response time by modifying scsi_send_eh_cmnd() such
   that it does not invoke any LLD code after scsi_remove_host() started.
- Added a patch to save and restore the host_scribble field.
- Refined / clarified several patch descriptions.
- Rebased and retested on top of kernel v3.8-rc6.

Changes compared to v6:
- Dropped the first six patches since Jens queued these for 3.8.
- Added patch to avoid that __scsi_remove_device() is invoked twice.
- Restore error recovery in the SHOST_CANCEL state.

Changes compared to v5:
- Avoid that block layer work can be scheduled on a dead queue.
- Do not invoke any SCSI LLD callback after scsi_remove_host() finished.
- Stop error handling as soon as scsi_remove_host() started.
- Remove the unused function bsg_goose_queue().
- Avoid that scsi_device_set_state() triggers a race condition.

Changes compared to v4:
- Moved queue_flag_set(QUEUE_FLAG_DEAD, q) from blk_drain_queue() into
   blk_cleanup_queue().
- Declared the new __blk_run_queue_uncond() function inline. Checked in
   the generated assembler code that this function is really inlined in
   __blk_run_queue().
- Elaborated several patch descriptions.
- Added sparse annotations to scsi_request_fn().
- Split several patches.

Changes compared to v3:
- Fixed a race condition by setting QUEUE_FLAG_DEAD earlier.
- Added a patch for fixing a race between starved list processing
   and device removal to this series.

Changes compared to v2:
- Split second patch into two patches.
- Refined patch descriptions.

Changes compared to v1:
- Included a patch to rename QUEUE_FLAG_DEAD.
- Refined the descriptions of the __blk_run_queue_uncond() and
   blk_cleanup_queue() functions.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v12 1/6] Fix race between starved list and device removal
  2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
@ 2013-06-27 14:52 ` Bart Van Assche
  2013-06-27 14:53 ` [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:52 UTC (permalink / raw)
  Cc: James Bottomley, Mike Christie, Hannes Reinecke, Chanho Min,
	Joe Lawrence, linux-scsi, David Milburn, Tejun Heo

From: James Bottomley <JBottomley@Parallels.com>

scsi_run_queue() examines all SCSI devices that are present on
the starved list. Since scsi_run_queue() unlocks the SCSI host
lock a SCSI device can get removed after it has been removed
from the starved list and before its queue is run. Protect
against that race condition by holding a reference on the
queue while running it.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Chanho Min <chanho.min@lge.com>
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c |   26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 86d5220..df8bd5a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -434,6 +434,8 @@ static void scsi_run_queue(struct request_queue *q)
 	list_splice_init(&shost->starved_list, &starved_list);
 
 	while (!list_empty(&starved_list)) {
+		struct request_queue *slq;
+
 		/*
 		 * As long as shost is accepting commands and we have
 		 * starved queues, call blk_run_queue. scsi_request_fn
@@ -456,11 +458,25 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
-		spin_unlock(shost->host_lock);
-		spin_lock(sdev->request_queue->queue_lock);
-		__blk_run_queue(sdev->request_queue);
-		spin_unlock(sdev->request_queue->queue_lock);
-		spin_lock(shost->host_lock);
+		/*
+		 * Once we drop the host lock, a racing scsi_remove_device()
+		 * call may remove the sdev from the starved list and destroy
+		 * it and the queue.  Mitigate by taking a reference to the
+		 * queue and never touching the sdev again after we drop the
+		 * host lock.  Note: if __scsi_remove_device() invokes
+		 * blk_cleanup_queue() before the queue is run from this
+		 * function then blk_run_queue() will return immediately since
+		 * blk_cleanup_queue() marks the queue with QUEUE_FLAG_DYING.
+		 */
+		slq = sdev->request_queue;
+		if (!blk_get_queue(slq))
+			continue;
+		spin_unlock_irqrestore(shost->host_lock, flags);
+
+		blk_run_queue(slq);
+		blk_put_queue(slq);
+
+		spin_lock_irqsave(shost->host_lock, flags);
 	}
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice
  2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
  2013-06-27 14:52 ` [PATCH v12 1/6] Fix race between starved list and device removal Bart Van Assche
@ 2013-06-27 14:53 ` Bart Van Assche
  2013-07-01  7:05   ` James Bottomley
  2013-06-27 14:54 ` [PATCH v12 3/6] Restrict device state changes allowed via sysfs Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:53 UTC (permalink / raw)
  Cc: James Bottomley, Mike Christie, Hannes Reinecke, Chanho Min,
	Joe Lawrence, linux-scsi, David Milburn, Tejun Heo

If something goes wrong during LUN scanning, e.g. a transport layer
failure occurs, then __scsi_remove_device() can get invoked by the
LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK. If
this happens then the SCSI device has not yet been added to sysfs
(is_visible == 0).  Make sure that in that case the transition into
state SDEV_DEL occurs. This avoids that __scsi_remove_device() gets
invoked a second time by scsi_forget_host().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/scsi_lib.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index df8bd5a..124392f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2193,6 +2193,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_OFFLINE:
 		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_CANCEL:
+		case SDEV_CREATED_BLOCK:
 			break;
 		default:
 			goto illegal;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v12 3/6] Restrict device state changes allowed via sysfs
  2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
  2013-06-27 14:52 ` [PATCH v12 1/6] Fix race between starved list and device removal Bart Van Assche
  2013-06-27 14:53 ` [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice Bart Van Assche
@ 2013-06-27 14:54 ` Bart Van Assche
  2013-07-01  8:23   ` Hannes Reinecke
  2013-07-01 14:51   ` James Bottomley
  2013-06-27 14:55 ` [PATCH v12 4/6] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Mike Christie, Hannes Reinecke, Chanho Min,
	Joe Lawrence, linux-scsi, David Milburn, Tejun Heo

Restrict the SCSI device state changes allowd via sysfs to the
OFFLINE<>RUNNING transitions. Other transitions may confuse
the SCSI mid-layer. As an example, changing the state of a SCSI
device via sysfs into "cancel" or "deleted" prevents removal of
a SCSI device by scsi_remove_host().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: David Milburn <dmilburn@redhat.com>
---
 drivers/scsi/scsi_sysfs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 931a7d9..013c6de 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -605,7 +605,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 			break;
 		}
 	}
-	if (!state)
+	if (state != SDEV_OFFLINE && state != SDEV_RUNNING)
 		return -EINVAL;
 
 	if (scsi_device_set_state(sdev, state))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v12 4/6] Avoid saving/restoring interrupt state inside scsi_remove_host()
  2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
                   ` (2 preceding siblings ...)
  2013-06-27 14:54 ` [PATCH v12 3/6] Restrict device state changes allowed via sysfs Bart Van Assche
@ 2013-06-27 14:55 ` Bart Van Assche
  2013-06-27 14:56 ` [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
  2013-06-27 14:57 ` [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline Bart Van Assche
  5 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:55 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

Since it is not allowed to invoke scsi_remove_host() with interrupts
disabled, avoid saving and restoring the interrupt state inside
scsi_remove_host(). This patch does not change the functionality of
the function scsi_remove_host().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@Parallels.com>
---
 drivers/scsi/hosts.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index df0c3c7..034a567 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -156,27 +156,25 @@ EXPORT_SYMBOL(scsi_host_set_state);
  **/
 void scsi_remove_host(struct Scsi_Host *shost)
 {
-	unsigned long flags;
-
 	mutex_lock(&shost->scan_mutex);
-	spin_lock_irqsave(shost->host_lock, flags);
+	spin_lock_irq(shost->host_lock);
 	if (scsi_host_set_state(shost, SHOST_CANCEL))
 		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
-			spin_unlock_irqrestore(shost->host_lock, flags);
+			spin_unlock_irq(shost->host_lock);
 			mutex_unlock(&shost->scan_mutex);
 			return;
 		}
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	spin_unlock_irq(shost->host_lock);
 
 	scsi_autopm_get_host(shost);
 	scsi_forget_host(shost);
 	mutex_unlock(&shost->scan_mutex);
 	scsi_proc_host_rm(shost);
 
-	spin_lock_irqsave(shost->host_lock, flags);
+	spin_lock_irq(shost->host_lock);
 	if (scsi_host_set_state(shost, SHOST_DEL))
 		BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	spin_unlock_irq(shost->host_lock);
 
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race
  2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
                   ` (3 preceding siblings ...)
  2013-06-27 14:55 ` [PATCH v12 4/6] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
@ 2013-06-27 14:56 ` Bart Van Assche
  2013-07-01 14:49   ` James Bottomley
  2013-06-27 14:57 ` [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline Bart Van Assche
  5 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Mike Christie, Hannes Reinecke, Chanho Min,
	Joe Lawrence, linux-scsi, David Milburn, Tejun Heo

Make concurrent invocations of scsi_device_set_state() safe.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Hannes Reinecke <hare@suse.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_error.c |    4 ++++
 drivers/scsi/scsi_lib.c   |   43 ++++++++++++++++++++++++++++++++++---------
 drivers/scsi/scsi_scan.c  |   15 ++++++++-------
 drivers/scsi/scsi_sysfs.c |   24 +++++++++++++++++++-----
 4 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f43de1e..7006359 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1380,7 +1380,11 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
 		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
 			    "not ready after error recovery\n");
+
+		spin_lock_irq(scmd->device->host->host_lock);
 		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
+		spin_unlock_irq(scmd->device->host->host_lock);
+
 		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
 			/*
 			 * FIXME: Handle lost cmds.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 124392f..6a4fde7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2096,7 +2096,9 @@ EXPORT_SYMBOL(scsi_test_unit_ready);
  *	@state:	state to change to.
  *
  *	Returns zero if unsuccessful or an error if the requested 
- *	transition is illegal.
+ *	transition is illegal. It is the responsibility of the caller to make
+ *      sure that a call of this function does not race with other code that
+ *      accesses the device state, e.g. by holding the host lock.
  */
 int
 scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
@@ -2374,7 +2376,13 @@ EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 int
 scsi_device_quiesce(struct scsi_device *sdev)
 {
-	int err = scsi_device_set_state(sdev, SDEV_QUIESCE);
+	struct Scsi_Host *host = sdev->host;
+	int err;
+
+	spin_lock_irq(host->host_lock);
+	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
+	spin_unlock_irq(host->host_lock);
+
 	if (err)
 		return err;
 
@@ -2398,13 +2406,21 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  */
 void scsi_device_resume(struct scsi_device *sdev)
 {
+	struct Scsi_Host *host = sdev->host;
+	int err;
+
 	/* check if the device state was mutated prior to resume, and if
 	 * so assume the state is being managed elsewhere (for example
 	 * device deleted during suspend)
 	 */
-	if (sdev->sdev_state != SDEV_QUIESCE ||
-	    scsi_device_set_state(sdev, SDEV_RUNNING))
+	spin_lock_irq(host->host_lock);
+	err = sdev->sdev_state == SDEV_QUIESCE ?
+		scsi_device_set_state(sdev, SDEV_RUNNING) : -EINVAL;
+	spin_unlock_irq(host->host_lock);
+
+	if (err)
 		return;
+
 	scsi_run_queue(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
@@ -2454,17 +2470,19 @@ EXPORT_SYMBOL(scsi_target_resume);
 int
 scsi_internal_device_block(struct scsi_device *sdev)
 {
+	struct Scsi_Host *host = sdev->host;
 	struct request_queue *q = sdev->request_queue;
 	unsigned long flags;
 	int err = 0;
 
+	spin_lock_irqsave(host->host_lock, flags);
 	err = scsi_device_set_state(sdev, SDEV_BLOCK);
-	if (err) {
+	if (err)
 		err = scsi_device_set_state(sdev, SDEV_CREATED_BLOCK);
+	spin_unlock_irqrestore(host->host_lock, flags);
 
-		if (err)
-			return err;
-	}
+	if (err)
+		return err;
 
 	/* 
 	 * The device has transitioned to SDEV_BLOCK.  Stop the
@@ -2499,13 +2517,16 @@ int
 scsi_internal_device_unblock(struct scsi_device *sdev,
 			     enum scsi_device_state new_state)
 {
+	struct Scsi_Host *host = sdev->host;
 	struct request_queue *q = sdev->request_queue; 
 	unsigned long flags;
+	int ret = 0;
 
 	/*
 	 * Try to transition the scsi device to SDEV_RUNNING or one of the
 	 * offlined states and goose the device queue if successful.
 	 */
+	spin_lock_irqsave(host->host_lock, flags);
 	if ((sdev->sdev_state == SDEV_BLOCK) ||
 	    (sdev->sdev_state == SDEV_TRANSPORT_OFFLINE))
 		sdev->sdev_state = new_state;
@@ -2517,7 +2538,11 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 			sdev->sdev_state = SDEV_CREATED;
 	} else if (sdev->sdev_state != SDEV_CANCEL &&
 		 sdev->sdev_state != SDEV_OFFLINE)
-		return -EINVAL;
+		ret = -EINVAL;
+	spin_unlock_irqrestore(host->host_lock, flags);
+
+	if (ret)
+		return ret;
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_start_queue(q);
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..5041aa8 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -898,18 +898,19 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	if (*bflags & BLIST_USE_10_BYTE_MS)
 		sdev->use_10_for_ms = 1;
 
+	spin_lock_irq(sdev->host->host_lock);
 	/* set the device running here so that slave configure
 	 * may do I/O */
 	ret = scsi_device_set_state(sdev, SDEV_RUNNING);
-	if (ret) {
+	if (ret)
 		ret = scsi_device_set_state(sdev, SDEV_BLOCK);
+	spin_unlock_irq(sdev->host->host_lock);
 
-		if (ret) {
-			sdev_printk(KERN_ERR, sdev,
-				    "in wrong state %s to complete scan\n",
-				    scsi_device_state_name(sdev->sdev_state));
-			return SCSI_SCAN_NO_RESPONSE;
-		}
+	if (ret) {
+		sdev_printk(KERN_ERR, sdev,
+			    "in wrong state %s to complete scan\n",
+			    scsi_device_state_name(sdev->sdev_state));
+		return SCSI_SCAN_NO_RESPONSE;
 	}
 
 	if (*bflags & BLIST_MS_192_BYTES_FOR_3F)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 013c6de..dfbaa34 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -593,7 +593,7 @@ static ssize_t
 store_state_field(struct device *dev, struct device_attribute *attr,
 		  const char *buf, size_t count)
 {
-	int i;
+	int i, ret;
 	struct scsi_device *sdev = to_scsi_device(dev);
 	enum scsi_device_state state = 0;
 
@@ -608,9 +608,11 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 	if (state != SDEV_OFFLINE && state != SDEV_RUNNING)
 		return -EINVAL;
 
-	if (scsi_device_set_state(sdev, state))
-		return -EINVAL;
-	return count;
+	spin_lock_irq(sdev->host->host_lock);
+	ret = scsi_device_set_state(sdev, state);
+	spin_unlock_irq(sdev->host->host_lock);
+
+	return ret < 0 ? ret : count;
 }
 
 static ssize_t
@@ -870,7 +872,10 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 	struct request_queue *rq = sdev->request_queue;
 	struct scsi_target *starget = sdev->sdev_target;
 
+	spin_lock_irq(sdev->host->host_lock);
 	error = scsi_device_set_state(sdev, SDEV_RUNNING);
+	spin_unlock_irq(sdev->host->host_lock);
+
 	if (error)
 		return error;
 
@@ -952,10 +957,16 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 
 void __scsi_remove_device(struct scsi_device *sdev)
 {
+	struct Scsi_Host *shost = sdev->host;
 	struct device *dev = &sdev->sdev_gendev;
+	int res;
 
 	if (sdev->is_visible) {
-		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+		spin_lock_irq(shost->host_lock);
+		res = scsi_device_set_state(sdev, SDEV_CANCEL);
+		spin_unlock_irq(shost->host_lock);
+
+		if (res != 0)
 			return;
 
 		bsg_unregister_queue(sdev->request_queue);
@@ -970,7 +981,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	 * scsi_run_queue() invocations have finished before tearing down the
 	 * device.
 	 */
+	spin_lock_irq(shost->host_lock);
 	scsi_device_set_state(sdev, SDEV_DEL);
+	spin_unlock_irq(shost->host_lock);
+
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline
  2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
                   ` (4 preceding siblings ...)
  2013-06-27 14:56 ` [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
@ 2013-06-27 14:57 ` Bart Van Assche
  2013-07-01  8:27   ` Hannes Reinecke
  5 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2013-06-27 14:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
that no I/O is sent to devices for which the transport is offline.
Notes:
- Functions like sd_shutdown() use scsi_execute_req() and hence
  set the REQ_PREEMPT flag. Such requests are passed to the LLD
  queuecommand callback in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
  drivers invoke fc_remote_port_chkready() before submitting a SCSI
  request to the HBA. That prevents a timeout to occur in state
  SDEV_CANCEL if the transport is offline.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/scsi_lib.c   |    1 -
 drivers/scsi/scsi_sysfs.c |    4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6a4fde7..63875c3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2180,7 +2180,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_RUNNING:
 		case SDEV_QUIESCE:
 		case SDEV_OFFLINE:
-		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_BLOCK:
 			break;
 		default:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index dfbaa34..666b741 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -959,14 +959,16 @@ void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct Scsi_Host *shost = sdev->host;
 	struct device *dev = &sdev->sdev_gendev;
+	enum scsi_device_state sdev_state;
 	int res;
 
 	if (sdev->is_visible) {
 		spin_lock_irq(shost->host_lock);
+		sdev_state = sdev->sdev_state;
 		res = scsi_device_set_state(sdev, SDEV_CANCEL);
 		spin_unlock_irq(shost->host_lock);
 
-		if (res != 0)
+		if (res != 0 && sdev_state != SDEV_TRANSPORT_OFFLINE)
 			return;
 
 		bsg_unregister_queue(sdev->request_queue);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice
  2013-06-27 14:53 ` [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice Bart Van Assche
@ 2013-07-01  7:05   ` James Bottomley
  2013-07-01  7:14     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2013-07-01  7:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo


On Thu, 2013-06-27 at 16:53 +0200, Bart Van Assche wrote:
> If something goes wrong during LUN scanning, e.g. a transport layer
> failure occurs, then __scsi_remove_device() can get invoked by the
> LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK. If
> this happens then the SCSI device has not yet been added to sysfs
> (is_visible == 0).  Make sure that in that case the transition into
> state SDEV_DEL occurs. This avoids that __scsi_remove_device() gets
> invoked a second time by scsi_forget_host().

The patch summary of this one isn't true.  How about "enable destruction
of blocked devices which fail LUN scanning"

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice
  2013-07-01  7:05   ` James Bottomley
@ 2013-07-01  7:14     ` Bart Van Assche
  2013-07-01 14:38       ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2013-07-01  7:14 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 07/01/13 09:05, James Bottomley wrote:
>
> On Thu, 2013-06-27 at 16:53 +0200, Bart Van Assche wrote:
>> If something goes wrong during LUN scanning, e.g. a transport layer
>> failure occurs, then __scsi_remove_device() can get invoked by the
>> LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK. If
>> this happens then the SCSI device has not yet been added to sysfs
>> (is_visible == 0).  Make sure that in that case the transition into
>> state SDEV_DEL occurs. This avoids that __scsi_remove_device() gets
>> invoked a second time by scsi_forget_host().
>
> The patch summary of this one isn't true.  How about "enable destruction
> of blocked devices which fail LUN scanning"

Hello James,

Do you want me to repost the patch series or is this something you can 
fix up ?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 3/6] Restrict device state changes allowed via sysfs
  2013-06-27 14:54 ` [PATCH v12 3/6] Restrict device state changes allowed via sysfs Bart Van Assche
@ 2013-07-01  8:23   ` Hannes Reinecke
  2013-07-01 14:51   ` James Bottomley
  1 sibling, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2013-07-01  8:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Mike Christie, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 06/27/2013 04:54 PM, Bart Van Assche wrote:
> Restrict the SCSI device state changes allowd via sysfs to the
> OFFLINE<>RUNNING transitions. Other transitions may confuse
> the SCSI mid-layer. As an example, changing the state of a SCSI
> device via sysfs into "cancel" or "deleted" prevents removal of
> a SCSI device by scsi_remove_host().
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: David Milburn <dmilburn@redhat.com>
> ---
>  drivers/scsi/scsi_sysfs.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 931a7d9..013c6de 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -605,7 +605,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>  			break;
>  		}
>  	}
> -	if (!state)
> +	if (state != SDEV_OFFLINE && state != SDEV_RUNNING)
>  		return -EINVAL;
>  
>  	if (scsi_device_set_state(sdev, state))
> 
Acked-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline
  2013-06-27 14:57 ` [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline Bart Van Assche
@ 2013-07-01  8:27   ` Hannes Reinecke
  2013-07-01 12:05     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2013-07-01  8:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Mike Christie, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 06/27/2013 04:57 PM, Bart Van Assche wrote:
> Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
> that no I/O is sent to devices for which the transport is offline.
> Notes:
> - Functions like sd_shutdown() use scsi_execute_req() and hence
>   set the REQ_PREEMPT flag. Such requests are passed to the LLD
>   queuecommand callback in the SDEV_CANCEL state.
> - This patch does not affect Fibre Channel LLD drivers since these
>   drivers invoke fc_remote_port_chkready() before submitting a SCSI
>   request to the HBA. That prevents a timeout to occur in state
>   SDEV_CANCEL if the transport is offline.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Tejun Heo <tj@kernel.org>
> ---
>  drivers/scsi/scsi_lib.c   |    1 -
>  drivers/scsi/scsi_sysfs.c |    4 +++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6a4fde7..63875c3 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2180,7 +2180,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>  		case SDEV_RUNNING:
>  		case SDEV_QUIESCE:
>  		case SDEV_OFFLINE:
> -		case SDEV_TRANSPORT_OFFLINE:
>  		case SDEV_BLOCK:
>  			break;
>  		default:
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index dfbaa34..666b741 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -959,14 +959,16 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  {
>  	struct Scsi_Host *shost = sdev->host;
>  	struct device *dev = &sdev->sdev_gendev;
> +	enum scsi_device_state sdev_state;
>  	int res;
>  
>  	if (sdev->is_visible) {
>  		spin_lock_irq(shost->host_lock);
> +		sdev_state = sdev->sdev_state;
>  		res = scsi_device_set_state(sdev, SDEV_CANCEL);
>  		spin_unlock_irq(shost->host_lock);
>  
> -		if (res != 0)
> +		if (res != 0 && sdev_state != SDEV_TRANSPORT_OFFLINE)
>  			return;
>  
>  		bsg_unregister_queue(sdev->request_queue);
> 
Hmm. This is really subtle. Do you mind adding inserting a comment
here on why this is required?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline
  2013-07-01  8:27   ` Hannes Reinecke
@ 2013-07-01 12:05     ` Bart Van Assche
  2013-07-01 12:09       ` Hannes Reinecke
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2013-07-01 12:05 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, Mike Christie, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 07/01/13 10:27, Hannes Reinecke wrote:
> On 06/27/2013 04:57 PM, Bart Van Assche wrote:
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index dfbaa34..666b741 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -959,14 +959,16 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>   {
>>   	struct Scsi_Host *shost = sdev->host;
>>   	struct device *dev = &sdev->sdev_gendev;
>> +	enum scsi_device_state sdev_state;
>>   	int res;
>>   
>>   	if (sdev->is_visible) {
>>   		spin_lock_irq(shost->host_lock);
>> +		sdev_state = sdev->sdev_state;
>>   		res = scsi_device_set_state(sdev, SDEV_CANCEL);
>>   		spin_unlock_irq(shost->host_lock);
>>   
>> -		if (res != 0)
>> +		if (res != 0 && sdev_state != SDEV_TRANSPORT_OFFLINE)
>>   			return;
>>   
>>   		bsg_unregister_queue(sdev->request_queue);
>>
> Hmm. This is really subtle. Do you mind adding inserting a comment
> here on why this is required?

How about inserting the following comment just above the last if-statement
in the code cited above ?

		/*
		 * The transition from SDEV_TRANSPORT_OFFLINE into SDEV_CANCEL
		 * is not allowed since this transition would re-enable I/O. If
		 * the device state was already SDEV_TRANSPORT_OFFLINE,
		 * proceed with device removal.
		 */

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline
  2013-07-01 12:05     ` Bart Van Assche
@ 2013-07-01 12:09       ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2013-07-01 12:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Mike Christie, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 07/01/2013 02:05 PM, Bart Van Assche wrote:
> On 07/01/13 10:27, Hannes Reinecke wrote:
>> On 06/27/2013 04:57 PM, Bart Van Assche wrote:
>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>> index dfbaa34..666b741 100644
>>> --- a/drivers/scsi/scsi_sysfs.c
>>> +++ b/drivers/scsi/scsi_sysfs.c
>>> @@ -959,14 +959,16 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>>   {
>>>   	struct Scsi_Host *shost = sdev->host;
>>>   	struct device *dev = &sdev->sdev_gendev;
>>> +	enum scsi_device_state sdev_state;
>>>   	int res;
>>>   
>>>   	if (sdev->is_visible) {
>>>   		spin_lock_irq(shost->host_lock);
>>> +		sdev_state = sdev->sdev_state;
>>>   		res = scsi_device_set_state(sdev, SDEV_CANCEL);
>>>   		spin_unlock_irq(shost->host_lock);
>>>   
>>> -		if (res != 0)
>>> +		if (res != 0 && sdev_state != SDEV_TRANSPORT_OFFLINE)
>>>   			return;
>>>   
>>>   		bsg_unregister_queue(sdev->request_queue);
>>>
>> Hmm. This is really subtle. Do you mind adding inserting a comment
>> here on why this is required?
> 
> How about inserting the following comment just above the last if-statement
> in the code cited above ?
> 
> 		/*
> 		 * The transition from SDEV_TRANSPORT_OFFLINE into SDEV_CANCEL
> 		 * is not allowed since this transition would re-enable I/O. If
> 		 * the device state was already SDEV_TRANSPORT_OFFLINE,
> 		 * proceed with device removal.
> 		 */
> 
> Bart.
> 
Perfect.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice
  2013-07-01  7:14     ` Bart Van Assche
@ 2013-07-01 14:38       ` James Bottomley
  0 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2013-07-01 14:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On Mon, 2013-07-01 at 09:14 +0200, Bart Van Assche wrote:
> On 07/01/13 09:05, James Bottomley wrote:
> >
> > On Thu, 2013-06-27 at 16:53 +0200, Bart Van Assche wrote:
> >> If something goes wrong during LUN scanning, e.g. a transport layer
> >> failure occurs, then __scsi_remove_device() can get invoked by the
> >> LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK. If
> >> this happens then the SCSI device has not yet been added to sysfs
> >> (is_visible == 0).  Make sure that in that case the transition into
> >> state SDEV_DEL occurs. This avoids that __scsi_remove_device() gets
> >> invoked a second time by scsi_forget_host().
> >
> > The patch summary of this one isn't true.  How about "enable destruction
> > of blocked devices which fail LUN scanning"
> 
> Hello James,
> 
> Do you want me to repost the patch series or is this something you can 
> fix up ?

I can fix it up, but if you repost, please change it.

Thanks,

James


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race
  2013-06-27 14:56 ` [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
@ 2013-07-01 14:49   ` James Bottomley
  2013-07-01 15:17     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2013-07-01 14:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On Thu, 2013-06-27 at 16:56 +0200, Bart Van Assche wrote:
> Make concurrent invocations of scsi_device_set_state() safe.

Firstly, I don't understand from this where you think the races are.
Secondly, shouldn't this be the device lock? and thirdly, if we accept
that locking is required, encapsulate it in the function: Having the
callers manage locking is asking for trouble.  The latter may require a
new lock for the state to avoid entanglement.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 3/6] Restrict device state changes allowed via sysfs
  2013-06-27 14:54 ` [PATCH v12 3/6] Restrict device state changes allowed via sysfs Bart Van Assche
  2013-07-01  8:23   ` Hannes Reinecke
@ 2013-07-01 14:51   ` James Bottomley
  1 sibling, 0 replies; 19+ messages in thread
From: James Bottomley @ 2013-07-01 14:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On Thu, 2013-06-27 at 16:54 +0200, Bart Van Assche wrote:
> Restrict the SCSI device state changes allowd via sysfs to the
> OFFLINE<>RUNNING transitions. Other transitions may confuse
> the SCSI mid-layer. As an example, changing the state of a SCSI
> device via sysfs into "cancel" or "deleted" prevents removal of
> a SCSI device by scsi_remove_host().

This one's not ready for application.  I would like a debate on what we
should be doing.  Currently we don't apply any sanity checking at all.
Should we?  And should we police user state changes.  If so, What
changes should we allow?  I opine that really only OFFLINE <-> RUNNING
make sense, but I've no idea what people actually use this field for (if
they use it at all).

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race
  2013-07-01 14:49   ` James Bottomley
@ 2013-07-01 15:17     ` Bart Van Assche
  2013-07-01 16:52       ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2013-07-01 15:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 07/01/13 16:49, James Bottomley wrote:
> On Thu, 2013-06-27 at 16:56 +0200, Bart Van Assche wrote:
>> Make concurrent invocations of scsi_device_set_state() safe.
>
> Firstly, I don't understand from this where you think the races are.
> Secondly, shouldn't this be the device lock? and thirdly, if we accept
> that locking is required, encapsulate it in the function: Having the
> callers manage locking is asking for trouble.  The latter may require a
> new lock for the state to avoid entanglement.

Today there is no guarantee that scsi_device_set_state() calls are 
serialized, so two scsi_device_set_state() invocations may be in 
progress concurrently. It is e.g. possible that both calls report 
"device state has been changed successfully" to their callers although 
only one of these two state changes will be effective due to the race.

At the time I wrote this patch I think there was a caller that invoked 
scsi_device_set_state() with the host lock held. Hence my choice for the 
host lock. However, I can't find that caller anymore. So the suggestion 
to use the device lock instead makes sense to me. I'll double check 
whether there are no callers of that function that already hold the 
device lock.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race
  2013-07-01 15:17     ` Bart Van Assche
@ 2013-07-01 16:52       ` James Bottomley
  2013-07-02  6:42         ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2013-07-01 16:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On Mon, 2013-07-01 at 17:17 +0200, Bart Van Assche wrote:
> On 07/01/13 16:49, James Bottomley wrote:
> > On Thu, 2013-06-27 at 16:56 +0200, Bart Van Assche wrote:
> >> Make concurrent invocations of scsi_device_set_state() safe.
> >
> > Firstly, I don't understand from this where you think the races are.
> > Secondly, shouldn't this be the device lock? and thirdly, if we accept
> > that locking is required, encapsulate it in the function: Having the
> > callers manage locking is asking for trouble.  The latter may require a
> > new lock for the state to avoid entanglement.
> 
> Today there is no guarantee that scsi_device_set_state() calls are 
> serialized, so two scsi_device_set_state() invocations may be in 
> progress concurrently. It is e.g. possible that both calls report 
> "device state has been changed successfully" to their callers although 
> only one of these two state changes will be effective due to the race.

We could say the above about a significant fraction of the functions in
the kernel; it's not a reason to add fine grained locking to them all.

I want to know what the actual races you're trying to fix are; what
causes them and, in particular, is adding yet another fine grained lock
going to mitigate them effectively or should they be mediated in a
different way.

James


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race
  2013-07-01 16:52       ` James Bottomley
@ 2013-07-02  6:42         ` Bart Van Assche
  0 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2013-07-02  6:42 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Hannes Reinecke, Chanho Min, Joe Lawrence,
	linux-scsi, David Milburn, Tejun Heo

On 07/01/13 18:52, James Bottomley wrote:
> On Mon, 2013-07-01 at 17:17 +0200, Bart Van Assche wrote:
>> On 07/01/13 16:49, James Bottomley wrote:
>>> On Thu, 2013-06-27 at 16:56 +0200, Bart Van Assche wrote:
>>>> Make concurrent invocations of scsi_device_set_state() safe.
>>>
>>> Firstly, I don't understand from this where you think the races are.
>>> Secondly, shouldn't this be the device lock? and thirdly, if we accept
>>> that locking is required, encapsulate it in the function: Having the
>>> callers manage locking is asking for trouble.  The latter may require a
>>> new lock for the state to avoid entanglement.
>>
>> Today there is no guarantee that scsi_device_set_state() calls are
>> serialized, so two scsi_device_set_state() invocations may be in
>> progress concurrently. It is e.g. possible that both calls report
>> "device state has been changed successfully" to their callers although
>> only one of these two state changes will be effective due to the race.
>
> We could say the above about a significant fraction of the functions in
> the kernel; it's not a reason to add fine grained locking to them all.
>
> I want to know what the actual races you're trying to fix are; what
> causes them and, in particular, is adding yet another fine grained lock
> going to mitigate them effectively or should they be mediated in a
> different way.

Since this patch is something I came up with as the result of source 
reading maybe I should defer this patch to a later time such that it 
doesn't slow down acceptance of this patch series.

Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-07-02  6:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-27 14:51 [PATCH v12 0/6] SCSI device removal fixes Bart Van Assche
2013-06-27 14:52 ` [PATCH v12 1/6] Fix race between starved list and device removal Bart Van Assche
2013-06-27 14:53 ` [PATCH v12 2/6] Avoid calling __scsi_remove_device() twice Bart Van Assche
2013-07-01  7:05   ` James Bottomley
2013-07-01  7:14     ` Bart Van Assche
2013-07-01 14:38       ` James Bottomley
2013-06-27 14:54 ` [PATCH v12 3/6] Restrict device state changes allowed via sysfs Bart Van Assche
2013-07-01  8:23   ` Hannes Reinecke
2013-07-01 14:51   ` James Bottomley
2013-06-27 14:55 ` [PATCH v12 4/6] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
2013-06-27 14:56 ` [PATCH v12 5/6] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
2013-07-01 14:49   ` James Bottomley
2013-07-01 15:17     ` Bart Van Assche
2013-07-01 16:52       ` James Bottomley
2013-07-02  6:42         ` Bart Van Assche
2013-06-27 14:57 ` [PATCH v12 6/6] Avoid re-enabling I/O after the transport became offline Bart Van Assche
2013-07-01  8:27   ` Hannes Reinecke
2013-07-01 12:05     ` Bart Van Assche
2013-07-01 12:09       ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).