[PATCH 4/7] Fix various bugs in the target code

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 4/7] Fix various bugs in the target code
@ 2009-08-10 15:08 Alan Stern
  2009-08-10 15:53 ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Stern @ 2009-08-10 15:08 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

This patch (as1276) fixes some bugs in the SCSI target code:

	In scsi_alloc_target(), a newly-created target was added to
	the host's list of targets before the host's target_alloc()
	method was called.

	In the same routine, if a match was found with an old target
	in the DEL state then that target's reap_ref was mistakenly
	incremented.  This is harmless, but it shouldn't happen.

	If we have to wait for an old target to disappear, instead of
	calling flush_scheduled_work() the patch calls
	schedule_timeout_uninterruptible(1).  After all, whatever is
	pinning the old target might not have anything to do with a
	workqueue.  Besides, flush_scheduled_work() is prone to
	deadlocks and should never be used by drivers.

	scsi_target_reap() changed starget->state outside the
	protection of the host lock.

	__scsi_add_device() called scsi_alloc_target() outside the
	protection of the host's scan_mutex, meaning that it might
	find an incompletely-initialized target or it might create
	a duplicate target.

	scsi_sysfs_add_sdev() would call transport_configure_device()
	for a target each time a new device was added under that
	target.  The call has been moved to scsi_target_add(), where
	it will be made only once.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Index: usb-2.6/drivers/scsi/scsi_scan.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_scan.c
+++ usb-2.6/drivers/scsi/scsi_scan.c
@@ -376,6 +376,9 @@ static struct scsi_target *__scsi_find_t
  *
  * The target is returned with an incremented reference, so the caller
  * is responsible for both reaping and doing a last put
+ *
+ * This routine should be called only under the protection of the host's
+ * scan_mutex.
  */
 static struct scsi_target *scsi_alloc_target(struct device *parent,
 					     int channel, uint id)
@@ -418,7 +421,6 @@ static struct scsi_target *scsi_alloc_ta
 	if (found_target)
 		goto found;
 
-	list_add_tail(&starget->siblings, &shost->__targets);
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	/* allocate and add */
 	transport_setup_device(dev);
@@ -433,22 +435,26 @@ static struct scsi_target *scsi_alloc_ta
 	}
 	spin_lock_irqsave(shost->host_lock, flags);
 	starget->state = STARGET_CREATED;
+	list_add_tail(&starget->siblings, &shost->__targets);
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	get_device(dev);
 
 	return starget;
 
  found:
-	found_target->reap_ref++;
-	spin_unlock_irqrestore(shost->host_lock, flags);
 	if (found_target->state != STARGET_DEL) {
+		found_target->reap_ref++;
+		spin_unlock_irqrestore(shost->host_lock, flags);
 		scsi_target_reap(starget);
 		return found_target;
 	}
-	/* Unfortunately, we found a dying target; need to
-	 * wait until it's dead before we can get a new one */
+
+	/* Unfortunately, we found a dying target; we need to
+	 * wait until it's gone before we can get a new one.
+	 */
+	spin_unlock_irqrestore(shost->host_lock, flags);
 	put_device(&found_target->dev);
-	flush_scheduled_work();
+	schedule_timeout_uninterruptible(1);
 	goto retry;
 }
 
@@ -471,13 +477,14 @@ void scsi_target_reap(struct scsi_target
 	spin_lock_irqsave(shost->host_lock, flags);
 	state = starget->state;
 	empty = --starget->reap_ref == 0;
+	if (empty)
+		starget->state = STARGET_DEL;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	if (!empty)
 		return;
 
 	BUG_ON(state == STARGET_DEL || !list_empty(&starget->devices));
-	starget->state = STARGET_DEL;
 	if (state == STARGET_RUNNING) {
 		transport_remove_device(dev);
 		device_del(dev);
@@ -1487,27 +1494,29 @@ static int scsi_report_lun_scan(struct s
 struct scsi_device *__scsi_add_device(struct Scsi_Host *shost, uint channel,
 				      uint id, uint lun, void *hostdata)
 {
-	struct scsi_device *sdev = ERR_PTR(-ENODEV);
+	struct scsi_device *sdev = ERR_PTR(-ENOMEM);
 	struct device *parent = &shost->shost_gendev;
-	struct scsi_target *starget;
+	struct scsi_target *starget = NULL;
 
 	if (strncmp(scsi_scan_type, "none", 4) == 0)
 		return ERR_PTR(-ENODEV);
 
-	starget = scsi_alloc_target(parent, channel, id);
-	if (!starget)
-		return ERR_PTR(-ENOMEM);
-
 	mutex_lock(&shost->scan_mutex);
 	if (!shost->async_scan)
 		scsi_complete_async_scans();
 
-	if (scsi_host_scan_allowed(shost))
-		scsi_probe_and_add_lun(starget, lun, NULL, &sdev, 1, hostdata);
+	if (scsi_host_scan_allowed(shost)) {
+		starget = scsi_alloc_target(parent, channel, id);
+		if (starget)
+			scsi_probe_and_add_lun(starget, lun, NULL, &sdev,
+					1, hostdata);
+	}
 	mutex_unlock(&shost->scan_mutex);
-	scsi_target_reap(starget);
-	put_device(&starget->dev);
 
+	if (starget) {
+		scsi_target_reap(starget);
+		put_device(&starget->dev);
+	}
 	return sdev;
 }
 EXPORT_SYMBOL(__scsi_add_device);
Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -822,6 +822,7 @@ static int scsi_target_add(struct scsi_t
 	}
 	transport_add_device(&starget->dev);
 	starget->state = STARGET_RUNNING;
+	transport_configure_device(&starget->dev);
 
 	return 0;
 }
@@ -850,7 +851,6 @@ int scsi_sysfs_add_sdev(struct scsi_devi
 	if (error)
 		return error;
 
-	transport_configure_device(&starget->dev);
 	error = device_add(&sdev->sdev_gendev);
 	if (error) {
 		put_device(sdev->sdev_gendev.parent);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/7] Fix various bugs in the target code
  2009-08-10 15:08 [PATCH 4/7] Fix various bugs in the target code Alan Stern
@ 2009-08-10 15:53 ` James Bottomley
  2009-08-10 19:05   ` Alan Stern
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: James Bottomley @ 2009-08-10 15:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Mon, 2009-08-10 at 11:08 -0400, Alan Stern wrote:
> This patch (as1276) fixes some bugs in the SCSI target code:
> 
> 	In scsi_alloc_target(), a newly-created target was added to
> 	the host's list of targets before the host's target_alloc()
> 	method was called.

So this one is the design way the target code works:  allocate and add
first so the transport and target code have a fully usable device.

> 	In the same routine, if a match was found with an old target
> 	in the DEL state then that target's reap_ref was mistakenly
> 	incremented.  This is harmless, but it shouldn't happen.

It's required for the state != DEL case ... the reap_ref loses
significance after the state moves to DEL, so it's by design.

> 	If we have to wait for an old target to disappear, instead of
> 	calling flush_scheduled_work() the patch calls
> 	schedule_timeout_uninterruptible(1).  After all, whatever is
> 	pinning the old target might not have anything to do with a
> 	workqueue.  Besides, flush_scheduled_work() is prone to
> 	deadlocks and should never be used by drivers.

I don't really buy this; it's not (yet) a documented restriction of
flush_scheduled_work() and we have a few drivers using it.  It only
deadlocks if you call it from a running workqueue.

> 	scsi_target_reap() changed starget->state outside the
> 	protection of the host lock.

So does everything else that manipulates the target state.

> 	__scsi_add_device() called scsi_alloc_target() outside the
> 	protection of the host's scan_mutex, meaning that it might
> 	find an incompletely-initialized target or it might create
> 	a duplicate target.

scan mutex isn't used as a determinant for this.  There can be a slight
race in initialisation, but really, it isn't important.

> 	scsi_sysfs_add_sdev() would call transport_configure_device()
> 	for a target each time a new device was added under that
> 	target.  The call has been moved to scsi_target_add(), where
> 	it will be made only once.

That, unfortunately is an SPI required feature ... we can't actually
configure the target until we have a device because of the way SPI
parameters work.

James

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/7] Fix various bugs in the target code
  2009-08-10 15:53 ` James Bottomley
@ 2009-08-10 19:05   ` Alan Stern
  2009-08-11 15:12   ` Alan Stern
  2009-08-12 18:31   ` Alan Stern
  2 siblings, 0 replies; 7+ messages in thread
From: Alan Stern @ 2009-08-10 19:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

Obviously this patch needs to be revised.  I'll redo it, taking your 
comments into account.

On Mon, 10 Aug 2009, James Bottomley wrote:

> On Mon, 2009-08-10 at 11:08 -0400, Alan Stern wrote:
> > This patch (as1276) fixes some bugs in the SCSI target code:
> > 
> > 	In scsi_alloc_target(), a newly-created target was added to
> > 	the host's list of targets before the host's target_alloc()
> > 	method was called.
> 
> So this one is the design way the target code works:  allocate and add
> first so the transport and target code have a fully usable device.

Sorry, I don't understand.  In what way is the target not fully usable 
before it is added to the host's list of targets?

I have carried out a search through the entire kernel source for
occurrences of "__targets".  The only places it occurs are in
__scsi_find_target() and scsi_alloc_target() (plus the initialization
in scsi_host_alloc() and the declaration in include/scsi/scsi_hosts.h).  
Doesn't this prove that a target is no less usable before it is added
to the list than afterward?

However, if you still this part of the patch should be removed then I
will do so, since this whole region of code is supposed to be
serialized by the host's scan_mutex.

> > 	In the same routine, if a match was found with an old target
> > 	in the DEL state then that target's reap_ref was mistakenly
> > 	incremented.  This is harmless, but it shouldn't happen.
> 
> It's required for the state != DEL case ... the reap_ref loses
> significance after the state moves to DEL, so it's by design.

Okay, it's harmless, so I'll remove this part of the patch.

> > 	If we have to wait for an old target to disappear, instead of
> > 	calling flush_scheduled_work() the patch calls
> > 	schedule_timeout_uninterruptible(1).  After all, whatever is
> > 	pinning the old target might not have anything to do with a
> > 	workqueue.  Besides, flush_scheduled_work() is prone to
> > 	deadlocks and should never be used by drivers.
> 
> I don't really buy this; it's not (yet) a documented restriction of
> flush_scheduled_work() and we have a few drivers using it.  It only
> deadlocks if you call it from a running workqueue.

Or if you call it while holding a mutex or semaphore that's needed by
one of the items on the workqueue.  Since the keventd workqueue is
likely to contain just about _anything_, you can have no idea what
mutexes the work items will require.

Since scsi_alloc_target() is called while the host's scan_mutex is
held, it cannot safely call flush_scheduled_work().

(For that matter, how do you know that scsi_alloc_target() won't be 
called from within a workqueue routine?)

Besides, what if there is nothing on the workqueue to flush?  You still
want to delay or give up the CPU briefly, so that the old target can be
deallocated.  Doesn't this mean schedule_timeout_uninterruptible(1) is
more appropriate than flush_scheduled_work() in any case?

> > 	scsi_target_reap() changed starget->state outside the
> > 	protection of the host lock.
> 
> So does everything else that manipulates the target state.

Hmm, that's true.  Didn't you mention in an earlier email that the
target state should be changed only under the protection of the host
lock?

Regardless, the important part is where "empty" gets set, and that's
already protected by the host lock.  So I'll remove this from the
patch.

> > 	__scsi_add_device() called scsi_alloc_target() outside the
> > 	protection of the host's scan_mutex, meaning that it might
> > 	find an incompletely-initialized target or it might create
> > 	a duplicate target.
> 
> scan mutex isn't used as a determinant for this.  There can be a slight
> race in initialisation, but really, it isn't important.

I'm surprised you say that.  It means you can end up trying to use a
target before the transport code has finished setting it up (if two
threads both try to register devices below the same target at the same
time).  Besides, a major part of the purpose of the scan_mutex is to
prevent the creation of new targets or devices after the host has left
the SHOST_RUNNING state.  So it definitely should cover all calls to
scsi_alloc_target().

Conversely, if as you say, this isn't important, then you shouldn't
have any real objection to moving this call into the mutex-protected
region.  So either way, there's no reason not to accept this part of 
the patch.

> > 	scsi_sysfs_add_sdev() would call transport_configure_device()
> > 	for a target each time a new device was added under that
> > 	target.  The call has been moved to scsi_target_add(), where
> > 	it will be made only once.
> 
> That, unfortunately is an SPI required feature ... we can't actually
> configure the target until we have a device because of the way SPI
> parameters work.

Okay, I'll put it back and add an explanatory comment.  After all these 
revisions the patch will be a lot shorter.  :-)

Sound good?

Alan Stern

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/7] Fix various bugs in the target code
  2009-08-10 15:53 ` James Bottomley
  2009-08-10 19:05   ` Alan Stern
@ 2009-08-11 15:12   ` Alan Stern
  2009-08-11 15:15     ` James Bottomley
  2009-08-12 18:31   ` Alan Stern
  2 siblings, 1 reply; 7+ messages in thread
From: Alan Stern @ 2009-08-11 15:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Mon, 10 Aug 2009, James Bottomley wrote:

> > 	scsi_sysfs_add_sdev() would call transport_configure_device()
> > 	for a target each time a new device was added under that
> > 	target.  The call has been moved to scsi_target_add(), where
> > 	it will be made only once.
> 
> That, unfortunately is an SPI required feature ... we can't actually
> configure the target until we have a device because of the way SPI
> parameters work.

Would it be sufficient to call transport_configure_device() only the
first time a device is added?

Alan Stern


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/7] Fix various bugs in the target code
  2009-08-11 15:12   ` Alan Stern
@ 2009-08-11 15:15     ` James Bottomley
  0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2009-08-11 15:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: SCSI development list

On Tue, 2009-08-11 at 11:12 -0400, Alan Stern wrote:
> On Mon, 10 Aug 2009, James Bottomley wrote:
> 
> > > 	scsi_sysfs_add_sdev() would call transport_configure_device()
> > > 	for a target each time a new device was added under that
> > > 	target.  The call has been moved to scsi_target_add(), where
> > > 	it will be made only once.
> > 
> > That, unfortunately is an SPI required feature ... we can't actually
> > configure the target until we have a device because of the way SPI
> > parameters work.
> 
> Would it be sufficient to call transport_configure_device() only the
> first time a device is added?

No, because of renegotiations ... it would be sufficient only to call it
the last time a device is added, but we don't know when that will be.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/7] Fix various bugs in the target code
  2009-08-10 15:53 ` James Bottomley
  2009-08-10 19:05   ` Alan Stern
  2009-08-11 15:12   ` Alan Stern
@ 2009-08-12 18:31   ` Alan Stern
  2 siblings, 0 replies; 7+ messages in thread
From: Alan Stern @ 2009-08-12 18:31 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

On Mon, 10 Aug 2009, James Bottomley wrote:

> > 	If we have to wait for an old target to disappear, instead of
> > 	calling flush_scheduled_work() the patch calls
> > 	schedule_timeout_uninterruptible(1).  After all, whatever is
> > 	pinning the old target might not have anything to do with a
> > 	workqueue.  Besides, flush_scheduled_work() is prone to
> > 	deadlocks and should never be used by drivers.
> 
> I don't really buy this; it's not (yet) a documented restriction of
> flush_scheduled_work() and we have a few drivers using it.  It only
> deadlocks if you call it from a running workqueue.

Since you are so opposed to this patch, I will drop it from the series.
You didn't comment on any of the first three patches in the series; 
does that mean they are acceptable as is?

And what about the following patches?  I'll redo them based on omitting 
4/7; are there any parts of them you don't like?

Alan Stern


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/7] Fix various bugs in the target code
@ 2009-08-13 14:25 Alan Stern
  0 siblings, 0 replies; 7+ messages in thread
From: Alan Stern @ 2009-08-13 14:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

James:

I have revised patch 7/7, based on omitting patch 4/7.  The other two, 
5/7 and 6/7, didn't need any significant changes -- they will apply 
okay with minor fuzz.

Alan Stern

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-08-13 14:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-10 15:08 [PATCH 4/7] Fix various bugs in the target code Alan Stern
2009-08-10 15:53 ` James Bottomley
2009-08-10 19:05   ` Alan Stern
2009-08-11 15:12   ` Alan Stern
2009-08-11 15:15     ` James Bottomley
2009-08-12 18:31   ` Alan Stern
  -- strict thread matches above, loose matches on Subject: below --
2009-08-13 14:25 Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox