linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: restart list search after unlock in scsi_remove_target
@ 2015-10-19 14:35 Christoph Hellwig
  2015-10-19 15:36 ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Christoph Hellwig @ 2015-10-19 14:35 UTC (permalink / raw)
  To: linux-scsi, James.Bottomley; +Cc: jthumshirn

When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us.  Without this
we can get stuck in an endless loop.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b333389f..d3b34d8 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1158,31 +1158,23 @@ static void __scsi_remove_target(struct scsi_target *starget)
 void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
-	struct scsi_target *starget, *last = NULL;
+	struct scsi_target *starget;
 	unsigned long flags;
 
-	/* remove targets being careful to lookup next entry before
-	 * deleting the last
-	 */
+restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(starget, &shost->__targets, siblings) {
 		if (starget->state == STARGET_DEL)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			/* assuming new targets arrive at the end */
 			kref_get(&starget->reap_ref);
 			spin_unlock_irqrestore(shost->host_lock, flags);
-			if (last)
-				scsi_target_reap(last);
-			last = starget;
 			__scsi_remove_target(starget);
-			spin_lock_irqsave(shost->host_lock, flags);
+			scsi_target_reap(starget);
+			goto restart;
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-
-	if (last)
-		scsi_target_reap(last);
 }
 EXPORT_SYMBOL(scsi_remove_target);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-19 14:35 [PATCH] scsi: restart list search after unlock in scsi_remove_target Christoph Hellwig
@ 2015-10-19 15:36 ` Bart Van Assche
  2015-10-19 15:56   ` Christoph Hellwig
  2015-10-26  8:35 ` Johannes Thumshirn
  2015-10-27 16:33 ` Johannes Thumshirn
  2 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2015-10-19 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, linux-scsi, James.Bottomley; +Cc: jthumshirn

On 10/19/2015 07:35 AM, Christoph Hellwig wrote:
> When dropping a lock while iterating a list we must restart the search
> as other threads could have manipulated the list under us.  Without this
> we can get stuck in an endless loop.

Hello Christoph,

Thanks for looking into this. However, I think we need a motivation in 
the patch description why this patch does not reintroduce the soft 
lockup documented in patch "scsi_remove_target: fix softlockup 
regression on hot remove" (commit bc3f02a795d3).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-19 15:36 ` Bart Van Assche
@ 2015-10-19 15:56   ` Christoph Hellwig
  2015-10-19 16:01     ` James Bottomley
  2015-10-19 17:21     ` Dan Williams
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2015-10-19 15:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, linux-scsi, James.Bottomley, jthumshirn,
	Dan Williams

On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote:
> Thanks for looking into this. However, I think we need a motivation in the 
> patch description why this patch does not reintroduce the soft lockup 
> documented in patch "scsi_remove_target: fix softlockup regression on hot 
> remove" (commit bc3f02a795d3).

Interesting.  I tried to find the original report and what state
changes would cause an endless loop here.  Dan, do you remember any
details about this bug report?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-19 15:56   ` Christoph Hellwig
@ 2015-10-19 16:01     ` James Bottomley
  2015-10-19 17:21     ` Dan Williams
  1 sibling, 0 replies; 9+ messages in thread
From: James Bottomley @ 2015-10-19 16:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bart Van Assche, linux-scsi, jthumshirn, Dan Williams

On Mon, 2015-10-19 at 17:56 +0200, Christoph Hellwig wrote:
> On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote:
> > Thanks for looking into this. However, I think we need a motivation in the 
> > patch description why this patch does not reintroduce the soft lockup 
> > documented in patch "scsi_remove_target: fix softlockup regression on hot 
> > remove" (commit bc3f02a795d3).
> 
> Interesting.  I tried to find the original report and what state
> changes would cause an endless loop here.

The original report is this one:

http://thread.gmane.org/gmane.linux.kernel/1348679

James

>   Dan, do you remember any
> details about this bug report?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-19 15:56   ` Christoph Hellwig
  2015-10-19 16:01     ` James Bottomley
@ 2015-10-19 17:21     ` Dan Williams
  1 sibling, 0 replies; 9+ messages in thread
From: Dan Williams @ 2015-10-19 17:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bart Van Assche, linux-scsi, Jej B, jthumshirn

On Mon, Oct 19, 2015 at 8:56 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote:
>> Thanks for looking into this. However, I think we need a motivation in the
>> patch description why this patch does not reintroduce the soft lockup
>> documented in patch "scsi_remove_target: fix softlockup regression on hot
>> remove" (commit bc3f02a795d3).
>
> Interesting.  I tried to find the original report and what state
> changes would cause an endless loop here.  Dan, do you remember any
> details about this bug report?

I believe the issue I was seeing back then might have been fixed or at
least modulated by "f2495e228fce [SCSI] dual scan thread bug fix"
which came a few years later.  The original problem was hot-remove
racing hot-add and that scsi_target_reap() was not guaranteed to
advance the state of the target if it was in the process of being
scanned when a removal event arrived.  However the comment in that
change:

+       /*
+        * if we get here and the target is still in the CREATED state that
+        * means it was allocated but never made visible (because a scan
+        * turned up no LUNs), so don't call device_del() on it.
+        */

...is not what I was seeing.  The target was in the CREATED state
because it had not yet completed the initial scan before tear down was
initiated.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-19 14:35 [PATCH] scsi: restart list search after unlock in scsi_remove_target Christoph Hellwig
  2015-10-19 15:36 ` Bart Van Assche
@ 2015-10-26  8:35 ` Johannes Thumshirn
  2015-10-27 20:14   ` Bart Van Assche
  2015-10-27 16:33 ` Johannes Thumshirn
  2 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2015-10-26  8:35 UTC (permalink / raw)
  To: Christoph Hellwig, linux-scsi, James.Bottomley

On Mon, 2015-10-19 at 16:35 +0200, Christoph Hellwig wrote:
> When dropping a lock while iterating a list we must restart the
> search
> as other threads could have manipulated the list under us.  Without
> this
> we can get stuck in an endless loop.
> 
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index b333389f..d3b34d8 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1158,31 +1158,23 @@ static void __scsi_remove_target(struct
> scsi_target *starget)
>  void scsi_remove_target(struct device *dev)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(dev->parent);
> -	struct scsi_target *starget, *last = NULL;
> +	struct scsi_target *starget;
>  	unsigned long flags;
>  
> -	/* remove targets being careful to lookup next entry before
> -	 * deleting the last
> -	 */
> +restart:
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_for_each_entry(starget, &shost->__targets, siblings) {
>  		if (starget->state == STARGET_DEL)
>  			continue;
>  		if (starget->dev.parent == dev || &starget->dev ==
> dev) {
> -			/* assuming new targets arrive at the end */
>  			kref_get(&starget->reap_ref);
>  			spin_unlock_irqrestore(shost->host_lock,
> flags);
> -			if (last)
> -				scsi_target_reap(last);
> -			last = starget;
>  			__scsi_remove_target(starget);
> -			spin_lock_irqsave(shost->host_lock, flags);
> +			scsi_target_reap(starget);
> +			goto restart;
>  		}
>  	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> -
> -	if (last)
> -		scsi_target_reap(last);
>  }
>  EXPORT_SYMBOL(scsi_remove_target);

Hi Christoph,

I haven't heard anything from the original reporter of the lockup but
my test's went all O.K., so 

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" 
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-19 14:35 [PATCH] scsi: restart list search after unlock in scsi_remove_target Christoph Hellwig
  2015-10-19 15:36 ` Bart Van Assche
  2015-10-26  8:35 ` Johannes Thumshirn
@ 2015-10-27 16:33 ` Johannes Thumshirn
  2 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2015-10-27 16:33 UTC (permalink / raw)
  To: Christoph Hellwig, linux-scsi, James.Bottomley

On Mon, 2015-10-19 at 16:35 +0200, Christoph Hellwig wrote:
> When dropping a lock while iterating a list we must restart the
> search
> as other threads could have manipulated the list under us.  Without
> this
> we can get stuck in an endless loop.
> 
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I have an official acknowledge from the original author that a 48h long
term test on their side didn't reproduce the lockup as well.

James can we please get the patch included in 4.4 and stable?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-26  8:35 ` Johannes Thumshirn
@ 2015-10-27 20:14   ` Bart Van Assche
  2015-10-30  8:26     ` Johannes Thumshirn
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2015-10-27 20:14 UTC (permalink / raw)
  To: Johannes Thumshirn, Christoph Hellwig, linux-scsi,
	James.Bottomley

On 10/26/2015 01:35 AM, Johannes Thumshirn wrote:
> I haven't heard anything from the original reporter of the lockup but
> my test's went all O.K., so
> 
> Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

Hello Christoph and Johannes,

How about the patch below, which is a variant of Christoph's patch ?

Thanks,

Bart.

[PATCH] scsi: Restart list search after unlock in scsi_remove_target

When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us.  Without this
we can get stuck in an endless loop. Introduce a new member variable
"visible" in struct scsi_target to track presence in sysfs.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_scan.c   | 16 +++-------------
 drivers/scsi/scsi_sysfs.c  | 19 ++++++-------------
 include/scsi/scsi_device.h |  1 +
 3 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index f9f3f82..7b74470 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -314,7 +314,6 @@ static void scsi_target_destroy(struct scsi_target *starget)
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
 	unsigned long flags;
 
-	starget->state = STARGET_DEL;
 	transport_destroy_device(dev);
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (shost->hostt->target_destroy)
@@ -379,12 +378,8 @@ static void scsi_target_reap_ref_release(struct kref *kref)
 	struct scsi_target *starget
 		= container_of(kref, struct scsi_target, reap_ref);
 
-	/*
-	 * if we get here and the target is still in the CREATED state that
-	 * means it was allocated but never made visible (because a scan
-	 * turned up no LUNs), so don't call device_del() on it.
-	 */
-	if (starget->state != STARGET_CREATED) {
+	if (starget->visible) {
+		starget->visible = false;
 		transport_remove_device(&starget->dev);
 		device_del(&starget->dev);
 	}
@@ -507,12 +502,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
  */
 void scsi_target_reap(struct scsi_target *starget)
 {
-	/*
-	 * serious problem if this triggers: STARGET_DEL is only set in the if
-	 * the reap_ref drops to zero, so we're trying to do another final put
-	 * on an already released kref
-	 */
-	BUG_ON(starget->state == STARGET_DEL);
+	starget->state = STARGET_DEL;
 	scsi_target_reap_ref_put(starget);
 }
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 5e085d4..de8e202 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -974,7 +974,7 @@ static int scsi_target_add(struct scsi_target *starget)
 {
 	int error;
 
-	if (starget->state != STARGET_CREATED)
+	if (starget->visible)
 		return 0;
 
 	error = device_add(&starget->dev);
@@ -983,6 +983,7 @@ static int scsi_target_add(struct scsi_target *starget)
 		return error;
 	}
 	transport_add_device(&starget->dev);
+	starget->visible = true;
 	starget->state = STARGET_RUNNING;
 
 	pm_runtime_set_active(&starget->dev);
@@ -1166,31 +1167,23 @@ static void __scsi_remove_target(struct scsi_target *starget)
 void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
-	struct scsi_target *starget, *last = NULL;
+	struct scsi_target *starget;
 	unsigned long flags;
 
-	/* remove targets being careful to lookup next entry before
-	 * deleting the last
-	 */
+restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(starget, &shost->__targets, siblings) {
 		if (starget->state == STARGET_DEL)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			/* assuming new targets arrive at the end */
 			kref_get(&starget->reap_ref);
 			spin_unlock_irqrestore(shost->host_lock, flags);
-			if (last)
-				scsi_target_reap(last);
-			last = starget;
 			__scsi_remove_target(starget);
-			spin_lock_irqsave(shost->host_lock, flags);
+			scsi_target_reap(starget);
+			goto restart;
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-
-	if (last)
-		scsi_target_reap(last);
 }
 EXPORT_SYMBOL(scsi_remove_target);
 
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index fe89d7c..d5a5f21 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -267,6 +267,7 @@ struct scsi_target {
 	unsigned int		expecting_lun_change:1;	/* A device has reported
 						 * a 3F/0E UA, other devices on
 						 * the same target will also. */
+	unsigned int		visible:1; /* visible in sysfs */
 	/* commands actually active on LLD. */
 	atomic_t		target_busy;
 	atomic_t		target_blocked;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: restart list search after unlock in scsi_remove_target
  2015-10-27 20:14   ` Bart Van Assche
@ 2015-10-30  8:26     ` Johannes Thumshirn
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2015-10-30  8:26 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig, linux-scsi, James.Bottomley

Hi Bart,
On Tue, 2015-10-27 at 13:14 -0700, Bart Van Assche wrote:
> On 10/26/2015 01:35 AM, Johannes Thumshirn wrote:
> > I haven't heard anything from the original reporter of the lockup
> > but
> > my test's went all O.K., so
> > 
> > Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
> > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> 
> Hello Christoph and Johannes,
> 
> How about the patch below, which is a variant of Christoph's patch ?
> 
> Thanks,
> 
> Bart.


I'm OK with it and I ran tests with it over night, no lockups happened.
I feel a bit reluctant of giving the patch to mt downsream reporter as
it's not their business to validate patches for me.

So from my POV it's OK as well. So please feel free to add

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

on the official submission.

Thanks,
	Johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-30  8:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-19 14:35 [PATCH] scsi: restart list search after unlock in scsi_remove_target Christoph Hellwig
2015-10-19 15:36 ` Bart Van Assche
2015-10-19 15:56   ` Christoph Hellwig
2015-10-19 16:01     ` James Bottomley
2015-10-19 17:21     ` Dan Williams
2015-10-26  8:35 ` Johannes Thumshirn
2015-10-27 20:14   ` Bart Van Assche
2015-10-30  8:26     ` Johannes Thumshirn
2015-10-27 16:33 ` Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).