linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
@ 2012-08-29  5:12 Dan Williams
  2012-08-29  6:50 ` Bart Van Assche
  2012-09-01 16:01 ` Stefan Ring
  0 siblings, 2 replies; 6+ messages in thread
From: Dan Williams @ 2012-08-29  5:12 UTC (permalink / raw)
  To: JBottomley; +Cc: John Drescher, stable, linux-scsi, Jack Wang

John reports:
 BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
 [..]
 Call Trace:
  [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
  [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
  [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
  [<ffffffff81421e35>] sas_port_delete+0x25/0x160
  [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270

...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".

Don't restart lookup of more stargets in the multi-target case, just
arrange to traverse the list once, on the assumption that new targets
are always added at the end.  There is no guarantee that the target will
change state in scsi_target_reap() so we can end up spinning if we
restart.

Cc: <stable@vger.kernel.org>
Acked-by: Jack Wang <jack_wang@usish.com>
LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com>
Reported-by: John Drescher <drescherjm@gmail.com>
Tested-by: John Drescher <drescherjm@gmail.com>
Signed-off-by: Dan Williams <djbw@fb.com>
---

Jack, thanks for the reminder to cc stable!

 drivers/scsi/scsi_sysfs.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 093d4f6..ce5224c 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1031,33 +1031,31 @@ static void __scsi_remove_target(struct scsi_target *starget)
 void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
-	struct scsi_target *starget, *found;
+	struct scsi_target *starget, *last = NULL;
 	unsigned long flags;
 
- restart:
-	found = NULL;
+	/* remove targets being careful to lookup next entry before
+	 * deleting the last
+	 */
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(starget, &shost->__targets, siblings) {
 		if (starget->state == STARGET_DEL)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			found = starget;
-			found->reap_ref++;
-			break;
+			/* assuming new targets arrive at the end */
+			starget->reap_ref++;
+			spin_unlock_irqrestore(shost->host_lock, flags);
+			if (last)
+				scsi_target_reap(last);
+			last = starget;
+			__scsi_remove_target(starget);
+			spin_lock_irqsave(shost->host_lock, flags);
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
-	if (found) {
-		__scsi_remove_target(found);
-		scsi_target_reap(found);
-		/* in the case where @dev has multiple starget children,
-		 * continue removing.
-		 *
-		 * FIXME: does such a case exist?
-		 */
-		goto restart;
-	}
+	if (last)
+		scsi_target_reap(last);
 }
 EXPORT_SYMBOL(scsi_remove_target);
 


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
  2012-08-29  5:12 [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove Dan Williams
@ 2012-08-29  6:50 ` Bart Van Assche
  2012-08-29 14:59   ` Dan Williams
  2012-09-01 16:01 ` Stefan Ring
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2012-08-29  6:50 UTC (permalink / raw)
  To: Dan Williams; +Cc: JBottomley, John Drescher, stable, linux-scsi, Jack Wang

On 08/29/12 05:12, Dan Williams wrote:
> John reports:
>  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
>  [..]
>  Call Trace:
>   [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
>   [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
>   [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
>   [<ffffffff81421e35>] sas_port_delete+0x25/0x160
>   [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
> 
> ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".

Including that call stack in the patch description may create the
misleading impression that this only occurs with the mptsas driver. This
lockup also happens with at least the iSCSI initiator. See also
http://lkml.org/lkml/2012/8/24/340.

By the way, in order to get a patch in the stable tree the proper "Cc:"
tag should be added in the patch description but the
stable@vger.kernel.org e-mail address should be left out from the
Cc-list of the e-mail with the patch.

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
  2012-08-29  6:50 ` Bart Van Assche
@ 2012-08-29 14:59   ` Dan Williams
  2012-09-04 14:02     ` John Drescher
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2012-08-29 14:59 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: JBottomley, John Drescher, stable, linux-scsi, Jack Wang

On Wed, 2012-08-29 at 06:50 +0000, Bart Van Assche wrote:
> On 08/29/12 05:12, Dan Williams wrote:
> > John reports:
> >  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
> >  [..]
> >  Call Trace:
> >   [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
> >   [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
> >   [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
> >   [<ffffffff81421e35>] sas_port_delete+0x25/0x160
> >   [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
> > 
> > ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
> 
> Including that call stack in the patch description may create the
> misleading impression that this only occurs with the mptsas driver. This
> lockup also happens with at least the iSCSI initiator. See also
> http://lkml.org/lkml/2012/8/24/340.

I don't think it does that.  The title is pretty generic, but you're
right the impact is potentially all scsi_remove_target() users.

> By the way, in order to get a patch in the stable tree the proper "Cc:"
> tag should be added in the patch description but the
> stable@vger.kernel.org e-mail address should be left out from the
> Cc-list of the e-mail with the patch.

No, we talked about that at kernel summit.  It's ok for the stable@
alias to get a few extra mails.  The patch won't be applied until it
hits mainline and in the meantime it gives a heads up to the -stable
folks, or anyone that wants to follow up on stable patches making their
way to mainline.

--
Dan



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
  2012-08-29  5:12 [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove Dan Williams
  2012-08-29  6:50 ` Bart Van Assche
@ 2012-09-01 16:01 ` Stefan Ring
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Ring @ 2012-09-01 16:01 UTC (permalink / raw)
  To: linux-scsi

Dan Williams <djbw <at> fb.com> writes:

> John reports:
>  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
>  [..]
> 
> ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
> 
> Don't restart lookup of more stargets in the multi-target case, just
> arrange to traverse the list once, on the assumption that new targets
> are always added at the end.  There is no guarantee that the target will
> change state in scsi_target_reap() so we can end up spinning if we
> restart.

Patch works for me.

See also <https://bugzilla.redhat.com/show_bug.cgi?id=848425> for
"hardware-less" reproduction instructions.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
  2012-08-29 14:59   ` Dan Williams
@ 2012-09-04 14:02     ` John Drescher
  2012-09-04 23:01       ` Dan Williams
  0 siblings, 1 reply; 6+ messages in thread
From: John Drescher @ 2012-09-04 14:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: Bart Van Assche, JBottomley, stable, linux-scsi, Jack Wang

On Wed, Aug 29, 2012 at 10:59 AM, Dan Williams <djbw@fb.com> wrote:
> On Wed, 2012-08-29 at 06:50 +0000, Bart Van Assche wrote:
>> On 08/29/12 05:12, Dan Williams wrote:
>> > John reports:
>> >  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
>> >  [..]
>> >  Call Trace:
>> >   [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
>> >   [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
>> >   [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
>> >   [<ffffffff81421e35>] sas_port_delete+0x25/0x160
>> >   [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
>> >
>> > ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
>>
>> Including that call stack in the patch description may create the
>> misleading impression that this only occurs with the mptsas driver. This
>> lockup also happens with at least the iSCSI initiator. See also
>> http://lkml.org/lkml/2012/8/24/340.
>
> I don't think it does that.  The title is pretty generic, but you're
> right the impact is potentially all scsi_remove_target() users.
>
>> By the way, in order to get a patch in the stable tree the proper "Cc:"
>> tag should be added in the patch description but the
>> stable@vger.kernel.org e-mail address should be left out from the
>> Cc-list of the e-mail with the patch.
>
> No, we talked about that at kernel summit.  It's ok for the stable@
> alias to get a few extra mails.  The patch won't be applied until it
> hits mainline and in the meantime it gives a heads up to the -stable
> folks, or anyone that wants to follow up on stable patches making their
> way to mainline.
>

It appears that this did not get into 3.6 rc4 (unless I am reading the
changlog wrong). Do I have to file an official bug report to get this
noticed?

John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove
  2012-09-04 14:02     ` John Drescher
@ 2012-09-04 23:01       ` Dan Williams
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2012-09-04 23:01 UTC (permalink / raw)
  To: John Drescher; +Cc: Bart Van Assche, JBottomley, stable, linux-scsi, Jack Wang

On Tue, Sep 4, 2012 at 7:02 AM, John Drescher <drescherjm@gmail.com> wrote:
> On Wed, Aug 29, 2012 at 10:59 AM, Dan Williams <djbw@fb.com> wrote:
>> On Wed, 2012-08-29 at 06:50 +0000, Bart Van Assche wrote:
>>> On 08/29/12 05:12, Dan Williams wrote:
>>> > John reports:
>>> >  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
>>> >  [..]
>>> >  Call Trace:
>>> >   [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
>>> >   [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
>>> >   [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
>>> >   [<ffffffff81421e35>] sas_port_delete+0x25/0x160
>>> >   [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
>>> >
>>> > ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
>>>
>>> Including that call stack in the patch description may create the
>>> misleading impression that this only occurs with the mptsas driver. This
>>> lockup also happens with at least the iSCSI initiator. See also
>>> http://lkml.org/lkml/2012/8/24/340.
>>
>> I don't think it does that.  The title is pretty generic, but you're
>> right the impact is potentially all scsi_remove_target() users.
>>
>>> By the way, in order to get a patch in the stable tree the proper "Cc:"
>>> tag should be added in the patch description but the
>>> stable@vger.kernel.org e-mail address should be left out from the
>>> Cc-list of the e-mail with the patch.
>>
>> No, we talked about that at kernel summit.  It's ok for the stable@
>> alias to get a few extra mails.  The patch won't be applied until it
>> hits mainline and in the meantime it gives a heads up to the -stable
>> folks, or anyone that wants to follow up on stable patches making their
>> way to mainline.
>>
>
> It appears that this did not get into 3.6 rc4 (unless I am reading the
> changlog wrong). Do I have to file an official bug report to get this
> noticed?

I know James was travelling last week, and I think he wanted some more
time to look over the removal of the restart logic... but I think we
are ok.  Prior to commit 3b661a9 we had

       get_device(dev);
       device_for_each_child(dev, NULL, __remove_child);
       put_device(dev);

Where that device_for_each_child would not restart, but also would not
consider stargets that were not yet proper children of dev.  With the
fix we find all stargets and make sure to be immune to the regression
case where scsi_remove_target() is not the final source of
scsi_target_reap().

--
Dan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-04 23:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-29  5:12 [resend PATCH] scsi_remove_target: fix softlockup regression on hot remove Dan Williams
2012-08-29  6:50 ` Bart Van Assche
2012-08-29 14:59   ` Dan Williams
2012-09-04 14:02     ` John Drescher
2012-09-04 23:01       ` Dan Williams
2012-09-01 16:01 ` Stefan Ring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).