All of lore.kernel.org
 help / color / mirror / Atom feed
* PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
@ 2009-12-01 21:45 Michael Reed
  2009-12-01 21:49 ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Reed @ 2009-12-01 21:45 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Bottomley, Hannes Reinecke, Jeremy Higdon, James Smart

Prevent delays and hangs due to sd_remove() waiting for the completion of
async threads executing sd_probe_async of disks on unrelated host adapters.
This patch executes every sd_probe_async in its own async domain allowing
sd_remove() to wait for just the completion of the async thread associated with
the scsi_disk being removed.

Found via fault insertion on a large fibre channel fabric.

Applies to 2.6.32-rc8.

Signed-off-by: Michael Reed <mdr@sgi.com>


--- linux-2.6.32-rc8/drivers/scsi/sd.h	2009-11-19 16:32:38.000000000 -0600
+++ linux-2.6.32-rc8-modified/drivers/scsi/sd.h	2009-12-01 15:25:33.686651715 -0600
@@ -60,6 +60,7 @@ struct scsi_disk {
 	unsigned	RCD : 1;	/* state of disk RCD bit, unused */
 	unsigned	DPOFUA : 1;	/* state of disk DPOFUA bit */
 	unsigned	first_scan : 1;
+	struct list_head async_domain;	/* for sd_probe_async */
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
--- linux-2.6.32-rc8/drivers/scsi/sd.c	2009-11-19 16:32:38.000000000 -0600
+++ linux-2.6.32-rc8-modified/drivers/scsi/sd.c	2009-12-01 15:26:17.686653817 -0600
@@ -2171,7 +2171,8 @@ static int sd_probe(struct device *dev)
 	get_device(&sdp->sdev_gendev);
 
 	get_device(&sdkp->dev);	/* prevent release before async_schedule */
-	async_schedule(sd_probe_async, sdkp);
+	INIT_LIST_HEAD(&sdkp->async_domain);
+	async_schedule_domain(sd_probe_async, sdkp, &sdkp->async_domain);
 
 	return 0;
 
@@ -2202,8 +2203,9 @@ static int sd_remove(struct device *dev)
 {
 	struct scsi_disk *sdkp;
 
-	async_synchronize_full();
 	sdkp = dev_get_drvdata(dev);
+	async_synchronize_full_domain(&sdkp->async_domain);
+
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	device_del(&sdkp->dev);
 	del_gendisk(sdkp->disk);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-01 21:45 PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads Michael Reed
@ 2009-12-01 21:49 ` James Bottomley
  2009-12-01 22:28   ` Michael Reed
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2009-12-01 21:49 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart

On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
> Prevent delays and hangs due to sd_remove() waiting for the completion of
> async threads executing sd_probe_async of disks on unrelated host adapters.
> This patch executes every sd_probe_async in its own async domain allowing
> sd_remove() to wait for just the completion of the async thread associated with
> the scsi_disk being removed.

This patch was thought of a while ago. Unfortunately, some of the
unrelated threads we end up waiting on are libata ones.  you confine sd
to only its own probes, we end up unsynchronised with respect to libata
probes and we might cause ordering problems amongst the ata devices.

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-01 21:49 ` James Bottomley
@ 2009-12-01 22:28   ` Michael Reed
  2009-12-01 22:50     ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Reed @ 2009-12-01 22:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart



James Bottomley wrote:
> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
>> Prevent delays and hangs due to sd_remove() waiting for the completion of
>> async threads executing sd_probe_async of disks on unrelated host adapters.
>> This patch executes every sd_probe_async in its own async domain allowing
>> sd_remove() to wait for just the completion of the async thread associated with
>> the scsi_disk being removed.
> 
> This patch was thought of a while ago. Unfortunately, some of the
> unrelated threads we end up waiting on are libata ones.  you confine sd
> to only its own probes, we end up unsynchronised with respect to libata
> probes and we might cause ordering problems amongst the ata devices.
> 

Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
Shouldn't libata use reference counting if it has is an issue with a scsi_disk
being prematurely removed?  Or is this a concern about "sd" naming?  Or something
else that I admittedly don't understand (but should)?

I would truly like to better understand the issue.  Would someone mind expanding
upon the concern about ata ordering issues associated with the removal of a
single scsi_disk?

Thanks,
 Mike

> James
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-01 22:28   ` Michael Reed
@ 2009-12-01 22:50     ` James Bottomley
  2009-12-02 14:27       ` Michael Reed
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2009-12-01 22:50 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart

On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
> 
> James Bottomley wrote:
> > On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
> >> Prevent delays and hangs due to sd_remove() waiting for the completion of
> >> async threads executing sd_probe_async of disks on unrelated host adapters.
> >> This patch executes every sd_probe_async in its own async domain allowing
> >> sd_remove() to wait for just the completion of the async thread associated with
> >> the scsi_disk being removed.
> > 
> > This patch was thought of a while ago. Unfortunately, some of the
> > unrelated threads we end up waiting on are libata ones.  you confine sd
> > to only its own probes, we end up unsynchronised with respect to libata
> > probes and we might cause ordering problems amongst the ata devices.
> > 
> 
> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
> else that I admittedly don't understand (but should)?

Well, no ... the sync on remove is preventing us removing a device whose
async part is still running.  That async part for libata includes pieces
kicked off from the sd probe.

> I would truly like to better understand the issue.  Would someone mind expanding
> upon the concern about ata ordering issues associated with the removal of a
> single scsi_disk?

The problem isn't removal per se ... it's the fact that remove can't
complete until any async pieces remaining from probe have run. 

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-01 22:50     ` James Bottomley
@ 2009-12-02 14:27       ` Michael Reed
  2009-12-02 14:44         ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Reed @ 2009-12-02 14:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart



James Bottomley wrote:
> On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
>> James Bottomley wrote:
>>> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
>>>> Prevent delays and hangs due to sd_remove() waiting for the completion of
>>>> async threads executing sd_probe_async of disks on unrelated host adapters.
>>>> This patch executes every sd_probe_async in its own async domain allowing
>>>> sd_remove() to wait for just the completion of the async thread associated with
>>>> the scsi_disk being removed.
>>> This patch was thought of a while ago. Unfortunately, some of the
>>> unrelated threads we end up waiting on are libata ones.  you confine sd
>>> to only its own probes, we end up unsynchronised with respect to libata
>>> probes and we might cause ordering problems amongst the ata devices.
>>>
>> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
>> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
>> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
>> else that I admittedly don't understand (but should)?
> 
> Well, no ... the sync on remove is preventing us removing a device whose
> async part is still running.  That async part for libata includes pieces
> kicked off from the sd probe.

What does the call stack look like that spawns the async part of libata?

What kind of hardware do I need to demonstrate this?

> 
>> I would truly like to better understand the issue.  Would someone mind expanding
>> upon the concern about ata ordering issues associated with the removal of a
>> single scsi_disk?
> 
> The problem isn't removal per se ... it's the fact that remove can't
> complete until any async pieces remaining from probe have run. 

Yes, I understand that.  I didn't realize that the sd_probe resulted in any
async work other than sd_probe_async().  It does complicate serialization
at removal.

I'll try to capture the "motivation" from within my fibre channel centric world
for the change and see if anyone's got some ideas on how to resolve the issue.

Mike


> 
> James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-02 14:27       ` Michael Reed
@ 2009-12-02 14:44         ` James Bottomley
  2009-12-02 15:18           ` Michael Reed
  2009-12-02 17:06           ` Michael Reed
  0 siblings, 2 replies; 9+ messages in thread
From: James Bottomley @ 2009-12-02 14:44 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart

On Wed, 2009-12-02 at 08:27 -0600, Michael Reed wrote:
> James Bottomley wrote:
> > On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
> >> James Bottomley wrote:
> >>> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
> >>>> Prevent delays and hangs due to sd_remove() waiting for the completion of
> >>>> async threads executing sd_probe_async of disks on unrelated host adapters.
> >>>> This patch executes every sd_probe_async in its own async domain allowing
> >>>> sd_remove() to wait for just the completion of the async thread associated with
> >>>> the scsi_disk being removed.
> >>> This patch was thought of a while ago. Unfortunately, some of the
> >>> unrelated threads we end up waiting on are libata ones.  you confine sd
> >>> to only its own probes, we end up unsynchronised with respect to libata
> >>> probes and we might cause ordering problems amongst the ata devices.
> >>>
> >> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
> >> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
> >> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
> >> else that I admittedly don't understand (but should)?
> > 
> > Well, no ... the sync on remove is preventing us removing a device whose
> > async part is still running.  That async part for libata includes pieces
> > kicked off from the sd probe.
> 
> What does the call stack look like that spawns the async part of libata?

it's in libata-core (and some in ahci) just do a git grep async in
drivers/ata

> What kind of hardware do I need to demonstrate this?

To demonstrate what?  The out of order sequencing that can arise?  any
ata based system with > 1 device should do.

There's actually another problem with the previous patch ...
scsi_wait_scan() also needs to synchronise with the sd probes.

> >> I would truly like to better understand the issue.  Would someone mind expanding
> >> upon the concern about ata ordering issues associated with the removal of a
> >> single scsi_disk?
> > 
> > The problem isn't removal per se ... it's the fact that remove can't
> > complete until any async pieces remaining from probe have run. 
> 
> Yes, I understand that.  I didn't realize that the sd_probe resulted in any
> async work other than sd_probe_async().  It does complicate serialization
> at removal.
> 
> I'll try to capture the "motivation" from within my fibre channel centric world
> for the change and see if anyone's got some ideas on how to resolve the issue.

OK ... actually describing the problem would be helpful.  The async
schedule is only in sd_remove() to guard against add/remove races ...
usually when we do removal, the async probing parts should be long
finished so I don't understand why you think we would be waiting for
stuff.

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-02 14:44         ` James Bottomley
@ 2009-12-02 15:18           ` Michael Reed
  2009-12-02 15:35             ` James Bottomley
  2009-12-02 17:06           ` Michael Reed
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Reed @ 2009-12-02 15:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart



James Bottomley wrote:
> On Wed, 2009-12-02 at 08:27 -0600, Michael Reed wrote:
>> James Bottomley wrote:
>>> On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
>>>> James Bottomley wrote:
>>>>> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
>>>>>> Prevent delays and hangs due to sd_remove() waiting for the completion of
>>>>>> async threads executing sd_probe_async of disks on unrelated host adapters.
>>>>>> This patch executes every sd_probe_async in its own async domain allowing
>>>>>> sd_remove() to wait for just the completion of the async thread associated with
>>>>>> the scsi_disk being removed.
>>>>> This patch was thought of a while ago. Unfortunately, some of the
>>>>> unrelated threads we end up waiting on are libata ones.  you confine sd
>>>>> to only its own probes, we end up unsynchronised with respect to libata
>>>>> probes and we might cause ordering problems amongst the ata devices.
>>>>>
>>>> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
>>>> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
>>>> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
>>>> else that I admittedly don't understand (but should)?
>>> Well, no ... the sync on remove is preventing us removing a device whose
>>> async part is still running.  That async part for libata includes pieces
>>> kicked off from the sd probe.
>> What does the call stack look like that spawns the async part of libata?
> 
> it's in libata-core (and some in ahci) just do a git grep async in
> drivers/ata
> 
>> What kind of hardware do I need to demonstrate this?
> 
> To demonstrate what?  The out of order sequencing that can arise?  any
> ata based system with > 1 device should do.
> 
> There's actually another problem with the previous patch ...
> scsi_wait_scan() also needs to synchronise with the sd probes.


Thank you for pointing this out.


> 
>>>> I would truly like to better understand the issue.  Would someone mind expanding
>>>> upon the concern about ata ordering issues associated with the removal of a
>>>> single scsi_disk?
>>> The problem isn't removal per se ... it's the fact that remove can't
>>> complete until any async pieces remaining from probe have run. 
>> Yes, I understand that.  I didn't realize that the sd_probe resulted in any
>> async work other than sd_probe_async().  It does complicate serialization
>> at removal.
>>
>> I'll try to capture the "motivation" from within my fibre channel centric world
>> for the change and see if anyone's got some ideas on how to resolve the issue.
> 
> OK ... actually describing the problem would be helpful.  The async
> schedule is only in sd_remove() to guard against add/remove races ...
> usually when we do removal, the async probing parts should be long
> finished so I don't understand why you think we would be waiting for
> stuff.

It appears that the async_synchronize_full() waits for all async threads
from whatever source.  The issue arises when there is sd_probe_async() 
work active for other luns.

Lots and LOTS of luns can delay the sd_probe_async work.  I've got
over 7000 total luns on my test system via 13 different FC host adapters.
What runs quickly on a small system can take a while on this config.
Add fibre channel link up / down events or fabric changes can cause
fc_remote_port_delete() and fc_remote_port_add() calls (and associated
target scans), the time needed to process these events can exceed the
device removal timeout resulting in removes while scan is still running
and there's lots of async work pending.

I'll try to provide a more detailed problem description later today.
(I knew I should have saved those backtraces....)

Thanks for your help in explaining the issues associated with the patch.

Mike


> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-02 15:18           ` Michael Reed
@ 2009-12-02 15:35             ` James Bottomley
  0 siblings, 0 replies; 9+ messages in thread
From: James Bottomley @ 2009-12-02 15:35 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart

On Wed, 2009-12-02 at 09:18 -0600, Michael Reed wrote:
> James Bottomley wrote:
> > On Wed, 2009-12-02 at 08:27 -0600, Michael Reed wrote:
> >> James Bottomley wrote:
> >>> The problem isn't removal per se ... it's the fact that remove can't
> >>> complete until any async pieces remaining from probe have run. 
> >> Yes, I understand that.  I didn't realize that the sd_probe resulted in any
> >> async work other than sd_probe_async().  It does complicate serialization
> >> at removal.
> >>
> >> I'll try to capture the "motivation" from within my fibre channel centric world
> >> for the change and see if anyone's got some ideas on how to resolve the issue.
> > 
> > OK ... actually describing the problem would be helpful.  The async
> > schedule is only in sd_remove() to guard against add/remove races ...
> > usually when we do removal, the async probing parts should be long
> > finished so I don't understand why you think we would be waiting for
> > stuff.
> 
> It appears that the async_synchronize_full() waits for all async threads
> from whatever source.  The issue arises when there is sd_probe_async() 
> work active for other luns.

Ah ... I'm afraid trying to fix that currently can't be done.  The
reason why we do a global synchronise is so that even with the async
pieces, everything shows up in-order (so the luns get sequentially
lettered in the sd<X> space).

Unfortunately, we're not yet to the point where we can expect devices to
show up entirely randomly on each boot and have the user cope.  Even
though udev can help with this, there are still too many non-udev or
simply just /dev/sd<X> using systems out there to dump the sequential
scan order just yet.

> Lots and LOTS of luns can delay the sd_probe_async work.  I've got
> over 7000 total luns on my test system via 13 different FC host adapters.
> What runs quickly on a small system can take a while on this config.
> Add fibre channel link up / down events or fabric changes can cause
> fc_remote_port_delete() and fc_remote_port_add() calls (and associated
> target scans), the time needed to process these events can exceed the
> device removal timeout resulting in removes while scan is still running
> and there's lots of async work pending.

Is the problem one of thread/work scheduling?  As in we just have too
much work to do and the scheduler isn't coping? ... figures comparing
what goes on with the fully sync case would be helpful.

> I'll try to provide a more detailed problem description later today.
> (I knew I should have saved those backtraces....)
> 
> Thanks for your help in explaining the issues associated with the patch.

You're welcome,

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
  2009-12-02 14:44         ` James Bottomley
  2009-12-02 15:18           ` Michael Reed
@ 2009-12-02 17:06           ` Michael Reed
  1 sibling, 0 replies; 9+ messages in thread
From: Michael Reed @ 2009-12-02 17:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, Hannes Reinecke, Jeremy Higdon, James Smart



James Bottomley wrote:
> On Wed, 2009-12-02 at 08:27 -0600, Michael Reed wrote:
>> James Bottomley wrote:
>>> On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
>>>> James Bottomley wrote:
>>>>> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
>>>>>> Prevent delays and hangs due to sd_remove() waiting for the completion of
>>>>>> async threads executing sd_probe_async of disks on unrelated host adapters.
>>>>>> This patch executes every sd_probe_async in its own async domain allowing
>>>>>> sd_remove() to wait for just the completion of the async thread associated with
>>>>>> the scsi_disk being removed.
>>>>> This patch was thought of a while ago. Unfortunately, some of the
>>>>> unrelated threads we end up waiting on are libata ones.  you confine sd
>>>>> to only its own probes, we end up unsynchronised with respect to libata
>>>>> probes and we might cause ordering problems amongst the ata devices.
>>>>>
>>>> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
>>>> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
>>>> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
>>>> else that I admittedly don't understand (but should)?
>>> Well, no ... the sync on remove is preventing us removing a device whose
>>> async part is still running.  That async part for libata includes pieces
>>> kicked off from the sd probe.
>> What does the call stack look like that spawns the async part of libata?
> 
> it's in libata-core (and some in ahci) just do a git grep async in
> drivers/ata
> 
>> What kind of hardware do I need to demonstrate this?
> 
> To demonstrate what?  The out of order sequencing that can arise?  any
> ata based system with > 1 device should do.
> 
> There's actually another problem with the previous patch ...
> scsi_wait_scan() also needs to synchronise with the sd probes.
> 
>>>> I would truly like to better understand the issue.  Would someone mind expanding
>>>> upon the concern about ata ordering issues associated with the removal of a
>>>> single scsi_disk?
>>> The problem isn't removal per se ... it's the fact that remove can't
>>> complete until any async pieces remaining from probe have run. 
>> Yes, I understand that.  I didn't realize that the sd_probe resulted in any
>> async work other than sd_probe_async().  It does complicate serialization
>> at removal.
>>
>> I'll try to capture the "motivation" from within my fibre channel centric world
>> for the change and see if anyone's got some ideas on how to resolve the issue.
> 
> OK ... actually describing the problem would be helpful.  The async
> schedule is only in sd_remove() to guard against add/remove races ...
> usually when we do removal, the async probing parts should be long
> finished so I don't understand why you think we would be waiting for
> stuff.

As I've been working out of 2.6.31 and things have changed pretty dramatically
in scsi-misc, I'm going to update my test environment to scsi-misc and see
if I continue to observe the same behavior.

Moving targets are good.  Moving targets are good.  Moving targets are good.

:)

Mike

> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-12-02 17:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-01 21:45 PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads Michael Reed
2009-12-01 21:49 ` James Bottomley
2009-12-01 22:28   ` Michael Reed
2009-12-01 22:50     ` James Bottomley
2009-12-02 14:27       ` Michael Reed
2009-12-02 14:44         ` James Bottomley
2009-12-02 15:18           ` Michael Reed
2009-12-02 15:35             ` James Bottomley
2009-12-02 17:06           ` Michael Reed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.