All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Reed <mdr@sgi.com>
To: James Bottomley <James.Bottomley@suse.de>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	Hannes Reinecke <hare@suse.de>, Jeremy Higdon <jeremy@sgi.com>,
	James Smart <James.Smart@Emulex.Com>
Subject: Re: PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads
Date: Wed, 02 Dec 2009 11:06:04 -0600	[thread overview]
Message-ID: <4B169E7C.8030003@sgi.com> (raw)
In-Reply-To: <1259765040.5430.19.camel@mulgrave.site>



James Bottomley wrote:
> On Wed, 2009-12-02 at 08:27 -0600, Michael Reed wrote:
>> James Bottomley wrote:
>>> On Tue, 2009-12-01 at 16:28 -0600, Michael Reed wrote:
>>>> James Bottomley wrote:
>>>>> On Tue, 2009-12-01 at 15:45 -0600, Michael Reed wrote:
>>>>>> Prevent delays and hangs due to sd_remove() waiting for the completion of
>>>>>> async threads executing sd_probe_async of disks on unrelated host adapters.
>>>>>> This patch executes every sd_probe_async in its own async domain allowing
>>>>>> sd_remove() to wait for just the completion of the async thread associated with
>>>>>> the scsi_disk being removed.
>>>>> This patch was thought of a while ago. Unfortunately, some of the
>>>>> unrelated threads we end up waiting on are libata ones.  you confine sd
>>>>> to only its own probes, we end up unsynchronised with respect to libata
>>>>> probes and we might cause ordering problems amongst the ata devices.
>>>>>
>>>> Isn't sd_remove() only concerned with the removal of a a single scsi_disk?
>>>> Shouldn't libata use reference counting if it has is an issue with a scsi_disk
>>>> being prematurely removed?  Or is this a concern about "sd" naming?  Or something
>>>> else that I admittedly don't understand (but should)?
>>> Well, no ... the sync on remove is preventing us removing a device whose
>>> async part is still running.  That async part for libata includes pieces
>>> kicked off from the sd probe.
>> What does the call stack look like that spawns the async part of libata?
> 
> it's in libata-core (and some in ahci) just do a git grep async in
> drivers/ata
> 
>> What kind of hardware do I need to demonstrate this?
> 
> To demonstrate what?  The out of order sequencing that can arise?  any
> ata based system with > 1 device should do.
> 
> There's actually another problem with the previous patch ...
> scsi_wait_scan() also needs to synchronise with the sd probes.
> 
>>>> I would truly like to better understand the issue.  Would someone mind expanding
>>>> upon the concern about ata ordering issues associated with the removal of a
>>>> single scsi_disk?
>>> The problem isn't removal per se ... it's the fact that remove can't
>>> complete until any async pieces remaining from probe have run. 
>> Yes, I understand that.  I didn't realize that the sd_probe resulted in any
>> async work other than sd_probe_async().  It does complicate serialization
>> at removal.
>>
>> I'll try to capture the "motivation" from within my fibre channel centric world
>> for the change and see if anyone's got some ideas on how to resolve the issue.
> 
> OK ... actually describing the problem would be helpful.  The async
> schedule is only in sd_remove() to guard against add/remove races ...
> usually when we do removal, the async probing parts should be long
> finished so I don't understand why you think we would be waiting for
> stuff.

As I've been working out of 2.6.31 and things have changed pretty dramatically
in scsi-misc, I'm going to update my test environment to scsi-misc and see
if I continue to observe the same behavior.

Moving targets are good.  Moving targets are good.  Moving targets are good.

:)

Mike

> 
> James
> 
> 

      parent reply	other threads:[~2009-12-02 17:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-01 21:45 PATCH [1/1]: sd_remove() hangs waiting on async_synchronize of unrelated threads Michael Reed
2009-12-01 21:49 ` James Bottomley
2009-12-01 22:28   ` Michael Reed
2009-12-01 22:50     ` James Bottomley
2009-12-02 14:27       ` Michael Reed
2009-12-02 14:44         ` James Bottomley
2009-12-02 15:18           ` Michael Reed
2009-12-02 15:35             ` James Bottomley
2009-12-02 17:06           ` Michael Reed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B169E7C.8030003@sgi.com \
    --to=mdr@sgi.com \
    --cc=James.Bottomley@suse.de \
    --cc=James.Smart@Emulex.Com \
    --cc=hare@suse.de \
    --cc=jeremy@sgi.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.