linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [PATCH] PCI: Device removal rescan deadlock
Date: Mon, 24 Sep 2018 09:03:54 -0600	[thread overview]
Message-ID: <20180924150353.GA9223@localhost.localdomain> (raw)
In-Reply-To: <20180922081117.n5geis3u3tuerkav@wunner.de>

On Sat, Sep 22, 2018 at 10:11:17AM +0200, Lukas Wunner wrote:
> On Fri, Sep 21, 2018 at 02:57:52PM -0600, Keith Busch wrote:
> > The pci_rescan_remove_lock provides single-threaded enumeration. This
> > can deadlock if a task requests this lock to enumerate below a device
> > while another task wants to remove that device.
> > 
> > A simple example to create this dead lock from sysfs looks something
> > like the following:
> > 
> >   # echo 1 > /sys/bus/pci/devices/<parent>/remove & \
> >     echo 1 > /sys/bus/pci/devices/<child>/rescan &
> > 
> > If the parent removal locks the pci_rescan_remove_lock first, the child
> > thread is blocked from forward progress, but the parent can't release
> > the lock until the child thread releases the device.
> > 
> > Similar deadlocks are possible in the surprise removal path due to
> > deadlocked interrupt handlers.
> > 
> > This patch provides an alternate locking method that exits in failure
> > if the caller wants to scan below a device being removed.
> 
> For context, alternative patches to tackle this problem were submitted
> earlier by Xiongfeng Wang and myself:
> 
> https://patchwork.ozlabs.org/patch/877835/
> https://patchwork.ozlabs.org/patch/930403/
> 
> The solution you've found lools good in principle, this might work.
> 
> However.  I had withdrawn my solution because I came to the conclusion
> that pci_lock_rescan_remove() is basically the equivalent to the
> Big Kernel Lock in the PCI subsystem.  It's way too coarse-grained and
> the proper solution, it seems, is to move to more fine-grained locking.
>
> Specifically, device removal comprises two phases, unbinding of the
> driver (pci_stop_bus_device()) and destruction of the pci_dev
> (pci_remove_bus_device()).  Currently pci_lock_rescan_remove()
> protects both phases.  If we make it so that it only protects the
> second phase, such that the first phase runs lockless, those deadlocks
> automatically go away.  I've explained this in a little more detail
> in this reply to Mika:
> https://www.spinics.net/lists/linux-pci/msg75960.html
> 
> Making unbinding of the driver lockless requires careful examination
> of core PCI enumeration code.  E.g., you've moved the call to
> pci_dev_assign_added() up in pci_stop_dev().  I don't know if that's
> safe.  Changing the "added" flag has consequences which I haven't
> fully understood yet, e.g. it's used to determine newly discovered
> devices during rescan, but I recall noticing that it also has other
> meanings.  This is historically grown code which needs to be untangled
> to make it run lockless, but that takes time.
> 
> The problem is real, so the conundrum is whether to apply a fix now
> which essentially amounts to duct tape, but helps users, or fixing
> the root cause, which takes more time and would leave users exposed
> to the problem for now.  Applying duct tape reduces pressure to
> address the root cause and makes untangling more difficult
> in the long run.

Yep, I agree with all the points you've made.

The coarse locking is not the only issue either. The pci driver internals
are a bit too exposed: many archs and other drivers directly access what
should be private pointers to other parts of the hierarchy without
proper locking, and then uses them without reference accounting. Many
desirable changes may not necessarily be isolated to just pci core.

I originally set out to try to remove the Big PCI Lock, but it started
touching too many subsystems I wouldn't be able to test, so reached for
the duct tape instead. :(

      reply	other threads:[~2018-09-24 15:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-21 20:57 [PATCH] PCI: Device removal rescan deadlock Keith Busch
2018-09-22  8:11 ` Lukas Wunner
2018-09-24 15:03   ` Keith Busch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180924150353.GA9223@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).