From: Keith Busch <kbusch@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Keith Busch <kbusch@meta.com>,
linux-pci@vger.kernel.org, bhelgaas@google.com
Subject: Re: [PATCHv2 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock
Date: Thu, 27 Jun 2024 08:55:43 -0600 [thread overview]
Message-ID: <Zn19b3oPiDF9UnPx@kbusch-mbp> (raw)
In-Reply-To: <Zn0Y-UhMqCo2PCtM@wunner.de>
On Thu, Jun 27, 2024 at 09:47:05AM +0200, Lukas Wunner wrote:
> On Wed, Jun 12, 2024 at 11:16:24AM -0700, Keith Busch wrote:
> > PCIe hotplug events modify the topology in their IRQ thread once it can
> > acquire the global pci_rescan_remove_lock.
> >
> > If a different removal event happens to acquire that lock first, and
> > that removal event is for the parent device of the bridge processing the
> > other hotplug event, then we are deadlocked: the parent removal will
> > wait indefinitely on the child's IRQ thread because the parent is
> > holding the global lock the child thread needs to make forward progress.
>
> Yes, that's a known problem. I submitted a fix for it in 2018:
>
> https://lore.kernel.org/all/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/
>
> The patch I proposed was similar to yours, but was smaller and
> confined to pciehp_pci.c. It was part of a larger series and
> when respinning that series I dropped the patch, which is the
> reason it never got applied. I explained the rationale for
> dropping it in this message:
>
> https://lore.kernel.org/all/20180906162634.ylyp3ydwujf5wuxx@wunner.de/
>
> Basically all these proposals (both mine and yours) are not great
> because they add another layer of duct tape without tackling the
> underlying problem -- that pci_lock_rescan_remove() is way too
> coarse-grained and needs to be replaced by finer-grained locking.
> That however is a complex task that we haven't made significant
> forward progress on in the last couple of years. Something else
> always seemed more important.
>
> So I don't really know what to say. I recognize it's a problem
> but I'm hesitant to endorse a duct tape fix. :(
>
> In the second link above I mentioned that my approach doesn't solve
> another race discovered by Xiongfeng Wang. pciehp has been refactored
> considerably since then, so I'm not sure if that particular issue
> still exists...
Thanks for comments. I agree the current locking scheme is the real
problem here. But I would still selfishly take a duct tape solution at
this point! :) There's so many direct pci_lock_rescan_remove() users
with hardware I don't have, it may be difficult to properly test a
significant change to the locking.
The sysfs race appears to still exist today. My patch wouldn't help with
that either.
next prev parent reply other threads:[~2024-06-27 14:55 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-12 18:16 [PATCHv2 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:16 ` [PATCHv2 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
2024-06-27 7:47 ` Lukas Wunner
2024-06-27 14:55 ` Keith Busch [this message]
2024-11-11 7:38 ` Lukas Wunner
2024-11-11 8:00 ` Lukas Wunner
2024-11-11 17:59 ` Keith Busch
2024-06-12 18:16 ` [PATCHv2 2/2] PCI: err: ensure stable topology during handling Keith Busch
2024-06-15 14:09 ` Lukas Wunner
2024-06-17 17:04 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zn19b3oPiDF9UnPx@kbusch-mbp \
--to=kbusch@kernel.org \
--cc=bhelgaas@google.com \
--cc=kbusch@meta.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).