All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Keith Busch <kbusch@meta.com>,
	linux-pci@vger.kernel.org, bhelgaas@google.com
Subject: Re: [PATCHv2 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock
Date: Thu, 27 Jun 2024 08:55:43 -0600	[thread overview]
Message-ID: <Zn19b3oPiDF9UnPx@kbusch-mbp> (raw)
In-Reply-To: <Zn0Y-UhMqCo2PCtM@wunner.de>

On Thu, Jun 27, 2024 at 09:47:05AM +0200, Lukas Wunner wrote:
> On Wed, Jun 12, 2024 at 11:16:24AM -0700, Keith Busch wrote:
> > PCIe hotplug events modify the topology in their IRQ thread once it can
> > acquire the global pci_rescan_remove_lock.
> > 
> > If a different removal event happens to acquire that lock first, and
> > that removal event is for the parent device of the bridge processing the
> > other hotplug event, then we are deadlocked: the parent removal will
> > wait indefinitely on the child's IRQ thread because the parent is
> > holding the global lock the child thread needs to make forward progress.
> 
> Yes, that's a known problem.  I submitted a fix for it in 2018:
> 
> https://lore.kernel.org/all/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/
> 
> The patch I proposed was similar to yours, but was smaller and
> confined to pciehp_pci.c.  It was part of a larger series and
> when respinning that series I dropped the patch, which is the
> reason it never got applied.  I explained the rationale for
> dropping it in this message:
> 
> https://lore.kernel.org/all/20180906162634.ylyp3ydwujf5wuxx@wunner.de/
> 
> Basically all these proposals (both mine and yours) are not great
> because they add another layer of duct tape without tackling the
> underlying problem -- that pci_lock_rescan_remove() is way too
> coarse-grained and needs to be replaced by finer-grained locking.
> That however is a complex task that we haven't made significant
> forward progress on in the last couple of years.  Something else
> always seemed more important.
> 
> So I don't really know what to say.  I recognize it's a problem
> but I'm hesitant to endorse a duct tape fix. :(
> 
> In the second link above I mentioned that my approach doesn't solve
> another race discovered by Xiongfeng Wang.  pciehp has been refactored
> considerably since then, so I'm not sure if that particular issue
> still exists...

Thanks for comments. I agree the current locking scheme is the real
problem here. But I would still selfishly take a duct tape solution at
this point! :) There's so many direct pci_lock_rescan_remove() users
with hardware I don't have, it may be difficult to properly test a
significant change to the locking.

The sysfs race appears to still exist today. My patch wouldn't help with
that either.

  reply	other threads:[~2024-06-27 14:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-12 18:16 [PATCHv2 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:16 ` [PATCHv2 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
2024-06-27  7:47   ` Lukas Wunner
2024-06-27 14:55     ` Keith Busch [this message]
2024-11-11  7:38     ` Lukas Wunner
2024-11-11  8:00       ` Lukas Wunner
2024-11-11 17:59         ` Keith Busch
2024-06-12 18:16 ` [PATCHv2 2/2] PCI: err: ensure stable topology during handling Keith Busch
2024-06-15 14:09   ` Lukas Wunner
2024-06-17 17:04     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zn19b3oPiDF9UnPx@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=kbusch@meta.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.