All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Keith Busch <kbusch@meta.com>
Cc: linux-pci@vger.kernel.org, bhelgaas@google.com,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCHv2 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock
Date: Thu, 27 Jun 2024 09:47:05 +0200	[thread overview]
Message-ID: <Zn0Y-UhMqCo2PCtM@wunner.de> (raw)
In-Reply-To: <20240612181625.3604512-2-kbusch@meta.com>

On Wed, Jun 12, 2024 at 11:16:24AM -0700, Keith Busch wrote:
> PCIe hotplug events modify the topology in their IRQ thread once it can
> acquire the global pci_rescan_remove_lock.
> 
> If a different removal event happens to acquire that lock first, and
> that removal event is for the parent device of the bridge processing the
> other hotplug event, then we are deadlocked: the parent removal will
> wait indefinitely on the child's IRQ thread because the parent is
> holding the global lock the child thread needs to make forward progress.

Yes, that's a known problem.  I submitted a fix for it in 2018:

https://lore.kernel.org/all/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/

The patch I proposed was similar to yours, but was smaller and
confined to pciehp_pci.c.  It was part of a larger series and
when respinning that series I dropped the patch, which is the
reason it never got applied.  I explained the rationale for
dropping it in this message:

https://lore.kernel.org/all/20180906162634.ylyp3ydwujf5wuxx@wunner.de/

Basically all these proposals (both mine and yours) are not great
because they add another layer of duct tape without tackling the
underlying problem -- that pci_lock_rescan_remove() is way too
coarse-grained and needs to be replaced by finer-grained locking.
That however is a complex task that we haven't made significant
forward progress on in the last couple of years.  Something else
always seemed more important.

So I don't really know what to say.  I recognize it's a problem
but I'm hesitant to endorse a duct tape fix. :(

In the second link above I mentioned that my approach doesn't solve
another race discovered by Xiongfeng Wang.  pciehp has been refactored
considerably since then, so I'm not sure if that particular issue
still exists...

Thanks,

Lukas

  reply	other threads:[~2024-06-27  7:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-12 18:16 [PATCHv2 0/2] pcie hotplug and error fixes Keith Busch
2024-06-12 18:16 ` [PATCHv2 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock Keith Busch
2024-06-27  7:47   ` Lukas Wunner [this message]
2024-06-27 14:55     ` Keith Busch
2024-11-11  7:38     ` Lukas Wunner
2024-11-11  8:00       ` Lukas Wunner
2024-11-11 17:59         ` Keith Busch
2024-06-12 18:16 ` [PATCHv2 2/2] PCI: err: ensure stable topology during handling Keith Busch
2024-06-15 14:09   ` Lukas Wunner
2024-06-17 17:04     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zn0Y-UhMqCo2PCtM@wunner.de \
    --to=lukas@wunner.de \
    --cc=bhelgaas@google.com \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.