public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Nam Cao <namcao@linutronix.de>
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
	"Yinghai Lu" <yinghai@kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Mika Westerberg" <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails
Date: Mon, 6 May 2024 21:36:44 +0200	[thread overview]
Message-ID: <ZjkxTGaAc48jPzqC@wunner.de> (raw)
In-Reply-To: <20240506083701.NZNifFGn@linutronix.de>

On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote:
> I just discovered a new crash scenario with shpchp:
> 
> First, hot-add a bridge:
> (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
> 
> But with the current patch, the hot-plug is still successful, just that the
> bridge is not properly configured:
> $ lspci -t
> -[0000:00]-+-00.0
>            +-01.0
>            +-02.0
>            +-03.0-[01-02]----00.0-[02]----01.0--  <-- new hot-added bridge
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 
> But! Now I leave the hot-added bridge as is, and hot-add an ethernet card:
> (qemu) device_add e1000,bus=br1,id=eth0,addr=2
> 
> this command crashes the kernel (full crash log below).
> 
> The problem is because during hot-plugging of this ethernet card,
> pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't
> been marked as "added" yet, so the driver attempts to add this bridge
> again, leading to the crash.
> 
> Now for pciehp, this scenario cannot happen, because there is only a single
> slot, so we cannot hot-plug another device. But still, this makes me think
> perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured
> state as this patch is doing. I cannot think of a scenario that would crash pciehp
> similarly to shpchp. But that's just me, maybe there is a rare scenario
> that can crash pciehp, but I just haven't seen yet.
> 
> My point is: better safe than sorry. I propose going back to the original
> solution: calling pci_stop_and_remove_bus_device() and return a negative
> error, and get rid of this improperly configured bridge. It is useless
> anyways without a subordinate bus number.
> 
> What do you think?

When the kernel runs out of BAR address space for devices downstream of
a bridge, it doesn't de-enumerate the bridge.  The devices are just
unusable.

So my first intuition is that running out of bus numbers shouldn't result
in de-enumeration either.

Remind me, how exactly does the NULL pointer deref occur?  I think it's
because no struct pci_bus was allocated for the subordinate bus of the
hot-plugged bridge, right?  Because AFAICS that would happen in

pci_hp_add_bridge()
  pci_can_bridge_extend()
    pci_add_new_bus()
      pci_alloc_child_bus()

but we never get that far because pci_hp_add_bridge() bails out with -1.
So the subordinate pointer remains a NULL pointer.

Perhaps we should allocate an empty struct pci_bus instead.
Or check for a NULL subordinate pointer instead of crashing.

I can't really say off the top of my head which solution is appropriate
here, it requires going through the code paths and identifying the
complexity associated with each solution.

Thanks,

Lukas

  reply	other threads:[~2024-05-06 19:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-04 16:15 [PATCH v2 0/2] abort hot-plug if pci_hp_add_bridge() fails Nam Cao
2024-05-04 16:15 ` [PATCH v2 1/2] PCI: shpchp: Abort " Nam Cao
2024-05-04 16:15 ` [PATCH v2 2/2] PCI: pciehp: " Nam Cao
2024-05-05  5:45   ` Lukas Wunner
2024-05-05  7:14     ` Nam Cao
2024-05-06  8:37       ` Nam Cao
2024-05-06 19:36         ` Lukas Wunner [this message]
2024-05-07 14:27           ` Nam Cao
2024-05-27  9:15             ` Lukas Wunner
2024-05-27  9:23               ` Nam Cao
2024-05-27 12:33                 ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjkxTGaAc48jPzqC@wunner.de \
    --to=lukas@wunner.de \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=namcao@linutronix.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox