Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Nirmal Patel <nirmal.patel@linux.intel.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v2] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports
Date: Tue, 5 Dec 2023 18:27:18 -0600	[thread overview]
Message-ID: <20231206002718.GA692432@bhelgaas> (raw)
In-Reply-To: <2728ad9d38442510c5e0c8174d0f7aae6ff247ac.camel@linux.intel.com>

On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote:
> > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote:
> > > Currently VMD copies root bridge setting to enable Hotplug on its
> > > rootports. This mechanism works fine for Host OS and no issue has
> > > been observed. However in case of VM, all the HyperVisors don't
> > > pass the Hotplug setting to the guest BIOS which results in
> > > assigning default values and disabling Hotplug capability in the
> > > guest which have been observed by many OEMs.
> > 
> > Can we make this a little more specific?  I'm lacking a lot of
> > virtualization background here, so I'm going to ask a bunch of stupid
> > questions here:
> > 
> > "VMD copies root bridge setting" refers to _OSC and the copying done
> > in vmd_copy_host_bridge_flags()?  Obviously this must be in the Host
> > OS.
>
> Yes. we are talking about vmd_copy_host_bridge_flags. It gets called in
> both Host and Guest. But it assigns different values. In Host, it
> assigns correct values, while in guest it assigns power-on default
> value 0 which in simple term disable everything including hotplug.

If vmd_copy_host_bridge_flags() is called both in the Host and the
Guest, I guess that means both the VMD endpoint and the VMD Root Ports
below it are passed through to the Guest?  I expected only the VMD
Root Ports would be passed through, but maybe that's my
misunderstanding.

The VMD endpoint is under host bridge A, and the VMD creates a new PCI
domain under a new host bridge B.  vmd_copy_host_bridge_flags() copies
all the feature ownership flags from A to B.

On ACPI systems (like both Host and Guest) all these flags are
initially cleared for host bridge A in acpi_pci_root_create() because
these features are owned by the platform until _OSC grants ownership
to the OS.  They are initially *set* for host bridge B in
pci_init_host_bridge() because it's created by the vmd driver instead
of being enumerated via ACPI.

If the Host _OSC grants ownership to the OS, A's native_pcie_hotplug
will be set, and vmd_copy_host_bridge_flags() leaves it set for B as
well.  If the Host _OSC does *not* grant hotplug ownership to the OS,
native_pcie_hotplug will be cleared for both A and B.

Apparently the Guest doesn't supply _OSC (seems like a spec violation;
could tell from the dmesg), so A's native_pcie_hotplug remains
cleared, which means vmd_copy_host_bridge_flags() will also clear it
for B.

[The lack of a Guest BIOS _OSC would also mean that if you passed
through a normal non-VMD PCIe Switch Downstream Port to the Guest, the
Guest OS would never be able to manage hotplug below that Port.  Maybe
nobody passes through Switch Ports.]

> > "This mechanism works fine for Host OS and no issue has been
> > observed."  I guess this means that if the platform grants hotplug
> > control to the Host OS via _OSC, pciehp in the Host OS works fine to
> > manage hotplug on Root Ports below the VMD?
>
> Yes. When platform hotplug setting is enabled, then _OSC assigns
> correct value to vmd root ports via vmd_copy_host_bridge_flags.
>
> > If the platform does *not* grant hotplug control to the Host OS, I
> > guess it means the Host OS pciehp can *not* manage hotplug below VMD?
> > Is that also what you expect?
> > 
> > "However in case of VM, all the HyperVisors don't pass the Hotplug
> > setting to the guest BIOS"  What is the mechanism by which they would
> > pass the hotplug setting?  I guess the guest probably doesn't see the
> > VMD endpoint itself, right?  The guest sees either virtualized or
> > passed-through VMD Root Ports?
>
> In guest, vmd is passthrough including pci topology behind vmd. The way
> we verify Hotplug capability is to read Slot Capabilities registers'
> hotplug enabled bit set on vmd root ports in Guest.

I guess maybe this refers to set_pcie_hotplug_bridge(), which checks
PCI_EXP_SLTCAP_HPC?  If it's not set, pciehp won't bind to the device.
This behavior is the same in Host and Guest.

> The values copied in vmd_copy_host_bridge_flags are different in Host
> than in Guest. I do not know what component is responsible for this in
> every HyperVisor. But I tested this in ESXi, HyperV, KVM and they all
> had the same behavior where the _OSC flags are set to power-on default
> values of 0 by vmd_copy_host_bridge_flags in Guest OS which is
> disabling hotplug.
>
> > I assume the guest OS sees a constructed ACPI system (different from
> > the ACPI environment the host sees), so it sees a PNP0A03 host bridge
> > with _OSC?  And presumably VMD Root Ports below that host bridge?
> > 
> > So are you saying the _OSC seen by the guest doesn't grant hotplug
> > control to the guest OS?  And it doesn't do that because the guest
> > BIOS hasn't implemented _OSC?  Or maybe the guest BIOS relies on the
> > Slot Capabilities registers of VMD Root Ports to decide whether _OSC
> > should grant hotplug control?  And those Slot Capabilities don't
> > advertise hotplug support?
>
> Currently Hotplug is controlled by _OSC in both host and Guest. In case
> of guest, it seems guest BIOS hasn't implemented _OSC since all the
> flags are set to 0 which is not the case in host.

I think you want pciehp to work on the VMD Root Ports in the Guest,
no matter what, on the assumption that any _OSC that applies to host
bridge A does not apply to the host bridge created by the vmd driver.

If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC
on PCIe features").  Since host bridge B was not enumerated via ACPI,
the OS owns all those features under B by default, so reverting
04b12ef163d1 would leave that state.

Obviously we'd have to first figure out another solution for the AER
message flood that 04b12ef163d1 resolved.

> VMD is passthrough in Guest so the slot capabilities registers are
> still same as what Host driver would see. That is how we can still
> control hotplug on vmd on both Guest and Host.
> > 
> > "... which results in assigning default values and disabling Hotplug
> > capability in the guest."  What default values are these?
>
> 0. Everything is disabled in guest. So basically _OSC is writing wrong
> values in guest.
>
> > Is this talking about the default host_bridge->native_pcie_hotplug
> > value set in acpi_pci_root_create(), where the OS assumes hotplug
> > is owned by the platform unless _OSC grants control to the OS?
>
> vmd driver calls pci_create_root_bus which eventually ends up
> calling pci_init_host_bridge where native_pcie_hotplug is set to 1.
> Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug to
> _OSC setting of 0 in guest.
>
> > > VMD Hotplug can be enabled or disabled based on the VMD
> > > rootports' Hotplug configuration in BIOS. is_hotplug_bridge is
> > > set on each VMD rootport based on Hotplug capable bit in SltCap
> > > in probe.c.  Check is_hotplug_bridge and enable or disable
> > > native_pcie_hotplug based on that value.
> > > 
> > > This patch will make sure that Hotplug is enabled properly in
> > > Host as well as in VM while honoring _OSC settings as well as
> > > VMD hotplug setting.
> > 
> > "Hotplug is enabled properly in Host as well as in VM" sounds like
> > we're talking about both the host OS and the guest OS.
>
> Yes. VM means guest.
> > 
> > In the host OS, this overrides whatever was negotiated via _OSC, I
> > guess on the principle that _OSC doesn't apply because the host BIOS
> > doesn't know about the Root Ports below the VMD.  (I'm not sure it's
> > 100% resolved that _OSC doesn't apply to them, so we should mention
> > that assumption here.)
>
> _OSC still controls every flag including Hotplug. I have observed
> that slot capabilities register has hotplug enabled only when
> platform has enabled the hotplug. So technically not overriding it
> in the host.
>
> It overrides in guest because _OSC is passing wrong/different
> information than the _OSC information in Host.  Also like I
> mentioned, slot capabilities registers are not changed in Guest
> because vmd is passthrough.
> > 
> > In the guest OS, this still overrides whatever was negotiated via
> > _OSC, but presumably the guest BIOS *does* know about the Root
> > Ports, so I don't think the same principle about overriding _OSC
> > applies there.
> > 
> > I think it's still a problem that we handle
> > host_bridge->native_pcie_hotplug differently than all the other
> > features negotiated via _OSC.  We should at least explain this
> > somehow.
>
> Right now this is the only way to know in Guest OS if platform has
> enabled Hotplug or not. We have many customers complaining about
> Hotplug being broken in Guest. So just trying to honor _OSC but also
> fixing its limitation in Guest.
> > 
> > I suspect part of the reason for doing it differently is to avoid
> > the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd:
> > Honor ACPI _OSC on PCIe features").  If so, I think 04b12ef163d1
> > should be mentioned here because it's an important piece of the
> > picture.
>
> I can add a note about this patch as well. Let me know if you want
> me to add something specific.
> 
> Thank you for taking the time. This is a very critical issue for us
> and we have many customers complaining about it, I would appreciate
> to get it resolved as soon as possible.

> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > ---
> > > v1->v2: Updating commit message.
> > > ---
> > >  drivers/pci/controller/vmd.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c
> > > b/drivers/pci/controller/vmd.c
> > > index 769eedeb8802..e39eaef5549a 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features)
> > >  	resource_size_t membar2_offset = 0x2000;
> > >  	struct pci_bus *child;
> > >  	struct pci_dev *dev;
> > > +	struct pci_host_bridge *vmd_bridge;
> > >  	int ret;
> > >  
> > >  	/*
> > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features)
> > >  	 * and will fail pcie_bus_configure_settings() early. It can
> > > instead be
> > >  	 * run on each of the real root ports.
> > >  	 */
> > > -	list_for_each_entry(child, &vmd->bus->children, node)
> > > +	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
> > > +	list_for_each_entry(child, &vmd->bus->children, node) {
> > >  		pcie_bus_configure_settings(child);
> > > +		/*
> > > +		 * When Hotplug is enabled on vmd root-port, enable it
> > > on vmd
> > > +		 * bridge.
> > > +		 */
> > > +		if (child->self->is_hotplug_bridge)
> > > +			vmd_bridge->native_pcie_hotplug = 1;
> > 
> > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which
> > means "the hardware of this slot is hot-plug *capable*."
> > 
> > Whether hotplug is *enabled* is a separate policy decision about
> > whether the OS is allowed to operate the hotplug functionality, so I
> > think saying "When Hotplug is enabled" in the comment is a little bit
> > misleading.
>
> I will rewrite the comment.
> > 
> > Bjorn
> > 
> > > +	}
> > >  
> > >  	pci_bus_add_devices(vmd->bus);
> > >  
> > > -- 
> > > 2.31.1
> > > 
> 

  reply	other threads:[~2023-12-06  0:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 21:17 [PATCH v2] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports Nirmal Patel
2023-12-02  0:07 ` Bjorn Helgaas
2023-12-05 22:20   ` Nirmal Patel
2023-12-06  0:27     ` Bjorn Helgaas [this message]
2023-12-11 23:05       ` Nirmal Patel
2023-12-12 21:13         ` Bjorn Helgaas
2023-12-14  1:07           ` Nirmal Patel
2023-12-14 19:23             ` Bjorn Helgaas
2023-12-14 22:22               ` Nirmal Patel
2024-01-12  0:02                 ` Nirmal Patel
2024-01-12 22:55                 ` Bjorn Helgaas
2024-01-16 20:37                   ` Nirmal Patel
2024-01-17  0:49                     ` Bjorn Helgaas
2024-02-01 21:16                       ` Bjorn Helgaas
2024-02-01 18:38                     ` Nirmal Patel
2024-02-01 23:00                       ` Bjorn Helgaas
2024-02-07  0:27                         ` Nirmal Patel
2024-02-07 18:55                           ` Bjorn Helgaas
2024-02-13 17:47                             ` Nirmal Patel
2024-03-06 22:27                               ` Nirmal Patel
2024-03-07  6:44                                 ` Kai-Heng Feng
2024-03-08  0:09                                   ` Nirmal Patel
2024-03-15  1:29                                     ` Kai-Heng Feng
2024-03-22 20:57                                       ` Nirmal Patel
2024-03-22 21:37                                         ` Bjorn Helgaas
2024-03-22 22:43                                           ` Paul M Stillwell Jr
2024-03-22 23:36                                             ` Bjorn Helgaas
2024-03-25 15:10                                               ` Paul M Stillwell Jr
2024-03-26  0:17                                               ` Nirmal Patel
2024-03-26  1:59                                                 ` Kai-Heng Feng
2024-03-26 15:51                                                   ` Paul M Stillwell Jr
2024-03-26 16:03                                                     ` Paul M Stillwell Jr
2024-03-26 21:08                                                     ` Bjorn Helgaas
2024-04-02 16:10                                                       ` Paul M Stillwell Jr
2024-02-01 22:22                     ` Bjorn Helgaas
2024-01-16 23:54     ` Bjorn Helgaas
2024-02-14 13:43     ` Lorenzo Pieralisi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231206002718.GA692432@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nirmal.patel@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox