From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: <linux-pci@vger.kernel.org>, Keith Busch <kbusch@kernel.org>,
"Kai-Heng Feng" <kai.heng.feng@canonical.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: Re: [PATCH] Documentation: PCI: add vmd documentation
Date: Mon, 22 Apr 2024 14:39:16 -0700 [thread overview]
Message-ID: <5ba96aab-10ee-41b4-988b-2609b4d39f33@intel.com> (raw)
In-Reply-To: <20240422202740.GA415030@bhelgaas>
On 4/22/2024 1:27 PM, Bjorn Helgaas wrote:
> On Fri, Apr 19, 2024 at 03:18:19PM -0700, Paul M Stillwell Jr wrote:
>> On 4/19/2024 2:14 PM, Bjorn Helgaas wrote:
>>> On Thu, Apr 18, 2024 at 02:51:19PM -0700, Paul M Stillwell Jr wrote:
>>>> On 4/18/2024 11:26 AM, Bjorn Helgaas wrote:
>>>>> On Wed, Apr 17, 2024 at 01:15:42PM -0700, Paul M Stillwell Jr wrote:
>>>>>> Adding documentation for the Intel VMD driver and updating the index
>>>>>> file to include it.
>
>>>>> - Which devices are passed through to a virtual guest and enumerated
>>>>> there?
>>>>
>>>> All devices under VMD are passed to a virtual guest
>>>
>>> So the guest will see the VMD Root Ports, but not the VMD RCiEP
>>> itself?
>>
>> The guest will see the VMD device and then the vmd driver in the guest will
>> enumerate the devices behind it is my understanding
>>
>>>>> - Where does the vmd driver run (host or guest or both)?
>>>>
>>>> I believe the answer is both.
>>>
>>> If the VMD RCiEP isn't passed through to the guest, how can the vmd
>>> driver do anything in the guest?
>>
>> The VMD device is passed through to the guest. It works just like bare metal
>> in that the guest OS detects the VMD device and loads the vmd driver which
>> then enumerates the devices into the guest
>
> I guess it's obvious that the VMD RCiEP must be passed through to the
> guest because the whole point of
> https://lore.kernel.org/linux-pci/20240408183927.135-1-paul.m.stillwell.jr@intel.com/
> is to do something in the guest.
>
> It does puzzle me that we have two copies of the vmd driver (one in
> the host OS and another in the guest OS) that think they own the same
> physical device. I'm not a virtualization guru but that sounds
> potentially problematic.
>
>>> IIUC, the current situation is "regardless of what firmware said, in
>>> the VMD domain we want AER disabled and hotplug enabled."
>>
>> We aren't saying we want AER disabled, we are just saying we want hotplug
>> enabled. The observation is that in a hypervisor scenario AER is going to be
>> disabled because the _OSC bits are all 0.
>
> 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") is saying
> we want AER disabled for the VMD domain, isn't it?
>
I don't see it that way, but maybe I'm misunderstanding something. Here
is the code from that commit (only the portion of interest):
+/*
+ * Since VMD is an aperture to regular PCIe root ports, only allow it to
+ * control features that the OS is allowed to control on the physical
PCI bus.
+ */
+static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
+ struct pci_host_bridge *vmd_bridge)
+{
+ vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
+ vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
+ vmd_bridge->native_aer = root_bridge->native_aer;
+ vmd_bridge->native_pme = root_bridge->native_pme;
+ vmd_bridge->native_ltr = root_bridge->native_ltr;
+ vmd_bridge->native_dpc = root_bridge->native_dpc;
+}
+
When I look at this, we are copying whatever is in the root_bridge to
the vmd_bridge. In a bare metal scenario, this is correct and AER will
be whatever the BIOS set up (IIRC on my bare metal system AER is
enabled). In a hypervisor scenario the root_bridge bits will all be 0
which effectively disables AER and all the similar bits.
Prior to this commit all the native_* bits were set to 1 because
pci_init_host_bridge() sets them all to 1 so we were enabling AER et all
despite what the OS may have wanted. With the commit we are setting the
bits to whatever the BIOS has set them to.
>>> It seems like the only clear option is to say "the vmd driver owns all
>>> PCIe services in the VMD domain, the platform does not supply _OSC for
>>> the VMD domain, the platform can't do anything with PCIe services in
>>> the VMD domain, and the vmd driver needs to explicitly enable/disable
>>> services as it needs."
>>
>> I actually looked at this as well :) I had an idea to set the _OSC bits to 0
>> when the vmd driver created the domain. The look at all the root ports
>> underneath it and see if AER and PM were set. If any root port underneath
>> VMD set AER or PM then I would set the _OSC bit for the bridge to 1. That
>> way if any root port underneath VMD had enabled AER (as an example) then
>> that feature would still work. I didn't test this in a hypervisor scenario
>> though so not sure what I would see.
>
> _OSC negotiates ownership of features between platform firmware and
> OSPM. The "native_pcie_hotplug" and similar bits mean that "IF a
> device advertises the feature, the OS can use it." We clear those
> native_* bits if the platform retains ownership via _OSC.
>
> If BIOS doesn't enable the VMD host bridge and doesn't supply _OSC for
> the domain below it, why would we assume that BIOS retains ownership
> of the features negotiated by _OSC? I think we have to assume the OS
> owns them, which is what happened before 04b12ef163d1.
>
Sorry, this confuses me :) If BIOS doesn't enable VMD (i.e. VMD is
disabled) then all the root ports and devices underneath VMD are visible
to the BIOS and OS so ACPI would run on all of them and the _OSC bits
should be set correctly.
There was a previous suggestion to print what VMD is owning. I think
that is a good idea and I can add that to my patch. It would look like
this (following the way OS support gets printed):
VMD supports PCIeHotplug
VMD supports SHPCHotplug
VMD now owns PCIeHotplug
VMD now owns SHPCHotplug
We could probably do the same thing for the other bits (based on the
native_* bits). That might make it clearer...
Paul
> Bjorn
>
next prev parent reply other threads:[~2024-04-22 21:39 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 20:15 [PATCH] Documentation: PCI: add vmd documentation Paul M Stillwell Jr
2024-04-17 23:51 ` Keith Busch
2024-04-18 15:07 ` Paul M Stillwell Jr
2024-04-18 23:34 ` Keith Busch
2024-04-18 18:26 ` Bjorn Helgaas
2024-04-18 21:51 ` Paul M Stillwell Jr
2024-04-19 21:14 ` Bjorn Helgaas
2024-04-19 22:18 ` Paul M Stillwell Jr
2024-04-22 20:27 ` Bjorn Helgaas
2024-04-22 21:39 ` Paul M Stillwell Jr [this message]
2024-04-22 22:52 ` Bjorn Helgaas
2024-04-22 23:39 ` Paul M Stillwell Jr
2024-04-23 21:26 ` Bjorn Helgaas
2024-04-23 23:10 ` Paul M Stillwell Jr
2024-04-24 0:47 ` Bjorn Helgaas
2024-04-24 21:29 ` Paul M Stillwell Jr
2024-04-25 17:24 ` Bjorn Helgaas
2024-04-25 21:43 ` Paul M Stillwell Jr
2024-04-25 22:32 ` Bjorn Helgaas
2024-04-25 23:32 ` Paul M Stillwell Jr
2024-04-26 21:36 ` Bjorn Helgaas
2024-04-26 21:46 ` Paul M Stillwell Jr
2024-06-12 21:52 ` Paul M Stillwell Jr
2024-06-12 22:25 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ba96aab-10ee-41b4-988b-2609b4d39f33@intel.com \
--to=paul.m.stillwell.jr@intel.com \
--cc=helgaas@kernel.org \
--cc=kai.heng.feng@canonical.com \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=rafael.j.wysocki@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox