From: Bjorn Helgaas <helgaas@kernel.org>
To: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Cc: linux-pci@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
Kai-Heng Feng <kai.heng.feng@canonical.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: Re: [PATCH] Documentation: PCI: add vmd documentation
Date: Wed, 12 Jun 2024 17:25:52 -0500 [thread overview]
Message-ID: <20240612222552.GA1042342@bhelgaas> (raw)
In-Reply-To: <de0260ed-dfc0-47d4-a8d0-806bbe7b1e4b@intel.com>
On Wed, Jun 12, 2024 at 02:52:38PM -0700, Paul M Stillwell Jr wrote:
> On 4/26/2024 2:46 PM, Paul M Stillwell Jr wrote:
> > On 4/26/2024 2:36 PM, Bjorn Helgaas wrote:
> > > On Thu, Apr 25, 2024 at 04:32:21PM -0700, Paul M Stillwell Jr wrote:
> > > > On 4/25/2024 3:32 PM, Bjorn Helgaas wrote:
> > > > > On Thu, Apr 25, 2024 at 02:43:07PM -0700, Paul M Stillwell Jr wrote:
> > > > > > On 4/25/2024 10:24 AM, Bjorn Helgaas wrote:
> > > > > > > On Wed, Apr 24, 2024 at 02:29:16PM -0700, Paul M Stillwell Jr wrote:
> > > > > > > > On 4/23/2024 5:47 PM, Bjorn Helgaas wrote:
> > > > > > ...
> > > > >
> > > > > > > > > _OSC is the only mechanism for negotiating ownership of these
> > > > > > > > > features, and PCI Firmware r3.3, sec 4.5.1,
> > > > > > > > > is pretty clear that _OSC
> > > > > > > > > only applies to the hierarchy originated by
> > > > > > > > > the PNP0A03/PNP0A08 host
> > > > > > > > > bridge that contains the _OSC method. AFAICT, there's no
> > > > > > > > > global/device-specific thing here.
> > > > > > > > >
> > > > > > > > > The BIOS may have a single user-visible
> > > > > > > > > setting, and it may apply that
> > > > > > > > > setting to all host bridge _OSC methods, but
> > > > > > > > > that's just part of the
> > > > > > > > > BIOS UI, not part of the firmware/OS interface.
> > > > > > > >
> > > > > > > > Fair, but we are still left with the question of
> > > > > > > > how to set the _OSC bits
> > > > > > > > for the VMD bridge. This would normally happen
> > > > > > > > using ACPI AFAICT and we
> > > > > > > > don't have that for the devices behind VMD.
> > > > > > >
> > > > > > > In the absence of a mechanism for negotiating
> > > > > > > ownership, e.g., an ACPI
> > > > > > > host bridge device for the hierarchy, the OS owns all the PCIe
> > > > > > > features.
> > > > > >
> > > > > > I'm new to this space so I don't know what it means for the OS to
> > > > > > own the features. In other words, how would the vmd driver figure
> > > > > > out what features are supported?
> > > > >
> > > > > There are three things that go into this:
> > > > >
> > > > > - Does the OS support the feature, e.g., is CONFIG_PCIEAER enabled?
> > > > >
> > > > > - Has the platform granted permission to the OS to use the feature,
> > > > > either explicitly via _OSC or implicitly because there's no
> > > > > mechanism to negotiate ownership?
> > > > >
> > > > > - Does the device advertise the feature, e.g., does it have an AER
> > > > > Capability?
> > > > >
> > > > > If all three are true, Linux enables the feature.
> > > > >
> > > > > I think vmd has implicit ownership of all features because there is no
> > > > > ACPI host bridge device for the VMD domain, and (IMO) that means there
> > > > > is no way to negotiate ownership in that domain.
> > > > >
> > > > > So the VMD domain starts with all the native_* bits set, meaning Linux
> > > > > owns the features. If the vmd driver doesn't want some feature to be
> > > > > used, it could clear the native_* bit for it.
> > > > >
> > > > > I don't think vmd should unilaterally claim ownership of features by
> > > > > *setting* native_* bits because that will lead to conflicts with
> > > > > firmware.
> > > >
> > > > This is the crux of the problem IMO. I'm happy to set the native_* bits
> > > > using some knowledge about what the firmware wants, but we don't have a
> > > > mechanism to do it AFAICT. I think that's what commit
> > > > 04b12ef163d1 ("PCI:
> > > > vmd: Honor ACPI _OSC on PCIe features") was trying to do: use a
> > > > domain that
> > > > ACPI had run on and negotiated features and apply them to the
> > > > vmd domain.
> > >
> > > Yes, this is the problem. We have no information about what firmware
> > > wants for the VMD domain because there is no ACPI host bridge device.
> > >
> > > We have information about what firmware wants for *other* domains.
> > > 04b12ef163d1 assumes that also applies to the VMD domain, but I don't
> > > think that's a good idea.
> > >
> > > > Using the 3 criteria you described above, could we do this for
> > > > the hotplug
> > > > feature for VMD:
> > > >
> > > > 1. Use IS_ENABLED(CONFIG_<whatever hotplug setting we need>) to
> > > > check to see
> > > > if the hotplug feature is enabled
> > >
> > > That's already there.
> > >
> > > > 2. We know that for VMD we want hotplug enabled so that is the implicit
> > > > permission
> > >
> > > The VMD domain starts with all native_* bits set. All you have to do
> > > is avoid *clearing* them.
> > >
> > > The problem (IMO) is that 04b12ef163d1 clears bits based on the _OSC
> > > for some other domain.
> > >
> > > > 3. Look at the root ports below VMD and see if hotplug capability is set
> > >
> > > This is already there, too.
> > >
> > > > If 1 & 3 are true, then we set the native_* bits for hotplug (we
> > > > can look
> > > > for surprise hotplug as well in the capability to set the
> > > > native_shpc_hotplug bit corrrectly) to 1. This feels like it
> > > > would solve the
> > > > problem of "what if there is a hotplug issue on the platform"
> > > > because the
> > > > user would have disabled hotplug for VMD and the root ports
> > > > below it would
> > > > have the capability turned off.
> > > >
> > > > In theory we could do this same thing for all the features, but we don't
> > > > know what the firmware wants for features other than hotplug (because we
> > > > implicitly know that vmd wants hotplug). I feel like
> > > > 04b12ef163d1 is a good
> > > > compromise for the other features, but I hear your issues with it.
> > > >
> > > > I'm happy to "do the right thing" for the other features, I just can't
> > > > figure out what that thing is :)
> > >
> > > 04b12ef163d1 was motivated by a flood of Correctable Errors.
> > >
> > > Kai-Heng says the errors occur even when booting with
> > > "pcie_ports=native" and VMD turned off, i.e., when the VMD RCiEP is
> > > disabled and the NVMe devices appear under plain Root Ports in domain
> > > 0000. That suggests that they aren't related to VMD at all.
> > >
> > > I think there's a significant chance that those errors are caused by a
> > > software defect, e.g., ASPM configuration. There are many similar
> > > reports of Correctable Errors where "pcie_aspm=off" is a workaround.
> > >
> > > If we can nail down the root cause of those Correctable Errors, we may
> > > be able to fix it and just revert 04b12ef163d1. That would leave all
> > > the PCIe features owned and enabled by Linux in the VMD domain. AER
> > > would be enabled and not a problem, hotplug would be enabled as you
> > > need, etc.
> > >
> > > There are a zillion reports of these errors and I added comments to
> > > some to see if anybody can help us get to a root cause.
> >
> > OK, sounds like the plan for me is to sit tight for now WRT a patch to
> > fix hotplug. I'll submit a v2 for the documentation.
>
> Any update on fixes for the errors?
Ah, sorry, I need to get back to this. I don't think it's
ASPM-related; I think it's just that disabling ASPM also causes AER to
be disabled, so the errors probably still occur but we ignore them.
Bjorn
prev parent reply other threads:[~2024-06-12 22:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 20:15 [PATCH] Documentation: PCI: add vmd documentation Paul M Stillwell Jr
2024-04-17 23:51 ` Keith Busch
2024-04-18 15:07 ` Paul M Stillwell Jr
2024-04-18 23:34 ` Keith Busch
2024-04-18 18:26 ` Bjorn Helgaas
2024-04-18 21:51 ` Paul M Stillwell Jr
2024-04-19 21:14 ` Bjorn Helgaas
2024-04-19 22:18 ` Paul M Stillwell Jr
2024-04-22 20:27 ` Bjorn Helgaas
2024-04-22 21:39 ` Paul M Stillwell Jr
2024-04-22 22:52 ` Bjorn Helgaas
2024-04-22 23:39 ` Paul M Stillwell Jr
2024-04-23 21:26 ` Bjorn Helgaas
2024-04-23 23:10 ` Paul M Stillwell Jr
2024-04-24 0:47 ` Bjorn Helgaas
2024-04-24 21:29 ` Paul M Stillwell Jr
2024-04-25 17:24 ` Bjorn Helgaas
2024-04-25 21:43 ` Paul M Stillwell Jr
2024-04-25 22:32 ` Bjorn Helgaas
2024-04-25 23:32 ` Paul M Stillwell Jr
2024-04-26 21:36 ` Bjorn Helgaas
2024-04-26 21:46 ` Paul M Stillwell Jr
2024-06-12 21:52 ` Paul M Stillwell Jr
2024-06-12 22:25 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240612222552.GA1042342@bhelgaas \
--to=helgaas@kernel.org \
--cc=kai.heng.feng@canonical.com \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=paul.m.stillwell.jr@intel.com \
--cc=rafael.j.wysocki@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox