Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: <linux-pci@vger.kernel.org>, Keith Busch <kbusch@kernel.org>,
	"Kai-Heng Feng" <kai.heng.feng@canonical.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: Re: [PATCH] Documentation: PCI: add vmd documentation
Date: Wed, 24 Apr 2024 14:29:16 -0700	[thread overview]
Message-ID: <56d5ca30-de62-4f7e-a43e-2affd2d1059a@intel.com> (raw)
In-Reply-To: <20240424004733.GA476130@bhelgaas>

On 4/23/2024 5:47 PM, Bjorn Helgaas wrote:
> On Tue, Apr 23, 2024 at 04:10:37PM -0700, Paul M Stillwell Jr wrote:
>> On 4/23/2024 2:26 PM, Bjorn Helgaas wrote:
>>> On Mon, Apr 22, 2024 at 04:39:19PM -0700, Paul M Stillwell Jr wrote:
>>>> On 4/22/2024 3:52 PM, Bjorn Helgaas wrote:
>>>>> On Mon, Apr 22, 2024 at 02:39:16PM -0700, Paul M Stillwell Jr wrote:
>>>>>> On 4/22/2024 1:27 PM, Bjorn Helgaas wrote:
>>>> ...
>>>
>>>>>>> _OSC negotiates ownership of features between platform firmware and
>>>>>>> OSPM.  The "native_pcie_hotplug" and similar bits mean that "IF a
>>>>>>> device advertises the feature, the OS can use it."  We clear those
>>>>>>> native_* bits if the platform retains ownership via _OSC.
>>>>>>>
>>>>>>> If BIOS doesn't enable the VMD host bridge and doesn't supply _OSC for
>>>>>>> the domain below it, why would we assume that BIOS retains ownership
>>>>>>> of the features negotiated by _OSC?  I think we have to assume the OS
>>>>>>> owns them, which is what happened before 04b12ef163d1.
>>>>>>
>>>>>> Sorry, this confuses me :) If BIOS doesn't enable VMD (i.e. VMD is disabled)
>>>>>> then all the root ports and devices underneath VMD are visible to the BIOS
>>>>>> and OS so ACPI would run on all of them and the _OSC bits should be set
>>>>>> correctly.
>>>>>
>>>>> Sorry, that was confusing.  I think there are two pieces to enabling
>>>>> VMD:
>>>>>
>>>>>      1) There's the BIOS "VMD enable" switch.  If set, the VMD device
>>>>>      appears as an RCiEP and the devices behind it are invisible to the
>>>>>      BIOS.  If cleared, VMD doesn't exist; the VMD RCiEP is hidden and
>>>>>      the devices behind it appear as normal Root Ports with devices below
>>>>>      them.
>>>>>
>>>>>      2) When the BIOS "VMD enable" is set, the OS vmd driver configures
>>>>>      the VMD RCiEP and enumerates things below the VMD host bridge.
>>>>>
>>>>>      In this case, BIOS enables the VMD RCiEP, but it doesn't have a
>>>>>      driver for it and it doesn't know how to enumerate the VMD Root
>>>>>      Ports, so I don't think it makes sense for BIOS to own features for
>>>>>      devices it doesn't know about.
>>>>
>>>> That makes sense to me. It sounds like VMD should own all the features, I
>>>> just don't know how the vmd driver would set the bits other than hotplug
>>>> correctly... We know leaving them on is problematic, but I'm not sure what
>>>> method to use to decide which of the other bits should be set or not.
>>>
>>> My starting assumption would be that we'd handle the VMD domain the
>>> same as other PCI domains: if a device advertises a feature, the
>>> kernel includes support for it, and the kernel owns it, we enable it.
>>
>> I've been poking around and it seems like some things (I was looking for
>> AER) are global to the platform. In my investigation (which is a small
>> sample size of machines) it looks like there is a single entry in the BIOS
>> to enable/disable AER so whatever is in one domain should be the same in all
>> the domains. I couldn't find settings for LTR or the other bits, but I'm not
>> sure what to look for in the BIOS for those.
>>
>> So it seems that there are 2 categories: platform global and device
>> specific. AER and probably some of the others are global and can be copied
>> from one domain to another, but things like hotplug are device specific and
>> should be handled that way.
> 
> _OSC is the only mechanism for negotiating ownership of these
> features, and PCI Firmware r3.3, sec 4.5.1, is pretty clear that _OSC
> only applies to the hierarchy originated by the PNP0A03/PNP0A08 host
> bridge that contains the _OSC method.  AFAICT, there's no
> global/device-specific thing here.
> 
> The BIOS may have a single user-visible setting, and it may apply that
> setting to all host bridge _OSC methods, but that's just part of the
> BIOS UI, not part of the firmware/OS interface.
> 

Fair, but we are still left with the question of how to set the _OSC 
bits for the VMD bridge. This would normally happen using ACPI AFAICT 
and we don't have that for the devices behind VMD.

>>> If a device advertises a feature but there's a hardware problem with
>>> it, the usual approach is to add a quirk to work around the problem.
>>> The Correctable Error issue addressed by 04b12ef163d1 ("PCI: vmd:
>>> Honor ACPI _OSC on PCIe features"), looks like it might be in this
>>> category.
>>
>> I don't think we had a hardware problem with these Samsung (IIRC) devices;
>> the issue was that the vmd driver were incorrectly enabling AER because
>> those native_* bits get set automatically.
> 
> Where do all the Correctable Errors come from?  IMO they're either
> caused by some hardware issue or by a software error in programming
> AER.  It's possible we forget to clear the errors and we just see the
> same error reported over and over.  But I don't think the answer is
> to copy the AER ownership from a different domain.
> 

I looked back at the original bugzilla and I feel like the AER errors 
are a red herring. AER was *supposed* to be disabled, but was 
incorrectly enabled by VMD so we are seeing errors. Yes, they may be 
real errors, but my point is that the user had disabled AER so they 
didn't care if there were errors or not (i.e. if AER had been correctly 
disabled by VMD then the user would not have AER errors in the dmesg 
output).

Kai-Heng even says this in one of his responses here 
https://lore.kernel.org/linux-pci/CAAd53p6hATV8TOcJ9Qi2rMwVi=y_9+tQu6KhDkAm6Y8=cQ_xoA@mail.gmail.com/. 
A quote from his reply "To be more precise, AER is disabled by the 
platform vendor in BIOS to paper over the issue."

Paul

  reply	other threads:[~2024-04-24 21:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 20:15 [PATCH] Documentation: PCI: add vmd documentation Paul M Stillwell Jr
2024-04-17 23:51 ` Keith Busch
2024-04-18 15:07   ` Paul M Stillwell Jr
2024-04-18 23:34     ` Keith Busch
2024-04-18 18:26 ` Bjorn Helgaas
2024-04-18 21:51   ` Paul M Stillwell Jr
2024-04-19 21:14     ` Bjorn Helgaas
2024-04-19 22:18       ` Paul M Stillwell Jr
2024-04-22 20:27         ` Bjorn Helgaas
2024-04-22 21:39           ` Paul M Stillwell Jr
2024-04-22 22:52             ` Bjorn Helgaas
2024-04-22 23:39               ` Paul M Stillwell Jr
2024-04-23 21:26                 ` Bjorn Helgaas
2024-04-23 23:10                   ` Paul M Stillwell Jr
2024-04-24  0:47                     ` Bjorn Helgaas
2024-04-24 21:29                       ` Paul M Stillwell Jr [this message]
2024-04-25 17:24                         ` Bjorn Helgaas
2024-04-25 21:43                           ` Paul M Stillwell Jr
2024-04-25 22:32                             ` Bjorn Helgaas
2024-04-25 23:32                               ` Paul M Stillwell Jr
2024-04-26 21:36                                 ` Bjorn Helgaas
2024-04-26 21:46                                   ` Paul M Stillwell Jr
2024-06-12 21:52                                     ` Paul M Stillwell Jr
2024-06-12 22:25                                       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56d5ca30-de62-4f7e-a43e-2affd2d1059a@intel.com \
    --to=paul.m.stillwell.jr@intel.com \
    --cc=helgaas@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox