From: Bjorn Helgaas <helgaas@kernel.org>
To: Vidya Sagar <vidyas@nvidia.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>,
Vikram Sethi <vsethi@nvidia.com>,
Thierry Reding <treding@nvidia.com>,
Jonathan Hunter <jonathanh@nvidia.com>,
Krishna Thota <kthota@nvidia.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Question: Clearing error bits in the root port post enumeration
Date: Fri, 3 Nov 2023 13:20:53 -0500 [thread overview]
Message-ID: <20231103182053.GA160440@bhelgaas> (raw)
In-Reply-To: <BN8PR12MB29005068F39DE028F19084EDB8A0A@BN8PR12MB2900.namprd12.prod.outlook.com>
On Tue, Oct 31, 2023 at 12:26:31PM +0000, Vidya Sagar wrote:
> Hi folks,
>
> I would like to know your comments on the following scenario where
> we are observing the root port logging errors because of the
> enumeration flow being followed.
>
> DUT information:
> - Has a root port and an endpoint connected to it
> - Uses ECAM mechanism to access the configuration space
> - Booted through ACPI flow
> - Has a Firmware-First approach for handling the errors
> - System is configured to treat Unsupported Requests as
> AdvisoryNon-Fatal errors
>
> As we all know, when a configuration read request comes in for a
> device number that is not implemented, a UR would be returned as per
> the PCIe spec.
>
> As part of the enumeration flow on DUT, when the kernel reads offset
> 0x0 of B:D:F=0:0:0, the root port responds with its valid Vendor-ID
> and Device-ID values. But, when B:D:F=0:1:0 is probed, since there
> is no device present there, the root port responds with an
> Unsupported Request and simultaneously logs the same in the Device
> Status register (i.e. bit-3). Because of it, there is a UR logged
> in the Device Status register of the RP by the time enumeration is
> complete.
>
> In the case of AER capability natively owned by the kernel, the AER
> driver's init call would clear all such pending bits.
>
> Since we are going with the Firmware-First approach, and the system
> is configured to treat Unsupported Requests as AdvisoryNon-Fatal
> errors, only a correctable error interrupt can be raised to the
> Firmware which takes care of clearing the corresponding status
> registers. The firmware can't know about the UnsupReq bit being set
> as the interrupt it received is for a correctable error hence it
> clears only bits related to correctable error.
>
> All these events leave a freshly booted system with the following
> bits set.
>
> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- (MAbort)
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- (UnsupReq)
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- (UnsupReq)
>
> Since the reason for UR is well understood at this point, I would
> like to weigh in on the idea of clearing the aforementioned bits in
> the root port once the enumeration is done particularly to cater to
> the configurations where Firmware-First approach is in place.
> Please let me know your comments on this approach.
I think Secondary status (PCI_SEC_STATUS) is always owned by the OS
and is not affected by _OSC negotiation, right? Linux does basically
nothing with that today, but I think it *could* clear the "Received
Master Abort" bit.
I'm not very familiar with Advisory Non-Fatal errors. I'm curious
about the UESta situation: why can't firmware know about UnsupReq
being set? I assume PCI_ERR_COR_ADV_NFAT is the Correctable Error
Status bit the firmware *does* see and clear.
But isn't the whole point of Advisory Non-Fatal errors that an error
that is logged as an Uncorrectable Error and that normally would be
signaled with ERR_NONFATAL is signaled with ERR_COR instead? So
doesn't PCI_ERR_COR_ADV_NFAT being set imply that some
PCI_ERR_UNCOR_STATUS must be set as well? If so, I would think
firmware *could* figure that out and clear the PCI_ERR_UNCOR_STATUS
bit.
Bjorn
next prev parent reply other threads:[~2023-11-03 18:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-31 12:26 Question: Clearing error bits in the root port post enumeration Vidya Sagar
2023-11-03 18:20 ` Bjorn Helgaas [this message]
2023-11-07 3:14 ` Vidya Sagar
2023-11-07 15:29 ` Bjorn Helgaas
2023-11-08 7:17 ` Vidya Sagar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231103182053.GA160440@bhelgaas \
--to=helgaas@kernel.org \
--cc=jonathanh@nvidia.com \
--cc=kthota@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=treding@nvidia.com \
--cc=vidyas@nvidia.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).