From: Bjorn Helgaas <helgaas@kernel.org>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-pci@vger.kernel.org,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Sean Kelley <sean.v.kelley@linux.intel.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
linuxarm@huawei.com, Austin Bolen <Austin.Bolen@dell.com>
Subject: Re: [PATCH v2] PCI/AER: Do not reset the port device status if doing firmware first handling.
Date: Fri, 17 Jul 2020 15:01:29 -0500 [thread overview]
Message-ID: <20200717200129.GA671299@bjorn-Precision-5520> (raw)
In-Reply-To: <20200622113523.891666-1-Jonathan.Cameron@huawei.com>
[+cc Austin, _OSC expert]
On Mon, Jun 22, 2020 at 07:35:23PM +0800, Jonathan Cameron wrote:
> pci_aer_clear_device_status() currently resets the device status
> (PCI_EXP_DEVSTA) on the downstream port above a device, or the port itself
> if the port is the reported AER error source. This happens even when error
> handling is firmware first.
>
> Our interpretation is that firmware first handling means that the firmware
> will deal with clearing all relevant error reporting registers
> including this one.
IMO "firmware-first" is meaningless to the kernel. I see the bit
defined in the ACPI HEST records (ACPI v6.3, sec 18.3.2.4), but there
is no indication of anything the OS needs to *do* with it. It does
not influence the result of pcie_aer_is_native(). So I don't want to
mention it in the subject or commit log.
But I think what the _OSC negotiation for AER ownership is relevant,
and that's what your patch tests, so I think this is the right thing
to do.
So I applied this as below to pci/error for v5.8, thanks a lot!
Oh, I also propose a preliminary patch (posted and cc'd to you) to
rename pci_aer_clear_device_status():
https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org
commit d6c8d24e3d5d ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER")
Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Mon Jun 22 19:35:23 2020 +0800
PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
pcie_clear_device_status() resets the error bits in the PCIe Device Status
Register (PCI_EXP_DEVSTA).
Previously we did this unconditionally, but on ACPI systems, the _OSC AER
bit negotiates control of the AER capability. Per sec 4.5.1 of the System
Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers
other error enable/status bits including the following:
Correctable Error Reporting Enable
Non-Fatal Error Reporting Enable
Fatal Error Reporting Enable
Unsupported Request Reporting Enable
These bits are all in the PCIe Device Control register (the ECN omitted
"Reporting", but I think that's a typo), so by implication the _OSC AER bit
also applies to the error status bits in the PCIe Device Status register:
Correctable Error Detected
Non-Fatal Error Detected
Fatal Error Detected
Unsupported Request Detected
Clear the PCIe Device Status error bits only when the OS controls the AER
capability and related error enable/status bits. If platform firmware
controls the AER capability, firmware is responsible for clearing these
bits.
One call path leading here is:
ghes_do_proc
ghes_handle_aer
aer_recover_queue
schedule_work(&aer_recover_work)
...
aer_recover_work_func
pcie_do_recovery
pcie_clear_device_status
[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
2020, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/14076
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d3ea667c8520..34bfea5c52b3 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -245,6 +245,9 @@ void pcie_clear_device_status(struct pci_dev *dev)
{
u16 sta;
+ if (!pcie_aer_is_native(dev))
+ return;
+
pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
}
next prev parent reply other threads:[~2020-07-17 20:01 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-22 11:35 [PATCH v2] PCI/AER: Do not reset the port device status if doing firmware first handling Jonathan Cameron
2020-07-17 20:01 ` Bjorn Helgaas [this message]
2020-07-17 21:39 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200717200129.GA671299@bjorn-Precision-5520 \
--to=helgaas@kernel.org \
--cc=Austin.Bolen@dell.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=linux-pci@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=lorenzo.pieralisi@arm.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=sean.v.kelley@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox