public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-pci@vger.kernel.org,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Sean Kelley <sean.v.kelley@linux.intel.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	linuxarm@huawei.com, Austin Bolen <Austin.Bolen@dell.com>
Subject: Re: [PATCH v2] PCI/AER: Do not reset the port device status if doing firmware first handling.
Date: Fri, 17 Jul 2020 15:01:29 -0500	[thread overview]
Message-ID: <20200717200129.GA671299@bjorn-Precision-5520> (raw)
In-Reply-To: <20200622113523.891666-1-Jonathan.Cameron@huawei.com>

[+cc Austin, _OSC expert]

On Mon, Jun 22, 2020 at 07:35:23PM +0800, Jonathan Cameron wrote:
> pci_aer_clear_device_status() currently resets the device status
> (PCI_EXP_DEVSTA) on the downstream port above a device, or the port itself
> if the port is the reported AER error source.  This happens even when error
> handling is firmware first.
> 
> Our interpretation is that firmware first handling means that the firmware
> will deal with clearing all relevant error reporting registers
> including this one.

IMO "firmware-first" is meaningless to the kernel.  I see the bit
defined in the ACPI HEST records (ACPI v6.3, sec 18.3.2.4), but there
is no indication of anything the OS needs to *do* with it.  It does
not influence the result of pcie_aer_is_native().  So I don't want to
mention it in the subject or commit log.

But I think what the _OSC negotiation for AER ownership is relevant,
and that's what your patch tests, so I think this is the right thing
to do.

So I applied this as below to pci/error for v5.8, thanks a lot!

Oh, I also propose a preliminary patch (posted and cc'd to you) to
rename pci_aer_clear_device_status():

  https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org

commit d6c8d24e3d5d ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER")
Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date:   Mon Jun 22 19:35:23 2020 +0800

    PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
    
    pcie_clear_device_status() resets the error bits in the PCIe Device Status
    Register (PCI_EXP_DEVSTA).
    
    Previously we did this unconditionally, but on ACPI systems, the _OSC AER
    bit negotiates control of the AER capability.  Per sec 4.5.1 of the System
    Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers
    other error enable/status bits including the following:
    
      Correctable Error Reporting Enable
      Non-Fatal Error Reporting Enable
      Fatal Error Reporting Enable
      Unsupported Request Reporting Enable
    
    These bits are all in the PCIe Device Control register (the ECN omitted
    "Reporting", but I think that's a typo), so by implication the _OSC AER bit
    also applies to the error status bits in the PCIe Device Status register:
    
      Correctable Error Detected
      Non-Fatal Error Detected
      Fatal Error Detected
      Unsupported Request Detected
    
    Clear the PCIe Device Status error bits only when the OS controls the AER
    capability and related error enable/status bits.  If platform firmware
    controls the AER capability, firmware is responsible for clearing these
    bits.
    
    One call path leading here is:
    
      ghes_do_proc
        ghes_handle_aer
          aer_recover_queue
            schedule_work(&aer_recover_work)
      ...
      aer_recover_work_func
        pcie_do_recovery
          pcie_clear_device_status
    
    [1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
        2020, affecting PCI Firmware Specification, Rev. 3.2
        https://members.pcisig.com/wg/PCI-SIG/document/14076
    [bhelgaas: commit log]
    Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d3ea667c8520..34bfea5c52b3 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -245,6 +245,9 @@ void pcie_clear_device_status(struct pci_dev *dev)
 {
 	u16 sta;
 
+	if (!pcie_aer_is_native(dev))
+		return;
+
 	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
 	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
 }

  reply	other threads:[~2020-07-17 20:01 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 11:35 [PATCH v2] PCI/AER: Do not reset the port device status if doing firmware first handling Jonathan Cameron
2020-07-17 20:01 ` Bjorn Helgaas [this message]
2020-07-17 21:39   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200717200129.GA671299@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=Austin.Bolen@dell.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sean.v.kelley@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox