linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	oohall@gmail.com, Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Yazen Ghannam <yazen.ghannam@amd.com>,
	Fontenot Nathan <Nathan.Fontenot@amd.com>
Subject: Re: [PATCH 1/2] PCI: pciehp: Add support for OS-First Hotplug and AER/DPC
Date: Thu, 11 May 2023 17:23:26 +0200	[thread overview]
Message-ID: <20230511152326.GA16215@wunner.de> (raw)
In-Reply-To: <20230510201937.GA11550@wunner.de>

On Wed, May 10, 2023 at 10:19:37PM +0200, Lukas Wunner wrote:
> Below please find a patch which
> sets the Surprise Down Error mask bit.  Could you test if this fixes
> the issue for you?

Sorry, I failed to appreciate that pcie_capability_set_dword()
can't be used to RMW the AER capability.  Replacement patch below.

-- >8 --

From: Lukas Wunner <lukas@wunner.de>
Subject: [PATCH] PCI: pciehp: Disable Surprise Down Error reporting

On hotplug ports capable of surprise removal, Surprise Down Errors are
expected and no reason for AER or DPC to spring into action.  Although
a Surprise Down event might be caused by an error, software cannot
discern that from regular surprise removal.

Any well-behaved BIOS should mask such errors, but Smita reports a case
where hot-removing an Intel NVMe SSD [8086:0a54] from an AMD Root Port
[1022:14ab] results in irritating AER log messages and a delay of more
than 1 second caused by DPC handling:

  pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
  pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
  pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
  pcieport 0000:00:01.4:   device [1022:14ab] error status/mask=00000020/04004000
  pcieport 0000:00:01.4:    [ 5] SDES (First)
  nvme nvme2: frozen state error detected, reset controller
  pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
  pcieport 0000:00:01.4: AER: subordinate device reset failed
  pcieport 0000:00:01.4: AER: device recovery failed
  pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
  nvme2n1: detected capacity change from 1953525168 to 0
  pci 0000:04:00.0: Removing from iommu group 49

Avoid by masking Surprise Down Errors on hotplug ports capable of
surprise removal.

Mask them even if AER or DPC is handled by firmware because if hotplug
control was granted to the operating system, it owns hotplug and thus
Surprise Down events.  So firmware has no business reporting or reacting
to them.

Reported-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Link: https://lore.kernel.org/all/20221101000719.36828-2-Smita.KoralahalliChannabasappa@amd.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_hpc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index f8c70115b691..40a721f3b713 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -984,8 +984,9 @@ static inline int pcie_hotplug_depth(struct pci_dev *dev)
 struct controller *pcie_init(struct pcie_device *dev)
 {
 	struct controller *ctrl;
-	u32 slot_cap, slot_cap2, link_cap;
+	u32 slot_cap, slot_cap2, link_cap, aer_cap;
 	u8 poweron;
+	u16 aer;
 	struct pci_dev *pdev = dev->port;
 	struct pci_bus *subordinate = pdev->subordinate;
 
@@ -1030,6 +1031,17 @@ struct controller *pcie_init(struct pcie_device *dev)
 	if (dmi_first_match(inband_presence_disabled_dmi_table))
 		ctrl->inband_presence_disabled = 1;
 
+	/*
+	 * Surprise Down Errors are par for the course on Hot-Plug Surprise
+	 * capable ports, so disable reporting in case BIOS left it enabled.
+	 */
+	aer = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
+	if (aer && slot_cap & PCI_EXP_SLTCAP_HPS) {
+		pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_MASK, &aer_cap);
+		aer_cap |= PCI_ERR_UNC_SURPDN;
+		pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_MASK, aer_cap);
+	}
+
 	/* Check if Data Link Layer Link Active Reporting is implemented */
 	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
 
-- 
2.39.2


  reply	other threads:[~2023-05-11 15:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01  0:07 [PATCH 0/2] PCI: pciehp: Add support for OS-First Hotplug Smita Koralahalli
2022-11-01  0:07 ` [PATCH 1/2] PCI: pciehp: Add support for OS-First Hotplug and AER/DPC Smita Koralahalli
2022-11-02 23:21   ` Bjorn Helgaas
2023-02-14  9:31     ` Smita Koralahalli
2022-11-04 10:15   ` Lukas Wunner
2023-02-14  9:33     ` Smita Koralahalli
2023-03-14 19:31       ` Smita Koralahalli
2023-05-10 20:19       ` Lukas Wunner
2023-05-11 15:23         ` Lukas Wunner [this message]
2023-05-15 19:20           ` Smita Koralahalli
2023-05-15 19:38             ` Lukas Wunner
2023-05-15 20:56               ` Smita Koralahalli
2023-05-16 10:14                 ` Lukas Wunner
2022-11-09 19:12   ` Sathyanarayanan Kuppuswamy
2023-02-14  9:34     ` Smita Koralahalli
2022-11-01  0:07 ` [PATCH 2/2] PCI:pciehp: Clear 10-bit tags unconditionally on a hot-plug event Smita Koralahalli
2022-11-02 23:12   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230511152326.GA16215@wunner.de \
    --to=lukas@wunner.de \
    --cc=Nathan.Fontenot@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).