From: Breno Leitao <leitao@debian.org>
To: Jonathan Corbet <corbet@lwn.net>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>,
Bjorn Helgaas <bhelgaas@google.com>,
kbusch@kernel.org
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
dcostantino@meta.com, rneu@meta.com, kernel-team@meta.com,
Breno Leitao <leitao@debian.org>
Subject: [PATCH] PCI/AER: Add option to panic on unrecoverable errors
Date: Fri, 06 Feb 2026 10:23:11 -0800 [thread overview]
Message-ID: <20260206-pci-v1-1-85160f02d956@debian.org> (raw)
When a device lacks an error_detected callback, AER recovery fails and
the device is left in a disconnected state. This can mask serious
hardware issues during development and testing.
Add a module parameter 'aer_unrecoverable_fatal' that panics the kernel
instead, making such failures immediately visible. The parameter
defaults to false to preserve existing behavior.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
In environments where all hardware must be fully operational, silently
leaving a device in a disconnected state after an AER recovery failure
is unacceptable. This is common in high-reliability systems, production
servers, and testing infrastructure where a degraded system should not
continue running.
This patch adds a module parameter that allows administrators to enforce
a strict policy: if a device cannot recover from an AER error, the
kernel panics instead of continuing with degraded hardware. This ensures
that hardware failures are immediately visible and can trigger
appropriate remediation (restart, failover, alerting).
---
Documentation/admin-guide/kernel-parameters.txt | 9 +++++++++
drivers/pci/pcie/err.c | 3 +++
drivers/pci/pcie/portdrv.c | 7 +++++++
drivers/pci/pcie/portdrv.h | 1 +
4 files changed, 20 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1058f2a6d6a8c..ff95c24280e3c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5240,6 +5240,15 @@ Kernel parameters
nomsi Do not use MSI for native PCIe PME signaling (this makes
all PCIe root ports use INTx for all services).
+ pcieportdrv.aer_unrecoverable_fatal=
+ [PCIE] Panic on unrecoverable AER errors:
+ 0 Log the error and leave the device in a disconnected
+ state (default).
+ 1 Panic the kernel when a device cannot recover from an
+ AER error (no error_detected callback). Useful for
+ high-reliability systems where degraded hardware is
+ unacceptable.
+
pcmv= [HW,PCMCIA] BadgePAD 4
pd_ignore_unused
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index bebe4bc111d75..788484791902e 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -73,6 +73,9 @@ static int report_error_detected(struct pci_dev *dev,
if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
vote = PCI_ERS_RESULT_NO_AER_DRIVER;
pci_info(dev, "can't recover (no error_detected callback)\n");
+ if (aer_unrecoverable_fatal)
+ panic("AER: %s: no error_detected callback\n",
+ pci_name(dev));
} else {
vote = PCI_ERS_RESULT_NONE;
}
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 38a41ccf79b9a..a411f60ff50ce 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -22,6 +22,13 @@
#include "../pci.h"
#include "portdrv.h"
+#ifdef CONFIG_PCIEAER
+bool aer_unrecoverable_fatal;
+module_param(aer_unrecoverable_fatal, bool, 0644);
+MODULE_PARM_DESC(aer_unrecoverable_fatal,
+ "Panic if a device cannot recover from an AER error (default: false)");
+#endif
+
/*
* The PCIe Capability Interrupt Message Number (PCIe r3.1, sec 7.8.2) must
* be one of the first 32 MSI-X entries. Per PCI r3.0, sec 6.8.3.1, MSI
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index bd29d1cc7b8bd..6c67b18de93c9 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -29,6 +29,7 @@ extern bool pcie_ports_dpc_native;
#ifdef CONFIG_PCIEAER
int pcie_aer_init(void);
+extern bool aer_unrecoverable_fatal;
#else
static inline int pcie_aer_init(void) { return 0; }
#endif
---
base-commit: 6bd9ed02871f22beb0e50690b0c3caf457104f7c
change-id: 20260206-pci-362cf172187f
Best regards,
--
Breno Leitao <leitao@debian.org>
next reply other threads:[~2026-02-06 18:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 18:23 Breno Leitao [this message]
2026-02-06 18:41 ` [PATCH] PCI/AER: Add option to panic on unrecoverable errors Lukas Wunner
2026-02-06 18:50 ` Keith Busch
2026-02-06 18:52 ` Bjorn Helgaas
2026-02-06 19:22 ` Keith Busch
2026-02-06 20:53 ` Lukas Wunner
2026-02-06 21:10 ` Lukas Wunner
2026-02-07 5:55 ` Keith Busch
2026-02-09 14:28 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260206-pci-v1-1-85160f02d956@debian.org \
--to=leitao@debian.org \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=dcostantino@meta.com \
--cc=kbusch@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=rneu@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox