From: Bjorn Helgaas <helgaas@kernel.org>
To: Terry Bowman <terry.bowman@amd.com>
Cc: dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
dan.j.williams@intel.com, bhelgaas@google.com,
shiju.jose@huawei.com, ming.li@zohomail.com,
Smita.KoralahalliChannabasappa@amd.com, rrichter@amd.com,
dan.carpenter@linaro.org, PradeepVineshReddy.Kodamati@amd.com,
lukas@wunner.de, Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-cxl@vger.kernel.org, alucerop@amd.com, ira.weiny@intel.com,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [RESEND v13 15/25] CXL/PCI: Introduce PCI_ERS_RESULT_PANIC
Date: Tue, 4 Nov 2025 13:03:53 -0600 [thread overview]
Message-ID: <20251104190353.GA1865360@bhelgaas> (raw)
In-Reply-To: <20251104170305.4163840-16-terry.bowman@amd.com>
On Tue, Nov 04, 2025 at 11:02:55AM -0600, Terry Bowman wrote:
> The CXL driver's error handling for uncorrectable errors (UCE) will be
> updated in the future. A required change is for the error handlers to
> to force a system panic when a UCE is detected.
>
> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
This patch doesn't actually *do* anything. There's no possibility of a
bisect landing on it. I think it would be better to combine this with
something that *uses* PCI_ERS_RESULT_PANIC, maybe the merge_result()
update?
Suggest possible subject prefix of "PCI/ERR" since this really isn't
CXL-specific; it just so happens that you don't know of uses outside
CXL.
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -102,6 +102,8 @@ Possible return values are::
> PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
> PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
> PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
> + PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
"AER capabilities" is confusingly similar to the PCIe AER Capability.
I think this really means "there's no
pci_error_handlers.error_detected() callback".
> + PCI_ERS_RESULT_PANIC, /* System is unstable, panic. Is CXL specific */
> };
>
> A driver does not have to implement all of these callbacks; however,
> @@ -116,6 +118,10 @@ The actual steps taken by a platform to recover from a PCI error
> event will be platform-dependent, but will follow the general
> sequence described below.
>
> +PCI_ERS_RESULT_PANIC is currently unique to CXL and handled in CXL
> +cxl_do_recovery(). The PCI pcie_do_recovery() routine does not report or
> +handle PCI_ERS_RESULT_PANIC.
I'm not sure all these mentions of being CXL specific are really
helpful. I don't think they are actionable to driver writers.
> STEP 0: Error Event
> -------------------
> A PCI bus error is detected by the PCI hardware. On powerpc, the slot
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 5c4759078d2f..cffa5535f28d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -890,6 +890,9 @@ enum pci_ers_result {
>
> /* No AER capabilities registered for the driver */
> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> +
> + /* System is unstable, panic. Is CXL specific */
> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
> };
>
> /* PCI bus error event callbacks */
> --
> 2.34.1
>
next prev parent reply other threads:[~2025-11-04 19:03 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 17:02 [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-11-04 17:02 ` [RESEND v13 01/25] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-11-04 17:50 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-12-08 18:04 ` Bjorn Helgaas
2025-12-08 22:13 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 02/25] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-11-04 17:52 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-11-19 15:55 ` Bowman, Terry
2025-11-19 23:34 ` dan.j.williams
2025-11-21 20:31 ` Gregory Price
2025-11-04 17:02 ` [RESEND v13 03/25] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-11-04 17:53 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 04/25] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 05/25] cxl: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 06/25] cxl: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
2025-11-04 18:03 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-19 16:07 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 07/25] CXL/AER: Replace device_lock() in cxl_rch_handle_error_iter() with guard() lock Terry Bowman
2025-11-04 18:05 ` Jonathan Cameron
2025-11-04 19:53 ` Dave Jiang
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 08/25] CXL/AER: Move AER drivers RCH error handling into pcie/aer_cxl_rch.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-19 8:26 ` Lukas Wunner
2025-11-19 23:36 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 09/25] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-11-04 18:08 ` Jonathan Cameron
2025-11-04 18:26 ` Bjorn Helgaas
2025-11-04 17:02 ` [RESEND v13 10/25] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-11-04 18:10 ` Jonathan Cameron
2025-11-11 8:17 ` Alison Schofield
2025-11-19 3:19 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-11-19 3:27 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 12/25] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-11-19 21:23 ` dan.j.williams
2025-11-19 22:02 ` Bowman, Terry
2025-11-19 23:40 ` dan.j.williams
2025-11-21 14:56 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 13/25] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-11-05 8:30 ` Alejandro Lucero Palau
2025-11-19 22:00 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 14/25] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-11-04 18:15 ` Jonathan Cameron
2025-11-04 20:03 ` Dave Jiang
2025-11-11 8:23 ` Alison Schofield
2025-11-04 17:02 ` [RESEND v13 15/25] CXL/PCI: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2025-11-04 19:03 ` Bjorn Helgaas [this message]
2025-11-20 0:17 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors Terry Bowman
2025-11-20 0:44 ` dan.j.williams
2025-11-20 0:53 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver Terry Bowman
2025-11-05 17:51 ` Gregory Price
2025-11-05 19:03 ` Gregory Price
2025-11-05 22:26 ` Gregory Price
2025-11-06 17:11 ` Gregory Price
2025-11-06 23:32 ` Bowman, Terry
2025-11-11 8:33 ` Alison Schofield
2025-11-13 21:42 ` Alison Schofield
2025-11-13 22:39 ` Bowman, Terry
2025-11-20 1:24 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 18/25] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
2025-11-04 18:18 ` Jonathan Cameron
2025-11-04 20:15 ` Dave Jiang
2025-11-04 17:02 ` [RESEND v13 19/25] cxl/pci: Introduce CXL protocol error handlers for Endpoints Terry Bowman
2025-11-04 18:29 ` Jonathan Cameron
2025-11-04 19:09 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 20/25] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-11-04 18:32 ` Jonathan Cameron
2025-11-04 21:20 ` Dave Jiang
2025-11-04 21:27 ` Bowman, Terry
2025-11-04 23:39 ` Dave Jiang
2025-11-04 17:03 ` [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-11-04 18:40 ` Jonathan Cameron
2025-11-04 18:45 ` Bjorn Helgaas
2025-11-20 3:33 ` dan.j.williams
2025-11-04 17:03 ` [RESEND v13 22/25] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-11-04 18:41 ` Jonathan Cameron
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-14 15:20 ` Bowman, Terry
2025-11-14 16:09 ` Jonathan Cameron
2025-11-04 17:03 ` [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-11-04 18:47 ` Jonathan Cameron
2025-11-04 23:43 ` Dave Jiang
2025-11-05 14:59 ` Bowman, Terry
2025-11-05 16:10 ` Dave Jiang
2025-11-11 8:37 ` Alison Schofield
2025-12-08 18:40 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 24/25] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-11-04 17:03 ` [RESEND v13 25/25] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-11-20 3:10 ` dan.j.williams
2025-12-04 17:08 ` Bowman, Terry
2025-11-04 19:11 ` [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Bjorn Helgaas
2025-11-04 21:54 ` Bowman, Terry
2025-11-04 22:12 ` Bjorn Helgaas
2025-12-04 17:30 ` Bowman, Terry
2025-12-08 18:42 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251104190353.GA1865360@bhelgaas \
--to=helgaas@kernel.org \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.