From: "Bowman, Terry" <terry.bowman@amd.com>
To: Dan Williams <dan.j.williams@intel.com>,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, nifan.cxl@gmail.com,
dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, bhelgaas@google.com,
mahesh@linux.ibm.com, ira.weiny@intel.com, oohall@gmail.com,
Benjamin.Cheatham@amd.com, rrichter@amd.com,
nathan.fontenot@amd.com, Smita.KoralahalliChannabasappa@amd.com,
lukas@wunner.de, ming.li@zohomail.com,
PradeepVineshReddy.Kodamati@amd.com
Subject: Re: [PATCH v7 06/17] PCI/AER: Add CXL PCIe Port uncorrectable error recovery in AER service driver
Date: Fri, 14 Feb 2025 13:36:35 -0600 [thread overview]
Message-ID: <ef4e438a-ae10-4757-90ad-642fcb2dc8d6@amd.com> (raw)
In-Reply-To: <67abea44d80f3_2d1e294ca@dwillia2-xfh.jf.intel.com.notmuch>
On 2/11/2025 6:24 PM, Dan Williams wrote:
> Terry Bowman wrote:
>> Existing recovery procedure for PCIe uncorrectable errors (UCE) does not
>> apply to CXL devices. Recovery can not be used for CXL devices because of
>> potential corruption on what can be system memory. Also, current PCIe UCE
>> recovery, in the case of a Root Port (RP) or Downstream Switch Port (DSP),
>> does not begin at the RP/DSP but begins at the first downstream device.
>> This will miss handling CXL Protocol Errors in a CXL RP or DSP. A separate
>> CXL recovery is needed because of the different handling requirements
>>
>> Add a new function, cxl_do_recovery() using the following.
>>
>> Add cxl_walk_bridge() to iterate the detected error's sub-topology.
>> cxl_walk_bridge() is similar to pci_walk_bridge() but the CXL flavor
>> will begin iteration at the RP or DSP rather than beginning at the
>> first downstream device.
>>
>> pci_walk_bridge() is candidate to possibly reuse cxl_walk_bridge() but
>> needs further investigation. This will be left for future improvement
>> to make the CXL and PCI handling paths more common.
>>
>> Add cxl_report_error_detected() as an analog to report_error_detected().
>> It will call pci_driver::cxl_err_handlers for each iterated downstream
>> device. The pci_driver::cxl_err_handler's UCE handler returns a boolean
>> indicating if there was a UCE error detected during handling.
>>
>> cxl_do_recovery() uses the status from cxl_report_error_detected() to
>> determine how to proceed. Non-fatal CXL UCE errors will be treated as
>> fatal. If a UCE was present during handling then cxl_do_recovery()
>> will kernel panic.
> For what this is:
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
> ...and perhaps it addresses my concern on the prior patch that
> ->error_detected() is responsible for the safety of checking that in
> fact a CXL internal error / UCE was detected.
Thanks Dan
next prev parent reply other threads:[~2025-02-14 19:36 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 19:24 [PATCH v7 00/17] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2025-02-11 19:24 ` [PATCH v7 01/17] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
2025-02-11 20:25 ` Bjorn Helgaas
2025-02-11 20:42 ` Dan Williams
2025-02-11 19:24 ` [PATCH v7 02/17] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe Port support Terry Bowman
2025-02-11 20:26 ` Bjorn Helgaas
2025-02-11 20:44 ` Dan Williams
2025-02-11 19:24 ` [PATCH v7 03/17] CXL/PCI: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
2025-02-11 20:28 ` Bjorn Helgaas
2025-02-11 20:38 ` Bowman, Terry
2025-02-11 22:33 ` Dan Williams
2025-02-12 19:07 ` Bowman, Terry
2025-02-12 19:51 ` Dan Williams
2025-03-04 19:11 ` Ira Weiny
2025-02-11 19:24 ` [PATCH v7 04/17] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2025-02-11 20:28 ` Bjorn Helgaas
2025-02-11 23:47 ` Dan Williams
2025-02-12 19:15 ` Bowman, Terry
2025-02-12 19:57 ` Dan Williams
2025-02-12 21:08 ` Bowman, Terry
2025-02-12 21:17 ` Lukas Wunner
2025-02-11 19:24 ` [PATCH v7 05/17] PCI/AER: Add CXL PCIe Port correctable error support in AER service driver Terry Bowman
2025-02-11 20:28 ` Bjorn Helgaas
2025-02-11 23:58 ` Dan Williams
2025-02-12 21:52 ` Bowman, Terry
2025-02-11 19:24 ` [PATCH v7 06/17] PCI/AER: Add CXL PCIe Port uncorrectable error recovery " Terry Bowman
2025-02-11 20:29 ` Bjorn Helgaas
2025-02-11 21:59 ` Dave Jiang
2025-02-12 0:02 ` Gregory Price
2025-02-12 0:24 ` Dan Williams
2025-02-14 19:36 ` Bowman, Terry [this message]
2025-02-14 15:11 ` Jonathan Cameron
2025-02-18 15:43 ` Bowman, Terry
2025-02-14 17:36 ` Fan Ni
2025-02-11 19:24 ` [PATCH v7 07/17] cxl/pci: Map CXL PCIe Root Port and Downstream Switch Port RAS registers Terry Bowman
2025-02-11 22:40 ` Dave Jiang
2025-02-12 1:23 ` Dan Williams
2025-02-13 15:43 ` Bowman, Terry
2025-02-14 21:24 ` Dan Williams
2025-02-14 22:23 ` Bowman, Terry
2025-02-14 22:42 ` Dan Williams
2025-02-12 22:28 ` Alison Schofield
2025-02-12 22:37 ` Bowman, Terry
2025-02-11 19:24 ` [PATCH v7 08/17] cxl/pci: Map CXL PCIe Upstream " Terry Bowman
2025-02-11 23:02 ` Dave Jiang
2025-02-12 2:00 ` Dan Williams
2025-02-14 19:46 ` Bowman, Terry
2025-02-14 21:29 ` Dan Williams
2025-02-14 15:15 ` Jonathan Cameron
2025-02-14 19:50 ` Bowman, Terry
2025-02-11 19:24 ` [PATCH v7 09/17] cxl/pci: Update RAS handler interfaces to also support CXL PCIe Ports Terry Bowman
2025-02-11 23:26 ` Dave Jiang
2025-02-14 15:19 ` Jonathan Cameron
2025-02-11 19:24 ` [PATCH v7 10/17] cxl/pci: Add log message and add type check in existing RAS handlers Terry Bowman
2025-02-11 23:28 ` Dave Jiang
2025-02-12 22:59 ` Dan Williams
2025-02-13 0:08 ` Bowman, Terry
2025-02-14 15:28 ` Jonathan Cameron
2025-02-11 19:24 ` [PATCH v7 11/17] cxl/pci: Change find_cxl_port() to non-static Terry Bowman
2025-02-11 23:42 ` Dave Jiang
2025-02-13 23:15 ` Dan Williams
2025-02-11 19:24 ` [PATCH v7 12/17] cxl/pci: Add error handler for CXL PCIe Port RAS errors Terry Bowman
2025-02-12 0:11 ` Dave Jiang
2025-02-12 16:34 ` Bowman, Terry
2025-02-14 2:18 ` Dan Williams
2025-02-14 21:43 ` Bowman, Terry
2025-02-15 0:20 ` Dan Williams
2025-02-18 15:33 ` Bowman, Terry
2025-02-18 17:15 ` Dan Williams
2025-03-05 0:22 ` Ira Weiny
2025-03-06 13:50 ` Bowman, Terry
2025-03-06 13:50 ` Bowman, Terry
2025-02-11 19:24 ` [PATCH v7 13/17] cxl/pci: Add trace logging " Terry Bowman
2025-02-12 0:11 ` Gregory Price
2025-02-12 0:17 ` Dave Jiang
2025-02-12 0:19 ` Dave Jiang
2025-02-12 16:23 ` Bowman, Terry
2025-02-14 2:21 ` Dan Williams
2025-02-14 15:34 ` Jonathan Cameron
2025-02-11 19:24 ` [PATCH v7 14/17] cxl/pci: Update CXL Port RAS logging to also display PCIe SBDF Terry Bowman
2025-02-12 0:21 ` Dave Jiang
2025-02-12 16:20 ` Bowman, Terry
2025-02-12 23:30 ` Alison Schofield
2025-02-12 23:34 ` Bowman, Terry
2025-02-11 19:24 ` [PATCH v7 15/17] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
2025-02-12 0:38 ` Dave Jiang
2025-02-14 2:29 ` Dan Williams
2025-02-11 19:24 ` [PATCH v7 16/17] PCI/AER: Enable internal errors for CXL Upstream and Downstream Switch Ports Terry Bowman
2025-02-11 20:25 ` Bjorn Helgaas
2025-02-11 20:40 ` Bowman, Terry
2025-02-12 0:40 ` Dave Jiang
2025-02-14 2:35 ` Dan Williams
2025-02-11 19:24 ` [PATCH v7 17/17] cxl/pci: Handle CXL Endpoint and RCH Protocol Errors separately from PCIe errors Terry Bowman
2025-02-14 2:43 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef4e438a-ae10-4757-90ad-642fcb2dc8d6@amd.com \
--to=terry.bowman@amd.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=ming.li@zohomail.com \
--cc=nathan.fontenot@amd.com \
--cc=nifan.cxl@gmail.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.