From: Bjorn Helgaas <helgaas@kernel.org>
To: Robert Richter <rrichter@amd.com>
Cc: alison.schofield@intel.com, dave.jiang@intel.com,
Terry Bowman <terry.bowman@amd.com>,
vishal.l.verma@intel.com, linuxppc-dev@lists.ozlabs.org,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-cxl@vger.kernel.org, bhelgaas@google.com,
Oliver O'Halloran <oohall@gmail.com>,
Jonathan.Cameron@huawei.com, bwidawsk@kernel.org,
dan.j.williams@intel.com, ira.weiny@intel.com
Subject: Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler
Date: Thu, 25 May 2023 17:01:01 -0500 [thread overview]
Message-ID: <ZG/anZ78FukSpERs@bhelgaas> (raw)
In-Reply-To: <ZG/TKOgMTSljryHN@rric.localdomain>
On Thu, May 25, 2023 at 11:29:58PM +0200, Robert Richter wrote:
> eOn 24.05.23 16:32:35, Bjorn Helgaas wrote:
> > On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote:
> > > From: Robert Richter <rrichter@amd.com>
> > >
> > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > > to an RCEC.
> > >
> > > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > > Corrected Internal Errors (CIE).
> >
> > From the parallelism with RCD above, I first thought that RCH devices
> > were non-RCD mode and *were* enumerated as part of the PCIe hierarchy,
> > but actually I suspect it's more like the following?
> >
> > ... but CXL downstream and upstream ports are not enumerated and not
> > visible in the PCIe hierarchy.
> >
> > Protocol and link errors from these non-enumerated ports are
> > signaled as internal AER errors ... via a CXL RCEC.
>
> Exactly, except the RCEC is standard PCIe and also must not
> necessarily on the same PCI bus as the CXL RCiEPs are.
So make it "RCEC" instead of "CXL RCEC", I guess? PCIe r6.0, sec
7.9.10.3, allows an RCEC to be associated with RCiEPs on different
buses, so nothing to see there.
> > > The error source is the id of the RCEC.
> >
> > This seems odd; I assume this refers to the RCEC's AER Error Source
> > Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source
> > Identification would ordinarily be the Requester ID of the RCiEP that
> > "sent" the Error Message. But you're saying it's actually the ID of
> > the *RCEC*, not the RCiEP?
>
> Right, the downstream port has its own AER ext capability in
> non-config (io mapped) RCRB register range. Errors originating from
> there are signaled as internal AER errors via the RCEC *with* the
> RCEC's Requester ID. Code walks through all associated CXL endpoints,
> determines the dport and checks its AER.
>
> There is also an RDPAS structure defined in CXL but that is only a
> different way to provide the RCEC to dport association instead of
> using the RCEC's Endpoint Association Extended Capability. In the end
> we get all associated RCHs and check the AER of all their dports.
>
> The upstream port is signaled using the RCiEP's AER. CXL spec is
> strict here: "Upstream Port RCRB shall not implement the AER Extended
> Capability." The RCiEP's requestor ID is used then and its config
> space the AER is in.
>
> CXL.cachemem errors are reported with the RCiEP as requester
> too. Status is in the CXL RAS cap and the UIE or CIE is set
> respectively in the AER status of the RCiEP.
>
> > We're going to call pci_aer_handle_error() as well, to handle the
> > non-internal errors, and I'm pretty sure that path expects the RCiEP
> > ID there.
> >
> > Whatever the answer, I'm not sure this sentence is actually relevant
> > to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or
> > look at struct aer_err_source.id.
>
> The source id is used in aer_process_err_devices() which finally calls
> handle_error_source() for the device with the requestor id. This is
> the place where cxl_rch_handle_error() checks if it is an RCEC that
> received an internal error and has cxl devices connected to it. Then,
> the request is forwarded to the cxl_mem handler which also needs to
> check the dport now. That is, pcie_walk_rcec() in
> cxl_rch_handle_error() is called with the RCEC's pci handle,
> cxl_rch_handle_error_iter() with the RCiEP's pci handle.
I'm still not sure this is relevant. Isn't that last sentence just
the way we always use pcie_walk_rcec()?
If there's something *different* here about CXL, and it's important to
this patch, sure. But I don't see that yet. Maybe a comment in the
code if you think it's important to clarify something there.
Bjorn
WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Robert Richter <rrichter@amd.com>
Cc: alison.schofield@intel.com, dave.jiang@intel.com,
ira.weiny@intel.com, Terry Bowman <terry.bowman@amd.com>,
linux-pci@vger.kernel.org, Jonathan.Cameron@huawei.com,
linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
bwidawsk@kernel.org, Oliver O'Halloran <oohall@gmail.com>,
vishal.l.verma@intel.com, bhelgaas@google.com,
dan.j.williams@intel.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler
Date: Thu, 25 May 2023 17:01:01 -0500 [thread overview]
Message-ID: <ZG/anZ78FukSpERs@bhelgaas> (raw)
In-Reply-To: <ZG/TKOgMTSljryHN@rric.localdomain>
On Thu, May 25, 2023 at 11:29:58PM +0200, Robert Richter wrote:
> eOn 24.05.23 16:32:35, Bjorn Helgaas wrote:
> > On Tue, May 23, 2023 at 06:22:13PM -0500, Terry Bowman wrote:
> > > From: Robert Richter <rrichter@amd.com>
> > >
> > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > > to an RCEC.
> > >
> > > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > > Corrected Internal Errors (CIE).
> >
> > From the parallelism with RCD above, I first thought that RCH devices
> > were non-RCD mode and *were* enumerated as part of the PCIe hierarchy,
> > but actually I suspect it's more like the following?
> >
> > ... but CXL downstream and upstream ports are not enumerated and not
> > visible in the PCIe hierarchy.
> >
> > Protocol and link errors from these non-enumerated ports are
> > signaled as internal AER errors ... via a CXL RCEC.
>
> Exactly, except the RCEC is standard PCIe and also must not
> necessarily on the same PCI bus as the CXL RCiEPs are.
So make it "RCEC" instead of "CXL RCEC", I guess? PCIe r6.0, sec
7.9.10.3, allows an RCEC to be associated with RCiEPs on different
buses, so nothing to see there.
> > > The error source is the id of the RCEC.
> >
> > This seems odd; I assume this refers to the RCEC's AER Error Source
> > Identification register, and the ERR_COR or ERR_FATAL/NONFATAL Source
> > Identification would ordinarily be the Requester ID of the RCiEP that
> > "sent" the Error Message. But you're saying it's actually the ID of
> > the *RCEC*, not the RCiEP?
>
> Right, the downstream port has its own AER ext capability in
> non-config (io mapped) RCRB register range. Errors originating from
> there are signaled as internal AER errors via the RCEC *with* the
> RCEC's Requester ID. Code walks through all associated CXL endpoints,
> determines the dport and checks its AER.
>
> There is also an RDPAS structure defined in CXL but that is only a
> different way to provide the RCEC to dport association instead of
> using the RCEC's Endpoint Association Extended Capability. In the end
> we get all associated RCHs and check the AER of all their dports.
>
> The upstream port is signaled using the RCiEP's AER. CXL spec is
> strict here: "Upstream Port RCRB shall not implement the AER Extended
> Capability." The RCiEP's requestor ID is used then and its config
> space the AER is in.
>
> CXL.cachemem errors are reported with the RCiEP as requester
> too. Status is in the CXL RAS cap and the UIE or CIE is set
> respectively in the AER status of the RCiEP.
>
> > We're going to call pci_aer_handle_error() as well, to handle the
> > non-internal errors, and I'm pretty sure that path expects the RCiEP
> > ID there.
> >
> > Whatever the answer, I'm not sure this sentence is actually relevant
> > to this patch, since this patch doesn't read PCI_ERR_ROOT_ERR_SRC or
> > look at struct aer_err_source.id.
>
> The source id is used in aer_process_err_devices() which finally calls
> handle_error_source() for the device with the requestor id. This is
> the place where cxl_rch_handle_error() checks if it is an RCEC that
> received an internal error and has cxl devices connected to it. Then,
> the request is forwarded to the cxl_mem handler which also needs to
> check the dport now. That is, pcie_walk_rcec() in
> cxl_rch_handle_error() is called with the RCEC's pci handle,
> cxl_rch_handle_error_iter() with the RCiEP's pci handle.
I'm still not sure this is relevant. Isn't that last sentence just
the way we always use pcie_walk_rcec()?
If there's something *different* here about CXL, and it's important to
this patch, sure. But I don't see that yet. Maybe a comment in the
code if you think it's important to clarify something there.
Bjorn
next prev parent reply other threads:[~2023-05-25 22:01 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-23 23:21 [PATCH v4 00/23] cxl/pci: Add support for RCH RAS error handling Terry Bowman
2023-05-23 23:21 ` [PATCH v4 01/23] cxl/acpi: Probe RCRB later during RCH downstream port creation Terry Bowman
2023-06-01 10:13 ` Jonathan Cameron
2023-06-02 14:16 ` Robert Richter
2023-05-23 23:21 ` [PATCH v4 02/23] cxl/rch: Prepare for caching the MMIO mapped PCIe AER capability Terry Bowman
2023-06-01 10:38 ` Jonathan Cameron
2023-06-02 14:53 ` Robert Richter
2023-05-23 23:21 ` [PATCH v4 03/23] cxl: Rename member @dport of struct cxl_dport to @dev Terry Bowman
2023-06-01 10:41 ` Jonathan Cameron
2023-05-23 23:21 ` [PATCH v4 04/23] cxl/core/regs: Add @dev to cxl_register_map Terry Bowman
2023-06-01 10:49 ` Jonathan Cameron
2023-06-02 15:11 ` Robert Richter
2023-05-23 23:21 ` [PATCH v4 05/23] cxl/pci: Refactor component register discovery for reuse Terry Bowman
2023-06-01 10:52 ` Jonathan Cameron
2023-05-23 23:21 ` [PATCH v4 06/23] cxl/acpi: Moving add_host_bridge_uport() around Terry Bowman
2023-06-01 10:54 ` Jonathan Cameron
2023-05-23 23:21 ` [PATCH v4 07/23] cxl/acpi: Directly bind the CEDT detected CHBCR to the Host Bridge's port Terry Bowman
2023-06-01 12:45 ` Jonathan Cameron
2023-06-02 15:42 ` Robert Richter
2023-05-23 23:21 ` [PATCH v4 08/23] cxl/regs: Remove early capability checks in Component Register setup Terry Bowman
2023-06-01 12:49 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 09/23] cxl/pci: Early setup RCH dport component registers from RCRB Terry Bowman
2023-06-01 12:59 ` Jonathan Cameron
2023-06-02 15:45 ` Robert Richter
2023-05-23 23:22 ` [PATCH v4 10/23] cxl/port: Store the port's Component Register mappings in struct cxl_port Terry Bowman
2023-06-01 13:06 ` Jonathan Cameron
2023-06-02 15:58 ` Robert Richter
2023-05-23 23:22 ` [PATCH v4 11/23] cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport Terry Bowman
2023-06-01 13:07 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 12/23] cxl/pci: Store the endpoint's Component Register mappings in struct cxl_dev_state Terry Bowman
2023-06-01 13:07 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 13/23] cxl/hdm: Use stored Component Register mappings to map HDM decoder capability Terry Bowman
2023-05-24 1:12 ` kernel test robot
2023-05-24 9:49 ` Robert Richter
2023-05-25 20:23 ` kernel test robot
2023-06-01 13:11 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 14/23] cxl/port: Remove Component Register base address from struct cxl_port Terry Bowman
2023-06-01 13:27 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 15/23] cxl/port: Remove Component Register base address from struct cxl_dport Terry Bowman
2023-06-01 13:28 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 16/23] cxl/pci: Remove Component Register base address from struct cxl_dev_state Terry Bowman
2023-06-01 13:28 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 17/23] cxl/pci: Add RCH downstream port AER register discovery Terry Bowman
2023-06-01 13:36 ` Jonathan Cameron
2023-06-01 13:38 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 18/23] PCI/AER: Refactor cper_print_aer() for use by CXL driver module Terry Bowman
2023-05-24 16:55 ` Bjorn Helgaas
2023-05-25 21:38 ` Terry Bowman
2023-05-23 23:22 ` [PATCH v4 19/23] cxl/pci: Update CXL error logging to use RAS register address Terry Bowman
2023-06-01 13:42 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 20/23] cxl/pci: Prepare for logging RCH downstream port protocol errors Terry Bowman
2023-06-01 13:49 ` Jonathan Cameron
2023-06-01 14:06 ` Terry Bowman
2023-06-01 14:12 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 21/23] cxl/pci: Add RCH downstream port error logging Terry Bowman
2023-06-01 14:03 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler Terry Bowman
2023-05-23 23:22 ` Terry Bowman
2023-05-24 21:32 ` Bjorn Helgaas
2023-05-24 21:32 ` Bjorn Helgaas
2023-05-25 21:29 ` Robert Richter
2023-05-25 21:29 ` Robert Richter
2023-05-25 22:01 ` Bjorn Helgaas [this message]
2023-05-25 22:01 ` Bjorn Helgaas
2023-05-25 22:28 ` Robert Richter
2023-05-25 22:28 ` Robert Richter
2023-06-01 14:06 ` Jonathan Cameron
2023-06-01 14:06 ` Jonathan Cameron
2023-05-23 23:22 ` [PATCH v4 23/23] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling Terry Bowman
2023-05-24 21:45 ` Bjorn Helgaas
2023-05-25 22:08 ` Robert Richter
2023-05-26 20:31 ` Bjorn Helgaas
2023-06-01 14:11 ` Jonathan Cameron
2023-06-02 16:41 ` Robert Richter
2023-05-23 23:29 ` [PATCH v4 00/23] cxl/pci: Add support for RCH RAS error handling - CHANGELOG Terry Bowman
2023-05-24 1:39 ` [PATCH v4 00/23] cxl/pci: Add support for RCH RAS error handling Terry Bowman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZG/anZ78FukSpERs@bhelgaas \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=bwidawsk@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.