From: Terry Bowman <Terry.Bowman@amd.com>
To: "Li, Ming" <ming4.li@intel.com>,
dan.j.williams@intel.com, rrichter@amd.com
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 3/6] PCI/AER: Enable RCEC to report internal error for CXL root port
Date: Tue, 16 Apr 2024 09:46:00 -0500 [thread overview]
Message-ID: <91ddd182-cc71-480f-a1b2-e7c31b29a549@amd.com> (raw)
In-Reply-To: <b4a721c2-567d-4ab3-8a85-963e3f323e61@intel.com>
Hi Ming,
On 4/16/24 02:27, Li, Ming wrote:
> On 3/26/2024 3:42 AM, Terry Bowman wrote:
>> Hi Li,
>>
>> I added comments below.
>>
>> On 3/13/24 03:35, Li Ming wrote:
>>> Per CXl r3.1 section 12.2.2, CXL.cachemem protocol erros detected by CXL
>>> root port could be logged in RCEC AER Extended Capability as
>>> PCI_ERR_UNC_INTN or PCI_ERR_COR_INTERNAL. Unmask these errors for that
>>> case.
>>>
>>> Signed-off-by: Li Ming <ming4.li@intel.com>
>>> ---
>>> drivers/pci/pcie/aer.c | 24 +++++++++++++++++-------
>>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index 42a3bd35a3e1..364c74e47273 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -985,7 +985,7 @@ static bool cxl_error_is_native(struct pci_dev *dev)
>>> {
>>> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
>>>
>>> - return (pcie_ports_native || host->native_aer);
>>> + return (pcie_ports_native || host->native_aer) && host->is_cxl;
>>> }
>>>
>>> static bool is_internal_error(struct aer_err_info *info)
>>> @@ -1041,8 +1041,13 @@ static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
>>> {
>>> bool *handles_cxl = data;
>>>
>>> - if (!*handles_cxl)
>>> - *handles_cxl = is_cxl_mem_dev(dev) && cxl_error_is_native(dev);
>>> + if (!*handles_cxl && cxl_error_is_native(dev)) {
>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END &&
>>> + dev->rcec && is_cxl_mem_dev(dev))
>>> + *handles_cxl = true;
>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
>>> + *handles_cxl = true;
>>> + }
>> I understand a root port can be found under an RCEC. It's possible. But, does the downstream
>> root port forward AER to the upstream RCEC? My understanding is AER is handled and processed
>> at the first root port/RCEC upstream from the device/RCH/USP/DSP.
>>
>> Regards,
>> Terry
>>
>
> CXL r3.1 section 12.2.2 mentions this:
>
> "If the CXL.cachemem protocol errors detected by a CXL root port are logged as
> CIEs or UIEs in an RCEC’s AER Extended Capability, it is recommended that the System
> Firmware populate an RDPAS record (see Section 9.18.1.5) to establish the association
> between the RCEC and the root port."
>
> I think it means that CXL root port is possible to forward its AER to RCEC.
>
> Thanks
> Ming
>
Thanks for pointing to spec details.
In testing here, we used root port as agent to consume root port CXL protocol errors.
The logic to handle the root port errors requires little to no AER driver changes.
This results in a root port consuming VH protocol errors and RCEC consuming RCD
protocol errors. The RCEC and root port both use the PCIe port bus driver's AER service
driver in separate instances for RCEC-RCD and root-port-VH.
The driver support is much simpler if RCEC does not handle VH protocol errors. Is there
a reason to forward root port VH mode protocol errors to an RCEC rather than consume
in the root port's AER driver and forward to CXL error handler?
Regards,
Terry
>>>
>>> /* Non-zero terminates iteration */
>>> return *handles_cxl;
>>> @@ -1054,13 +1059,18 @@ static bool handles_cxl_errors(struct pci_dev *rcec)
>>>
>>> if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
>>> pcie_aer_is_native(rcec))
>>> - pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
>>> + pcie_walk_rcec_all(rcec, handles_cxl_error_iter, &handles_cxl);
>>>
>>> return handles_cxl;
>>> }
>>>
>>> -static void cxl_rch_enable_rcec(struct pci_dev *rcec)
>>> +static void cxl_enable_rcec(struct pci_dev *rcec)
>>> {
>>> + /*
>>> + * Enable RCEC's internal error report for two cases:
>>> + * 1. RCiEP detected CXL.cachemem protocol errors
>>> + * 2. CXL root port detected CXL.cachemem protocol errors.
>>> + */
>>> if (!handles_cxl_errors(rcec))
>>> return;
>>>
>>> @@ -1069,7 +1079,7 @@ static void cxl_rch_enable_rcec(struct pci_dev *rcec)
>>> }
>>>
>>> #else
>>> -static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
>>> +static inline void cxl_enable_rcec(struct pci_dev *dev) { }
>>> static inline void cxl_rch_handle_error(struct pci_dev *dev,
>>> struct aer_err_info *info) { }
>>> #endif
>>> @@ -1494,7 +1504,7 @@ static int aer_probe(struct pcie_device *dev)
>>> return status;
>>> }
>>>
>>> - cxl_rch_enable_rcec(port);
>>> + cxl_enable_rcec(port);
>>> aer_enable_rootport(rpc);
>>> pci_info(port, "enabled with IRQ %d\n", dev->irq);
>>> return 0;
>
next prev parent reply other threads:[~2024-04-16 14:46 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-13 8:35 [RFC PATCH 0/6] Add support for root port RAS error handling Li Ming
2024-03-13 8:35 ` [RFC PATCH 1/6] PCI/RCEC: Introduce pcie_walk_rcec_all() Li Ming
2024-03-25 20:15 ` Terry Bowman
2024-04-16 4:39 ` Dan Williams
2024-04-22 14:34 ` Terry Bowman
2024-04-22 23:03 ` Dan Williams
2024-04-23 2:33 ` Li, Ming
2024-04-16 7:23 ` Li, Ming
2024-03-13 8:35 ` [RFC PATCH 2/6] PCI/CXL: A new attribute to indicate CXL-capable host bridge Li Ming
2024-03-13 8:35 ` [RFC PATCH 3/6] PCI/AER: Enable RCEC to report internal error for CXL root port Li Ming
2024-03-25 19:42 ` Terry Bowman
2024-04-16 7:27 ` Li, Ming
2024-04-16 14:46 ` Terry Bowman [this message]
2024-04-18 5:53 ` Li, Ming
2024-04-18 14:57 ` Dan Williams
2024-04-22 2:06 ` Li, Ming
2024-04-22 23:01 ` Dan Williams
2024-03-13 8:36 ` [RFC PATCH 4/6] PCI/AER: Extend RCH RAS error handling to support VH topology case Li Ming
2024-03-15 2:30 ` Dan Williams
2024-03-15 3:43 ` Li, Ming
2024-03-15 4:05 ` Dan Williams
2024-03-15 5:08 ` Li, Ming
2024-03-25 19:14 ` Terry Bowman
2024-03-13 8:36 ` [RFC PATCH 5/6] cxl: Use __free() for cxl_pci/mem_find_port() to drop put_device() Li Ming
2024-03-15 2:24 ` Dan Williams
2024-03-15 4:05 ` Li, Ming
2024-03-13 8:36 ` [RFC PATCH 6/6] cxl/pci: Support to handle root port RAS errors captured by RCEC Li Ming
2024-03-15 1:45 ` [RFC PATCH 0/6] Add support for root port RAS error handling Dan Williams
2024-03-15 8:40 ` Li, Ming
2024-03-15 18:21 ` Dan Williams
2024-03-20 12:48 ` Li, Ming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91ddd182-cc71-480f-a1b2-e7c31b29a549@amd.com \
--to=terry.bowman@amd.com \
--cc=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming4.li@intel.com \
--cc=rrichter@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox