From: "Bowman, Terry" <terry.bowman@amd.com>
To: Li Ming <ming.li@zohomail.com>,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, nifan.cxl@gmail.com,
dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, dan.j.williams@intel.com,
bhelgaas@google.com, mahesh@linux.ibm.com, ira.weiny@intel.com,
oohall@gmail.com, Benjamin.Cheatham@amd.com, rrichter@amd.com,
nathan.fontenot@amd.com, Smita.KoralahalliChannabasappa@amd.com,
lukas@wunner.de, PradeepVineshReddy.Kodamati@amd.com
Subject: Re: [PATCH v4 04/15] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type
Date: Thu, 12 Dec 2024 13:59:01 -0600 [thread overview]
Message-ID: <208e6639-a394-428f-bfe9-a3b8d48d6144@amd.com> (raw)
In-Reply-To: <ef7d45cc-d5ed-4a76-a9af-52c2a423ead0@zohomail.com>
On 12/11/2024 7:34 PM, Li Ming wrote:
> On 12/12/2024 7:39 AM, Terry Bowman wrote:
>> The AER driver and aer_event tracing currently log 'PCIe Bus Type'
>> for all errors.
>>
>> Update the driver and aer_event tracing to log 'CXL Bus Type' for CXL
>> device errors.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Fan Ni <fan.ni@samsung.com>
>> ---
>> drivers/pci/pcie/aer.c | 14 ++++++++------
>> include/ras/ras_event.h | 9 ++++++---
>> 2 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index fe6edf26279e..53e9a11f6c0f 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -699,13 +699,14 @@ static void __aer_print_error(struct pci_dev *dev,
>>
>> void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>> {
>> + const char *bus_type = pcie_is_cxl(dev) ? "CXL" : "PCIe";
>> int layer, agent;
>> int id = pci_dev_id(dev);
>> const char *level;
>>
>> if (!info->status) {
>> - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
>> - aer_error_severity_string[info->severity]);
>> + pci_err(dev, "%s Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
>> + bus_type, aer_error_severity_string[info->severity]);
>> goto out;
>> }
>>
>> @@ -714,8 +715,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>>
>> level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
>>
>> - pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
>> - aer_error_severity_string[info->severity],
>> + pci_printk(level, dev, "%s Bus Error: severity=%s, type=%s, (%s)\n",
>> + bus_type, aer_error_severity_string[info->severity],
>> aer_error_layer[layer], aer_agent_string[agent]);
>>
>> pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
>> @@ -730,7 +731,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>> if (info->id && info->error_dev_num > 1 && info->id == id)
>> pci_err(dev, " Error of this Agent is reported first\n");
>>
>> - trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
>> + trace_aer_event(dev_name(&dev->dev), bus_type, (info->status & ~info->mask),
>> info->severity, info->tlp_header_valid, &info->tlp);
>> }
>>
>> @@ -764,6 +765,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer);
>> void pci_print_aer(struct pci_dev *dev, int aer_severity,
>> struct aer_capability_regs *aer)
>> {
>> + const char *bus_type = pcie_is_cxl(dev) ? "CXL" : "PCIe";
>> int layer, agent, tlp_header_valid = 0;
>> u32 status, mask;
>> struct aer_err_info info;
>> @@ -798,7 +800,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
>> if (tlp_header_valid)
>> __print_tlp_header(dev, &aer->header_log);
>>
>> - trace_aer_event(dev_name(&dev->dev), (status & ~mask),
>> + trace_aer_event(dev_name(&dev->dev), bus_type, (status & ~mask),
>> aer_severity, tlp_header_valid, &aer->header_log);
>> }
>> EXPORT_SYMBOL_NS_GPL(pci_print_aer, CXL);
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index e5f7ee0864e7..1bf8e7050ba8 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -297,15 +297,17 @@ TRACE_EVENT(non_standard_event,
>>
>> TRACE_EVENT(aer_event,
>> TP_PROTO(const char *dev_name,
>> + const char *bus_type,
>> const u32 status,
>> const u8 severity,
>> const u8 tlp_header_valid,
>> struct pcie_tlp_log *tlp),
>>
>> - TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp),
>> + TP_ARGS(dev_name, bus_type, status, severity, tlp_header_valid, tlp),
>>
>> TP_STRUCT__entry(
>> __string( dev_name, dev_name )
>> + __string( bus_type, bus_type )
>> __field( u32, status )
>> __field( u8, severity )
>> __field( u8, tlp_header_valid)
>> @@ -314,6 +316,7 @@ TRACE_EVENT(aer_event,
>>
>> TP_fast_assign(
>> __assign_str(dev_name);
>> + __assign_str(bus_type);
>> __entry->status = status;
>> __entry->severity = severity;
>> __entry->tlp_header_valid = tlp_header_valid;
>> @@ -325,8 +328,8 @@ TRACE_EVENT(aer_event,
>> }
>> ),
>>
>> - TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n",
>> - __get_str(dev_name),
>> + TP_printk("%s %s Bus Error: severity=%s, %s, TLP Header=%s\n",
>> + __get_str(dev_name), __get_str(bus_type),
>> __entry->severity == AER_CORRECTABLE ? "Corrected" :
>> __entry->severity == AER_FATAL ?
>> "Fatal" : "Uncorrected, non-fatal",
> Hi Terry,
>
>
> Patch #3 is using flexbus dvsec to identify CXL RP/USP/DSP. But per CXL r3.1 section 9.12.3 "Enumerating CXL RPs and DSPs", there may be a flexbus dvsec if CXL RP/DSP is in disconnect state or connecting to a PCIe device.
>
> If a PCIe device connects to a CXL RP/DSP, and the CXL RP/DSP reports an error, the error log will be also "CXL Bus Type", is it expected? My understanding is that the CXL RP/DSP is working on PCIe mode.
>
> If not, I think that setting "pci_dev->is_cxl" during cxl port enumeration and CXL device probing is another option.
>
>
> Thanks
>
> Ming
>
Hi Ming,
aer_print_error() logs the AER details (including bus type) for the device that detected the error
not the RPAER reporting agent unless the error is detected in the RP. The bus type is determined
using the 'dev' parameter and in your example is a PCIe device not a CXL device. aer_print_error()
will log "PCI bus" because the flexbus DVSEC will not be present in 'dev' config space.
I agree in your example the RP and downstream device will train to PCIe mode and not CXL mode. But, the
flexbus DVSEC will still be present in the RP PCIe configuration space. The pci_dev::is_cxl structure
member indicates CXL support and is not reflective of the current training state.
Regards,
Terry
next prev parent reply other threads:[~2024-12-12 19:59 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 23:39 [PATCH v4 0/15] Enable CXL PCIe Port protocol error handling and logging Terry Bowman
2024-12-11 23:39 ` [PATCH v4 01/15] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
2024-12-11 23:39 ` [PATCH v4 02/15] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe Port support Terry Bowman
2024-12-11 23:39 ` [PATCH v4 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
2024-12-11 23:39 ` [PATCH v4 04/15] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2024-12-12 1:34 ` Li Ming
2024-12-12 19:59 ` Bowman, Terry [this message]
2024-12-14 13:34 ` Li Ming
2024-12-11 23:39 ` [PATCH v4 05/15] PCI/AER: Add CXL PCIe Port correctable error support in AER service driver Terry Bowman
2024-12-11 23:39 ` [PATCH v4 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe Port devices Terry Bowman
2024-12-24 18:28 ` Jonathan Cameron
2024-12-11 23:39 ` [PATCH v4 07/15] PCI/AER: Add CXL PCIe Port Uncorrectable Error recovery in AER service driver Terry Bowman
2024-12-12 9:28 ` Alejandro Lucero Palau
2024-12-13 15:07 ` Bowman, Terry
2024-12-24 18:31 ` Jonathan Cameron
2024-12-11 23:39 ` [PATCH v4 08/15] cxl/pci: Map CXL PCIe Root Port and Downstream Switch Port RAS registers Terry Bowman
2024-12-12 10:36 ` Alejandro Lucero Palau
2024-12-13 15:10 ` Bowman, Terry
2024-12-24 18:38 ` Jonathan Cameron
2024-12-11 23:39 ` [PATCH v4 09/15] cxl/pci: Map CXL PCIe Upstream " Terry Bowman
2024-12-24 18:41 ` Jonathan Cameron
2024-12-11 23:39 ` [PATCH v4 10/15] cxl/pci: Update RAS handler interfaces to also support CXL PCIe Ports Terry Bowman
2024-12-12 10:38 ` Alejandro Lucero Palau
2024-12-24 18:42 ` Jonathan Cameron
2024-12-11 23:39 ` [PATCH v4 11/15] cxl/pci: Change find_cxl_port() to non-static Terry Bowman
2024-12-11 23:39 ` [PATCH v4 12/15] cxl/pci: Add error handler for CXL PCIe Port RAS errors Terry Bowman
2024-12-12 2:19 ` Li Ming
2024-12-24 18:43 ` Jonathan Cameron
2024-12-11 23:40 ` [PATCH v4 13/15] cxl/pci: Add trace logging " Terry Bowman
2024-12-12 9:46 ` Alejandro Lucero Palau
2024-12-24 18:46 ` Jonathan Cameron
2024-12-26 17:01 ` Bowman, Terry
2024-12-11 23:40 ` [PATCH v4 14/15] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
2024-12-12 2:31 ` Li Ming
2024-12-17 14:39 ` Bowman, Terry
2024-12-24 18:50 ` Jonathan Cameron
2024-12-26 17:07 ` Bowman, Terry
2025-01-07 11:32 ` Jonathan Cameron
2024-12-11 23:40 ` [PATCH v4 15/15] PCI/AER: Enable internal errors for CXL Upstream and Downstream Switch Ports Terry Bowman
2024-12-12 9:44 ` Alejandro Lucero Palau
2024-12-12 10:44 ` Alejandro Lucero Palau
2024-12-13 15:22 ` Bowman, Terry
2024-12-13 15:34 ` Bowman, Terry
2024-12-24 18:53 ` Jonathan Cameron
2024-12-26 17:19 ` Bowman, Terry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=208e6639-a394-428f-bfe9-a3b8d48d6144@amd.com \
--to=terry.bowman@amd.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=ming.li@zohomail.com \
--cc=nathan.fontenot@amd.com \
--cc=nifan.cxl@gmail.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox