All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Bowman, Terry" <terry.bowman@amd.com>
To: Shiju Jose <shiju.jose@huawei.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"ming.li@zohomail.com" <ming.li@zohomail.com>,
	"Smita.KoralahalliChannabasappa@amd.com"
	<Smita.KoralahalliChannabasappa@amd.com>,
	"rrichter@amd.com" <rrichter@amd.com>,
	"dan.carpenter@linaro.org" <dan.carpenter@linaro.org>,
	"PradeepVineshReddy.Kodamati@amd.com"
	<PradeepVineshReddy.Kodamati@amd.com>,
	"lukas@wunner.de" <lukas@wunner.de>,
	"Benjamin.Cheatham@amd.com" <Benjamin.Cheatham@amd.com>,
	"sathyanarayanan.kuppuswamy@linux.intel.com"
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v10 12/17] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports
Date: Wed, 2 Jul 2025 16:56:14 -0500	[thread overview]
Message-ID: <cc98e985-3c91-403a-8317-c160128fd5c7@amd.com> (raw)
In-Reply-To: <6b8b65df7c334043863b1464e04957db@huawei.com>



On 6/27/2025 7:22 AM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Terry Bowman <terry.bowman@amd.com>
>> Sent: 26 June 2025 23:43
>> To: dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>> dave.jiang@intel.com; alison.schofield@intel.com; dan.j.williams@intel.com;
>> bhelgaas@google.com; Shiju Jose <shiju.jose@huawei.com>;
>> ming.li@zohomail.com; Smita.KoralahalliChannabasappa@amd.com;
>> rrichter@amd.com; dan.carpenter@linaro.org;
>> PradeepVineshReddy.Kodamati@amd.com; lukas@wunner.de;
>> Benjamin.Cheatham@amd.com;
>> sathyanarayanan.kuppuswamy@linux.intel.com; terry.bowman@amd.com;
>> linux-cxl@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org
>> Subject: [PATCH v10 12/17] cxl/pci: Unify CXL trace logging for CXL Endpoints
>> and CXL Ports
>>
>> CXL currently has separate trace routines for CXL Port errors and CXL Endpoint
>> errors. This is inconvenient for the user because they must enable
>> 2 sets of trace routines. Make updates to the trace logging such that a single
>> trace routine logs both CXL Endpoint and CXL Port protocol errors.
>>
>> Keep the trace log fields 'memdev' and 'host'. While these are not accurate for
>> non-Endpoints the fields will remain as-is to prevent breaking userspace RAS
>> trace consumers.
>>
>> Add serial number parameter to the trace logging. This is used for EPs and 0 is
>> provided for CXL port devices without a serial number.
>>
>> Below is output of correctable and uncorrectable protocol error logging.
>> CXL Root Port and CXL Endpoint examples are included below.
>>
>> Root Port:
>> cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0
>> status='CRC Threshold Hit'
>> cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0
>> status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity
>> Error'
>>
>> Endpoint:
>> cxl_aer_correctable_error: memdev=mem3 host=0000:0f:00.0 serial=0
>> status='CRC Threshold Hit'
>> cxl_aer_uncorrectable_error: memdev=mem3 host=0000:0f:00.0 serial: 0 status:
>> 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error'
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>> drivers/cxl/core/pci.c   | 19 ++++-----
>> drivers/cxl/core/ras.c   | 14 ++++---
>> drivers/cxl/core/trace.h | 84 +++++++++-------------------------------
>> 3 files changed, 37 insertions(+), 80 deletions(-)
>>
> [...]
>> static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data
>> *data) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>> 25ebfbc1616c..494d6db461a7 100644
>> --- a/drivers/cxl/core/trace.h
>> +++ b/drivers/cxl/core/trace.h
>> @@ -48,49 +48,22 @@
>> 	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
>> )
>>
>> -TRACE_EVENT(cxl_port_aer_uncorrectable_error,
>> -	TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),
>> -	TP_ARGS(dev, status, fe, hl),
>> -	TP_STRUCT__entry(
>> -		__string(device, dev_name(dev))
>> -		__string(host, dev_name(dev->parent))
>> -		__field(u32, status)
>> -		__field(u32, first_error)
>> -		__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
>> -	),
>> -	TP_fast_assign(
>> -		__assign_str(device);
>> -		__assign_str(host);
>> -		__entry->status = status;
>> -		__entry->first_error = fe;
>> -		/*
>> -		 * Embed the 512B headerlog data for user app retrieval and
>> -		 * parsing, but no need to print this in the trace buffer.
>> -		 */
>> -		memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
>> -	),
>> -	TP_printk("device=%s host=%s status: '%s' first_error: '%s'",
>> -		  __get_str(device), __get_str(host),
>> -		  show_uc_errs(__entry->status),
>> -		  show_uc_errs(__entry->first_error)
>> -	)
>> -);
>> -
>> TRACE_EVENT(cxl_aer_uncorrectable_error,
>> -	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32
>> *hl),
>> -	TP_ARGS(cxlmd, status, fe, hl),
>> +	TP_PROTO(struct device *dev, u64 serial, u32 status, u32 fe,
>> +		 u32 *hl),
>> +	TP_ARGS(dev, serial, status, fe, hl),
>> 	TP_STRUCT__entry(
>> -		__string(memdev, dev_name(&cxlmd->dev))
>> -		__string(host, dev_name(cxlmd->dev.parent))
>> +		__string(name, dev_name(dev))
>> +		__string(parent, dev_name(dev->parent))
> Hi Terry,
>
> Thanks for considering the feedback given in v9 regarding the compatibility issue
> with the rasdaemon.
> https://lore.kernel.org/all/959acc682e6e4b52ac0283b37ee21026@huawei.com/
>
> Probably some confusion w.r.t the feedback.
> Unfortunately  TP_printk(...) is not an ABI that we need to keep stable, 
> it's this structure, TP_STRUCT__entry(..) , that matters to the rasdaemon.
>
>> 		

Oh. Apologies, I didn't realize TP_STRUCT was an ABI requirement. I will change back
the TP_STRUCT as well.

-Terry

>> __field(u64, serial)
>> 		__field(u32, status)
>> 		__field(u32, first_error)
>> 		__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
>> 	),
>> 	TP_fast_assign(
>> -		__assign_str(memdev);
>> -		__assign_str(host);
>> -		__entry->serial = cxlmd->cxlds->serial;
>> +		__assign_str(name);
>> +		__assign_str(parent);
>> +		__entry->serial = serial;
>> 		__entry->status = status;
>> 		__entry->first_error = fe;
>> 		/*
>> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>> 		 */
>> 		memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
>> 	),
>> -	TP_printk("memdev=%s host=%s serial=%lld: status: '%s' first_error:
>> '%s'",
>> -		  __get_str(memdev), __get_str(host), __entry->serial,
>> +	TP_printk("memdev=%s host=%s serial=%lld status='%s'
>> first_error='%s'",
>> +		  __get_str(name), __get_str(parent), __entry->serial,
>> 		  show_uc_errs(__entry->status),
>> 		  show_uc_errs(__entry->first_error)
>> 	)
>> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>> 	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer"
>> }	\
>> )
>>
>> -TRACE_EVENT(cxl_port_aer_correctable_error,
>> -	TP_PROTO(struct device *dev, u32 status),
>> -	TP_ARGS(dev, status),
>> -	TP_STRUCT__entry(
>> -		__string(device, dev_name(dev))
>> -		__string(host, dev_name(dev->parent))
>> -		__field(u32, status)
>> -	),
>> -	TP_fast_assign(
>> -		__assign_str(device);
>> -		__assign_str(host);
>> -		__entry->status = status;
>> -	),
>> -	TP_printk("device=%s host=%s status='%s'",
>> -		  __get_str(device), __get_str(host),
>> -		  show_ce_errs(__entry->status)
>> -	)
>> -);
>> -
>> TRACE_EVENT(cxl_aer_correctable_error,
>> -	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
>> -	TP_ARGS(cxlmd, status),
>> +	TP_PROTO(struct device *dev, u64 serial, u32 status),
>> +	TP_ARGS(dev, serial, status),
>> 	TP_STRUCT__entry(
>> -		__string(memdev, dev_name(&cxlmd->dev))
>> -		__string(host, dev_name(cxlmd->dev.parent))
>> +		__string(name, dev_name(dev))
>> +		__string(parent, dev_name(dev->parent))
> Same as above.
>> 		__field(u64, serial)
>> 		__field(u32, status)
>> 	),
>> 	TP_fast_assign(
>> -		__assign_str(memdev);
>> -		__assign_str(host);
>> -		__entry->serial = cxlmd->cxlds->serial;
>> +		__assign_str(name);
>> +		__assign_str(parent);
>> +		__entry->serial = serial;
>> 		__entry->status = status;
>> 	),
>> -	TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
>> -		  __get_str(memdev), __get_str(host), __entry->serial,
>> +	TP_printk("memdev=%s host=%s serial=%lld status='%s'",
>> +		  __get_str(name), __get_str(parent), __entry->serial,
>> 		  show_ce_errs(__entry->status)
>> 	)
>> );
>> --
>> 2.34.1
> Thanks,
> Shiju


  parent reply	other threads:[~2025-07-02 21:56 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26 22:42 [PATCH v10 00/17] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-06-26 22:42 ` [PATCH v10 01/17] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-07-18 17:55   ` Dave Jiang
2025-07-23 21:58   ` dan.j.williams
2025-07-23 22:15     ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 02/17] PCI/CXL: Add pcie_is_cxl() Terry Bowman
2025-07-23 22:30   ` dan.j.williams
2025-07-23 23:21     ` Bowman, Terry
2025-07-24 18:00       ` dan.j.williams
2025-08-09 10:56   ` Alejandro Lucero Palau
2025-08-11 19:14     ` Bowman, Terry
2025-08-11 23:14       ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 03/17] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-06-26 23:25   ` Sathyanarayanan Kuppuswamy
2025-06-27 14:14     ` Bowman, Terry
2025-06-27  9:53   ` Jonathan Cameron
2025-07-02 16:00     ` Bowman, Terry
2025-06-27 11:32   ` Shiju Jose
2025-06-27 14:24     ` Bowman, Terry
2025-07-01 21:27   ` Dave Jiang
2025-07-23 22:56   ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 04/17] CXL/AER: Introduce CXL specific AER driver file Terry Bowman
2025-06-26 23:42   ` Sathyanarayanan Kuppuswamy
2025-06-27 10:12     ` Jonathan Cameron
2025-06-27 14:29     ` Bowman, Terry
2025-07-24  0:01   ` dan.j.williams
2025-07-24 17:06     ` Bowman, Terry
2025-07-24 20:32       ` dan.j.williams
2025-07-24  1:16   ` dan.j.williams
2025-07-24 17:02     ` Bowman, Terry
2025-07-24 20:23       ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 05/17] CXL/AER: Introduce kfifo for forwarding CXL errors Terry Bowman
2025-06-27 10:24   ` Jonathan Cameron
2025-07-02 16:21     ` Bowman, Terry
2025-07-02 19:54       ` Dan Carpenter
2025-07-02 19:57         ` Bowman, Terry
2025-07-03 10:06       ` Jonathan Cameron
2025-07-01 21:53   ` Dave Jiang
2025-07-02 17:10     ` Bowman, Terry
2025-07-24  2:01   ` dan.j.williams
2025-07-24 17:21     ` Bowman, Terry
2025-07-24 20:55       ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 06/17] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-06-27 11:00   ` Jonathan Cameron
2025-07-02 17:51     ` Bowman, Terry
2025-07-01 23:04   ` Dave Jiang
2025-07-02 17:56     ` Bowman, Terry
2025-07-03 10:11       ` Jonathan Cameron
2025-07-25  0:38   ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 07/17] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-06-27 11:05   ` Jonathan Cameron
2025-07-02 21:06     ` Bowman, Terry
2025-06-27 12:27   ` Shiju Jose
2025-07-02 21:34     ` Bowman, Terry
2025-06-26 22:42 ` [PATCH v10 08/17] cxl/pci: Move RAS initialization to cxl_port driver Terry Bowman
2025-06-27 11:12   ` Jonathan Cameron
2025-07-18 18:01   ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 09/17] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-06-27 11:17   ` Jonathan Cameron
2025-07-02 21:41     ` Bowman, Terry
2025-07-18 21:28   ` Dave Jiang
2025-07-18 21:55     ` Bowman, Terry
2025-07-18 22:01       ` Dave Jiang
2025-07-18 22:40         ` Bowman, Terry
2025-07-18 22:45           ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 10/17] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-06-26 22:42 ` [PATCH v10 11/17] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-07-21 21:56   ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 12/17] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-06-27 12:22   ` Shiju Jose
2025-07-02  1:18     ` Alison Schofield
2025-07-02 22:07       ` Bowman, Terry
2025-07-02 21:56     ` Bowman, Terry [this message]
2025-06-26 22:42 ` [PATCH v10 13/17] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-06-27 11:48   ` Jonathan Cameron
2025-07-21 22:17   ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 14/17] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
2025-06-27 11:52   ` Jonathan Cameron
2025-06-27 12:27   ` Shiju Jose
2025-07-21 22:35   ` Dave Jiang
2025-07-22 18:23     ` Bowman, Terry
2025-06-26 22:42 ` [PATCH v10 15/17] CXL/PCI: Introduce CXL Port " Terry Bowman
2025-06-26 22:42 ` [PATCH v10 16/17] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-06-26 22:42 ` [PATCH v10 17/17] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-07-23 21:55 ` [PATCH v10 00/17] Enable CXL PCIe Port Protocol Error handling and logging dan.j.williams
2025-07-24 15:58   ` Bowman, Terry
2025-08-18 15:18 ` Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc98e985-3c91-403a-8317-c160128fd5c7@amd.com \
    --to=terry.bowman@amd.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=PradeepVineshReddy.Kodamati@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.carpenter@linaro.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ming.li@zohomail.com \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.