linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dave Jiang <dave.jiang@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-pci@vger.kernel.org>,
	<dan.j.williams@intel.com>, <ira.weiny@intel.com>,
	<vishal.l.verma@intel.com>, <alison.schofield@intel.com>,
	<rostedt@goodmis.org>, <terry.bowman@amd.com>,
	<bhelgaas@google.com>
Subject: Re: [PATCH v3 08/11] cxl/pci: add tracepoint events for CXL RAS
Date: Mon, 21 Nov 2022 11:37:43 +0000	[thread overview]
Message-ID: <20221121113743.000075b6@Huawei.com> (raw)
In-Reply-To: <166879132997.674819.12112190531427523276.stgit@djiang5-desk3.ch.intel.com>

On Fri, 18 Nov 2022 10:08:49 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Add tracepoint events for recording the CXL uncorrectable and correctable
> errors. For uncorrectable errors, there is additional data of 512B from
> the header log register (CXL spec rev3 8.2.4.16.7). The trace event will
> intake a dynamic array that will dump the entire Header Log data. If
> multiple errors are set in the status register, then the
> 'first error' field (CXL spec rev3 v8.2.4.16.6) is read from the Error
> Capabilities and Control Register in order to determine the error.
> 
> This implementation does not include CXL IDE Error details.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
With the stuff Steven raised tidied up this looks good to me now.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>



> ---
>  drivers/cxl/pci.c          |    2 +
>  include/trace/events/cxl.h |  110 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 112 insertions(+)
>  create mode 100644 include/trace/events/cxl.h
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 9428f3e0d99b..0f36a5861a7b 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -13,6 +13,8 @@
>  #include "cxlmem.h"
>  #include "cxlpci.h"
>  #include "cxl.h"
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
>  
>  /**
>   * DOC: cxl pci
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..f8e95d977133
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,110 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_EVENTS_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_EVENTS_H
> +
> +#include <linux/tracepoint.h>
> +
> +#define CXL_HEADERLOG_SIZE		SZ_512
> +#define CXL_HEADERLOG_SIZE_U32		SZ_512 / sizeof(u32)
> +
> +#define CXL_RAS_UC_CACHE_DATA_PARITY	BIT(0)
> +#define CXL_RAS_UC_CACHE_ADDR_PARITY	BIT(1)
> +#define CXL_RAS_UC_CACHE_BE_PARITY	BIT(2)
> +#define CXL_RAS_UC_CACHE_DATA_ECC	BIT(3)
> +#define CXL_RAS_UC_MEM_DATA_PARITY	BIT(4)
> +#define CXL_RAS_UC_MEM_ADDR_PARITY	BIT(5)
> +#define CXL_RAS_UC_MEM_BE_PARITY	BIT(6)
> +#define CXL_RAS_UC_MEM_DATA_ECC		BIT(7)
> +#define CXL_RAS_UC_REINIT_THRESH	BIT(8)
> +#define CXL_RAS_UC_RSVD_ENCODE		BIT(9)
> +#define CXL_RAS_UC_POISON		BIT(10)
> +#define CXL_RAS_UC_RECV_OVERFLOW	BIT(11)
> +#define CXL_RAS_UC_INTERNAL_ERR		BIT(14)
> +#define CXL_RAS_UC_IDE_TX_ERR		BIT(15)
> +#define CXL_RAS_UC_IDE_RX_ERR		BIT(16)
> +
> +#define show_uc_errs(status)	__print_flags(status, " | ",		  \
> +	{ CXL_RAS_UC_CACHE_DATA_PARITY, "Cache Data Parity Error" },	  \
> +	{ CXL_RAS_UC_CACHE_ADDR_PARITY, "Cache Address Parity Error" },	  \
> +	{ CXL_RAS_UC_CACHE_BE_PARITY, "Cache Byte Enable Parity Error" }, \
> +	{ CXL_RAS_UC_CACHE_DATA_ECC, "Cache Data ECC Error" },		  \
> +	{ CXL_RAS_UC_MEM_DATA_PARITY, "Memory Data Parity Error" },	  \
> +	{ CXL_RAS_UC_MEM_ADDR_PARITY, "Memory Address Parity Error" },	  \
> +	{ CXL_RAS_UC_MEM_BE_PARITY, "Memory Byte Enable Parity Error" },  \
> +	{ CXL_RAS_UC_MEM_DATA_ECC, "Memory Data ECC Error" },		  \
> +	{ CXL_RAS_UC_REINIT_THRESH, "REINIT Threshold Hit" },		  \
> +	{ CXL_RAS_UC_RSVD_ENCODE, "Received Unrecognized Encoding" },	  \
> +	{ CXL_RAS_UC_POISON, "Received Poison From Peer" },		  \
> +	{ CXL_RAS_UC_RECV_OVERFLOW, "Receiver Overflow" },		  \
> +	{ CXL_RAS_UC_INTERNAL_ERR, "Component Specific Error" },	  \
> +	{ CXL_RAS_UC_IDE_TX_ERR, "IDE Tx Error" },			  \
> +	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
> +)
> +
> +TRACE_EVENT(cxl_aer_uncorrectable_error,
> +	TP_PROTO(const char *dev_name, u32 status, u32 fe, u32 *hl),
> +	TP_ARGS(dev_name, status, fe, hl),
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(u32, status)
> +		__field(u32, first_error)
> +		__dynamic_array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
> +	),
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->status = status;
> +		__entry->first_error = fe;
> +		/*
> +		 * Embed the 512B headerlog data for user app retrieval and
> +		 * parsing, but no need to print this in the trace buffer.
> +		 */
> +		memcpy(__get_dynamic_array(header_log), hl, CXL_HEADERLOG_SIZE);
> +	),
> +	TP_printk("%s: status: '%s' first_error: '%s'",
> +		  __get_str(dev_name),
> +		  show_uc_errs(__entry->status),
> +		  show_uc_errs(__entry->first_error)
> +	)
> +);
> +
> +#define CXL_RAS_CE_CACHE_DATA_ECC	BIT(0)
> +#define CXL_RAS_CE_MEM_DATA_ECC		BIT(1)
> +#define CXL_RAS_CE_CRC_THRESH		BIT(2)
> +#define CXL_RAS_CE_CACHE_POISON		BIT(3)
> +#define CXL_RAS_CE_MEM_POISON		BIT(4)
> +#define CXL_RAS_CE_PHYS_LAYER_ERR	BIT(5)
> +
> +#define show_ce_errs(status)	__print_flags(status, " | ",			\
> +	{ CXL_RAS_CE_CACHE_DATA_ECC, "Cache Data ECC Error" },			\
> +	{ CXL_RAS_CE_MEM_DATA_ECC, "Memory Data Ecc Error" },			\
> +	{ CXL_RAS_CE_CRC_THRESH, "CRC Threshold Hit" },				\
> +	{ CXL_RAS_CE_CACHE_POISON, "Received Cache Poison From Peer" },		\
> +	{ CXL_RAS_CE_MEM_POISON, "Received Memory Poison From Peer" },		\
> +	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" }	\
> +)
> +
> +TRACE_EVENT(cxl_aer_correctable_error,
> +	TP_PROTO(const char *dev_name, u32 status),
> +	TP_ARGS(dev_name, status),
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(u32, status)
> +	),
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->status = status;
> +	),
> +	TP_printk("%s: status: '%s'",
> +		  __get_str(dev_name), show_ce_errs(__entry->status)
> +	)
> +);
> +
> +#endif /* _CXL_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>
> 
> 


  parent reply	other threads:[~2022-11-21 11:39 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18 17:08 [PATCH v3 00/11] cxl/pci: Add fundamental error handling Dave Jiang
2022-11-18 17:08 ` [PATCH v3 01/11] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers Dave Jiang
2022-11-18 17:08 ` [PATCH v3 02/11] cxl/pci: Cleanup cxl_map_device_regs() Dave Jiang
2022-11-18 17:08 ` [PATCH v3 03/11] cxl/pci: Kill cxl_map_regs() Dave Jiang
2022-11-18 17:08 ` [PATCH v3 04/11] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic Dave Jiang
2022-11-18 17:08 ` [PATCH v3 05/11] cxl/port: Limit the port driver to just the HDM Decoder Capability Dave Jiang
2022-11-18 17:08 ` [PATCH v3 06/11] cxl/pci: Prepare for mapping RAS Capability Structure Dave Jiang
2022-11-18 17:08 ` [PATCH v3 07/11] cxl/pci: Find and map the " Dave Jiang
2022-11-18 17:08 ` [PATCH v3 08/11] cxl/pci: add tracepoint events for CXL RAS Dave Jiang
2022-11-18 17:17   ` Steven Rostedt
2022-11-18 17:31     ` Dave Jiang
2022-11-21 11:37   ` Jonathan Cameron [this message]
2022-11-21 13:08   ` Shiju Jose
2022-11-28 17:54     ` Dave Jiang
2022-11-18 17:08 ` [PATCH v3 09/11] cxl/pci: Add (hopeful) error handling support Dave Jiang
2022-11-21 11:56   ` Jonathan Cameron
2022-11-18 17:09 ` [PATCH v3 10/11] PCI/AER: Add optional logging callback for correctable error Dave Jiang
2022-11-19  1:08   ` Sathyanarayanan Kuppuswamy
2022-11-28 18:19     ` Dave Jiang
2022-11-21 12:05   ` Jonathan Cameron
2022-11-21 12:17     ` Jonathan Cameron
2022-11-28 21:01     ` Dave Jiang
2022-11-18 17:09 ` [PATCH v3 11/11] cxl/pci: Add callback to log AER " Dave Jiang
2022-11-21 12:21   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221121113743.000075b6@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=terry.bowman@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).