From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <ming4.li@intel.com>, <linux-cxl@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<dan.j.williams@intel.com>, <bhelgaas@google.com>,
<mahesh@linux.ibm.com>, <ira.weiny@intel.com>, <oohall@gmail.com>,
<Benjamin.Cheatham@amd.com>, <rrichter@amd.com>,
<nathan.fontenot@amd.com>,
<Smita.KoralahalliChannabasappa@amd.com>, <shiju.jose@huawei.com>,
<M.Chehab@huawei.com>
Subject: Re: [PATCH v2 13/14] cxl/pci: Add trace logging for CXL PCIe port RAS errors
Date: Wed, 30 Oct 2024 16:07:26 +0000 [thread overview]
Message-ID: <20241030160726.0000533e@Huawei.com> (raw)
In-Reply-To: <20241025210305.27499-14-terry.bowman@amd.com>
On Fri, 25 Oct 2024 16:03:04 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> The CXL drivers use kernel trace functions for logging endpoint and
> RCH downstream port RAS errors. Similar functionality is
> required for CXL root ports, CXL downstream switch ports, and CXL
> upstream switch ports.
>
> Introduce trace logging functions for both RAS correctable and
> uncorrectable errors specific to CXL PCIe ports. Additionally, update
> the PCIe port error handlers to invoke these new trace functions.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
+CC Mauro and Shiju to give the tracepoint a sanity check and for
awareness that we have something new to feed rasdaemon :)
Jonathan
> ---
> drivers/cxl/core/pci.c | 16 ++++++++++----
> drivers/cxl/core/trace.h | 47 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 59 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index adb184d346ae..eeb4a64ba5b5 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -661,10 +661,14 @@ static void __cxl_handle_cor_ras(struct device *dev,
>
> addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> status = readl(addr);
> - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
> + return;
> + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> +
> + if (is_cxl_memdev(dev))
> trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
> - }
> + else if (dev_is_pci(dev))
How would you get here otherwise? Is it useful to know it is a pci device
here?
> + trace_cxl_port_aer_correctable_error(dev, status);
> }
>
> static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
> @@ -720,7 +724,11 @@ static bool __cxl_handle_ras(struct device *dev, void __iomem *ras_base)
> }
>
> header_log_copy(ras_base, hl);
> - trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
> + if (is_cxl_memdev(dev))
> + trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
> + else if (dev_is_pci(dev))
as above.
> + trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
> +
> writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
>
> return true;
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 8672b42ee4d1..1c4368a7b50b 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -48,6 +48,34 @@
> { CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" } \
> )
>
> +TRACE_EVENT(cxl_port_aer_uncorrectable_error,
> + TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),
> + TP_ARGS(dev, status, fe, hl),
> + TP_STRUCT__entry(
> + __string(devname, dev_name(dev))
> + __string(host, dev_name(dev->parent))
> + __field(u32, status)
> + __field(u32, first_error)
> + __array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
> + ),
> + TP_fast_assign(
> + __assign_str(devname);
> + __assign_str(host);
> + __entry->status = status;
> + __entry->first_error = fe;
> + /*
> + * Embed the 512B headerlog data for user app retrieval and
> + * parsing, but no need to print this in the trace buffer.
I'm not sure any printing as such goes on in the trace buffer. It is from
the data in the trace buffer I think.
> + */
> + memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
> + ),
> + TP_printk("device=%s host=%s status: '%s' first_error: '%s'",
> + __get_str(devname), __get_str(host),
> + show_uc_errs(__entry->status),
> + show_uc_errs(__entry->first_error)
> + )
> +);
> +
> TRACE_EVENT(cxl_aer_uncorrectable_error,
> TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
> TP_ARGS(cxlmd, status, fe, hl),
> @@ -96,6 +124,25 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
> { CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" } \
> )
>
> +TRACE_EVENT(cxl_port_aer_correctable_error,
> + TP_PROTO(struct device *dev, u32 status),
> + TP_ARGS(dev, status),
> + TP_STRUCT__entry(
> + __string(devname, dev_name(dev))
> + __string(host, dev_name(dev->parent))
> + __field(u32, status)
> + ),
> + TP_fast_assign(
> + __assign_str(devname);
> + __assign_str(host);
> + __entry->status = status;
> + ),
> + TP_printk("device=%s host=%s status='%s'",
> + __get_str(devname), __get_str(host),
> + show_ce_errs(__entry->status)
> + )
> +);
> +
> TRACE_EVENT(cxl_aer_correctable_error,
> TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
> TP_ARGS(cxlmd, status),
next prev parent reply other threads:[~2024-10-30 16:07 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 21:02 [PATCH v2 0/14] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-10-25 21:02 ` [PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
2024-10-30 15:14 ` Jonathan Cameron
2024-10-30 15:15 ` Bowman, Terry
2024-10-31 16:20 ` Dave Jiang
2024-10-31 20:24 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 02/14] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe port support Terry Bowman
2024-10-30 15:13 ` Jonathan Cameron
2024-10-31 16:21 ` Dave Jiang
2024-10-31 20:25 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 03/14] cxl/pci: Introduce helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
2024-10-30 14:57 ` Jonathan Cameron
2024-10-31 16:25 ` Dave Jiang
2024-10-31 21:22 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 04/14] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2024-10-30 14:56 ` Jonathan Cameron
2024-10-31 16:27 ` Dave Jiang
2024-10-31 21:27 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 05/14] PCI/AER: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
2024-10-30 15:13 ` Jonathan Cameron
2024-10-30 15:51 ` Bowman, Terry
2024-11-04 21:50 ` Dan Williams
2024-11-04 22:05 ` Bowman, Terry
2024-10-31 16:37 ` Dave Jiang
2024-10-25 21:02 ` [PATCH v2 06/14] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
2024-10-30 15:37 ` Jonathan Cameron
2024-10-31 16:58 ` Dave Jiang
2024-11-01 13:30 ` Bowman, Terry
2024-10-25 21:02 ` [PATCH v2 07/14] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
2024-10-30 15:42 ` Jonathan Cameron
2024-10-25 21:02 ` [PATCH v2 08/14] cxl/pci: Change find_cxl_ports() to non-static Terry Bowman
2024-10-30 15:45 ` Jonathan Cameron
2024-10-30 15:54 ` Bowman, Terry
2024-10-25 21:03 ` [PATCH v2 09/14] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
2024-10-30 15:55 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 10/14] cxl/pci: Map CXL PCIe upstream " Terry Bowman
2024-10-30 15:56 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 11/14] cxl/pci: Rename RAS handler interfaces to also indicate CXL PCIe port support Terry Bowman
2024-10-30 15:59 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 12/14] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
2024-10-30 16:03 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 13/14] cxl/pci: Add trace logging " Terry Bowman
2024-10-30 16:07 ` Jonathan Cameron [this message]
2024-10-30 21:30 ` Bowman, Terry
2024-10-25 21:03 ` [PATCH v2 14/14] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
2024-10-30 16:11 ` Jonathan Cameron
2024-10-30 21:34 ` Bowman, Terry
2024-10-27 16:59 ` [PATCH v2 0/14] Applies to Base commit: 8cf0b93919e1 (tag: v6.12-rc2) Linux 6.12-rc2 Bowman, Terry
2024-10-28 1:05 ` [PATCH v2 0/14] Enable CXL PCIe port protocol error handling and logging Bowman, Terry
2024-11-01 18:00 ` Fan Ni
2024-11-01 18:28 ` Bowman, Terry
2024-11-01 19:11 ` Fan Ni
2024-11-01 22:11 ` Fan Ni
2024-11-04 21:25 ` Bowman, Terry
2024-11-04 21:48 ` Fan Ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241030160726.0000533e@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=M.Chehab@huawei.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mahesh@linux.ibm.com \
--cc=ming4.li@intel.com \
--cc=nathan.fontenot@amd.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox