From: Dan Williams <dan.j.williams@intel.com>
To: "Bowman, Terry" <terry.bowman@amd.com>,
Dan Williams <dan.j.williams@intel.com>, <dave@stgolabs.net>,
<jonathan.cameron@huawei.com>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <bhelgaas@google.com>,
<shiju.jose@huawei.com>, <ming.li@zohomail.com>,
<Smita.KoralahalliChannabasappa@amd.com>, <rrichter@amd.com>,
<dan.carpenter@linaro.org>, <PradeepVineshReddy.Kodamati@amd.com>,
<lukas@wunner.de>, <Benjamin.Cheatham@amd.com>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<linux-cxl@vger.kernel.org>, <vishal.l.verma@intel.com>,
<alucerop@amd.com>, <ira.weiny@intel.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v16 07/10] cxl: Update error handlers to support CXL Port devices
Date: Mon, 30 Mar 2026 19:11:39 -0700 [thread overview]
Message-ID: <69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <e840b60f-2fec-4645-95d5-27c14ae33853@amd.com>
Bowman, Terry wrote:
> On 3/29/2026 8:07 PM, Dan Williams wrote:
> > Terry Bowman wrote:
> >> CXL Protocol trace logging is called for Endpoints in cxl_handle_ras() and
> >> cxl_handle_cor_ras(). Trace logging support for CXL Port devices is missing.
> >>
> >> CXL Endpoint trace logging utilizes a separate trace routine than CXL Port
> >> device handling. Using is_cxl_memdev(), determine if the device is a CXL EP
> >> or one of the CXL Port devices.
> >>
> >> Update cxl_handle_ras() and cxl_handle_cor_ras() to call the CXL Port trace
> >> logging function. Change cxl_handle_ras() return values to be pci_ers_result_t
> >> type.
> >>
> >> Check for invalid ras_base and add log messages if NULL.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> >>
> >> ---
> >>
> >> Changes in v15 -> v16:
> >> - New commit
> >> ---
> >> drivers/cxl/core/core.h | 10 ++++++----
> >> drivers/cxl/core/ras.c | 36 +++++++++++++++++++++++++-----------
> >> 2 files changed, 31 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> >> index 76d2593e68c6..984cc37be186 100644
> >> --- a/drivers/cxl/core/core.h
> >> +++ b/drivers/cxl/core/core.h
> >> @@ -6,6 +6,7 @@
> >>
> >> #include <cxl/mailbox.h>
> >> #include <linux/rwsem.h>
> >> +#include <linux/pci.h>
> >>
> >> extern const struct device_type cxl_nvdimm_bridge_type;
> >> extern const struct device_type cxl_nvdimm_type;
> >> @@ -181,7 +182,8 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)
> >> #ifdef CONFIG_CXL_RAS
> >> int cxl_ras_init(void);
> >> void cxl_ras_exit(void);
> >> -bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
> >> +pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial,
> >> + void __iomem *ras_base);
> >
> > This change is unrelated to the conversion to call different tracepoint
> > handlers.
> >
> >> void cxl_handle_cor_ras(struct device *dev, u64 serial,
> >> void __iomem *ras_base);
> >> void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> >> @@ -195,10 +197,10 @@ static inline int cxl_ras_init(void)
> >> return 0;
> >> }
> >> static inline void cxl_ras_exit(void) { }
> >> -static inline bool cxl_handle_ras(struct device *dev, u64 serial,
> >> - void __iomem *ras_base)
> >> +static inline pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial,
> >> + void __iomem *ras_base)
> >> {
> >> - return false;
> >> + return PCI_ERS_RESULT_NONE;
> >> }
> >> static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,
> >> void __iomem *ras_base) { }
> >> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> >> index 48d3ef7cbb92..254144d19764 100644
> >> --- a/drivers/cxl/core/ras.c
> >> +++ b/drivers/cxl/core/ras.c
> >> @@ -291,15 +291,22 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> >> void __iomem *addr;
> >> u32 status;
> >>
> >> - if (!ras_base)
> >> + if (!ras_base) {
> >> + pr_err_ratelimited("%s: CXL RAS registers aren't mapped\n",
> >> + dev_name(dev));
> >
> > What does this new error print have to do with trace logging?
>
> The error print will be logged when the trace is absent. This is an indication
> to the user why the expected trace is missing and will help in debugging where
> there is no other details otherwise.
Imagine you have a device that does not support the RAS capbality. How
do we even get to this far into the code without that detail having
already been reported? For something discovered statically at init time
why does the kernel need to warn over and over again?
So no, I am not seeing a need to spam the log for this, especially since
cxl_handle_cor_ras() has never needed this in the past.
> >> return;
> >> + }
> >>
> >> addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> >> status = readl(addr);
> >> - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> >> - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> >> + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
> >> + return;
> >
> > No need for this thrash if the tracepoint is kept unified... more below.
> >
> >> + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> >> + if (is_cxl_memdev(dev))
> >
> > Wait, the whole point of patch 3 was to make this tracepoint generic.
> >
> > The format string in tracepoints are not supposed to be ABI. Incremental
> > fixup below. Now, there is a difference between "not supposed to be ABI"
> > and "whoops someone had a dependency". We can always come back and
> > restore the old format string if someone screams. It would also be nice
> > to delete trace_cxl_port_aer_correctable_error(). That one might be more
> > dicey from an ABI regression standpoint, but worth it if you only need
> > one tracepoint for all CXL protocol errors regardless of source.
> >
>
> Would you like that logic moved to the trace routine? The trace routine would
> require additional changes to determine if the device is a PCI device. This is
> required inorder to format the displayed strings. The Endpoint displays 'memdev'
> which doesn't apply to non-Endpoint devices.
No, as I showed below, just change the format string to drop the
assumption that the device is a 'struct cxl_memdev'.
I will wrap this diff I pasted earlier into a fixup patch.
> > - TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
> > - __get_str(memdev), __get_str(host), __entry->serial,
> > + TP_printk("device=%s host=%s serial=%lld: status: '%s'",
> > + __get_str(dev), __get_str(host), __entry->serial,
> > show_ce_errs(__entry->status)
> > )
> > );
next prev parent reply other threads:[~2026-03-31 2:11 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 20:36 [PATCH v16 00/10] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-03-02 20:36 ` [PATCH v16 01/10] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-03-09 12:20 ` Jonathan Cameron
2026-03-28 0:28 ` Dan Williams
2026-03-29 20:33 ` Dan Williams
2026-03-30 15:33 ` Bowman, Terry
2026-03-30 15:15 ` Bowman, Terry
2026-03-02 20:36 ` [PATCH v16 02/10] PCI/CXL: Update unregistration for AER-CXL and CPER-CXL kfifos Terry Bowman
2026-03-09 12:27 ` Jonathan Cameron
2026-03-11 15:03 ` Bowman, Terry
2026-03-09 18:30 ` Dave Jiang
2026-03-29 21:27 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 03/10] cxl: Update CXL Endpoint tracing Terry Bowman
2026-03-29 21:44 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 04/10] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2026-03-29 21:57 ` Dan Williams
2026-03-30 16:40 ` Bowman, Terry
2026-03-02 20:36 ` [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-03-09 12:45 ` [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flowUIRE Jonathan Cameron
2026-03-30 0:08 ` [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flow Dan Williams
2026-03-02 20:36 ` [PATCH v16 06/10] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-03-09 14:00 ` Jonathan Cameron
2026-03-11 15:21 ` Bowman, Terry
2026-03-30 0:31 ` Dan Williams
2026-03-30 17:02 ` Bowman, Terry
2026-03-02 20:36 ` [PATCH v16 07/10] cxl: Update error handlers to support CXL Port devices Terry Bowman
2026-03-09 14:05 ` Jonathan Cameron
2026-03-11 15:37 ` Bowman, Terry
2026-03-12 13:05 ` Jonathan Cameron
2026-03-30 1:07 ` Dan Williams
2026-03-30 16:31 ` Bowman, Terry
2026-03-31 2:11 ` Dan Williams [this message]
2026-03-02 20:36 ` [PATCH v16 08/10] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-03-09 14:12 ` Jonathan Cameron
2026-03-11 15:58 ` Bowman, Terry
2026-03-30 1:22 ` Dan Williams
2026-03-31 18:52 ` Bowman, Terry
2026-03-31 19:23 ` Dan Williams
2026-03-31 19:52 ` Bowman, Terry
2026-04-02 3:39 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 09/10] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-03-09 14:13 ` Jonathan Cameron
2026-03-09 18:55 ` Dave Jiang
2026-03-30 1:24 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 10/10] cxl: Enable CXL protocol error reporting Terry Bowman
2026-03-30 1:41 ` Dan Williams
2026-03-31 13:31 ` Bowman, Terry
2026-03-31 19:16 ` Dan Williams
2026-03-31 20:50 ` Bowman, Terry
2026-03-31 21:12 ` Bowman, Terry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch \
--to=dan.j.williams@intel.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox