From: Dan Williams <dan.j.williams@intel.com>
To: Terry Bowman <terry.bowman@amd.com>, <dave@stgolabs.net>,
<jonathan.cameron@huawei.com>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <dan.j.williams@intel.com>,
<bhelgaas@google.com>, <shiju.jose@huawei.com>,
<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
<rrichter@amd.com>, <dan.carpenter@linaro.org>,
<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
<Benjamin.Cheatham@amd.com>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<linux-cxl@vger.kernel.org>, <vishal.l.verma@intel.com>,
<alucerop@amd.com>, <ira.weiny@intel.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<terry.bowman@amd.com>
Subject: Re: [PATCH v16 07/10] cxl: Update error handlers to support CXL Port devices
Date: Sun, 29 Mar 2026 18:07:08 -0700 [thread overview]
Message-ID: <69c9ccbc3d1cb_178904100cf@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <20260302203648.2886956-8-terry.bowman@amd.com>
Terry Bowman wrote:
> CXL Protocol trace logging is called for Endpoints in cxl_handle_ras() and
> cxl_handle_cor_ras(). Trace logging support for CXL Port devices is missing.
>
> CXL Endpoint trace logging utilizes a separate trace routine than CXL Port
> device handling. Using is_cxl_memdev(), determine if the device is a CXL EP
> or one of the CXL Port devices.
>
> Update cxl_handle_ras() and cxl_handle_cor_ras() to call the CXL Port trace
> logging function. Change cxl_handle_ras() return values to be pci_ers_result_t
> type.
>
> Check for invalid ras_base and add log messages if NULL.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
>
> Changes in v15 -> v16:
> - New commit
> ---
> drivers/cxl/core/core.h | 10 ++++++----
> drivers/cxl/core/ras.c | 36 +++++++++++++++++++++++++-----------
> 2 files changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 76d2593e68c6..984cc37be186 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -6,6 +6,7 @@
>
> #include <cxl/mailbox.h>
> #include <linux/rwsem.h>
> +#include <linux/pci.h>
>
> extern const struct device_type cxl_nvdimm_bridge_type;
> extern const struct device_type cxl_nvdimm_type;
> @@ -181,7 +182,8 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)
> #ifdef CONFIG_CXL_RAS
> int cxl_ras_init(void);
> void cxl_ras_exit(void);
> -bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);
> +pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial,
> + void __iomem *ras_base);
This change is unrelated to the conversion to call different tracepoint
handlers.
> void cxl_handle_cor_ras(struct device *dev, u64 serial,
> void __iomem *ras_base);
> void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> @@ -195,10 +197,10 @@ static inline int cxl_ras_init(void)
> return 0;
> }
> static inline void cxl_ras_exit(void) { }
> -static inline bool cxl_handle_ras(struct device *dev, u64 serial,
> - void __iomem *ras_base)
> +static inline pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial,
> + void __iomem *ras_base)
> {
> - return false;
> + return PCI_ERS_RESULT_NONE;
> }
> static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,
> void __iomem *ras_base) { }
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 48d3ef7cbb92..254144d19764 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -291,15 +291,22 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
> void __iomem *addr;
> u32 status;
>
> - if (!ras_base)
> + if (!ras_base) {
> + pr_err_ratelimited("%s: CXL RAS registers aren't mapped\n",
> + dev_name(dev));
What does this new error print have to do with trace logging?
> return;
> + }
>
> addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> status = readl(addr);
> - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
> + return;
No need for this thrash if the tracepoint is kept unified... more below.
> + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> + if (is_cxl_memdev(dev))
Wait, the whole point of patch 3 was to make this tracepoint generic.
The format string in tracepoints are not supposed to be ABI. Incremental
fixup below. Now, there is a difference between "not supposed to be ABI"
and "whoops someone had a dependency". We can always come back and
restore the old format string if someone screams. It would also be nice
to delete trace_cxl_port_aer_correctable_error(). That one might be more
dicey from an ABI regression standpoint, but worth it if you only need
one tracepoint for all CXL protocol errors regardless of source.
---
drivers/cxl/core/trace.h | 14 +++++++-------
drivers/cxl/core/ras.c | 25 ++++++-------------------
2 files changed, 13 insertions(+), 26 deletions(-)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 5f630543b720..70c9e65bfa08 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -145,22 +145,22 @@ TRACE_EVENT(cxl_port_aer_correctable_error,
);
TRACE_EVENT(cxl_aer_correctable_error,
- TP_PROTO(const struct device *cxlmd, u32 status, u64 serial),
- TP_ARGS(cxlmd, status, serial),
+ TP_PROTO(const struct device *dev, u32 status, u64 serial),
+ TP_ARGS(dev, status, serial),
TP_STRUCT__entry(
- __string(memdev, dev_name(cxlmd))
- __string(host, dev_name(cxlmd->parent))
+ __string(dev, dev_name(dev))
+ __string(host, dev_name(dev->parent))
__field(u64, serial)
__field(u32, status)
),
TP_fast_assign(
- __assign_str(memdev);
+ __assign_str(dev);
__assign_str(host);
__entry->serial = serial;
__entry->status = status;
),
- TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
- __get_str(memdev), __get_str(host), __entry->serial,
+ TP_printk("device=%s host=%s serial=%lld: status: '%s'",
+ __get_str(dev), __get_str(host), __entry->serial,
show_ce_errs(__entry->status)
)
);
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 254144d19764..7a1319ce0f84 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -291,22 +291,15 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
void __iomem *addr;
u32 status;
- if (!ras_base) {
- pr_err_ratelimited("%s: CXL RAS registers aren't mapped\n",
- dev_name(dev));
+ if (!ras_base)
return;
- }
addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
status = readl(addr);
- if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
- return;
-
- writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- if (is_cxl_memdev(dev))
+ if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
+ writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
trace_cxl_aer_correctable_error(dev, status, serial);
- else
- trace_cxl_port_aer_correctable_error(dev, status);
+ }
}
/* CXL spec rev3.0 8.2.4.16.1 */
@@ -338,11 +331,8 @@ cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
u32 status;
u32 fe;
- if (!ras_base) {
- pr_err_ratelimited("%s: CXL RAS registers aren't mapped\n",
- dev_name(dev));
+ if (!ras_base)
return PCI_ERS_RESULT_NONE;
- }
addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
status = readl(addr);
@@ -361,10 +351,7 @@ cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- if (is_cxl_memdev(dev))
- trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
- else
- trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
+ trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return PCI_ERS_RESULT_PANIC;
--
2.53.0
next prev parent reply other threads:[~2026-03-30 1:07 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 20:36 [PATCH v16 00/10] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-03-02 20:36 ` [PATCH v16 01/10] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-03-09 12:20 ` Jonathan Cameron
2026-03-28 0:28 ` Dan Williams
2026-03-29 20:33 ` Dan Williams
2026-03-30 15:33 ` Bowman, Terry
2026-03-30 15:15 ` Bowman, Terry
2026-03-02 20:36 ` [PATCH v16 02/10] PCI/CXL: Update unregistration for AER-CXL and CPER-CXL kfifos Terry Bowman
2026-03-09 12:27 ` Jonathan Cameron
2026-03-11 15:03 ` Bowman, Terry
2026-03-09 18:30 ` Dave Jiang
2026-03-29 21:27 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 03/10] cxl: Update CXL Endpoint tracing Terry Bowman
2026-03-29 21:44 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 04/10] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2026-03-29 21:57 ` Dan Williams
2026-03-30 16:40 ` Bowman, Terry
2026-03-02 20:36 ` [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-03-09 12:45 ` [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flowUIRE Jonathan Cameron
2026-03-30 0:08 ` [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flow Dan Williams
2026-03-02 20:36 ` [PATCH v16 06/10] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-03-09 14:00 ` Jonathan Cameron
2026-03-11 15:21 ` Bowman, Terry
2026-03-30 0:31 ` Dan Williams
2026-03-30 17:02 ` Bowman, Terry
2026-03-02 20:36 ` [PATCH v16 07/10] cxl: Update error handlers to support CXL Port devices Terry Bowman
2026-03-09 14:05 ` Jonathan Cameron
2026-03-11 15:37 ` Bowman, Terry
2026-03-12 13:05 ` Jonathan Cameron
2026-03-30 1:07 ` Dan Williams [this message]
2026-03-30 16:31 ` Bowman, Terry
2026-03-31 2:11 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 08/10] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-03-09 14:12 ` Jonathan Cameron
2026-03-11 15:58 ` Bowman, Terry
2026-03-30 1:22 ` Dan Williams
2026-03-31 18:52 ` Bowman, Terry
2026-03-31 19:23 ` Dan Williams
2026-03-31 19:52 ` Bowman, Terry
2026-04-02 3:39 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 09/10] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-03-09 14:13 ` Jonathan Cameron
2026-03-09 18:55 ` Dave Jiang
2026-03-30 1:24 ` Dan Williams
2026-03-02 20:36 ` [PATCH v16 10/10] cxl: Enable CXL protocol error reporting Terry Bowman
2026-03-30 1:41 ` Dan Williams
2026-03-31 13:31 ` Bowman, Terry
2026-03-31 19:16 ` Dan Williams
2026-03-31 20:50 ` Bowman, Terry
2026-03-31 21:12 ` Bowman, Terry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69c9ccbc3d1cb_178904100cf@dwillia2-mobl4.notmuch \
--to=dan.j.williams@intel.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox