public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
* [ndctl PATCH 0/3] Enable CXL protocol testing
@ 2026-04-08 20:32 Terry Bowman
  2026-04-08 20:32 ` [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject Terry Bowman
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Terry Bowman @ 2026-04-08 20:32 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
	sathyanarayanan.kuppuswamy, nvdimm, alucerop, ira.weiny
  Cc: linux-cxl, terry.bowman

Current CXL error injection (EINJ) only supports Root Port protocol error
injection but a method to test all CXL devices is needed. This series
outlines methods to update both the kernel and the 'aer-inject' tool-without
relying on EINJ-to enable CXL RAS protocol error handling across all CXL
devices.

The first patch provides the scripts to enable and trigger AER protocol
errors. This patch also includes the README.md with install details.

The second patch enables correctable and uncorrectable AER internal error
handling in the aer-inject tool.

The third patch is a kernel patch to set the RAS status in the handler. 

Terry Bowman (3):
  test/cxl: Enable CXL protocol error testing using aer-inject
  test/aer-inject: Add aer-inject correctable and uncorrectable interanl
    error support
  test/cxl: Force RAS status in cxl_handle_cor_ras() and
    cxl_handle_ras()

 test/contrib/cxl-aer-einj/README.md           | 80 ++++++++++++++++
 ...Add-internal-error-injection-support.patch | 91 +++++++++++++++++++
 ...AS-status-in-cxl_handle_cor_ras-and-.patch | 51 +++++++++++
 .../cxl-aer-einj/scripts/ds-ce-inject.sh      |  4 +
 .../cxl-aer-einj/scripts/ds-uce-inject.sh     |  4 +
 .../cxl-aer-einj/scripts/enable-trace.sh      |  5 +
 .../cxl-aer-einj/scripts/ep-ce-inject.sh      |  4 +
 .../cxl-aer-einj/scripts/ep-uce-inject.sh     |  4 +
 .../cxl-aer-einj/scripts/root-ce-inject.sh    |  4 +
 .../cxl-aer-einj/scripts/root-uce-inject.sh   |  4 +
 .../cxl-aer-einj/scripts/us-ce-inject.sh      |  4 +
 .../cxl-aer-einj/scripts/us-uce-inject.sh     |  4 +
 12 files changed, 259 insertions(+)
 create mode 100644 test/contrib/cxl-aer-einj/README.md
 create mode 100644 test/contrib/cxl-aer-einj/patches/0001-aer-inject-Add-internal-error-injection-support.patch
 create mode 100644 test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/enable-trace.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh


base-commit: 8ad90e54f0ff4f7291e7f21d44d769d10f24e2b6
-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject
  2026-04-08 20:32 [ndctl PATCH 0/3] Enable CXL protocol testing Terry Bowman
@ 2026-04-08 20:32 ` Terry Bowman
  2026-04-08 21:39   ` Cheatham, Benjamin
  2026-04-08 20:32 ` [ndctl PATCH 2/3] test/aer-inject: Add aer-inject correctable and uncorrectable interanl error support Terry Bowman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Terry Bowman @ 2026-04-08 20:32 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
	sathyanarayanan.kuppuswamy, nvdimm, alucerop, ira.weiny
  Cc: linux-cxl, terry.bowman

CXL protocol errors are signaled to the kernel's CXL drivers via PCIe
Advanced Error Reporting (AER) internal error types: Uncorrectable Internal
Errors (UIE) and Correctable Internal Errors (CIE). These errors in-turn
trigger RAS handling paths in the CXL core. The `aer-inject` tool has
lacked the ability to generate AER UIE and CIE events, making it difficult
to verify kernel handling without actual hardware protocol error conditions.
To address this testing gap, this patch introduces tooling and scripts that
allow for injected CXL protocol errors to be delivered to the CXL core.

This change adds a new `test/contrib/cxl-aer-einj` directory containing:

 - A README with instructions, prerequisites, and caveats for simulating CXL
   protocol errors using UIE/CIE AER injection.
 - Example sed commands for hardcoding ECC cache bits in CXL RAS handlers as
   a debug workaround for testing with zero hardware status.
 - Scripts to enable CXL tracing and to invoke CE/UCE injections for root
   ports, upstream switches, downstream ports, and endpoints.

The below patches are required to complete support. These patches will follow:

 - Patch (`0001-aer-inject-Add-internal-error-injection-support.patch`) for
   `aer-inject` to support UIE and CIE injection by defining new constants and
   updating parser rules.
 - Kernel test patch ('0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch')
   to force setting RAS status.

With internal error injection support in `aer-inject`, developers can trigger
RAS paths reliably in the CXL core and validate their protocol error handling
logic without relying on physical fault conditions.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 test/contrib/cxl-aer-einj/README.md           | 80 +++++++++++++++++++
 .../cxl-aer-einj/scripts/ds-ce-inject.sh      |  4 +
 .../cxl-aer-einj/scripts/ds-uce-inject.sh     |  4 +
 .../cxl-aer-einj/scripts/enable-trace.sh      |  5 ++
 .../cxl-aer-einj/scripts/ep-ce-inject.sh      |  4 +
 .../cxl-aer-einj/scripts/ep-uce-inject.sh     |  4 +
 .../cxl-aer-einj/scripts/root-ce-inject.sh    |  4 +
 .../cxl-aer-einj/scripts/root-uce-inject.sh   |  4 +
 .../cxl-aer-einj/scripts/us-ce-inject.sh      |  4 +
 .../cxl-aer-einj/scripts/us-uce-inject.sh     |  4 +
 10 files changed, 117 insertions(+)
 create mode 100644 test/contrib/cxl-aer-einj/README.md
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/enable-trace.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
 create mode 100755 test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh

diff --git a/test/contrib/cxl-aer-einj/README.md b/test/contrib/cxl-aer-einj/README.md
new file mode 100644
index 0000000..d31b572
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/README.md
@@ -0,0 +1,80 @@
+**Testing CXL Protocol Errors Using AER Injection**
+
+The `aer-inject` tool currently does not support injecting internal errors such as Correctable Internal Errors (CIE) and Uncorrectable Internal Errors (UIE). By default, internal errors are masked according to the PCI specification and are rarely used. However, these internal errors are now leveraged to notify the PCI and CXL subsystems of CXL protocol errors. The attached patches enable support for CE and UCE internal errors in `aer-inject`, allowing you to test CXL RAS functionality.
+
+**Important Caveats:**
+- `aer-inject` will only inject AER errors and does not inject CXL RAS-specific errors directly.
+- As a result, functions like `cxl_handle_ras()` and `cxl_handle_cor_ras()` will detect a status of 0 and exit early, which hampers testing.
+- To work around this, a debug patch must be added (example included) to hardcode the last RAS error status in `cxl_handle_ras()` and `cxl_handle_cor_ras()`. While not ideal, this workaround facilitates testing of the software paths involved. This is addressed below in 'Patch'.
+
+---
+
+### Prerequisites
+- `aer-einj` tool from: https://github.com/intel/aer-inject
+- Kernel configuration options:
+  ```
+  CONFIG_PCIEAER=y
+  CONFIG_PCIEAER_INJECT=y
+  CONFIG_PCIEPORTBUS=y
+  CONFIG_DEBUG_FS=y
+  CONFIG_CXL_PCI
+  CONFIG_CXL_RAS
+  CONFIG_CXL_PORT
+  CONFIG_CXL_BUS
+  ```
+  
+---
+
+### aer-inject Patch Details
+- The patch adds support for injecting both correctable (CE) and uncorrectable (UCE) internal errors.
+- The patch is located in `./patches` and should be applied to the `aer-inject` repository, based on the master branch (commit `81701cb`). The patch is '0001-aer-inject-Add-internal-error-injection-support.patch'.
+- Additionally, you'll need to apply a kernel-side workaround by hardcoding the RAS error status in the relevant handler, as described earlier.
+
+### Kernel patch Details
+Below is patch to set the RAS for testing. 'sed' scripts are also included 
+
+#### Kernel Patch to set CXL RAS status for testing
+Setting CXL protocol RAS status, is based on v7.0-rc6 (7aaa8047eafd). Patch is:
+0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
+
+#### Script to set the Kernel's CXL RAS status
+##### 1: Correctable Errors (CE)
+```bash
+sed -i '
+/void cxl_handle_cor_ras/,/}/ {
+	/status = readl(addr);/ {
+		i #define CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC 0x1
+		a\    status |= CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC;
+	}
+}' drivers/cxl/core/ras.c
+```
+##### 2: Uncorrectable Errors (UCE)
+```bash
+sed -i '
+/bool cxl_handle_ras/,/}/ {
+	/status = readl(addr);/ {
+		i #define CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC 0x1
+		a\    status |= CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC;
+	}
+}' drivers/cxl/core/ras.c
+```
+---
+
+### Testing Procedure
+- The provided scripts illustrate how I ran the tests. You'll need to modify the scripts to use the correct BDFs for your system.
+- Alternatively, you can run the tests manually using commands like:
+
+```bash
+aer-inject -s ${bdf} examples/correctable.internal
+```
+
+and
+
+```bash
+aer-inject -s ${bdf} examples/fatal.internal
+```
+
+*Ensure you replace `${bdf}` with the appropriate PCI BDF for your device.*
+
+---
+
diff --git a/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
new file mode 100755
index 0000000..c0e3417
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0e:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
new file mode 100755
index 0000000..e238f63
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0e:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/enable-trace.sh b/test/contrib/cxl-aer-einj/scripts/enable-trace.sh
new file mode 100755
index 0000000..753419f
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/enable-trace.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+echo 1 >  /sys/kernel/debug/tracing/events/cxl/enable
+echo 1 > /sys/kernel/debug/tracing/events/cxl/cxl_aer_correctable_error/enable
+echo 1 > /sys/kernel/debug/tracing/events/cxl/cxl_aer_uncorrectable_error/enable
diff --git a/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
new file mode 100755
index 0000000..3077c3c
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0f:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
new file mode 100755
index 0000000..9dad325
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0000:0f:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
new file mode 100755
index 0000000..768522e
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0c:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
new file mode 100755
index 0000000..7238983
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0c:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
new file mode 100755
index 0000000..12ac104
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0d:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh
new file mode 100755
index 0000000..bcd130e
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0d:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [ndctl PATCH 2/3] test/aer-inject: Add aer-inject correctable and uncorrectable interanl error support
  2026-04-08 20:32 [ndctl PATCH 0/3] Enable CXL protocol testing Terry Bowman
  2026-04-08 20:32 ` [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject Terry Bowman
@ 2026-04-08 20:32 ` Terry Bowman
  2026-04-08 20:32 ` [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras() Terry Bowman
  2026-04-08 21:39 ` [ndctl PATCH 0/3] Enable CXL protocol testing Cheatham, Benjamin
  3 siblings, 0 replies; 8+ messages in thread
From: Terry Bowman @ 2026-04-08 20:32 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
	sathyanarayanan.kuppuswamy, nvdimm, alucerop, ira.weiny
  Cc: linux-cxl, terry.bowman

The `aer-inject` tool currently does not support injecting Correctable
Internal Errors (CIE) or Uncorrectable Internal Errors (UIE). By default,
internal errors are masked according to the PCI specification and are
generally not used. However, these internal errors are now leveraged to
notify the PCI and CXL subsystems of CXL protocol errors. The attached
patches enable support for CIE and UIE internal errors in `aer-inject`,
allowing for injected CXL protocol errors to be delivered to the CXL core.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 ...Add-internal-error-injection-support.patch | 91 +++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100644 test/contrib/cxl-aer-einj/patches/0001-aer-inject-Add-internal-error-injection-support.patch

diff --git a/test/contrib/cxl-aer-einj/patches/0001-aer-inject-Add-internal-error-injection-support.patch b/test/contrib/cxl-aer-einj/patches/0001-aer-inject-Add-internal-error-injection-support.patch
new file mode 100644
index 0000000..e5675ee
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/patches/0001-aer-inject-Add-internal-error-injection-support.patch
@@ -0,0 +1,91 @@
+From 9d273a798950122059e9428a698d1d9d2520362b Mon Sep 17 00:00:00 2001
+From: Terry Bowman <terry.bowman@amd.com>
+Date: Thu, 17 Oct 2024 12:12:58 -0500
+Subject: [PATCH] aer-inject: Add internal error injection support
+
+The `aer-inject` tool currently does not support injecting internal errors
+such as Correctable Errors (CE) and Uncorrectable Errors (UCE). By default,
+internal errors are masked according to the PCI specification and are
+generally not used. However, these internal errors are now leveraged to
+notify the PCI and CXL subsystems of CXL protocol errors. The attached
+patches enable support for CE and UCE internal errors in `aer-inject`,
+allowing for injected CXL protocol errors to be delivered to the CXL core.
+
+Signed-off-by: Terry Bowman <terry.bowman@amd.com>
+---
+ aer.h   | 2 ++
+ aer.lex | 2 ++
+ aer.y   | 8 ++++----
+ 3 files changed, 8 insertions(+), 4 deletions(-)
+
+diff --git a/aer.h b/aer.h
+index a0ad152..e55a731 100644
+--- a/aer.h
++++ b/aer.h
+@@ -30,11 +30,13 @@ struct aer_error_inj
+ #define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
+ #define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
+ #define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
++#define  PCI_ERR_UNC_INTERNAL   0x00400000      /* Internal error */
+ #define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
+ #define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
+ #define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
+ #define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
+ #define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
++#define  PCI_ERR_COR_CINTERNAL	0x00004000	/* Internal error */
+ 
+ extern void init_aer(struct aer_error_inj *err);
+ extern void submit_aer(struct aer_error_inj *err);
+diff --git a/aer.lex b/aer.lex
+index 6121e4e..4fadd0e 100644
+--- a/aer.lex
++++ b/aer.lex
+@@ -82,11 +82,13 @@ static struct key {
+ 	KEYVAL(MALF_TLP, PCI_ERR_UNC_MALF_TLP),
+ 	KEYVAL(ECRC, PCI_ERR_UNC_ECRC),
+ 	KEYVAL(UNSUP, PCI_ERR_UNC_UNSUP),
++	KEYVAL(INTERNAL, PCI_ERR_UNC_INTERNAL),
+ 	KEYVAL(RCVR, PCI_ERR_COR_RCVR),
+ 	KEYVAL(BAD_TLP, PCI_ERR_COR_BAD_TLP),
+ 	KEYVAL(BAD_DLLP, PCI_ERR_COR_BAD_DLLP),
+ 	KEYVAL(REP_ROLL, PCI_ERR_COR_REP_ROLL),
+ 	KEYVAL(REP_TIMER, PCI_ERR_COR_REP_TIMER),
++	KEYVAL(CINTERNAL, PCI_ERR_COR_CINTERNAL),
+ };
+ 
+ static int cmp_key(const void *av, const void *bv)
+diff --git a/aer.y b/aer.y
+index e5ecc7d..500dc97 100644
+--- a/aer.y
++++ b/aer.y
+@@ -34,8 +34,8 @@ static void init(void);
+ 
+ %token AER DOMAIN BUS DEV FN PCI_ID UNCOR_STATUS COR_STATUS HEADER_LOG
+ %token <num> TRAIN DLP POISON_TLP FCP COMP_TIME COMP_ABORT UNX_COMP RX_OVER
+-%token <num> MALF_TLP ECRC UNSUP
+-%token <num> RCVR BAD_TLP BAD_DLLP REP_ROLL REP_TIMER
++%token <num> MALF_TLP ECRC UNSUP INTERNAL
++%token <num> RCVR BAD_TLP BAD_DLLP REP_ROLL REP_TIMER CINTERNAL
+ %token <num> SYMBOL NUMBER
+ %token <str> PCI_ID_STR
+ 
+@@ -77,14 +77,14 @@ uncor_status_list: /* empty */			{ $$ = 0; }
+ 	;
+ 
+ uncor_status: TRAIN | DLP | POISON_TLP | FCP | COMP_TIME | COMP_ABORT
+-	| UNX_COMP | RX_OVER | MALF_TLP | ECRC | UNSUP | NUMBER
++	| UNX_COMP | RX_OVER | MALF_TLP | ECRC | UNSUP | INTERNAL | NUMBER
+ 	;
+ 
+ cor_status_list: /* empty */			{ $$ = 0; }
+ 	| cor_status_list cor_status		{ $$ = $1 | $2; }
+ 	;
+ 
+-cor_status: RCVR | BAD_TLP | BAD_DLLP | REP_ROLL | REP_TIMER | NUMBER
++cor_status: RCVR | BAD_TLP | BAD_DLLP | REP_ROLL | REP_TIMER | CINTERNAL | NUMBER
+ 	;
+ 
+ %% 
+-- 
+2.34.1
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras()
  2026-04-08 20:32 [ndctl PATCH 0/3] Enable CXL protocol testing Terry Bowman
  2026-04-08 20:32 ` [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject Terry Bowman
  2026-04-08 20:32 ` [ndctl PATCH 2/3] test/aer-inject: Add aer-inject correctable and uncorrectable interanl error support Terry Bowman
@ 2026-04-08 20:32 ` Terry Bowman
  2026-04-08 21:39   ` Cheatham, Benjamin
  2026-04-08 21:39 ` [ndctl PATCH 0/3] Enable CXL protocol testing Cheatham, Benjamin
  3 siblings, 1 reply; 8+ messages in thread
From: Terry Bowman @ 2026-04-08 20:32 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, Benjamin.Cheatham,
	sathyanarayanan.kuppuswamy, nvdimm, alucerop, ira.weiny
  Cc: linux-cxl, terry.bowman

CXL RAS error injection (EINJ) is present for Root Ports but not for other CXL
devices. Provide the means to test protocol errors for all CXL devices
in a testing environment. Use 'aer-inject' userspace tool to deliver AER
internal error notification which will trigger the CXL driver's RAS
handling. Hardcode the status for CE and UCE errors in cxl_handle_ras()
and cxl_handle_cor_ras().

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 ...AS-status-in-cxl_handle_cor_ras-and-.patch | 51 +++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100644 test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch

diff --git a/test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch b/test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
new file mode 100644
index 0000000..d8562cc
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
@@ -0,0 +1,51 @@
+From 1b4054d82a1834e211ef3f284b9f51926db8f060 Mon Sep 17 00:00:00 2001
+From: Terry Bowman <terry.bowman@amd.com>
+Date: Tue, 7 Apr 2026 16:47:45 -0500
+Subject: [PATCH] test/cxl: Force RAS status in cxl_handle_cor_ras() and
+ cxl_handle_ras()
+
+CXL RAS error injection is present for Root Ports but not for other CXL
+devices. Provide the means to test protocol errors for all CXL devices
+in a testing environment. Use 'aer-inject' userspace tool to deliver AER
+internal error notification which will trigger the CXL driver's RAS
+handling. Hardcode the status for CE and UCE errors in cxl_handle_ras()
+and cxl_handle_cor_ras().
+
+Signed-off-by: Terry Bowman <terry.bowman@amd.com>
+---
+ drivers/cxl/core/ras.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
+index 006c6ffc2f56..09ff82973c70 100644
+--- a/drivers/cxl/core/ras.c
++++ b/drivers/cxl/core/ras.c
+@@ -183,6 +183,9 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
+ }
+ EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
+ 
++#define CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC 0x1
++#define CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC 0x1
++
+ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
+ {
+ 	void __iomem *addr;
+@@ -193,6 +196,7 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
+ 
+ 	addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
+ 	status = readl(addr);
++	status |= CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC;
+ 	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
+ 		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+ 		trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
+@@ -232,6 +236,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
+ 
+ 	addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
+ 	status = readl(addr);
++	status |= CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC;
+ 	if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
+ 		return false;
+ 
+-- 
+2.34.1
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras()
  2026-04-08 20:32 ` [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras() Terry Bowman
@ 2026-04-08 21:39   ` Cheatham, Benjamin
  0 siblings, 0 replies; 8+ messages in thread
From: Cheatham, Benjamin @ 2026-04-08 21:39 UTC (permalink / raw)
  To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
	alison.schofield, dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, sathyanarayanan.kuppuswamy,
	nvdimm, alucerop, ira.weiny
  Cc: linux-cxl

On 4/8/2026 3:32 PM, Terry Bowman wrote:
> CXL RAS error injection (EINJ) is present for Root Ports but not for other CXL
> devices. Provide the means to test protocol errors for all CXL devices
> in a testing environment. Use 'aer-inject' userspace tool to deliver AER
> internal error notification which will trigger the CXL driver's RAS
> handling. Hardcode the status for CE and UCE errors in cxl_handle_ras()
> and cxl_handle_cor_ras().
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  ...AS-status-in-cxl_handle_cor_ras-and-.patch | 51 +++++++++++++++++++
>  1 file changed, 51 insertions(+)
>  create mode 100644 test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
> 

The below patch should just be sent to the kernel list. I'm not sure you can create a cxl_test wrapper for cxl_handle_cor_ras()/cxl_handle_ras()
since they're in the core, but that would be the ideal situation. Otherwise, you could create a Kconfig symbol that's explicitly for testing
and create stub functions to force the values as needed. Basically something like:

cxl/core/core.h:
#ifdef CONFIG_CXL_RAS_TEST
#define CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC 0x1
#define CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC 0x1

u32 cxl_ras_uncor_readl(void *addr) {
	u32 status = readl(addr);
	return status | CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC;
}

u32 cxl_ras_cor_readl(void *addr) {
	u32 status = readl(addr);
	return status | CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC;
}
#else
#define cxl_ras_uncor_readl readl
#define cxl_ras_cor_readl readl
#endif

You can obviously consolidate the uncor/cor versions, but I'll let you figure out the naming there ;).
Can also be a bit safer and create static inline functions in the #else, but I just went the easy route.

> diff --git a/test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch b/test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
> new file mode 100644
> index 0000000..d8562cc
> --- /dev/null
> +++ b/test/contrib/cxl-aer-einj/patches/0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
> @@ -0,0 +1,51 @@
> +From 1b4054d82a1834e211ef3f284b9f51926db8f060 Mon Sep 17 00:00:00 2001
> +From: Terry Bowman <terry.bowman@amd.com>
> +Date: Tue, 7 Apr 2026 16:47:45 -0500
> +Subject: [PATCH] test/cxl: Force RAS status in cxl_handle_cor_ras() and
> + cxl_handle_ras()
> +
> +CXL RAS error injection is present for Root Ports but not for other CXL
> +devices. Provide the means to test protocol errors for all CXL devices
> +in a testing environment. Use 'aer-inject' userspace tool to deliver AER
> +internal error notification which will trigger the CXL driver's RAS
> +handling. Hardcode the status for CE and UCE errors in cxl_handle_ras()
> +and cxl_handle_cor_ras().
> +
> +Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> +---
> + drivers/cxl/core/ras.c | 5 +++++
> + 1 file changed, 5 insertions(+)
> +
> +diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> +index 006c6ffc2f56..09ff82973c70 100644
> +--- a/drivers/cxl/core/ras.c
> ++++ b/drivers/cxl/core/ras.c
> +@@ -183,6 +183,9 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
> + }
> + EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
> + 
> ++#define CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC 0x1
> ++#define CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC 0x1
> ++
> + void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
> + {
> + 	void __iomem *addr;
> +@@ -193,6 +196,7 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)
> + 
> + 	addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> + 	status = readl(addr);
> ++	status |= CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC;
> + 	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> + 		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> + 		trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
> +@@ -232,6 +236,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)
> + 
> + 	addr = ras_base + CXL_RAS_UNCORRECTABLE_STATUS_OFFSET;
> + 	status = readl(addr);
> ++	status |= CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC;
> + 	if (!(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK))
> + 		return false;
> + 
> +-- 
> +2.34.1
> +


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject
  2026-04-08 20:32 ` [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject Terry Bowman
@ 2026-04-08 21:39   ` Cheatham, Benjamin
  0 siblings, 0 replies; 8+ messages in thread
From: Cheatham, Benjamin @ 2026-04-08 21:39 UTC (permalink / raw)
  To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
	alison.schofield, dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, sathyanarayanan.kuppuswamy,
	nvdimm, alucerop, ira.weiny
  Cc: linux-cxl

On 4/8/2026 3:32 PM, Terry Bowman wrote:
> CXL protocol errors are signaled to the kernel's CXL drivers via PCIe
> Advanced Error Reporting (AER) internal error types: Uncorrectable Internal
> Errors (UIE) and Correctable Internal Errors (CIE). These errors in-turn
> trigger RAS handling paths in the CXL core. The `aer-inject` tool has
> lacked the ability to generate AER UIE and CIE events, making it difficult
> to verify kernel handling without actual hardware protocol error conditions.
> To address this testing gap, this patch introduces tooling and scripts that
> allow for injected CXL protocol errors to be delivered to the CXL core.
> 
> This change adds a new `test/contrib/cxl-aer-einj` directory containing:
> 
>  - A README with instructions, prerequisites, and caveats for simulating CXL
>    protocol errors using UIE/CIE AER injection.
>  - Example sed commands for hardcoding ECC cache bits in CXL RAS handlers as
>    a debug workaround for testing with zero hardware status.
>  - Scripts to enable CXL tracing and to invoke CE/UCE injections for root
>    ports, upstream switches, downstream ports, and endpoints.
> 
> The below patches are required to complete support. These patches will follow:
> 
>  - Patch (`0001-aer-inject-Add-internal-error-injection-support.patch`) for
>    `aer-inject` to support UIE and CIE injection by defining new constants and
>    updating parser rules.

I have no clue what the aer-inject PR process is, but I'd expect that to be added in that project
instead of this one. If it isn't maintained or private then I guess this can work, but it's probably
out of scope for ndctl (if I had to guess).

>  - Kernel test patch ('0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch')
>    to force setting RAS status.

Same thing for this, but I'll save comments for patch 3/3.

> 
> With internal error injection support in `aer-inject`, developers can trigger
> RAS paths reliably in the CXL core and validate their protocol error handling
> logic without relying on physical fault conditions.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  test/contrib/cxl-aer-einj/README.md           | 80 +++++++++++++++++++
>  .../cxl-aer-einj/scripts/ds-ce-inject.sh      |  4 +
>  .../cxl-aer-einj/scripts/ds-uce-inject.sh     |  4 +
>  .../cxl-aer-einj/scripts/enable-trace.sh      |  5 ++
>  .../cxl-aer-einj/scripts/ep-ce-inject.sh      |  4 +
>  .../cxl-aer-einj/scripts/ep-uce-inject.sh     |  4 +
>  .../cxl-aer-einj/scripts/root-ce-inject.sh    |  4 +
>  .../cxl-aer-einj/scripts/root-uce-inject.sh   |  4 +
>  .../cxl-aer-einj/scripts/us-ce-inject.sh      |  4 +
>  .../cxl-aer-einj/scripts/us-uce-inject.sh     |  4 +

These paths are somewhat redundant. There's already a scripts directory, so I'd put all of
the .sh scripts in a cxl-aer-einj subdir in the scripts directory (so ndctl/scripts/cxl-aer-einj/*.sh).

As for the README, the contents would be merged into the inject-protocol-error documentation and the
file removed (see my comments on the cover letter).

>  10 files changed, 117 insertions(+)
>  create mode 100644 test/contrib/cxl-aer-einj/README.md
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/enable-trace.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
>  create mode 100755 test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh
> 
> diff --git a/test/contrib/cxl-aer-einj/README.md b/test/contrib/cxl-aer-einj/README.md
> new file mode 100644
> index 0000000..d31b572
> --- /dev/null
> +++ b/test/contrib/cxl-aer-einj/README.md
> @@ -0,0 +1,80 @@
> +**Testing CXL Protocol Errors Using AER Injection**
> +
> +The `aer-inject` tool currently does not support injecting internal errors such as Correctable Internal Errors (CIE) and Uncorrectable Internal Errors (UIE). By default, internal errors are masked according to the PCI specification and are rarely used. However, these internal errors are now leveraged to notify the PCI and CXL subsystems of CXL protocol errors. The attached patches enable support for CE and UCE internal errors in `aer-inject`, allowing you to test CXL RAS functionality.
> +
> +**Important Caveats:**
> +- `aer-inject` will only inject AER errors and does not inject CXL RAS-specific errors directly.
> +- As a result, functions like `cxl_handle_ras()` and `cxl_handle_cor_ras()` will detect a status of 0 and exit early, which hampers testing.
> +- To work around this, a debug patch must be added (example included) to hardcode the last RAS error status in `cxl_handle_ras()` and `cxl_handle_cor_ras()`. While not ideal, this workaround facilitates testing of the software paths involved. This is addressed below in 'Patch'.
> +
> +---
> +
> +### Prerequisites
> +- `aer-einj` tool from: https://github.com/intel/aer-inject
> +- Kernel configuration options:
> +  ```
> +  CONFIG_PCIEAER=y
> +  CONFIG_PCIEAER_INJECT=y
> +  CONFIG_PCIEPORTBUS=y
> +  CONFIG_DEBUG_FS=y
> +  CONFIG_CXL_PCI
> +  CONFIG_CXL_RAS
> +  CONFIG_CXL_PORT
> +  CONFIG_CXL_BUS
> +  ```
> +  
> +---
> +
> +### aer-inject Patch Details
> +- The patch adds support for injecting both correctable (CE) and uncorrectable (UCE) internal errors.
> +- The patch is located in `./patches` and should be applied to the `aer-inject` repository, based on the master branch (commit `81701cb`). The patch is '0001-aer-inject-Add-internal-error-injection-support.patch'.
> +- Additionally, you'll need to apply a kernel-side workaround by hardcoding the RAS error status in the relevant handler, as described earlier.
> +
> +### Kernel patch Details
> +Below is patch to set the RAS for testing. 'sed' scripts are also included 
> +
> +#### Kernel Patch to set CXL RAS status for testing
> +Setting CXL protocol RAS status, is based on v7.0-rc6 (7aaa8047eafd). Patch is:
> +0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
> +
> +#### Script to set the Kernel's CXL RAS status
> +##### 1: Correctable Errors (CE)
> +```bash
> +sed -i '
> +/void cxl_handle_cor_ras/,/}/ {
> +	/status = readl(addr);/ {
> +		i #define CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC 0x1
> +		a\    status |= CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC;
> +	}
> +}' drivers/cxl/core/ras.c
> +```
> +##### 2: Uncorrectable Errors (UCE)
> +```bash
> +sed -i '
> +/bool cxl_handle_ras/,/}/ {
> +	/status = readl(addr);/ {
> +		i #define CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC 0x1
> +		a\    status |= CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC;
> +	}
> +}' drivers/cxl/core/ras.c

This section probably isn't needed, see comments on patch 3/3.

> +```
> +---
> +
> +### Testing Procedure
> +- The provided scripts illustrate how I ran the tests. You'll need to modify the scripts to use the correct BDFs for your system.
> +- Alternatively, you can run the tests manually using commands like:

I think you should drop the scripts. If the user has to manually change half the script to make them work, it's probably better
to just give an example or two and let them make the script(s).

> +
> +```bash
> +aer-inject -s ${bdf} examples/correctable.internal
> +```
> +
> +and
> +
> +```bash
> +aer-inject -s ${bdf} examples/fatal.internal
> +```
> +
> +*Ensure you replace `${bdf}` with the appropriate PCI BDF for your device.*

Should probably be renamed SBDF instead of BDF. Most systems don't use the S part, but I'm sure ones exist that do. It may be helpful
to also mention how to find the SBDF (i.e. an example 'lspci' command), but I think anyone who needs that probably shouldn't be
running these scripts...

> +
> +---
> +
> diff --git a/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
> new file mode 100755
> index 0000000..c0e3417
> --- /dev/null
> +++ b/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
> @@ -0,0 +1,4 @@
> +#!/bin/bash
> +bdf="0e:00.0"
> +
> +aer-inject -s ${bdf} examples/correctable.internal
> diff --git a/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
> new file mode 100755
> index 0000000..e238f63
> --- /dev/null
> +++ b/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
> @@ -0,0 +1,4 @@
> +#!/bin/bash
> +bdf="0e:00.0"
> +
> +aer-inject -s ${bdf} examples/fatal.internal
> diff --git a/test/contrib/cxl-aer-einj/scripts/enable-trace.sh b/test/contrib/cxl-aer-einj/scripts/enable-trace.sh
> new file mode 100755
> index 0000000..753419f
> --- /dev/null
> +++ b/test/contrib/cxl-aer-einj/scripts/enable-trace.sh
> @@ -0,0 +1,5 @@
> +#!/bin/bash
> +
> +echo 1 >  /sys/kernel/debug/tracing/events/cxl/enable
> +echo 1 > /sys/kernel/debug/tracing/events/cxl/cxl_aer_correctable_error/enable
> +echo 1 > /sys/kernel/debug/tracing/events/cxl/cxl_aer_uncorrectable_error/enable
> diff --git a/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
> new file mode 100755
> index 0000000..3077c3c
> --- /dev/null
> +++ b/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
> @@ -0,0 +1,4 @@
> +#!/bin/bash
> +bdf="0f:00.0"
> +
> +aer-inject -s ${bdf} examples/correctable.internal

If you don't want to remove the scripts as per my suggestion above, I would at least consolidate them based on the examples/* file
they use and make the user give the bdf as an argument. For example: ep-uce-inject.sh and root-uce-inject.sh would get turned into
something like "uce-inject.sh".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ndctl PATCH 0/3] Enable CXL protocol testing
  2026-04-08 20:32 [ndctl PATCH 0/3] Enable CXL protocol testing Terry Bowman
                   ` (2 preceding siblings ...)
  2026-04-08 20:32 ` [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras() Terry Bowman
@ 2026-04-08 21:39 ` Cheatham, Benjamin
  2026-04-09 17:05   ` Dave Jiang
  3 siblings, 1 reply; 8+ messages in thread
From: Cheatham, Benjamin @ 2026-04-08 21:39 UTC (permalink / raw)
  To: Terry Bowman, dave, jonathan.cameron, dave.jiang,
	alison.schofield, dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, sathyanarayanan.kuppuswamy,
	nvdimm, alucerop, ira.weiny
  Cc: linux-cxl

On 4/8/2026 3:32 PM, Terry Bowman wrote:
> Current CXL error injection (EINJ) only supports Root Port protocol error
> injection but a method to test all CXL devices is needed. This series
> outlines methods to update both the kernel and the 'aer-inject' tool-without
> relying on EINJ-to enable CXL RAS protocol error handling across all CXL
> devices.
> 

This functionality should probably be added to the inject-protocol-error subcommand
instead of spread out across the directory as a bunch of scripts + patches. The command
is only set up for protocol error injection, but I don't think it would be *too* hard
to extend.

I think the first thing you have to do is expand the accepted device types to include ports and
memdevs instead of just dports. That should be simple enough, there are already helpers to find
both based on sbdf, name, etc. Then, you'd need to change the interface used to inject the error
based on device type (is it a root port? then use EINJ, otherwise use aer-inject). All that's
left at that point is to actually call the aer-inject command with the correct options (and
update the documentation/help messages).

I would be happy to help with any of the above if you agree with the direction, just let me
know!

Thanks,
Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ndctl PATCH 0/3] Enable CXL protocol testing
  2026-04-08 21:39 ` [ndctl PATCH 0/3] Enable CXL protocol testing Cheatham, Benjamin
@ 2026-04-09 17:05   ` Dave Jiang
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Jiang @ 2026-04-09 17:05 UTC (permalink / raw)
  To: Cheatham, Benjamin, Terry Bowman, dave, jonathan.cameron,
	alison.schofield, dan.j.williams, shiju.jose, ming.li,
	Smita.KoralahalliChannabasappa, rrichter, dan.carpenter,
	PradeepVineshReddy.Kodamati, lukas, sathyanarayanan.kuppuswamy,
	nvdimm, alucerop, ira.weiny
  Cc: linux-cxl



On 4/8/26 2:39 PM, Cheatham, Benjamin wrote:
> On 4/8/2026 3:32 PM, Terry Bowman wrote:
>> Current CXL error injection (EINJ) only supports Root Port protocol error
>> injection but a method to test all CXL devices is needed. This series
>> outlines methods to update both the kernel and the 'aer-inject' tool-without
>> relying on EINJ-to enable CXL RAS protocol error handling across all CXL
>> devices.
>>
> 
> This functionality should probably be added to the inject-protocol-error subcommand
> instead of spread out across the directory as a bunch of scripts + patches. The command
> is only set up for protocol error injection, but I don't think it would be *too* hard
> to extend.
> 
> I think the first thing you have to do is expand the accepted device types to include ports and
> memdevs instead of just dports. That should be simple enough, there are already helpers to find
> both based on sbdf, name, etc. Then, you'd need to change the interface used to inject the error
> based on device type (is it a root port? then use EINJ, otherwise use aer-inject). All that's
> left at that point is to actually call the aer-inject command with the correct options (and
> update the documentation/help messages).
> 
> I would be happy to help with any of the above if you agree with the direction, just let me
> know!

Thanks for the Review Ben! Initially Dan and I asked Terry to just provide his test scripts in the CXL CLI test/contrib directory as a place holder until we can figure out how to integrate it. But if you have better ideas of how to improve the usability, help is much appreciated!


> 
> Thanks,
> Ben


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-09 17:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 20:32 [ndctl PATCH 0/3] Enable CXL protocol testing Terry Bowman
2026-04-08 20:32 ` [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject Terry Bowman
2026-04-08 21:39   ` Cheatham, Benjamin
2026-04-08 20:32 ` [ndctl PATCH 2/3] test/aer-inject: Add aer-inject correctable and uncorrectable interanl error support Terry Bowman
2026-04-08 20:32 ` [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras() Terry Bowman
2026-04-08 21:39   ` Cheatham, Benjamin
2026-04-08 21:39 ` [ndctl PATCH 0/3] Enable CXL protocol testing Cheatham, Benjamin
2026-04-09 17:05   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox