From: Terry Bowman <terry.bowman@amd.com>
To: <dave@stgolabs.net>, <jonathan.cameron@huawei.com>,
<dave.jiang@intel.com>, <alison.schofield@intel.com>,
<dan.j.williams@intel.com>, <shiju.jose@huawei.com>,
<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
<rrichter@amd.com>, <dan.carpenter@linaro.org>,
<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
<Benjamin.Cheatham@amd.com>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<nvdimm@lists.linux.dev>, <alucerop@amd.com>,
<ira.weiny@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <terry.bowman@amd.com>
Subject: [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject
Date: Wed, 8 Apr 2026 15:32:29 -0500 [thread overview]
Message-ID: <20260408203231.962206-2-terry.bowman@amd.com> (raw)
In-Reply-To: <20260408203231.962206-1-terry.bowman@amd.com>
CXL protocol errors are signaled to the kernel's CXL drivers via PCIe
Advanced Error Reporting (AER) internal error types: Uncorrectable Internal
Errors (UIE) and Correctable Internal Errors (CIE). These errors in-turn
trigger RAS handling paths in the CXL core. The `aer-inject` tool has
lacked the ability to generate AER UIE and CIE events, making it difficult
to verify kernel handling without actual hardware protocol error conditions.
To address this testing gap, this patch introduces tooling and scripts that
allow for injected CXL protocol errors to be delivered to the CXL core.
This change adds a new `test/contrib/cxl-aer-einj` directory containing:
- A README with instructions, prerequisites, and caveats for simulating CXL
protocol errors using UIE/CIE AER injection.
- Example sed commands for hardcoding ECC cache bits in CXL RAS handlers as
a debug workaround for testing with zero hardware status.
- Scripts to enable CXL tracing and to invoke CE/UCE injections for root
ports, upstream switches, downstream ports, and endpoints.
The below patches are required to complete support. These patches will follow:
- Patch (`0001-aer-inject-Add-internal-error-injection-support.patch`) for
`aer-inject` to support UIE and CIE injection by defining new constants and
updating parser rules.
- Kernel test patch ('0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch')
to force setting RAS status.
With internal error injection support in `aer-inject`, developers can trigger
RAS paths reliably in the CXL core and validate their protocol error handling
logic without relying on physical fault conditions.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
test/contrib/cxl-aer-einj/README.md | 80 +++++++++++++++++++
.../cxl-aer-einj/scripts/ds-ce-inject.sh | 4 +
.../cxl-aer-einj/scripts/ds-uce-inject.sh | 4 +
.../cxl-aer-einj/scripts/enable-trace.sh | 5 ++
.../cxl-aer-einj/scripts/ep-ce-inject.sh | 4 +
.../cxl-aer-einj/scripts/ep-uce-inject.sh | 4 +
.../cxl-aer-einj/scripts/root-ce-inject.sh | 4 +
.../cxl-aer-einj/scripts/root-uce-inject.sh | 4 +
.../cxl-aer-einj/scripts/us-ce-inject.sh | 4 +
.../cxl-aer-einj/scripts/us-uce-inject.sh | 4 +
10 files changed, 117 insertions(+)
create mode 100644 test/contrib/cxl-aer-einj/README.md
create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/enable-trace.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
create mode 100755 test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh
diff --git a/test/contrib/cxl-aer-einj/README.md b/test/contrib/cxl-aer-einj/README.md
new file mode 100644
index 0000000..d31b572
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/README.md
@@ -0,0 +1,80 @@
+**Testing CXL Protocol Errors Using AER Injection**
+
+The `aer-inject` tool currently does not support injecting internal errors such as Correctable Internal Errors (CIE) and Uncorrectable Internal Errors (UIE). By default, internal errors are masked according to the PCI specification and are rarely used. However, these internal errors are now leveraged to notify the PCI and CXL subsystems of CXL protocol errors. The attached patches enable support for CE and UCE internal errors in `aer-inject`, allowing you to test CXL RAS functionality.
+
+**Important Caveats:**
+- `aer-inject` will only inject AER errors and does not inject CXL RAS-specific errors directly.
+- As a result, functions like `cxl_handle_ras()` and `cxl_handle_cor_ras()` will detect a status of 0 and exit early, which hampers testing.
+- To work around this, a debug patch must be added (example included) to hardcode the last RAS error status in `cxl_handle_ras()` and `cxl_handle_cor_ras()`. While not ideal, this workaround facilitates testing of the software paths involved. This is addressed below in 'Patch'.
+
+---
+
+### Prerequisites
+- `aer-einj` tool from: https://github.com/intel/aer-inject
+- Kernel configuration options:
+ ```
+ CONFIG_PCIEAER=y
+ CONFIG_PCIEAER_INJECT=y
+ CONFIG_PCIEPORTBUS=y
+ CONFIG_DEBUG_FS=y
+ CONFIG_CXL_PCI
+ CONFIG_CXL_RAS
+ CONFIG_CXL_PORT
+ CONFIG_CXL_BUS
+ ```
+
+---
+
+### aer-inject Patch Details
+- The patch adds support for injecting both correctable (CE) and uncorrectable (UCE) internal errors.
+- The patch is located in `./patches` and should be applied to the `aer-inject` repository, based on the master branch (commit `81701cb`). The patch is '0001-aer-inject-Add-internal-error-injection-support.patch'.
+- Additionally, you'll need to apply a kernel-side workaround by hardcoding the RAS error status in the relevant handler, as described earlier.
+
+### Kernel patch Details
+Below is patch to set the RAS for testing. 'sed' scripts are also included
+
+#### Kernel Patch to set CXL RAS status for testing
+Setting CXL protocol RAS status, is based on v7.0-rc6 (7aaa8047eafd). Patch is:
+0001-test-cxl-Force-RAS-status-in-cxl_handle_cor_ras-and-.patch
+
+#### Script to set the Kernel's CXL RAS status
+##### 1: Correctable Errors (CE)
+```bash
+sed -i '
+/void cxl_handle_cor_ras/,/}/ {
+ /status = readl(addr);/ {
+ i #define CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC 0x1
+ a\ status |= CXL_RAS_CORRECTABLE_STATUS_CACHE_ECC;
+ }
+}' drivers/cxl/core/ras.c
+```
+##### 2: Uncorrectable Errors (UCE)
+```bash
+sed -i '
+/bool cxl_handle_ras/,/}/ {
+ /status = readl(addr);/ {
+ i #define CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC 0x1
+ a\ status |= CXL_RAS_UNCORRECTABLE_STATUS_CACHE_ECC;
+ }
+}' drivers/cxl/core/ras.c
+```
+---
+
+### Testing Procedure
+- The provided scripts illustrate how I ran the tests. You'll need to modify the scripts to use the correct BDFs for your system.
+- Alternatively, you can run the tests manually using commands like:
+
+```bash
+aer-inject -s ${bdf} examples/correctable.internal
+```
+
+and
+
+```bash
+aer-inject -s ${bdf} examples/fatal.internal
+```
+
+*Ensure you replace `${bdf}` with the appropriate PCI BDF for your device.*
+
+---
+
diff --git a/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
new file mode 100755
index 0000000..c0e3417
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ds-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0e:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
new file mode 100755
index 0000000..e238f63
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ds-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0e:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/enable-trace.sh b/test/contrib/cxl-aer-einj/scripts/enable-trace.sh
new file mode 100755
index 0000000..753419f
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/enable-trace.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+echo 1 > /sys/kernel/debug/tracing/events/cxl/enable
+echo 1 > /sys/kernel/debug/tracing/events/cxl/cxl_aer_correctable_error/enable
+echo 1 > /sys/kernel/debug/tracing/events/cxl/cxl_aer_uncorrectable_error/enable
diff --git a/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
new file mode 100755
index 0000000..3077c3c
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ep-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0f:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
new file mode 100755
index 0000000..9dad325
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/ep-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0000:0f:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
new file mode 100755
index 0000000..768522e
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/root-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0c:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
new file mode 100755
index 0000000..7238983
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/root-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0c:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh b/test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
new file mode 100755
index 0000000..12ac104
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/us-ce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0d:00.0"
+
+aer-inject -s ${bdf} examples/correctable.internal
diff --git a/test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh b/test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh
new file mode 100755
index 0000000..bcd130e
--- /dev/null
+++ b/test/contrib/cxl-aer-einj/scripts/us-uce-inject.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+bdf="0d:00.0"
+
+aer-inject -s ${bdf} examples/fatal.internal
--
2.34.1
next prev parent reply other threads:[~2026-04-08 20:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 20:32 [ndctl PATCH 0/3] Enable CXL protocol testing Terry Bowman
2026-04-08 20:32 ` Terry Bowman [this message]
2026-04-08 21:39 ` [ndctl PATCH 1/3] test/cxl: Enable CXL protocol error testing using aer-inject Cheatham, Benjamin
2026-04-08 20:32 ` [ndctl PATCH 2/3] test/aer-inject: Add aer-inject correctable and uncorrectable interanl error support Terry Bowman
2026-04-08 20:32 ` [ndctl PATCH 3/3] test/cxl: Force RAS status in cxl_handle_cor_ras() and cxl_handle_ras() Terry Bowman
2026-04-08 21:39 ` Cheatham, Benjamin
2026-04-08 21:39 ` [ndctl PATCH 0/3] Enable CXL protocol testing Cheatham, Benjamin
2026-04-09 17:05 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408203231.962206-2-terry.bowman@amd.com \
--to=terry.bowman@amd.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=dan.carpenter@linaro.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=nvdimm@lists.linux.dev \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox