All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Support zero-sized HDM decoders
@ 2026-06-23  9:10 Richard Cheng
  2026-06-23  9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized " Richard Cheng
  2026-06-23  9:10 ` [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0 Richard Cheng
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Cheng @ 2026-06-23  9:10 UTC (permalink / raw)
  To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
	iweiny, danwilliams
  Cc: ming.li, terry.bowman, alucerop, linux-cxl, linux-kernel, newtonl,
	kristinc, kaihengf, kobak, mochs, Richard Cheng

CXL r4.0 8.2.4.20.12 ("Committing Decoder Programming") and 14.13.10
("CXL HDM Decoder Zero Size Commit") permit committing an HDM decoder with
size 0. BIOS may commit and lock such decoders so the OS cannot program
regions through them, this is a design choice rather than a spec
requirement.
The kernel rejected these with -ENXIO during port enumeration and aborted
the whole port, so affected systems showed nothing under 'cxl list'.

This series enumerates empty committed decoders into the topology and keeps
them out of region assembly.

Patch 1 stops rejecting empty committed decoders and enumerates them.
Empty decoders are now first-class: they take a real resource instead of
bypassing the reservation path, so port->hdm_end and the in-order DPA
checks stay consistent. discover_region() skips them so autodiscovery does
not build a phantom region for an empty decoder. Poison collection is
fixed for the case where commit_end references a decoder with no DPA
resource.

Patch 2 adds a mock_zero_size_decoders cxl_test module parameter (default
off) that commits empty (zero-size + locked) decoders under host-bridge0,
so the unit tests can exercise the new enumeration and poison paths.

Alison's suggestion about adding tests in ndctl will be sent in a separate
patch.

Tested with the ndctl cxl unit suite (cxl_test) and on real hardware.

Full cxl suite with mock_zero_size_decoders off -- no regressions:
"""
$ sudo env "PATH=$PATH" meson test -C build --suite cxl --num-processes 1 -t 6 --print-errorlogs
 1/14 ndctl:cxl / cxl-topology.sh       OK                2.89s
 2/14 ndctl:cxl / cxl-region-sysfs.sh   OK                2.40s
 3/14 ndctl:cxl / cxl-labels.sh         OK                2.38s
 4/14 ndctl:cxl / cxl-create-region.sh  OK                2.93s
 5/14 ndctl:cxl / cxl-xor-region.sh     OK                2.40s
 6/14 ndctl:cxl / cxl-events.sh         OK                2.20s
 7/14 ndctl:cxl / cxl-sanitize.sh       OK                5.19s
 8/14 ndctl:cxl / cxl-destroy-region.sh OK                2.23s
 9/14 ndctl:cxl / cxl-qos-class.sh      OK                2.28s
10/14 ndctl:cxl / cxl-translate.sh      OK                0.51s
11/14 ndctl:cxl / cxl-elc.sh            OK                2.31s
12/14 ndctl:cxl / cxl-security.sh       SKIP              0.01s   exit status 77
13/14 ndctl:cxl / cxl-features.sh       SKIP              1.04s   exit status 77
14/14 ndctl:cxl / cxl-poison.sh         OK                7.07s

Ok:                 12
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            2
Timeout:            0
"""

The subset Alison asked about:
"""
$ sudo env "PATH=$PATH" meson test -C build --num-processes 1 -t 6 --print-errorlogs cxl-region-sysfs.sh cxl-create-region.sh cxl-xor-region.sh cxl-destroy-region.sh cxl-qos-class.sh cxl-poison.sh
1/6 ndctl:cxl / cxl-region-sysfs.sh   OK                2.39s
2/6 ndctl:cxl / cxl-create-region.sh  OK                2.92s
3/6 ndctl:cxl / cxl-xor-region.sh     OK                2.39s
4/6 ndctl:cxl / cxl-destroy-region.sh OK                2.23s
5/6 ndctl:cxl / cxl-qos-class.sh      OK                2.26s
6/6 ndctl:cxl / cxl-poison.sh         OK                7.02s

Ok:                 6
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            0
"""

With mock_zero_size_decoders on, cxl-poison's by-memdev-by-dpa case returns
all 4 records (clean under KASAN).

On a Montage CXL Type 3 device, enumeration is now clean: no -ENXIO, the
existing 128 GiB RAM region stays intact, and the empty decoder appears
with size 0 and locked.

Changelog:

v4 -> v5:
  - Patch 1: rework per Dan Williams -- make the zero-size DPA reservation
    first-class instead of special-casing size 0 and bypassing the
    reservation path; keep port->hdm_end consistent.
  - Patch 1: skip zero-size committed decoders in discover_region() so
    autodiscovery does not fail building a region for an empty decoder (fixes
    a -ENXIO region-attach hit while enumerating a real Type 3 device).
  - Patch 1: poison fixes (sashiko) -- gate per-decoder queries on a valid
    partition and fall back to scanning all partitions when no sized
    decoder was walked; add a len == 0 guard to cxl_mem_get_poison(); add
    the missing return after kfree() in the zero-size release path; tag the
    zero-size marker resource IORESOURCE_MEM so partition resolution
    succeeds.
  - Patch 2: gate the mock behind mock_zero_size_decoders (default off) so
    the shared cxl_test topology stays undisturbed.

Richard Cheng (2):
  cxl/hdm: Allow zero sized HDM decoders
  tools/testing/cxl: Enable zero sized decoders under hb0

 drivers/cxl/core/hdm.c       |  52 ++++++++++++------
 drivers/cxl/core/mbox.c      |   3 ++
 drivers/cxl/core/region.c    |  49 +++++++++++------
 drivers/cxl/cxl.h            |   9 ++++
 drivers/cxl/port.c           |   3 ++
 tools/testing/cxl/test/cxl.c | 100 ++++++++++++++++++++++++++++++-----
 6 files changed, 168 insertions(+), 48 deletions(-)


base-commit: eb3f4b7426cfd2b79d65b7d37155480b32259a11
-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v5 1/2] cxl/hdm: Allow zero sized HDM decoders
  2026-06-23  9:10 [PATCH v5 0/2] Support zero-sized HDM decoders Richard Cheng
@ 2026-06-23  9:10 ` Richard Cheng
  2026-06-23  9:37   ` sashiko-bot
  2026-06-23 20:13   ` Dan Williams (nvidia)
  2026-06-23  9:10 ` [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0 Richard Cheng
  1 sibling, 2 replies; 6+ messages in thread
From: Richard Cheng @ 2026-06-23  9:10 UTC (permalink / raw)
  To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
	iweiny, danwilliams
  Cc: ming.li, terry.bowman, alucerop, linux-cxl, linux-kernel, newtonl,
	kristinc, kaihengf, kobak, mochs, Richard Cheng, Vishal Aslot

CXL r4.0 §8.2.4.20.12 ("Committing Decoder Programming") and §14.13.10
("CXL HDM Decoder Zero Size Commit") permit committing an HDM decoder
with size 0. BIOS may commit and lock such decoders so the OS cannot
program regions through them, this is a design choice rather than a spec
requirement.
The kernel rejected these with -ENXIO during port enumeration and aborted
the whole port, so affected systems showed nothing under 'cxl list'.

Treat empty decoders as first class instead of special casing them, back
them with a kmalloc'd resource, since the resource tree can't represent
an empty range, and keep the skip and hdm_end accounting intact. Guard
the paths an empty decoder can't serve, e.g. region attach, DPA free, and
poison queries.

Suggested-by: Dan Williams <djbw@kernel.org>
Signed-off-by: Vishal Aslot <vaslot@nvidia.com>
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
Changelog
v4->v5:
     - Rework per Dan Williams' suggestion: make zero-size DPA
       reservations first class instead of bypassing reservation.
       cxl_dpa_request_region() returns a kmalloc'd resource for size 0;
       cxl_dpa_release_region() kfrees it on teardown. hdm_end now
       advances and releases for zero-size decoders under cxl_rwsem.dpa,
       and their skip registers are honored.
     - Corrections to the posted sketch: add the missing return after
       kfree() (use-after-free), use IORESOURCE_MEM flags so
       resource_contains() can resolve cxled->part.
     - cxl_mem_get_poison() returns success for zero-length queries
       before issuing a mailbox command.
     - Fully-burned device (all decoders zero-size at DPA 0): cxled->part
       cannot resolve; gate poison_by_decoder() queries on part >= 0
       (the part[-1] OOB read was already reachable for sized decoders
       that fail partition containment) and scan all partitions when the
       walk reaches commit_end with nothing mapped (sashiko report).
     - Drop v4's poison_by_decoder() restructure and the hdm_end
       non-advance approach; superseded by the above.

v3->v4:
     - Drop the port->hdm_end advance for zero-size decoders (sashiko AI
       review). The write was outside cxl_rwsem.dpa, and advancing the
       watermark without a balanced release strands hdm_end and breaks
       LIFO teardown. It is also unnecessary -- zero-size committed slots
       are trailing and locked, so the in-order check in
       __cxl_dpa_reserve() is never reached past them.

v2->v3:
     - Advance port->hdm_end for the committed zero-size decoder so a
       following committed decoder still passes the in-order check in
       __cxl_dpa_reserve() (it was left un-incremented in v2).
     - Fold the poison fix into this patch: commit_end may now reference a
       zero-size decoder with no DPA resource, so poison_by_decoder() falls
       through to run cxl_get_poison_unmapped() and scan the unmapped DPA
       tail.

v1->v2:
    - Add zero-size committed decoders to the topology instead of
      skipping them. Drop v1's -ENOSPC sentinel and the matching
"continue" in devm_cxl_enumerate_decoders(); fall through so
add_hdm_decoder() registers the decoder.
    - Set port->commit_end unconditionally for any committed decoder,
      not only non-zero-size ones, so subsequent decoders satisfy the
out-of-order check.
    - Add an explicit early-return before devm_cxl_dpa_reserve() in the
      endpoint-decoder path. __cxl_dpa reserve() rejects zero-size
decoders
    - Spell out TSP and TCB and cite spec sections in commit message
    - Reorder series, implementation first
---
 drivers/cxl/core/hdm.c    | 52 ++++++++++++++++++++++++++-------------
 drivers/cxl/core/mbox.c   |  3 +++
 drivers/cxl/core/region.c | 49 +++++++++++++++++++++++-------------
 drivers/cxl/cxl.h         |  9 +++++++
 drivers/cxl/port.c        |  3 +++
 5 files changed, 82 insertions(+), 34 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 0c80b76a5f9b..ccbab2e21f06 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -240,6 +240,18 @@ static resource_size_t __adjust_skip(struct cxl_dev_state *cxlds,
 }
 #define release_skip(c, b, l) __adjust_skip((c), (b), (l), NULL)
 
+static void cxl_dpa_release_region(struct resource *parent,
+				   struct resource *res)
+{
+	/* zero sized decoders are not tracked in the resource tree */
+	if (resource_size(res) == 0) {
+		kfree(res);
+		return;
+	}
+
+	__release_region(parent, res->start, resource_size(res));
+}
+
 /*
  * Must be called in a context that synchronizes against this decoder's
  * port ->remove() callback (like an endpoint decoder sysfs attribute)
@@ -256,7 +268,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
 
 	/* save @skip_start, before @res is released */
 	skip_start = res->start - cxled->skip;
-	__release_region(&cxlds->dpa_res, res->start, resource_size(res));
+	cxl_dpa_release_region(&cxlds->dpa_res, res);
 	if (cxled->skip)
 		release_skip(cxlds, skip_start, cxled->skip);
 	cxled->skip = 0;
@@ -336,6 +348,23 @@ static int request_skip(struct cxl_dev_state *cxlds,
 	return -EBUSY;
 }
 
+static struct resource *cxl_dpa_request_region(struct resource *parent,
+					       resource_size_t start,
+					       resource_size_t n,
+					       const char *name)
+{
+	if (!n) {
+		struct resource *res = kmalloc_obj(*res);
+
+		if (!res)
+			return NULL;
+		*res = DEFINE_RES_NAMED(start, 0, name, IORESOURCE_MEM);
+		return res;
+	}
+
+	return __request_region(parent, start, n, name, 0);
+}
+
 static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 			     resource_size_t base, resource_size_t len,
 			     resource_size_t skipped)
@@ -349,12 +378,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 
 	lockdep_assert_held_write(&cxl_rwsem.dpa);
 
-	if (!len) {
-		dev_warn(dev, "decoder%d.%d: empty reservation attempted\n",
-			 port->id, cxled->cxld.id);
-		return -EINVAL;
-	}
-
 	if (cxled->dpa_res) {
 		dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
 			port->id, cxled->cxld.id, cxled->dpa_res);
@@ -378,8 +401,8 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 		if (rc)
 			return rc;
 	}
-	res = __request_region(&cxlds->dpa_res, base, len,
-			       dev_name(&cxled->cxld.dev), 0);
+	res = cxl_dpa_request_region(&cxlds->dpa_res, base, len,
+				     dev_name(&cxled->cxld.dev));
 	if (!res) {
 		dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
 			port->id, cxled->cxld.id);
@@ -402,7 +425,8 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
 				break;
 			}
 
-	if (cxled->part < 0)
+	/* Empty decoders may not be contained by a partition boundary */
+	if (cxled->part < 0 && resource_size(res))
 		dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
 			 port->id, cxled->cxld.id, res);
 
@@ -545,7 +569,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
 	struct device *dev = &cxled->cxld.dev;
 
 	guard(rwsem_write)(&cxl_rwsem.dpa);
-	if (!cxled->dpa_res)
+	if (cxled_empty(cxled))
 		return 0;
 	if (cxled->cxld.region) {
 		dev_dbg(dev, "decoder assigned to: %s\n",
@@ -1031,12 +1055,6 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			return -ENXIO;
 		}
 
-		if (size == 0) {
-			dev_warn(&port->dev,
-				 "decoder%d.%d: Committed with zero size\n",
-				 port->id, cxld->id);
-			return -ENXIO;
-		}
 		port->commit_end = cxld->id;
 	} else {
 		if (cxled) {
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 7c6c5b7450a5..6f4d634b2cfb 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1433,6 +1433,9 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 	int nr_records = 0;
 	int rc;
 
+	if (!len)
+		return 0;
+
 	ACQUIRE(mutex_intr, lock)(&mds->poison.mutex);
 	if ((rc = ACQUIRE_ERR(mutex_intr, &lock)))
 		return rc;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..eeecb588821a 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2090,7 +2090,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 		return -ENXIO;
 	}
 
-	if (!cxled->dpa_res) {
+	if (cxled_empty(cxled)) {
 		dev_dbg(&cxlr->dev, "%s:%s: missing DPA allocation.\n",
 			dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
 		return -ENXIO;
@@ -2910,28 +2910,35 @@ static int poison_by_decoder(struct device *dev, void *arg)
 	if (!cxled->dpa_res)
 		return rc;
 
-	cxlmd = cxled_to_memdev(cxled);
-	cxlds = cxlmd->cxlds;
-	mode = cxlds->part[cxled->part].mode;
+	/*
+	 * Handle the degenerate case of a device with only empty decoders;
+	 * an empty decoder can still map a non-zero skip range, so advance
+	 * the walk to commit_end either way.
+	 */
+	if (cxled->part >= 0) {
+		cxlmd = cxled_to_memdev(cxled);
+		cxlds = cxlmd->cxlds;
+		mode = cxlds->part[cxled->part].mode;
+
+		if (cxled->skip) {
+			offset = cxled->dpa_res->start - cxled->skip;
+			length = cxled->skip;
+			rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
+			if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
+				rc = 0;
+			if (rc)
+				return rc;
+		}
 
-	if (cxled->skip) {
-		offset = cxled->dpa_res->start - cxled->skip;
-		length = cxled->skip;
-		rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
+		offset = cxled->dpa_res->start;
+		length = cxled->dpa_res->end - offset + 1;
+		rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
 		if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
 			rc = 0;
 		if (rc)
 			return rc;
 	}
 
-	offset = cxled->dpa_res->start;
-	length = cxled->dpa_res->end - offset + 1;
-	rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
-	if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
-		rc = 0;
-	if (rc)
-		return rc;
-
 	/* Iterate until commit_end is reached */
 	if (cxled->cxld.id == ctx->port->commit_end) {
 		ctx->offset = cxled->dpa_res->end + 1;
@@ -2953,9 +2960,17 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
 	};
 
 	rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
-	if (rc == 1)
+	if (rc == 1) {
+		/*
+		 * No decoder with a sized DPA reservation was walked
+		 * (every committed decoder is zero-size): scan all
+		 * partitions in full.
+		 */
+		if (ctx.part < 0)
+			ctx.part = 0;
 		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
 					     &ctx);
+	}
 
 	return rc;
 }
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..5231345ff78e 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -324,6 +324,15 @@ struct cxl_endpoint_decoder {
 	int pos;
 };
 
+/*
+ * Some BIOS use locked empty decoders to preclude HDM decode aliasing
+ * for TSP operation. Use cxled_empty() to handle that common case.
+ */
+static inline bool cxled_empty(struct cxl_endpoint_decoder *cxled)
+{
+	return !cxled->dpa_res || !resource_size(cxled->dpa_res);
+}
+
 /**
  * struct cxl_switch_decoder - Switch specific CXL HDM Decoder
  * @cxld: base cxl_decoder object
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index ada51948d52f..104c4c090167 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -46,6 +46,9 @@ static int discover_region(struct device *dev, void *unused)
 	if (cxled->state != CXL_DECODER_STATE_AUTO)
 		return 0;
 
+	if (cxled_empty(cxled))
+		return 0;
+
 	/*
 	 * Region enumeration is opportunistic, if this add-event fails,
 	 * continue to the next endpoint decoder.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0
  2026-06-23  9:10 [PATCH v5 0/2] Support zero-sized HDM decoders Richard Cheng
  2026-06-23  9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized " Richard Cheng
@ 2026-06-23  9:10 ` Richard Cheng
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-06-23  9:10 UTC (permalink / raw)
  To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
	iweiny, danwilliams
  Cc: ming.li, terry.bowman, alucerop, linux-cxl, linux-kernel, newtonl,
	kristinc, kaihengf, kobak, mochs, Richard Cheng, Vishal Aslot

The kernel now allows committed zero-size HDM decoders so BIOS can lock
empty decoders; cxl_test needs to exercise that path.

Add a mock_zero_size_decoders module parameter (default off). When set,
the special endpoints under host-bridge0 (cxl_mem.0 and cxl_mem.4)
commit decoders 1 and 2 as zero-size + locked above the decoder[0]
auto-region, mirrored on the parent switch and host bridge. The mocks
take a real zero-size DPA reservation, like enumeration of real
hardware, so commit_end lands on a zero-size decoder and the
reservation, poison-by-endpoint, and teardown paths all run.

Signed-off-by: Vishal Aslot <vaslot@nvidia.com>
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
v4->v5:
      - Mirror the v5 core semantics: mock zero-size decoders take a real
        zero-size DPA reservation via devm_cxl_dpa_reserve() so
        dpa_res/hdm_end/cxled->part match real hardware enumeration and
        the poison-by-endpoint walk reaches commit_end.
      - Set cxled->state = CXL_DECODER_STATE_AUTO to match
        init_hdm_decoder()'s fall-through (undoes the v2 switch to
        MANUAL).
      - cxld_registry_restore(): reserve unconditionally for enabled
        endpoint decoders; enabled now implies reserved, sized or not.

v3->v4:
     - No change.

v2->v3:
      - Gate the zero-size + locked decoder injection behind a new
        mock_zero_size_decoders module parameter (default off). v2
        applied it unconditionally on the host-bridge0 auto-region
        endpoints, which the region test suite reuses, regressing 7 of
        17 cxl unit tests; defaulting off leaves the shared topology
        untouched.

v1->v2:
      - Replace second_decoder(), third_decoder() with a single
        match_decoder_by_index() helper, so all lookups share one
        matcher.
      - Use DEFINE_RANGE() for the empty range instead of an open-coded
        struct.
      - Set cxled->state = CXL_DECODER_STATE_MANUAL rather than
        STATE_AUTO.
      - Set CXL_DECODER_F_LOCK on the mock zero-size decoders to model
        the BIOS-burns-slots case.
---
 tools/testing/cxl/test/cxl.c | 100 ++++++++++++++++++++++++++++++-----
 1 file changed, 86 insertions(+), 14 deletions(-)

diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 418669927fb0..33cea0116a71 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -17,6 +17,7 @@
 static int interleave_arithmetic;
 static bool extended_linear_cache;
 static bool fail_autoassemble;
+static bool mock_zero_size_decoders;
 
 #define FAKE_QTG_ID	42
 
@@ -841,14 +842,13 @@ static int cxld_registry_restore(struct cxl_decoder *cxld,
 		cxld_copy(cxld, &td->cxled.cxld);
 		cxled->state = td->cxled.state;
 		cxled->skip = td->cxled.skip;
-		if (range_len(&td->dpa_range)) {
-			rc = devm_cxl_dpa_reserve(cxled, td->dpa_range.start,
-						  range_len(&td->dpa_range),
-						  td->cxled.skip);
-			if (rc) {
-				init_disabled_mock_decoder(cxld);
-				return rc;
-			}
+		/* enabled endpoint decoders hold a reservation, sized or not */
+		rc = devm_cxl_dpa_reserve(cxled, td->dpa_range.start,
+					  range_len(&td->dpa_range),
+					  td->cxled.skip);
+		if (rc) {
+			init_disabled_mock_decoder(cxld);
+			return rc;
 		}
 		port->commit_end = cxld->id;
 	}
@@ -1041,16 +1041,49 @@ static void default_mock_decoder(struct cxl_decoder *cxld)
 	WARN_ON_ONCE(!cxld_registry_new(cxld));
 }
 
-static int first_decoder(struct device *dev, const void *data)
+static int match_decoder_by_index(struct device *dev, const void *data)
 {
+	int target_id = *(const int *)data;
 	struct cxl_decoder *cxld;
 
 	if (!is_switch_decoder(dev))
 		return 0;
 	cxld = to_cxl_decoder(dev);
-	if (cxld->id == 0)
-		return 1;
-	return 0;
+	return cxld->id == target_id;
+}
+
+/*
+ * Mock a committed, locked, empty decoder
+ * (CXL r4.0 8.2.4.20.12). Gated by the mock_zero_size_decoders module
+ * param so the default cxl_test topology, shared by the region test
+ * suite, is left undisturbed.
+ */
+static void size_zero_mock_decoder_ep(struct cxl_decoder *cxld, u64 base)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
+	cxld->hpa_range = DEFINE_RANGE(base, base - 1);
+	cxld->interleave_ways = 2;
+	cxld->interleave_granularity = 4096;
+	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+	cxld->flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
+	cxled->state = CXL_DECODER_STATE_AUTO;
+	/* decoder[0] reserved [0, size/2); empty decoders sit at that watermark */
+	devm_cxl_dpa_reserve(cxled, mock_auto_region_size / 2, 0, 0);
+	cxld->commit = mock_decoder_commit;
+	cxld->reset = mock_decoder_reset;
+}
+
+static void size_zero_mock_decoder_sw(struct cxl_decoder *cxld, u64 base,
+				      int level)
+{
+	cxld->flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
+	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+	cxld->interleave_ways = level == 0 ? 2 : 1;
+	cxld->interleave_granularity = 4096;
+	cxld->hpa_range = DEFINE_RANGE(base, base - 1);
+	cxld->commit = mock_decoder_commit;
+	cxld->reset = mock_decoder_reset;
 }
 
 /*
@@ -1131,7 +1164,7 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
 	 * See 'cxl list -BMPu -m cxl_mem.0,cxl_mem.4'
 	 */
 	if (!is_endpoint_decoder(&cxld->dev) || !hb0 || pdev->id % 4 ||
-	    pdev->id > 4 || cxld->id > 0) {
+	    pdev->id > 4 || cxld->id > (mock_zero_size_decoders ? 2 : 0)) {
 		default_mock_decoder(cxld);
 		return false;
 	}
@@ -1145,6 +1178,20 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
 	base = window->base_hpa;
 	if (extended_linear_cache)
 		base += mock_auto_region_size;
+
+	/*
+	 * With mock_zero_size_decoders, decoders 1 and 2 of the special
+	 * endpoints mock committed, locked, empty decoders above the
+	 * decoder[0] auto-region (CXL r4.0 8.2.4.20.12). commit_end then
+	 * points at a zero-size decoder, exercising the zero-size
+	 * reservation and poison-by-endpoint code paths.
+	 */
+	if (cxld->id == 1 || cxld->id == 2) {
+		size_zero_mock_decoder_ep(cxld, base);
+		port->commit_end = cxld->id;
+		WARN_ON_ONCE(!cxld_registry_new(cxld));
+		return false;
+	}
 	cxld->hpa_range = (struct range) {
 		.start = base,
 		.end = base + mock_auto_region_size - 1,
@@ -1168,9 +1215,11 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
 	 */
 	iter = port;
 	for (i = 0; i < 2; i++) {
+		int id = 0;
+
 		dport = iter->parent_dport;
 		iter = dport->port;
-		dev = device_find_child(&iter->dev, NULL, first_decoder);
+		dev = device_find_child(&iter->dev, &id, match_decoder_by_index);
 		/*
 		 * Ancestor ports are guaranteed to be enumerated before
 		 * @port, and all ports have at least one decoder.
@@ -1214,6 +1263,26 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
 
 		cxld_registry_update(cxld);
 		put_device(dev);
+
+		if (!mock_zero_size_decoders)
+			continue;
+
+		/*
+		 * Mirror the endpoint: commit the next two switch decoders
+		 * as zero-size + locked so the empty-decoder layout extends
+		 * end-to-end through the switch and host bridge.
+		 */
+		for (id = 1; id <= 2; id++) {
+			dev = device_find_child(&iter->dev, &id,
+						match_decoder_by_index);
+			if (WARN_ON(!dev))
+				continue;
+			cxld = to_cxl_decoder(dev);
+			size_zero_mock_decoder_sw(cxld, base, i);
+			iter->commit_end = id;
+			cxld_registry_update(cxld);
+			put_device(dev);
+		}
 	}
 
 	return false;
@@ -2049,6 +2118,9 @@ module_param(extended_linear_cache, bool, 0444);
 MODULE_PARM_DESC(extended_linear_cache, "Enable extended linear cache support");
 module_param(fail_autoassemble, bool, 0444);
 MODULE_PARM_DESC(fail_autoassemble, "Simulate missing member of an auto-region");
+module_param(mock_zero_size_decoders, bool, 0444);
+MODULE_PARM_DESC(mock_zero_size_decoders,
+		 "Mock committed, locked, empty decoders under host-bridge0");
 module_init(cxl_test_init);
 module_exit(cxl_test_exit);
 MODULE_LICENSE("GPL v2");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 1/2] cxl/hdm: Allow zero sized HDM decoders
  2026-06-23  9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized " Richard Cheng
@ 2026-06-23  9:37   ` sashiko-bot
  2026-06-23 19:55     ` Dan Williams (nvidia)
  2026-06-23 20:13   ` Dan Williams (nvidia)
  1 sibling, 1 reply; 6+ messages in thread
From: sashiko-bot @ 2026-06-23  9:37 UTC (permalink / raw)
  To: Richard Cheng; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] cxl_dpa_free() incorrectly bypasses teardown for zero-sized decoders, permanently locking hdm_end and breaking the LIFO DPA teardown sequence.

Pre-existing issues:
- [High] Pre-existing bug in cxl_get_poison_unmapped: prematurely breaking out of the partition loop skips poison scanning for unmapped partitions.
--

commit 66fbe5db5d8f3f27534b45af8c4ef71b88ab7f3e
Author: Richard Cheng <icheng@nvidia.com>

cxl/hdm: Allow zero sized HDM decoders

Treat empty decoders as first class instead of special casing them, back
them with a kmalloc'd resource, since the resource tree can't represent
an empty range, and keep the skip and hdm_end accounting intact. Guard
the paths an empty decoder can't serve, e.g. region attach, DPA free, and
poison queries.

> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 0c80b76a5f9b4..ccbab2e21f060 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
[ ... ]
> @@ -545,7 +569,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>  	struct device *dev = &cxled->cxld.dev;
>  
>  	guard(rwsem_write)(&cxl_rwsem.dpa);
> -	if (!cxled->dpa_res)
> +	if (cxled_empty(cxled))
>  		return 0;

[Severity: High]
Does this early return prevent decrementing port->hdm_end for zero-sized
decoders?

If an unlocked zero-sized decoder is freed by userspace (for example, by
writing 0 to dpa_size in sysfs), cxled_empty() returns true because
resource_size(dpa_res) is 0. The function then returns 0 early, skipping
devm_cxl_dpa_release().

Because hdm_end is decremented only inside __cxl_dpa_release(), bypassing it
leaves port->hdm_end permanently elevated. Would subsequent attempts to free
lower-indexed decoders fail the cxled->cxld.id != port->hdm_end check with
-EBUSY, preventing further dynamic DPA reconfiguration?

>  	if (cxled->cxld.region) {
>  		dev_dbg(dev, "decoder assigned to: %s\n",
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e82..eeecb588821a4 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
[ ... ]
> @@ -2953,9 +2960,17 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
>  	};
>  
>  	rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
> -	if (rc == 1)
> +	if (rc == 1) {
> +		/*
> +		 * No decoder with a sized DPA reservation was walked
> +		 * (every committed decoder is zero-size): scan all
> +		 * partitions in full.
> +		 */
> +		if (ctx.part < 0)
> +			ctx.part = 0;
>  		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
>  					     &ctx);

[Severity: High]
This is a pre-existing issue, but does cxl_get_poison_unmapped() prematurely
break out of its partition loop when encountering a fully mapped partition?

Looking at cxl_get_poison_unmapped() in drivers/cxl/core/region.c:

for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
        ...
        length = res->end - offset + 1;
        if (!length)
                break;
        rc = cxl_mem_get_poison(...);

If it processes a partition that is fully mapped, length becomes 0, and the
break statement completely terminates the outer loop over the remaining
partitions instead of moving to the next one. Should this be a continue
instead of a break so that poison in subsequent unmapped partitions isn't
silently ignored?

> +	}
>  
>  	return rc;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623091019.33417-1-icheng@nvidia.com?part=1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 1/2] cxl/hdm: Allow zero sized HDM decoders
  2026-06-23  9:37   ` sashiko-bot
@ 2026-06-23 19:55     ` Dan Williams (nvidia)
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams (nvidia) @ 2026-06-23 19:55 UTC (permalink / raw)
  To: sashiko-bot, Richard Cheng; +Cc: linux-cxl

sashiko-bot@ wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> 
> New issues:
> - [High] cxl_dpa_free() incorrectly bypasses teardown for zero-sized decoders, permanently locking hdm_end and breaking the LIFO DPA teardown sequence.
> 
> Pre-existing issues:
> - [High] Pre-existing bug in cxl_get_poison_unmapped: prematurely breaking out of the partition loop skips poison scanning for unmapped partitions.
> --
> 
> commit 66fbe5db5d8f3f27534b45af8c4ef71b88ab7f3e
> Author: Richard Cheng <icheng@nvidia.com>
> 
> cxl/hdm: Allow zero sized HDM decoders
> 
> Treat empty decoders as first class instead of special casing them, back
> them with a kmalloc'd resource, since the resource tree can't represent
> an empty range, and keep the skip and hdm_end accounting intact. Guard
> the paths an empty decoder can't serve, e.g. region attach, DPA free, and
> poison queries.
> 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 0c80b76a5f9b4..ccbab2e21f060 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> [ ... ]
> > @@ -545,7 +569,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> >  	struct device *dev = &cxled->cxld.dev;
> >  
> >  	guard(rwsem_write)(&cxl_rwsem.dpa);
> > -	if (!cxled->dpa_res)
> > +	if (cxled_empty(cxled))
> >  		return 0;
> 
> [Severity: High]
> Does this early return prevent decrementing port->hdm_end for zero-sized
> decoders?
> 
> If an unlocked zero-sized decoder is freed by userspace (for example, by
> writing 0 to dpa_size in sysfs), cxled_empty() returns true because
> resource_size(dpa_res) is 0. The function then returns 0 early, skipping
> devm_cxl_dpa_release().
> 
> Because hdm_end is decremented only inside __cxl_dpa_release(), bypassing it
> leaves port->hdm_end permanently elevated. Would subsequent attempts to free
> lower-indexed decoders fail the cxled->cxld.id != port->hdm_end check with
> -EBUSY, preventing further dynamic DPA reconfiguration?

This is blocked by the fact that there is no path to clear
CXL_DECODER_F_ENABLE due to empty decoders not participating in any
region. The permanent nature of empty decoders is ok for now.

Might be worth a comment to note that emtpy decoders are not manageable
by userspace.

> >  	if (cxled->cxld.region) {
> >  		dev_dbg(dev, "decoder assigned to: %s\n",
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index e50dc716d4e82..eeecb588821a4 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> [ ... ]
> > @@ -2953,9 +2960,17 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> >  	};
> >  
> >  	rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
> > -	if (rc == 1)
> > +	if (rc == 1) {
> > +		/*
> > +		 * No decoder with a sized DPA reservation was walked
> > +		 * (every committed decoder is zero-size): scan all
> > +		 * partitions in full.
> > +		 */
> > +		if (ctx.part < 0)
> > +			ctx.part = 0;
> >  		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
> >  					     &ctx);
> 
> [Severity: High]
> This is a pre-existing issue, but does cxl_get_poison_unmapped() prematurely
> break out of its partition loop when encountering a fully mapped partition?
> 
> Looking at cxl_get_poison_unmapped() in drivers/cxl/core/region.c:
> 
> for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
>         ...
>         length = res->end - offset + 1;
>         if (!length)
>                 break;
>         rc = cxl_mem_get_poison(...);
> 
> If it processes a partition that is fully mapped, length becomes 0, and the
> break statement completely terminates the outer loop over the remaining
> partitions instead of moving to the next one. Should this be a continue
> instead of a break so that poison in subsequent unmapped partitions isn't
> silently ignored?

This looks accurate. On entry for a fully mapped partition ctx->part
will be initialized from the decoder for that partition. If there are
more unmapped partitions they will be skipped.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 1/2] cxl/hdm: Allow zero sized HDM decoders
  2026-06-23  9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized " Richard Cheng
  2026-06-23  9:37   ` sashiko-bot
@ 2026-06-23 20:13   ` Dan Williams (nvidia)
  1 sibling, 0 replies; 6+ messages in thread
From: Dan Williams (nvidia) @ 2026-06-23 20:13 UTC (permalink / raw)
  To: Richard Cheng, dave, jic23, dave.jiang, alison.schofield,
	vishal.l.verma, djbw, iweiny, danwilliams
  Cc: ming.li, terry.bowman, alucerop, linux-cxl, linux-kernel, newtonl,
	kristinc, kaihengf, kobak, mochs, Richard Cheng, Vishal Aslot

Richard Cheng wrote:
> CXL r4.0 §8.2.4.20.12 ("Committing Decoder Programming") and §14.13.10
> ("CXL HDM Decoder Zero Size Commit") permit committing an HDM decoder
> with size 0. BIOS may commit and lock such decoders so the OS cannot
> program regions through them, this is a design choice rather than a spec
> requirement.
> The kernel rejected these with -ENXIO during port enumeration and aborted
> the whole port, so affected systems showed nothing under 'cxl list'.
> 
> Treat empty decoders as first class instead of special casing them, back
> them with a kmalloc'd resource, since the resource tree can't represent
> an empty range, and keep the skip and hdm_end accounting intact. Guard
> the paths an empty decoder can't serve, e.g. region attach, DPA free, and
> poison queries.
> 
> Suggested-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Vishal Aslot <vaslot@nvidia.com>
> Signed-off-by: Richard Cheng <icheng@nvidia.com>

Looks good, you added the cxled_empty() helper and dropped the dev_dbg()
announcing the arrival of committed empty decoders like we chatted
about.

The comment on cxled_empty() could use come adjustment, but maybe Dave
can fix that up on applying.

Reviewed-by: Dan Williams <djbw@kernel.org>

[..]
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..5231345ff78e 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -324,6 +324,15 @@ struct cxl_endpoint_decoder {
>  	int pos;
>  };
>  
> +/*
> + * Some BIOS use locked empty decoders to preclude HDM decode aliasing
> + * for TSP operation. Use cxled_empty() to handle that common case.
> + */

Hmm, that is not the "common case".

/*
 * The common case is decoders with no reservation, but also handle
 * decoders with a zero-sized reservation that firmware may install for
 * security lockdown purposes.
 */

> +static inline bool cxled_empty(struct cxl_endpoint_decoder *cxled)
> +{
> +	return !cxled->dpa_res || !resource_size(cxled->dpa_res);
> +}
> +
>  /**
>   * struct cxl_switch_decoder - Switch specific CXL HDM Decoder
>   * @cxld: base cxl_decoder object

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-23 20:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23  9:10 [PATCH v5 0/2] Support zero-sized HDM decoders Richard Cheng
2026-06-23  9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized " Richard Cheng
2026-06-23  9:37   ` sashiko-bot
2026-06-23 19:55     ` Dan Williams (nvidia)
2026-06-23 20:13   ` Dan Williams (nvidia)
2026-06-23  9:10 ` [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0 Richard Cheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.