* [PATCH v5 1/2] cxl/hdm: Allow zero sized HDM decoders
2026-06-23 9:10 [PATCH v5 0/2] Support zero-sized HDM decoders Richard Cheng
@ 2026-06-23 9:10 ` Richard Cheng
2026-06-23 9:37 ` sashiko-bot
2026-06-23 20:13 ` Dan Williams (nvidia)
2026-06-23 9:10 ` [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0 Richard Cheng
1 sibling, 2 replies; 6+ messages in thread
From: Richard Cheng @ 2026-06-23 9:10 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
iweiny, danwilliams
Cc: ming.li, terry.bowman, alucerop, linux-cxl, linux-kernel, newtonl,
kristinc, kaihengf, kobak, mochs, Richard Cheng, Vishal Aslot
CXL r4.0 §8.2.4.20.12 ("Committing Decoder Programming") and §14.13.10
("CXL HDM Decoder Zero Size Commit") permit committing an HDM decoder
with size 0. BIOS may commit and lock such decoders so the OS cannot
program regions through them, this is a design choice rather than a spec
requirement.
The kernel rejected these with -ENXIO during port enumeration and aborted
the whole port, so affected systems showed nothing under 'cxl list'.
Treat empty decoders as first class instead of special casing them, back
them with a kmalloc'd resource, since the resource tree can't represent
an empty range, and keep the skip and hdm_end accounting intact. Guard
the paths an empty decoder can't serve, e.g. region attach, DPA free, and
poison queries.
Suggested-by: Dan Williams <djbw@kernel.org>
Signed-off-by: Vishal Aslot <vaslot@nvidia.com>
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
Changelog
v4->v5:
- Rework per Dan Williams' suggestion: make zero-size DPA
reservations first class instead of bypassing reservation.
cxl_dpa_request_region() returns a kmalloc'd resource for size 0;
cxl_dpa_release_region() kfrees it on teardown. hdm_end now
advances and releases for zero-size decoders under cxl_rwsem.dpa,
and their skip registers are honored.
- Corrections to the posted sketch: add the missing return after
kfree() (use-after-free), use IORESOURCE_MEM flags so
resource_contains() can resolve cxled->part.
- cxl_mem_get_poison() returns success for zero-length queries
before issuing a mailbox command.
- Fully-burned device (all decoders zero-size at DPA 0): cxled->part
cannot resolve; gate poison_by_decoder() queries on part >= 0
(the part[-1] OOB read was already reachable for sized decoders
that fail partition containment) and scan all partitions when the
walk reaches commit_end with nothing mapped (sashiko report).
- Drop v4's poison_by_decoder() restructure and the hdm_end
non-advance approach; superseded by the above.
v3->v4:
- Drop the port->hdm_end advance for zero-size decoders (sashiko AI
review). The write was outside cxl_rwsem.dpa, and advancing the
watermark without a balanced release strands hdm_end and breaks
LIFO teardown. It is also unnecessary -- zero-size committed slots
are trailing and locked, so the in-order check in
__cxl_dpa_reserve() is never reached past them.
v2->v3:
- Advance port->hdm_end for the committed zero-size decoder so a
following committed decoder still passes the in-order check in
__cxl_dpa_reserve() (it was left un-incremented in v2).
- Fold the poison fix into this patch: commit_end may now reference a
zero-size decoder with no DPA resource, so poison_by_decoder() falls
through to run cxl_get_poison_unmapped() and scan the unmapped DPA
tail.
v1->v2:
- Add zero-size committed decoders to the topology instead of
skipping them. Drop v1's -ENOSPC sentinel and the matching
"continue" in devm_cxl_enumerate_decoders(); fall through so
add_hdm_decoder() registers the decoder.
- Set port->commit_end unconditionally for any committed decoder,
not only non-zero-size ones, so subsequent decoders satisfy the
out-of-order check.
- Add an explicit early-return before devm_cxl_dpa_reserve() in the
endpoint-decoder path. __cxl_dpa reserve() rejects zero-size
decoders
- Spell out TSP and TCB and cite spec sections in commit message
- Reorder series, implementation first
---
drivers/cxl/core/hdm.c | 52 ++++++++++++++++++++++++++-------------
drivers/cxl/core/mbox.c | 3 +++
drivers/cxl/core/region.c | 49 +++++++++++++++++++++++-------------
drivers/cxl/cxl.h | 9 +++++++
drivers/cxl/port.c | 3 +++
5 files changed, 82 insertions(+), 34 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 0c80b76a5f9b..ccbab2e21f06 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -240,6 +240,18 @@ static resource_size_t __adjust_skip(struct cxl_dev_state *cxlds,
}
#define release_skip(c, b, l) __adjust_skip((c), (b), (l), NULL)
+static void cxl_dpa_release_region(struct resource *parent,
+ struct resource *res)
+{
+ /* zero sized decoders are not tracked in the resource tree */
+ if (resource_size(res) == 0) {
+ kfree(res);
+ return;
+ }
+
+ __release_region(parent, res->start, resource_size(res));
+}
+
/*
* Must be called in a context that synchronizes against this decoder's
* port ->remove() callback (like an endpoint decoder sysfs attribute)
@@ -256,7 +268,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
/* save @skip_start, before @res is released */
skip_start = res->start - cxled->skip;
- __release_region(&cxlds->dpa_res, res->start, resource_size(res));
+ cxl_dpa_release_region(&cxlds->dpa_res, res);
if (cxled->skip)
release_skip(cxlds, skip_start, cxled->skip);
cxled->skip = 0;
@@ -336,6 +348,23 @@ static int request_skip(struct cxl_dev_state *cxlds,
return -EBUSY;
}
+static struct resource *cxl_dpa_request_region(struct resource *parent,
+ resource_size_t start,
+ resource_size_t n,
+ const char *name)
+{
+ if (!n) {
+ struct resource *res = kmalloc_obj(*res);
+
+ if (!res)
+ return NULL;
+ *res = DEFINE_RES_NAMED(start, 0, name, IORESOURCE_MEM);
+ return res;
+ }
+
+ return __request_region(parent, start, n, name, 0);
+}
+
static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped)
@@ -349,12 +378,6 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
lockdep_assert_held_write(&cxl_rwsem.dpa);
- if (!len) {
- dev_warn(dev, "decoder%d.%d: empty reservation attempted\n",
- port->id, cxled->cxld.id);
- return -EINVAL;
- }
-
if (cxled->dpa_res) {
dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n",
port->id, cxled->cxld.id, cxled->dpa_res);
@@ -378,8 +401,8 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
if (rc)
return rc;
}
- res = __request_region(&cxlds->dpa_res, base, len,
- dev_name(&cxled->cxld.dev), 0);
+ res = cxl_dpa_request_region(&cxlds->dpa_res, base, len,
+ dev_name(&cxled->cxld.dev));
if (!res) {
dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n",
port->id, cxled->cxld.id);
@@ -402,7 +425,8 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
break;
}
- if (cxled->part < 0)
+ /* Empty decoders may not be contained by a partition boundary */
+ if (cxled->part < 0 && resource_size(res))
dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n",
port->id, cxled->cxld.id, res);
@@ -545,7 +569,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
struct device *dev = &cxled->cxld.dev;
guard(rwsem_write)(&cxl_rwsem.dpa);
- if (!cxled->dpa_res)
+ if (cxled_empty(cxled))
return 0;
if (cxled->cxld.region) {
dev_dbg(dev, "decoder assigned to: %s\n",
@@ -1031,12 +1055,6 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
return -ENXIO;
}
- if (size == 0) {
- dev_warn(&port->dev,
- "decoder%d.%d: Committed with zero size\n",
- port->id, cxld->id);
- return -ENXIO;
- }
port->commit_end = cxld->id;
} else {
if (cxled) {
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 7c6c5b7450a5..6f4d634b2cfb 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1433,6 +1433,9 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
int nr_records = 0;
int rc;
+ if (!len)
+ return 0;
+
ACQUIRE(mutex_intr, lock)(&mds->poison.mutex);
if ((rc = ACQUIRE_ERR(mutex_intr, &lock)))
return rc;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..eeecb588821a 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2090,7 +2090,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return -ENXIO;
}
- if (!cxled->dpa_res) {
+ if (cxled_empty(cxled)) {
dev_dbg(&cxlr->dev, "%s:%s: missing DPA allocation.\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
return -ENXIO;
@@ -2910,28 +2910,35 @@ static int poison_by_decoder(struct device *dev, void *arg)
if (!cxled->dpa_res)
return rc;
- cxlmd = cxled_to_memdev(cxled);
- cxlds = cxlmd->cxlds;
- mode = cxlds->part[cxled->part].mode;
+ /*
+ * Handle the degenerate case of a device with only empty decoders;
+ * an empty decoder can still map a non-zero skip range, so advance
+ * the walk to commit_end either way.
+ */
+ if (cxled->part >= 0) {
+ cxlmd = cxled_to_memdev(cxled);
+ cxlds = cxlmd->cxlds;
+ mode = cxlds->part[cxled->part].mode;
+
+ if (cxled->skip) {
+ offset = cxled->dpa_res->start - cxled->skip;
+ length = cxled->skip;
+ rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
+ if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
+ rc = 0;
+ if (rc)
+ return rc;
+ }
- if (cxled->skip) {
- offset = cxled->dpa_res->start - cxled->skip;
- length = cxled->skip;
- rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
+ offset = cxled->dpa_res->start;
+ length = cxled->dpa_res->end - offset + 1;
+ rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
rc = 0;
if (rc)
return rc;
}
- offset = cxled->dpa_res->start;
- length = cxled->dpa_res->end - offset + 1;
- rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
- if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
- rc = 0;
- if (rc)
- return rc;
-
/* Iterate until commit_end is reached */
if (cxled->cxld.id == ctx->port->commit_end) {
ctx->offset = cxled->dpa_res->end + 1;
@@ -2953,9 +2960,17 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
};
rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
- if (rc == 1)
+ if (rc == 1) {
+ /*
+ * No decoder with a sized DPA reservation was walked
+ * (every committed decoder is zero-size): scan all
+ * partitions in full.
+ */
+ if (ctx.part < 0)
+ ctx.part = 0;
rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
&ctx);
+ }
return rc;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..5231345ff78e 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -324,6 +324,15 @@ struct cxl_endpoint_decoder {
int pos;
};
+/*
+ * Some BIOS use locked empty decoders to preclude HDM decode aliasing
+ * for TSP operation. Use cxled_empty() to handle that common case.
+ */
+static inline bool cxled_empty(struct cxl_endpoint_decoder *cxled)
+{
+ return !cxled->dpa_res || !resource_size(cxled->dpa_res);
+}
+
/**
* struct cxl_switch_decoder - Switch specific CXL HDM Decoder
* @cxld: base cxl_decoder object
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index ada51948d52f..104c4c090167 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -46,6 +46,9 @@ static int discover_region(struct device *dev, void *unused)
if (cxled->state != CXL_DECODER_STATE_AUTO)
return 0;
+ if (cxled_empty(cxled))
+ return 0;
+
/*
* Region enumeration is opportunistic, if this add-event fails,
* continue to the next endpoint decoder.
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v5 2/2] tools/testing/cxl: Enable zero sized decoders under hb0
2026-06-23 9:10 [PATCH v5 0/2] Support zero-sized HDM decoders Richard Cheng
2026-06-23 9:10 ` [PATCH v5 1/2] cxl/hdm: Allow zero sized " Richard Cheng
@ 2026-06-23 9:10 ` Richard Cheng
1 sibling, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-06-23 9:10 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
iweiny, danwilliams
Cc: ming.li, terry.bowman, alucerop, linux-cxl, linux-kernel, newtonl,
kristinc, kaihengf, kobak, mochs, Richard Cheng, Vishal Aslot
The kernel now allows committed zero-size HDM decoders so BIOS can lock
empty decoders; cxl_test needs to exercise that path.
Add a mock_zero_size_decoders module parameter (default off). When set,
the special endpoints under host-bridge0 (cxl_mem.0 and cxl_mem.4)
commit decoders 1 and 2 as zero-size + locked above the decoder[0]
auto-region, mirrored on the parent switch and host bridge. The mocks
take a real zero-size DPA reservation, like enumeration of real
hardware, so commit_end lands on a zero-size decoder and the
reservation, poison-by-endpoint, and teardown paths all run.
Signed-off-by: Vishal Aslot <vaslot@nvidia.com>
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
v4->v5:
- Mirror the v5 core semantics: mock zero-size decoders take a real
zero-size DPA reservation via devm_cxl_dpa_reserve() so
dpa_res/hdm_end/cxled->part match real hardware enumeration and
the poison-by-endpoint walk reaches commit_end.
- Set cxled->state = CXL_DECODER_STATE_AUTO to match
init_hdm_decoder()'s fall-through (undoes the v2 switch to
MANUAL).
- cxld_registry_restore(): reserve unconditionally for enabled
endpoint decoders; enabled now implies reserved, sized or not.
v3->v4:
- No change.
v2->v3:
- Gate the zero-size + locked decoder injection behind a new
mock_zero_size_decoders module parameter (default off). v2
applied it unconditionally on the host-bridge0 auto-region
endpoints, which the region test suite reuses, regressing 7 of
17 cxl unit tests; defaulting off leaves the shared topology
untouched.
v1->v2:
- Replace second_decoder(), third_decoder() with a single
match_decoder_by_index() helper, so all lookups share one
matcher.
- Use DEFINE_RANGE() for the empty range instead of an open-coded
struct.
- Set cxled->state = CXL_DECODER_STATE_MANUAL rather than
STATE_AUTO.
- Set CXL_DECODER_F_LOCK on the mock zero-size decoders to model
the BIOS-burns-slots case.
---
tools/testing/cxl/test/cxl.c | 100 ++++++++++++++++++++++++++++++-----
1 file changed, 86 insertions(+), 14 deletions(-)
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 418669927fb0..33cea0116a71 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -17,6 +17,7 @@
static int interleave_arithmetic;
static bool extended_linear_cache;
static bool fail_autoassemble;
+static bool mock_zero_size_decoders;
#define FAKE_QTG_ID 42
@@ -841,14 +842,13 @@ static int cxld_registry_restore(struct cxl_decoder *cxld,
cxld_copy(cxld, &td->cxled.cxld);
cxled->state = td->cxled.state;
cxled->skip = td->cxled.skip;
- if (range_len(&td->dpa_range)) {
- rc = devm_cxl_dpa_reserve(cxled, td->dpa_range.start,
- range_len(&td->dpa_range),
- td->cxled.skip);
- if (rc) {
- init_disabled_mock_decoder(cxld);
- return rc;
- }
+ /* enabled endpoint decoders hold a reservation, sized or not */
+ rc = devm_cxl_dpa_reserve(cxled, td->dpa_range.start,
+ range_len(&td->dpa_range),
+ td->cxled.skip);
+ if (rc) {
+ init_disabled_mock_decoder(cxld);
+ return rc;
}
port->commit_end = cxld->id;
}
@@ -1041,16 +1041,49 @@ static void default_mock_decoder(struct cxl_decoder *cxld)
WARN_ON_ONCE(!cxld_registry_new(cxld));
}
-static int first_decoder(struct device *dev, const void *data)
+static int match_decoder_by_index(struct device *dev, const void *data)
{
+ int target_id = *(const int *)data;
struct cxl_decoder *cxld;
if (!is_switch_decoder(dev))
return 0;
cxld = to_cxl_decoder(dev);
- if (cxld->id == 0)
- return 1;
- return 0;
+ return cxld->id == target_id;
+}
+
+/*
+ * Mock a committed, locked, empty decoder
+ * (CXL r4.0 8.2.4.20.12). Gated by the mock_zero_size_decoders module
+ * param so the default cxl_test topology, shared by the region test
+ * suite, is left undisturbed.
+ */
+static void size_zero_mock_decoder_ep(struct cxl_decoder *cxld, u64 base)
+{
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
+ cxld->hpa_range = DEFINE_RANGE(base, base - 1);
+ cxld->interleave_ways = 2;
+ cxld->interleave_granularity = 4096;
+ cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+ cxld->flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
+ cxled->state = CXL_DECODER_STATE_AUTO;
+ /* decoder[0] reserved [0, size/2); empty decoders sit at that watermark */
+ devm_cxl_dpa_reserve(cxled, mock_auto_region_size / 2, 0, 0);
+ cxld->commit = mock_decoder_commit;
+ cxld->reset = mock_decoder_reset;
+}
+
+static void size_zero_mock_decoder_sw(struct cxl_decoder *cxld, u64 base,
+ int level)
+{
+ cxld->flags = CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
+ cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+ cxld->interleave_ways = level == 0 ? 2 : 1;
+ cxld->interleave_granularity = 4096;
+ cxld->hpa_range = DEFINE_RANGE(base, base - 1);
+ cxld->commit = mock_decoder_commit;
+ cxld->reset = mock_decoder_reset;
}
/*
@@ -1131,7 +1164,7 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
* See 'cxl list -BMPu -m cxl_mem.0,cxl_mem.4'
*/
if (!is_endpoint_decoder(&cxld->dev) || !hb0 || pdev->id % 4 ||
- pdev->id > 4 || cxld->id > 0) {
+ pdev->id > 4 || cxld->id > (mock_zero_size_decoders ? 2 : 0)) {
default_mock_decoder(cxld);
return false;
}
@@ -1145,6 +1178,20 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
base = window->base_hpa;
if (extended_linear_cache)
base += mock_auto_region_size;
+
+ /*
+ * With mock_zero_size_decoders, decoders 1 and 2 of the special
+ * endpoints mock committed, locked, empty decoders above the
+ * decoder[0] auto-region (CXL r4.0 8.2.4.20.12). commit_end then
+ * points at a zero-size decoder, exercising the zero-size
+ * reservation and poison-by-endpoint code paths.
+ */
+ if (cxld->id == 1 || cxld->id == 2) {
+ size_zero_mock_decoder_ep(cxld, base);
+ port->commit_end = cxld->id;
+ WARN_ON_ONCE(!cxld_registry_new(cxld));
+ return false;
+ }
cxld->hpa_range = (struct range) {
.start = base,
.end = base + mock_auto_region_size - 1,
@@ -1168,9 +1215,11 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
*/
iter = port;
for (i = 0; i < 2; i++) {
+ int id = 0;
+
dport = iter->parent_dport;
iter = dport->port;
- dev = device_find_child(&iter->dev, NULL, first_decoder);
+ dev = device_find_child(&iter->dev, &id, match_decoder_by_index);
/*
* Ancestor ports are guaranteed to be enumerated before
* @port, and all ports have at least one decoder.
@@ -1214,6 +1263,26 @@ static bool mock_init_hdm_decoder(struct cxl_decoder *cxld)
cxld_registry_update(cxld);
put_device(dev);
+
+ if (!mock_zero_size_decoders)
+ continue;
+
+ /*
+ * Mirror the endpoint: commit the next two switch decoders
+ * as zero-size + locked so the empty-decoder layout extends
+ * end-to-end through the switch and host bridge.
+ */
+ for (id = 1; id <= 2; id++) {
+ dev = device_find_child(&iter->dev, &id,
+ match_decoder_by_index);
+ if (WARN_ON(!dev))
+ continue;
+ cxld = to_cxl_decoder(dev);
+ size_zero_mock_decoder_sw(cxld, base, i);
+ iter->commit_end = id;
+ cxld_registry_update(cxld);
+ put_device(dev);
+ }
}
return false;
@@ -2049,6 +2118,9 @@ module_param(extended_linear_cache, bool, 0444);
MODULE_PARM_DESC(extended_linear_cache, "Enable extended linear cache support");
module_param(fail_autoassemble, bool, 0444);
MODULE_PARM_DESC(fail_autoassemble, "Simulate missing member of an auto-region");
+module_param(mock_zero_size_decoders, bool, 0444);
+MODULE_PARM_DESC(mock_zero_size_decoders,
+ "Mock committed, locked, empty decoders under host-bridge0");
module_init(cxl_test_init);
module_exit(cxl_test_exit);
MODULE_LICENSE("GPL v2");
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread