* [PATCH v2 1/5] cxl/features: Reject feature offset that overflows 16-bit field
2026-07-02 9:08 [PATCH v2 0/5] cxl: Sashiko bug fixes Richard Cheng
@ 2026-07-02 9:08 ` Richard Cheng
2026-07-02 9:08 ` [PATCH v2 2/5] cxl/region: Scan all partitions for unmapped poison Richard Cheng
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-07-02 9:08 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_feature() and cxl_set_feature() build the mailbox command's
offset as cpu_to_le16(offset + data_rcvd_size/data_sent_size), but never
check the sum fits in the 16-bit field. Via fwctl, a user-supplied
offset plus count/op_size summing over 65535 silently wraps, steering
the device to the wrong feature offset.
Fixes: 5e5ac21f629d ("cxl/mbox: Add GET_FEATURE mailbox command")
Fixes: 14d502cc2718 ("cxl/mbox: Add SET_FEATURE mailbox command")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
Changelog:
v1->v2:
- refactor the guard to "size > U16_MAX - offset", the addition is
performed in size_t, so on 32-bit arch a large user-supplied size
wrpas the sum and bypasses the check. The substraction form can't
misbehave since offset is a u16, making U16_MAX - offset always
well-defined.
---
drivers/cxl/core/features.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
index 85185af46b72..c3d5f88a4e04 100644
--- a/drivers/cxl/core/features.c
+++ b/drivers/cxl/core/features.c
@@ -237,6 +237,9 @@ size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
if (!feat_out || !feat_out_size)
return 0;
+ if (feat_out_size > U16_MAX - offset)
+ return 0;
+
size_out = min(feat_out_size, cxl_mbox->payload_size);
uuid_copy(&pi.uuid, feat_uuid);
pi.selection = selection;
@@ -287,6 +290,9 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox,
if (return_code)
*return_code = CXL_MBOX_CMD_RC_INPUT;
+ if (feat_data_size > U16_MAX - offset)
+ return -EINVAL;
+
struct cxl_mbox_set_feat_in *pi __free(kfree) =
kzalloc(cxl_mbox->payload_size, GFP_KERNEL);
if (!pi)
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 2/5] cxl/region: Scan all partitions for unmapped poison
2026-07-02 9:08 [PATCH v2 0/5] cxl: Sashiko bug fixes Richard Cheng
2026-07-02 9:08 ` [PATCH v2 1/5] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
@ 2026-07-02 9:08 ` Richard Cheng
2026-07-02 9:08 ` [PATCH v2 3/5] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan Richard Cheng
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-07-02 9:08 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_poison_unmapped() sweeps the unmapped tail of each partition
from ctx->part onward. A fully-mapped partition has no unmapped tail,
it's a normal per-partition state, but the loop treated it with break,
aborting the whole sweep and silently skipping unmapped poison in all
later partition. Use continue so a fully-mapped partition is skipped and
later partitions are still scanned.
Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
Changelog:
v1->v2:
- Tweak commit message
---
drivers/cxl/core/region.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 1e211542b6b6..be246fb09c99 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2931,7 +2931,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
offset = res->start;
length = res->end - offset + 1;
if (!length)
- break;
+ continue;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
continue;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 3/5] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan
2026-07-02 9:08 [PATCH v2 0/5] cxl: Sashiko bug fixes Richard Cheng
2026-07-02 9:08 ` [PATCH v2 1/5] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
2026-07-02 9:08 ` [PATCH v2 2/5] cxl/region: Scan all partitions for unmapped poison Richard Cheng
@ 2026-07-02 9:08 ` Richard Cheng
2026-07-02 9:08 ` [PATCH v2 4/5] cxl/region: Start unmapped poison scan at the committed decoder boundary Richard Cheng
2026-07-02 9:08 ` [PATCH v2 5/5] cxl/memdev: Don't overwrite the error from an earlier partition poison query Richard Cheng
4 siblings, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-07-02 9:08 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_poison_unmapped() tolerates the -EFAULT a RAM partition returns
for Get Poison List by skipping that partition, but left rc holding the
error. If the tolerated RAM fault was the last poison query before the
loop ended, the function returned a spurious -EFAULT and the poison-list
read failed even though enumeration succeeded. Reset rc to 0 when
tolerating the fault, matching poison_by_decoder().
Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
drivers/cxl/core/region.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index be246fb09c99..52ba8e9e4288 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2933,8 +2933,10 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
if (!length)
continue;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
+ if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM) {
+ rc = 0;
continue;
+ }
if (rc)
break;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 4/5] cxl/region: Start unmapped poison scan at the committed decoder boundary
2026-07-02 9:08 [PATCH v2 0/5] cxl: Sashiko bug fixes Richard Cheng
` (2 preceding siblings ...)
2026-07-02 9:08 ` [PATCH v2 3/5] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan Richard Cheng
@ 2026-07-02 9:08 ` Richard Cheng
2026-07-02 9:08 ` [PATCH v2 5/5] cxl/memdev: Don't overwrite the error from an earlier partition poison query Richard Cheng
4 siblings, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-07-02 9:08 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
poison_by_decoder() stops at the last committed decoder and records the
handoff in ctx->offset, but cxl_get_poison_unmapped() ignores it and
starts after the highest DPA allocation instead. Allocation exist for
uncommitted decoders too, so their DPA is skipped by both phases and
poison there is never reported. Resume the scan at ctx->offset, and scan
later partitions in full, restoring the pre-rewrite behavior.
Fixes: be5cbd084027 ("cxl: Kill enum cxl_decoder_mode")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
Changelog:
v1->v2:
- New added patch ( sashiko's report )
---
drivers/cxl/core/region.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 52ba8e9e4288..ba77416055f4 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2910,7 +2910,6 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
const struct resource *res;
- struct resource *p, *last;
u64 offset, length;
int rc = 0;
@@ -2923,10 +2922,8 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
*/
for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
res = &cxlds->part[i].res;
- for (p = res->child, last = NULL; p; p = p->sibling)
- last = p;
- if (last)
- offset = last->end + 1;
+ if (i == ctx->part)
+ offset = ctx->offset;
else
offset = res->start;
length = res->end - offset + 1;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 5/5] cxl/memdev: Don't overwrite the error from an earlier partition poison query
2026-07-02 9:08 [PATCH v2 0/5] cxl: Sashiko bug fixes Richard Cheng
` (3 preceding siblings ...)
2026-07-02 9:08 ` [PATCH v2 4/5] cxl/region: Start unmapped poison scan at the committed decoder boundary Richard Cheng
@ 2026-07-02 9:08 ` Richard Cheng
4 siblings, 0 replies; 6+ messages in thread
From: Richard Cheng @ 2026-07-02 9:08 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_poison_by_memdev() queries Get Poison List per partition but
never checks the result inside the loop, so a later partition's success
overwrites an earlier partition's failure and the whole scan reports
success while that partition's poison went unlisted. Before the loop
conversion the PMEM query returned early on error. Stop the loop on any
error not already tolerated as a RAM -EFAULT.
Fixes: be5cbd084027 ("cxl: Kill enum cxl_decoder_mode")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
Changelog:
v1->v2:
- New added patch ( sashiko-bot's report )
---
drivers/cxl/core/memdev.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 33a3d2e7b13a..8718964b9c5e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -231,6 +231,8 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
*/
if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
rc = 0;
+ if (rc)
+ break;
}
return rc;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread