* [PATCH 1/3] cxl/features: Reject feature offset that overflows 16-bit field
2026-06-30 7:46 [PATCH 0/3] cxl: Sashiko bug fixes Richard Cheng
@ 2026-06-30 7:46 ` Richard Cheng
2026-06-30 15:54 ` Dave Jiang
2026-06-30 7:46 ` [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison Richard Cheng
2026-06-30 7:46 ` [PATCH 3/3] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan Richard Cheng
2 siblings, 1 reply; 8+ messages in thread
From: Richard Cheng @ 2026-06-30 7:46 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_feature() and cxl_set_feature() build the mailbox command's
offset as cpu_to_le16(offset + data_rcvd_size/data_sent_size), but never
check the sum fits in the 16-bit field. Via fwctl, a user-supplied
offset plus count/op_size summing over 65535 silently wraps, steering
the device to the wrong feature offset.
Fixes: 5e5ac21f629d ("cxl/mbox: Add GET_FEATURE mailbox command")
Fixes: 14d502cc2718 ("cxl/mbox: Add SET_FEATURE mailbox command")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
drivers/cxl/core/features.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
index 85185af46b72..db5964ea184f 100644
--- a/drivers/cxl/core/features.c
+++ b/drivers/cxl/core/features.c
@@ -237,6 +237,9 @@ size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
if (!feat_out || !feat_out_size)
return 0;
+ if (offset + feat_out_size > U16_MAX)
+ return 0;
+
size_out = min(feat_out_size, cxl_mbox->payload_size);
uuid_copy(&pi.uuid, feat_uuid);
pi.selection = selection;
@@ -287,6 +290,9 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox,
if (return_code)
*return_code = CXL_MBOX_CMD_RC_INPUT;
+ if (offset + feat_data_size > U16_MAX)
+ return -EINVAL;
+
struct cxl_mbox_set_feat_in *pi __free(kfree) =
kzalloc(cxl_mbox->payload_size, GFP_KERNEL);
if (!pi)
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 1/3] cxl/features: Reject feature offset that overflows 16-bit field
2026-06-30 7:46 ` [PATCH 1/3] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
@ 2026-06-30 15:54 ` Dave Jiang
0 siblings, 0 replies; 8+ messages in thread
From: Dave Jiang @ 2026-06-30 15:54 UTC (permalink / raw)
To: Richard Cheng, dave, jic23, alison.schofield, vishal.l.verma,
djbw, danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak
On 6/30/26 12:46 AM, Richard Cheng wrote:
> cxl_get_feature() and cxl_set_feature() build the mailbox command's
> offset as cpu_to_le16(offset + data_rcvd_size/data_sent_size), but never
> check the sum fits in the 16-bit field. Via fwctl, a user-supplied
> offset plus count/op_size summing over 65535 silently wraps, steering
> the device to the wrong feature offset.
>
> Fixes: 5e5ac21f629d ("cxl/mbox: Add GET_FEATURE mailbox command")
> Fixes: 14d502cc2718 ("cxl/mbox: Add SET_FEATURE mailbox command")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>
> ---
> drivers/cxl/core/features.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c
> index 85185af46b72..db5964ea184f 100644
> --- a/drivers/cxl/core/features.c
> +++ b/drivers/cxl/core/features.c
> @@ -237,6 +237,9 @@ size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
> if (!feat_out || !feat_out_size)
> return 0;
>
> + if (offset + feat_out_size > U16_MAX)
> + return 0;
Should this return -EINVAL?
> +
> size_out = min(feat_out_size, cxl_mbox->payload_size);
> uuid_copy(&pi.uuid, feat_uuid);
> pi.selection = selection;
> @@ -287,6 +290,9 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox,
> if (return_code)
> *return_code = CXL_MBOX_CMD_RC_INPUT;
>
> + if (offset + feat_data_size > U16_MAX)
> + return -EINVAL;
> +
> struct cxl_mbox_set_feat_in *pi __free(kfree) =
> kzalloc(cxl_mbox->payload_size, GFP_KERNEL);
> if (!pi)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison
2026-06-30 7:46 [PATCH 0/3] cxl: Sashiko bug fixes Richard Cheng
2026-06-30 7:46 ` [PATCH 1/3] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
@ 2026-06-30 7:46 ` Richard Cheng
2026-06-30 15:56 ` Dave Jiang
2026-07-01 4:48 ` Alison Schofield
2026-06-30 7:46 ` [PATCH 3/3] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan Richard Cheng
2 siblings, 2 replies; 8+ messages in thread
From: Richard Cheng @ 2026-06-30 7:46 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_poison_unmapped() sweeps the unmapped tail of each partition
from ctx->part onward. A fully-mapped partition has no unmapped tail
, it's a normal per-partition state, but the loop treated it with break,
aborting the whole sweep and silently skipping unmapped poison in all
later partition. Use continue so a fully-mapped partition is skipped and
later partitions are still scanned.
Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
drivers/cxl/core/region.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 1e211542b6b6..be246fb09c99 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2931,7 +2931,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
offset = res->start;
length = res->end - offset + 1;
if (!length)
- break;
+ continue;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
continue;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison
2026-06-30 7:46 ` [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison Richard Cheng
@ 2026-06-30 15:56 ` Dave Jiang
2026-07-01 4:48 ` Alison Schofield
1 sibling, 0 replies; 8+ messages in thread
From: Dave Jiang @ 2026-06-30 15:56 UTC (permalink / raw)
To: Richard Cheng, dave, jic23, alison.schofield, vishal.l.verma,
djbw, danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak
On 6/30/26 12:46 AM, Richard Cheng wrote:
> cxl_get_poison_unmapped() sweeps the unmapped tail of each partition
> from ctx->part onward. A fully-mapped partition has no unmapped tail
> , it's a normal per-partition state, but the loop treated it with break,
The comma should be with 'tail'. weird line break?
> aborting the whole sweep and silently skipping unmapped poison in all
> later partition. Use continue so a fully-mapped partition is skipped and
> later partitions are still scanned.
>
> Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 1e211542b6b6..be246fb09c99 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2931,7 +2931,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> offset = res->start;
> length = res->end - offset + 1;
> if (!length)
> - break;
> + continue;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> continue;
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison
2026-06-30 7:46 ` [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison Richard Cheng
2026-06-30 15:56 ` Dave Jiang
@ 2026-07-01 4:48 ` Alison Schofield
1 sibling, 0 replies; 8+ messages in thread
From: Alison Schofield @ 2026-07-01 4:48 UTC (permalink / raw)
To: Richard Cheng
Cc: dave, jic23, dave.jiang, vishal.l.verma, djbw, danwilliams,
iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak
On Tue, Jun 30, 2026 at 03:46:56PM +0800, Richard Cheng wrote:
> cxl_get_poison_unmapped() sweeps the unmapped tail of each partition
> from ctx->part onward. A fully-mapped partition has no unmapped tail
> , it's a normal per-partition state, but the loop treated it with break,
> aborting the whole sweep and silently skipping unmapped poison in all
> later partition. Use continue so a fully-mapped partition is skipped and
> later partitions are still scanned.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Posted a test case for review:
https://lore.kernel.org/all/20260701044205.1589967-1-alison.schofield@intel.com/
>
> Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>
> ---
> drivers/cxl/core/region.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 1e211542b6b6..be246fb09c99 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2931,7 +2931,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> offset = res->start;
> length = res->end - offset + 1;
> if (!length)
> - break;
> + continue;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> continue;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan
2026-06-30 7:46 [PATCH 0/3] cxl: Sashiko bug fixes Richard Cheng
2026-06-30 7:46 ` [PATCH 1/3] cxl/features: Reject feature offset that overflows 16-bit field Richard Cheng
2026-06-30 7:46 ` [PATCH 2/3] cxl/region: Scan all partitions for unmapped poison Richard Cheng
@ 2026-06-30 7:46 ` Richard Cheng
2026-06-30 16:04 ` Dave Jiang
2 siblings, 1 reply; 8+ messages in thread
From: Richard Cheng @ 2026-06-30 7:46 UTC (permalink / raw)
To: dave, jic23, dave.jiang, alison.schofield, vishal.l.verma, djbw,
danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak, Richard Cheng
cxl_get_poison_unmapped() tolerates the -EFAULT a RAM partition returns
for Get Poison List by skipping that partition, but left rc holding the
error. If the tolerated RAM fault was the last poison query before the
loop ended, the function returned a spurious -EFAULT and the poison-list
read failed even though enumeration succeeded. Reset rc to 0 when
tolerating the fault, matching poison_by_decoder().
Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
drivers/cxl/core/region.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index be246fb09c99..52ba8e9e4288 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2933,8 +2933,10 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
if (!length)
continue;
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
- if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
+ if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM) {
+ rc = 0;
continue;
+ }
if (rc)
break;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 3/3] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan
2026-06-30 7:46 ` [PATCH 3/3] cxl/region: Don't leak tolerated RAM -EFAULT from unmapped poison scan Richard Cheng
@ 2026-06-30 16:04 ` Dave Jiang
0 siblings, 0 replies; 8+ messages in thread
From: Dave Jiang @ 2026-06-30 16:04 UTC (permalink / raw)
To: Richard Cheng, dave, jic23, alison.schofield, vishal.l.verma,
djbw, danwilliams
Cc: iweiny, ming.li, gourry, rrichter, linux-cxl, linux-kernel, kees,
newtonl, kristinc, mochs, kaihengf, kobak
On 6/30/26 12:46 AM, Richard Cheng wrote:
> cxl_get_poison_unmapped() tolerates the -EFAULT a RAM partition returns
> for Get Poison List by skipping that partition, but left rc holding the
> error. If the tolerated RAM fault was the last poison query before the
> loop ended, the function returned a spurious -EFAULT and the poison-list
> read failed even though enumeration succeeded. Reset rc to 0 when
> tolerating the fault, matching poison_by_decoder().
>
> Fixes: be5cbd0840275 ("cxl: Kill enum cxl_decoder_mode")
> Signed-off-by: Richard Cheng <icheng@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index be246fb09c99..52ba8e9e4288 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2933,8 +2933,10 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
> if (!length)
> continue;
> rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> - if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
> + if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM) {
> + rc = 0;
> continue;
> + }
> if (rc)
> break;
> }
^ permalink raw reply [flat|nested] 8+ messages in thread