All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region
@ 2026-06-06  7:50 Li Ming
  2026-06-06  7:51 ` [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach() Li Ming
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Li Ming @ 2026-06-06  7:50 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, linux-kernel, Li Ming

This patchset includes two fixes for endpoint decoder attach/detach for
auto-assembly region.

Patch #1 fixes OOB access in cxl_cancel_auto_attach().

Patch #2 fixes NULL endpoint pointers hole in p->targets[]. CXL driver
does not allow any NULL pointer hole in p->targets[], it will cause
NULL pointer dereference issue. However, if an assigned endpoint decoder
is removed from an auto-assembly region, it could make it happen.

The following operations can always trigger NULL pointer hole issue.
Precondition:
an auto-assembly region with LOCK flags or its assigned endpoint
decoders with LOCK flags. This means these assigned endpoint decoders
could be re-attached to the region after being detached.

echo {one of cxl pci BDF} > /sys/bus/pci/drivers/cxl_pci/unbind
echo {one of cxl pci BDF} > /sys/bus/pci/drivers/cxl_pci/bind

it will trigger the NUll pointer dereference issuse fixed by patch #2.

Note: Patch #2 only fixes NULL pointer dereference issue, re-attaching
a removal endpoint decoder to the auto-assembly region still fails
with the patch, because there are other issues blocking re-attachment
flow. One of them is that re-attachment will trigger calling
cxl_region_attach_position() for each targets in p->targets[] again, but
the function fails on the targets which have been attached. I am not
sure whether re-attachment is a valid user usage, if yes, I will work on
that later.

Signed-off-by: Li Ming <ming.li@zohomail.com>
---
Li Ming (2):
      cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()
      cxl/region: Fill first free targets[] slot during auto-discovery

 drivers/cxl/core/region.c | 55 ++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 22 deletions(-)
---
base-commit: a1516711b95490ad6c9f05b61500e73d4f603d28
change-id: 20260606-fix_two_issues_introduced_by_cxl_cancel_auto_attach-6267f4fdce27

Best regards,
-- 
Li Ming <ming.li@zohomail.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()
  2026-06-06  7:50 [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Li Ming
@ 2026-06-06  7:51 ` Li Ming
  2026-06-12  1:19   ` Alison Schofield
  2026-06-06  7:51 ` [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery Li Ming
  2026-06-12 16:40 ` [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Dave Jiang
  2 siblings, 1 reply; 9+ messages in thread
From: Li Ming @ 2026-06-06  7:51 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, linux-kernel, Li Ming

In cxl_cancel_auto_attach(), it assumes cxled->pos is a valid index for
accessing p->targets[]. However, cxled->pos can be set to negative errno
in cxl_region_sort_targets() if cxl_calc_interleave_pos() fails. This
causes the driver to use a negative index to access p->targets[],
resulting in out-of-bounds access.

Fix it by walking p->targets[] instead of using cxled->pos directly.

Fixes: 87805c32e6ad ("cxl/region: Fix use-after-free from auto assembly failure")
Signed-off-by: Li Ming <ming.li@zohomail.com>
---
 drivers/cxl/core/region.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index cc41c08c0c0c..c4335ebf19f7 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2011,8 +2011,9 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr)
 		cxled->pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range);
 		/*
 		 * Record that sorting failed, but still continue to calc
-		 * cxled->pos so that follow-on code paths can reliably
-		 * do p->targets[cxled->pos] to self-reference their entry.
+		 * cxled->pos so that cxl_calc_interleave_pos() emits its
+		 * dev_dbg() for every member. which is useful for auto
+		 * discovery debug.
 		 */
 		if (cxled->pos < 0)
 			rc = -ENXIO;
@@ -2202,18 +2203,30 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	return 0;
 }
 
-static int cxl_region_by_target(struct device *dev, const void *data)
+static int cxl_region_remove_target(struct device *dev, void *data)
 {
-	const struct cxl_endpoint_decoder *cxled = data;
+	struct cxl_endpoint_decoder *cxled = data;
 	struct cxl_region_params *p;
 	struct cxl_region *cxlr;
+	int i;
 
 	if (!is_cxl_region(dev))
 		return 0;
 
 	cxlr = to_cxl_region(dev);
 	p = &cxlr->params;
-	return p->targets[cxled->pos] == cxled;
+	for (i = 0; i < p->interleave_ways; i++) {
+		if (p->targets[i] == cxled) {
+			p->nr_targets--;
+			cxled->state = CXL_DECODER_STATE_AUTO;
+			cxled->pos = -1;
+			p->targets[i] = NULL;
+
+			return 1;
+		}
+	}
+
+	return 0;
 }
 
 /*
@@ -2222,25 +2235,10 @@ static int cxl_region_by_target(struct device *dev, const void *data)
  */
 static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
 {
-	struct cxl_region_params *p;
-	struct cxl_region *cxlr;
-	int pos = cxled->pos;
-
 	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
 		return;
 
-	struct device *dev __free(put_device) =
-		bus_find_device(&cxl_bus_type, NULL, cxled, cxl_region_by_target);
-	if (!dev)
-		return;
-
-	cxlr = to_cxl_region(dev);
-	p = &cxlr->params;
-
-	p->nr_targets--;
-	cxled->state = CXL_DECODER_STATE_AUTO;
-	cxled->pos = -1;
-	p->targets[pos] = NULL;
+	bus_for_each_dev(&cxl_bus_type, NULL, cxled, cxl_region_remove_target);
 }
 
 static struct cxl_region *

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery
  2026-06-06  7:50 [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Li Ming
  2026-06-06  7:51 ` [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach() Li Ming
@ 2026-06-06  7:51 ` Li Ming
  2026-06-06  8:11   ` sashiko-bot
  2026-06-12  1:20   ` Alison Schofield
  2026-06-12 16:40 ` [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Dave Jiang
  2 siblings, 2 replies; 9+ messages in thread
From: Li Ming @ 2026-06-06  7:51 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, linux-kernel, Li Ming

Any invalid endpoint decoder pointer in the target array of an active
region is not allowed by cxl driver. This means cxl driver always
assumes the first p->nr_targets entries of the target array in an
auto-assembly region are valid. However, there are scenarios that could
leave NULL endpoint decoder pointer holes in the target array.

1. When cxl_cancel_auto_attach() removes an endpoint decoder from a
   target array, the target slot is set to NULL. If the removed endpoint
   decoder is not the last element in the target array, the target array
   will contain a NULL hole.

2. When a auto-assembly region removes an assigned endpoint decoder, if
   the removed endpoint decoder is not the last element in the target
   array, always remains a NULL hole in the target array.

When a NULL pointer hole exists in a region's target array, it
introduces two potential problems:
1. Access an endpoint decoder via a NULL pointer. it always trigger
   calltrace like that.
    Oops: general protection fault, probably for non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KASAN PTI
    RIP: 0010:cxl_calc_interleave_pos+0x26/0x810 [cxl_core]
    Call Trace:
      <TASK>
      cxl_region_attach+0xc50/0x2140 [cxl_core]
      cxl_add_to_region+0x321/0x2330 [cxl_core]
      discover_region+0x92/0x150 [cxl_port]
      device_for_each_child+0xf3/0x170
      cxl_port_probe+0x150/0x200 [cxl_port]
      cxl_bus_probe+0x4f/0xa0 [cxl_core]
      really_probe+0x1c8/0x960
      __driver_probe_device+0x323/0x450
      driver_probe_device+0x45/0x120
      __device_attach_driver+0x15d/0x280
      bus_for_each_drv+0x10f/0x190

2. Not having enough valid endpoint decoders attached to an
   auto-assembly region. if an auto-assembly region is created with lock
   flag or assigned endpoint decoder with lock flag, which means
   assigned endpoint decoder will not be reset during detaching, they
   could re-attach to the auto-assembly region again. But cxl region
   driver relies on p->nr_targets to verify whether the required number
   of endpoint decoders has been attached, and NULL endpoint decoder
   pointers are still counted in that case.

To fix above issues, adjust cxl_region_attach_auto() logic to find the
first free target slot for endpoint decoder attachment, this ensures
NULL holes in the target array are filled, rather than adding new
endpoint decoders at the tail of the target array.

Fixes: 87805c32e6ad ("cxl/region: Fix use-after-free from auto assembly failure")
Fixes: 2230c4bdc412 ("cxl: Add handling of locked CXL decoder")
Suggested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Li Ming <ming.li@zohomail.com>
---
 drivers/cxl/core/region.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c4335ebf19f7..532dac77bd00 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1848,8 +1848,21 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
 	 * this means that userspace can view devices in the wrong position
 	 * before the region activates, and must be careful to understand when
 	 * it might be racing region autodiscovery.
+	 *
+	 * The endpoint decoder will be recorded into the first free slot of
+	 * the target array.
 	 */
-	pos = p->nr_targets;
+	for (pos = 0; pos < p->interleave_ways; pos++) {
+		if (!p->targets[pos])
+			break;
+	}
+
+	if (pos == p->interleave_ways) {
+		dev_err(&cxlr->dev, "%s: unable to find a free target slot\n",
+			dev_name(&cxled->cxld.dev));
+		return -ENXIO;
+	}
+
 	p->targets[pos] = cxled;
 	cxled->pos = pos;
 	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery
  2026-06-06  7:51 ` [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery Li Ming
@ 2026-06-06  8:11   ` sashiko-bot
  2026-06-08  4:38     ` Li Ming
  2026-06-12  1:20   ` Alison Schofield
  1 sibling, 1 reply; 9+ messages in thread
From: sashiko-bot @ 2026-06-06  8:11 UTC (permalink / raw)
  To: Li Ming; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Loops iterating over the `p->targets` array up to `p->nr_targets` unconditionally dereference elements, leading to NULL pointer dereferences because the array can be sparse.
--

commit bf78125f1b0b7a811b9b200d37ed57cc96619a00
Author: Li Ming <ming.li@zohomail.com>

cxl/region: Fill first free targets[] slot during auto-discovery

This commit adjusts the cxl_region_attach_auto logic to find the first free
target slot for endpoint decoder attachment, rather than appending to the tail.
This ensures NULL holes in the target array are filled to avoid accessing an
endpoint decoder via a NULL pointer and miscounting valid endpoint decoders.

> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c4335ebf19f7f..532dac77bd003 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1848,8 +1848,21 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
>  	 * this means that userspace can view devices in the wrong position
>  	 * before the region activates, and must be careful to understand when
>  	 * it might be racing region autodiscovery.
> +	 *
> +	 * The endpoint decoder will be recorded into the first free slot of
> +	 * the target array.
>  	 */
> -	pos = p->nr_targets;
> +	for (pos = 0; pos < p->interleave_ways; pos++) {
> +		if (!p->targets[pos])
> +			break;
> +	}
> +
> +	if (pos == p->interleave_ways) {
> +		dev_err(&cxlr->dev, "%s: unable to find a free target slot\n",
> +			dev_name(&cxled->cxld.dev));
> +		return -ENXIO;
> +	}
> +
>  	p->targets[pos] = cxled;

[Severity: High]
This is a pre-existing issue, but since __cxl_decoder_detach() can leave
sparse holes in p->targets without compacting the array, are the loops
that iterate up to p->nr_targets still susceptible to NULL pointer
dereferences?

For instance, in drivers/cxl/core/region.c:cxl_dpa_to_hpa():

    for (int i = 0; i < p->nr_targets; i++) {
        if (cxlmd == cxled_to_memdev(p->targets[i])) {

If there is a hole at an index less than p->nr_targets, wouldn't
cxled_to_memdev() dereference a NULL pointer?

Similar unprotected iterations seem to exist in
cxl_scrub_get_attrbs_region(), unaligned_region_offset_to_dpa_result(),
and region_offset_to_dpa_result().

Does the array need to be compacted upon detach, or should these loops be
updated to check for NULL pointers before dereferencing p->targets[i]?

>  	cxled->pos = pos;
>  	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260606-fix_two_issues_introduced_by_cxl_cancel_auto_attach-v1-0-5d94ca06c4e4@zohomail.com?part=2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery
  2026-06-06  8:11   ` sashiko-bot
@ 2026-06-08  4:38     ` Li Ming
  2026-06-12  1:19       ` Alison Schofield
  0 siblings, 1 reply; 9+ messages in thread
From: Li Ming @ 2026-06-08  4:38 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-cxl

On 6/6/2026 4:11 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] Loops iterating over the `p->targets` array up to `p->nr_targets` unconditionally dereference elements, leading to NULL pointer dereferences because the array can be sparse.
> --
>
> commit bf78125f1b0b7a811b9b200d37ed57cc96619a00
> Author: Li Ming <ming.li@zohomail.com>
>
> cxl/region: Fill first free targets[] slot during auto-discovery
>
> This commit adjusts the cxl_region_attach_auto logic to find the first free
> target slot for endpoint decoder attachment, rather than appending to the tail.
> This ensures NULL holes in the target array are filled to avoid accessing an
> endpoint decoder via a NULL pointer and miscounting valid endpoint decoders.
>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index c4335ebf19f7f..532dac77bd003 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -1848,8 +1848,21 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
>>  	 * this means that userspace can view devices in the wrong position
>>  	 * before the region activates, and must be careful to understand when
>>  	 * it might be racing region autodiscovery.
>> +	 *
>> +	 * The endpoint decoder will be recorded into the first free slot of
>> +	 * the target array.
>>  	 */
>> -	pos = p->nr_targets;
>> +	for (pos = 0; pos < p->interleave_ways; pos++) {
>> +		if (!p->targets[pos])
>> +			break;
>> +	}
>> +
>> +	if (pos == p->interleave_ways) {
>> +		dev_err(&cxlr->dev, "%s: unable to find a free target slot\n",
>> +			dev_name(&cxled->cxld.dev));
>> +		return -ENXIO;
>> +	}
>> +
>>  	p->targets[pos] = cxled;
> [Severity: High]
> This is a pre-existing issue, but since __cxl_decoder_detach() can leave
> sparse holes in p->targets without compacting the array, are the loops
> that iterate up to p->nr_targets still susceptible to NULL pointer
> dereferences?
>
> For instance, in drivers/cxl/core/region.c:cxl_dpa_to_hpa():
>
>     for (int i = 0; i < p->nr_targets; i++) {
>         if (cxlmd == cxled_to_memdev(p->targets[i])) {
>
> If there is a hole at an index less than p->nr_targets, wouldn't
> cxled_to_memdev() dereference a NULL pointer?
>
> Similar unprotected iterations seem to exist in
> cxl_scrub_get_attrbs_region(), unaligned_region_offset_to_dpa_result(),
> and region_offset_to_dpa_result().
>
> Does the array need to be compacted upon detach, or should these loops be
> updated to check for NULL pointers before dereferencing p->targets[i]?

It is not a bug here.

cxl driver always calls cxl_dpa_to_region() for a cxl region checking before involving cxl_dpa_to_hpa(), but if a cxl region is not bound to cxl region driver, driver will not involve cxl_dpa_to_hpa().

But maybe adding cxlr->driver checking in cxl_dpa_to_hpa() is a choice.

>
>>  	cxled->pos = pos;
>>  	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery
  2026-06-08  4:38     ` Li Ming
@ 2026-06-12  1:19       ` Alison Schofield
  0 siblings, 0 replies; 9+ messages in thread
From: Alison Schofield @ 2026-06-12  1:19 UTC (permalink / raw)
  To: Li Ming; +Cc: sashiko-reviews, linux-cxl

On Mon, Jun 08, 2026 at 12:38:42PM +0800, Li Ming wrote:
> On 6/6/2026 4:11 PM, sashiko-bot@kernel.org wrote:
> > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> >
> > Pre-existing issues:
> > - [High] Loops iterating over the `p->targets` array up to `p->nr_targets` unconditionally dereference elements, leading to NULL pointer dereferences because the array can be sparse.
> > --
> >
> > commit bf78125f1b0b7a811b9b200d37ed57cc96619a00
> > Author: Li Ming <ming.li@zohomail.com>
> >
> > cxl/region: Fill first free targets[] slot during auto-discovery
> >
> > This commit adjusts the cxl_region_attach_auto logic to find the first free
> > target slot for endpoint decoder attachment, rather than appending to the tail.
> > This ensures NULL holes in the target array are filled to avoid accessing an
> > endpoint decoder via a NULL pointer and miscounting valid endpoint decoders.
> >
> >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> >> index c4335ebf19f7f..532dac77bd003 100644
> >> --- a/drivers/cxl/core/region.c
> >> +++ b/drivers/cxl/core/region.c
> >> @@ -1848,8 +1848,21 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
> >>  	 * this means that userspace can view devices in the wrong position
> >>  	 * before the region activates, and must be careful to understand when
> >>  	 * it might be racing region autodiscovery.
> >> +	 *
> >> +	 * The endpoint decoder will be recorded into the first free slot of
> >> +	 * the target array.
> >>  	 */
> >> -	pos = p->nr_targets;
> >> +	for (pos = 0; pos < p->interleave_ways; pos++) {
> >> +		if (!p->targets[pos])
> >> +			break;
> >> +	}
> >> +
> >> +	if (pos == p->interleave_ways) {
> >> +		dev_err(&cxlr->dev, "%s: unable to find a free target slot\n",
> >> +			dev_name(&cxled->cxld.dev));
> >> +		return -ENXIO;
> >> +	}
> >> +
> >>  	p->targets[pos] = cxled;
> > [Severity: High]
> > This is a pre-existing issue, but since __cxl_decoder_detach() can leave
> > sparse holes in p->targets without compacting the array, are the loops
> > that iterate up to p->nr_targets still susceptible to NULL pointer
> > dereferences?
> >
> > For instance, in drivers/cxl/core/region.c:cxl_dpa_to_hpa():
> >
> >     for (int i = 0; i < p->nr_targets; i++) {
> >         if (cxlmd == cxled_to_memdev(p->targets[i])) {
> >
> > If there is a hole at an index less than p->nr_targets, wouldn't
> > cxled_to_memdev() dereference a NULL pointer?
> >
> > Similar unprotected iterations seem to exist in
> > cxl_scrub_get_attrbs_region(), unaligned_region_offset_to_dpa_result(),
> > and region_offset_to_dpa_result().
> >
> > Does the array need to be compacted upon detach, or should these loops be
> > updated to check for NULL pointers before dereferencing p->targets[i]?
> 
> It is not a bug here.
> 
> cxl driver always calls cxl_dpa_to_region() for a cxl region checking before involving cxl_dpa_to_hpa(), but if a cxl region is not bound to cxl region driver, driver will not involve cxl_dpa_to_hpa().
> 
> But maybe adding cxlr->driver checking in cxl_dpa_to_hpa() is a choice.
> 

Hi Ming, 
I'm in agreemnet that there is not a reachable NULL deref path here.
With this patch, the hole is gone by the time those loops run.

A hole only exists transiently between a detach and the next stage,
while the region is not comitted. The windows suggested aren't real.

I'm kind of against adding an explicit check in cxl_dpa_to_hpa(), as
'hardening' because that undercuts this whole point here.

-- Alison



> >
> >>  	cxled->pos = pos;
> >>  	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()
  2026-06-06  7:51 ` [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach() Li Ming
@ 2026-06-12  1:19   ` Alison Schofield
  0 siblings, 0 replies; 9+ messages in thread
From: Alison Schofield @ 2026-06-12  1:19 UTC (permalink / raw)
  To: Li Ming
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, linux-cxl, linux-kernel

On Sat, Jun 06, 2026 at 03:51:00PM +0800, Li Ming wrote:
> In cxl_cancel_auto_attach(), it assumes cxled->pos is a valid index for
> accessing p->targets[]. However, cxled->pos can be set to negative errno
> in cxl_region_sort_targets() if cxl_calc_interleave_pos() fails. This
> causes the driver to use a negative index to access p->targets[],
> resulting in out-of-bounds access.
> 
> Fix it by walking p->targets[] instead of using cxled->pos directly.

Reviewed-by: Alison Schofield <alison.schofield@intel.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery
  2026-06-06  7:51 ` [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery Li Ming
  2026-06-06  8:11   ` sashiko-bot
@ 2026-06-12  1:20   ` Alison Schofield
  1 sibling, 0 replies; 9+ messages in thread
From: Alison Schofield @ 2026-06-12  1:20 UTC (permalink / raw)
  To: Li Ming
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, linux-cxl, linux-kernel

On Sat, Jun 06, 2026 at 03:51:01PM +0800, Li Ming wrote:
> Any invalid endpoint decoder pointer in the target array of an active
> region is not allowed by cxl driver. This means cxl driver always
> assumes the first p->nr_targets entries of the target array in an
> auto-assembly region are valid. However, there are scenarios that could
> leave NULL endpoint decoder pointer holes in the target array.
> 
> 1. When cxl_cancel_auto_attach() removes an endpoint decoder from a
>    target array, the target slot is set to NULL. If the removed endpoint
>    decoder is not the last element in the target array, the target array
>    will contain a NULL hole.
> 
> 2. When a auto-assembly region removes an assigned endpoint decoder, if
>    the removed endpoint decoder is not the last element in the target
>    array, always remains a NULL hole in the target array.
> 
> When a NULL pointer hole exists in a region's target array, it
> introduces two potential problems:
> 1. Access an endpoint decoder via a NULL pointer. it always trigger
>    calltrace like that.
>     Oops: general protection fault, probably for non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KASAN PTI
>     RIP: 0010:cxl_calc_interleave_pos+0x26/0x810 [cxl_core]
>     Call Trace:
>       <TASK>
>       cxl_region_attach+0xc50/0x2140 [cxl_core]
>       cxl_add_to_region+0x321/0x2330 [cxl_core]
>       discover_region+0x92/0x150 [cxl_port]
>       device_for_each_child+0xf3/0x170
>       cxl_port_probe+0x150/0x200 [cxl_port]
>       cxl_bus_probe+0x4f/0xa0 [cxl_core]
>       really_probe+0x1c8/0x960
>       __driver_probe_device+0x323/0x450
>       driver_probe_device+0x45/0x120
>       __device_attach_driver+0x15d/0x280
>       bus_for_each_drv+0x10f/0x190
> 
> 2. Not having enough valid endpoint decoders attached to an
>    auto-assembly region. if an auto-assembly region is created with lock
>    flag or assigned endpoint decoder with lock flag, which means
>    assigned endpoint decoder will not be reset during detaching, they
>    could re-attach to the auto-assembly region again. But cxl region
>    driver relies on p->nr_targets to verify whether the required number
>    of endpoint decoders has been attached, and NULL endpoint decoder
>    pointers are still counted in that case.
> 
> To fix above issues, adjust cxl_region_attach_auto() logic to find the
> first free target slot for endpoint decoder attachment, this ensures
> NULL holes in the target array are filled, rather than adding new
> endpoint decoders at the tail of the target array.

Reviewed-by: Alison Schofield <alison.schofield@intel.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region
  2026-06-06  7:50 [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Li Ming
  2026-06-06  7:51 ` [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach() Li Ming
  2026-06-06  7:51 ` [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery Li Ming
@ 2026-06-12 16:40 ` Dave Jiang
  2 siblings, 0 replies; 9+ messages in thread
From: Dave Jiang @ 2026-06-12 16:40 UTC (permalink / raw)
  To: Li Ming, Davidlohr Bueso, Jonathan Cameron, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, linux-kernel



On 6/6/26 12:50 AM, Li Ming wrote:
> This patchset includes two fixes for endpoint decoder attach/detach for
> auto-assembly region.
> 
> Patch #1 fixes OOB access in cxl_cancel_auto_attach().
> 
> Patch #2 fixes NULL endpoint pointers hole in p->targets[]. CXL driver
> does not allow any NULL pointer hole in p->targets[], it will cause
> NULL pointer dereference issue. However, if an assigned endpoint decoder
> is removed from an auto-assembly region, it could make it happen.
> 
> The following operations can always trigger NULL pointer hole issue.
> Precondition:
> an auto-assembly region with LOCK flags or its assigned endpoint
> decoders with LOCK flags. This means these assigned endpoint decoders
> could be re-attached to the region after being detached.
> 
> echo {one of cxl pci BDF} > /sys/bus/pci/drivers/cxl_pci/unbind
> echo {one of cxl pci BDF} > /sys/bus/pci/drivers/cxl_pci/bind
> 
> it will trigger the NUll pointer dereference issuse fixed by patch #2.
> 
> Note: Patch #2 only fixes NULL pointer dereference issue, re-attaching
> a removal endpoint decoder to the auto-assembly region still fails
> with the patch, because there are other issues blocking re-attachment
> flow. One of them is that re-attachment will trigger calling
> cxl_region_attach_position() for each targets in p->targets[] again, but
> the function fails on the targets which have been attached. I am not
> sure whether re-attachment is a valid user usage, if yes, I will work on
> that later.
> 
> Signed-off-by: Li Ming <ming.li@zohomail.com>
> ---
> Li Ming (2):
>       cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()
>       cxl/region: Fill first free targets[] slot during auto-discovery

applied to cxl/next
cbda6a2c2bec
aa8a76711c15

> 
>  drivers/cxl/core/region.c | 55 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 22 deletions(-)
> ---
> base-commit: a1516711b95490ad6c9f05b61500e73d4f603d28
> change-id: 20260606-fix_two_issues_introduced_by_cxl_cancel_auto_attach-6267f4fdce27
> 
> Best regards,


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-12 16:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-06  7:50 [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Li Ming
2026-06-06  7:51 ` [PATCH 1/2] cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach() Li Ming
2026-06-12  1:19   ` Alison Schofield
2026-06-06  7:51 ` [PATCH 2/2] cxl/region: Fill first free targets[] slot during auto-discovery Li Ming
2026-06-06  8:11   ` sashiko-bot
2026-06-08  4:38     ` Li Ming
2026-06-12  1:19       ` Alison Schofield
2026-06-12  1:20   ` Alison Schofield
2026-06-12 16:40 ` [PATCH 0/2] cxl/region: Fix two decoder attach/detach issues for auto-assembly region Dave Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.