All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cxl/region: Fix NULL pointer within p->targets[]
@ 2026-05-30  4:24 Li Ming
  2026-05-30  5:06 ` sashiko-bot
  2026-06-03 22:40 ` Alison Schofield
  0 siblings, 2 replies; 6+ messages in thread
From: Li Ming @ 2026-05-30  4:24 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, linux-kernel, Li Ming

cxl_region_remove_target() leaves a NULL pointer in the slot of the
removable endpoint decoder in p->targets array. However, p->targets
array replies on p->nr_targets to determine validity, which means when
p->nr_targets == p->interleave_ways, driver assumes all elements from
index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
violates this assumption and causes the driver to treat a NULL pointer
as a valid endpoint decoder.

To fix this issue, when a endpoint decoder is removed by
cxl_region_remove_target(), always swap the last valid endpoint decoder
pointer into the slot of removal endpoint decoder to ensure all pointers
before p->targets[p->nr_targets] are valid.

Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
Suggested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Li Ming <ming.li@zohomail.com>
---
 drivers/cxl/core/region.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e90c024c8036..54018db87a4c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
 			p->nr_targets--;
 			cxled->state = CXL_DECODER_STATE_AUTO;
 			cxled->pos = -1;
-			p->targets[i] = NULL;
+
+			/*
+			 * Swap the last valid target into the slot to
+			 * ensure no invalid target in p->nr_targets range.
+			 * The targets array will be re-sorted during the
+			 * last endpoint decoder attaching again.
+			 */
+			p->targets[i] = p->targets[p->nr_targets];
+			p->targets[p->nr_targets] = NULL;
 
 			return 1;
 		}

---
base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
change-id: 20260530-fix_null_in_targets_array-124303a8ba0f

Best regards,
-- 
Li Ming <ming.li@zohomail.com>


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cxl/region: Fix NULL pointer within p->targets[]
  2026-05-30  4:24 [PATCH] cxl/region: Fix NULL pointer within p->targets[] Li Ming
@ 2026-05-30  5:06 ` sashiko-bot
  2026-05-30 14:54   ` Li Ming
  2026-06-03 22:40 ` Alison Schofield
  1 sibling, 1 reply; 6+ messages in thread
From: sashiko-bot @ 2026-05-30  5:06 UTC (permalink / raw)
  To: Li Ming; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] __cxl_decoder_detach() suffers from the exact same p->targets hole vulnerability for fully assembled auto-discovered regions, which will cause a NULL pointer dereference upon re-attachment.
- [High] Failed auto-discovery decoders are left in the region targets array if failure occurs after endpoint port attachment, leading to a Use-After-Free (UAF).
--

commit 8beb15c55def2c0ba7cda46a708f6b3d2ab5e643
Author: Li Ming <ming.li@zohomail.com>

cxl/region: Fix NULL pointer within p->targets[]

This patch fixes a NULL pointer issue in p->targets[] during endpoint
decoder removal. It ensures the last valid endpoint decoder pointer is
swapped into the removed slot, avoiding gaps in the valid targets array.

> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e90c024c80365..54018db87a4c9 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>  			p->nr_targets--;
>  			cxled->state = CXL_DECODER_STATE_AUTO;
>  			cxled->pos = -1;
> -			p->targets[i] = NULL;
> +
> +			/*
> +			 * Swap the last valid target into the slot to
> +			 * ensure no invalid target in p->nr_targets range.
> +			 * The targets array will be re-sorted during the
> +			 * last endpoint decoder attaching again.
> +			 */
> +			p->targets[i] = p->targets[p->nr_targets];
> +			p->targets[p->nr_targets] = NULL;

[Severity: High]
This isn't a bug introduced by this patch, but does __cxl_decoder_detach()
suffer from the exact same p->targets hole vulnerability for fully assembled
auto-discovered regions?

When a target is detached from an auto-discovered region:

drivers/cxl/core/region.c:__cxl_decoder_detach() {
    ...
	p->targets[cxled->pos] = NULL;
	p->nr_targets--;
    ...
}

For auto-discovered regions, new targets are appended at
p->targets[p->nr_targets] by cxl_region_attach_auto(). This leaves the NULL
hole intact while overwriting a valid endpoint pointer that was previously
positioned at the end of the array. Once p->nr_targets reaches
p->interleave_ways again, cxl_region_sort_targets() iterates through the
array and attempts to dereference the NULL pointer, which could cause a
kernel panic.


[Severity: High]
This isn't a bug introduced by this patch, but are failed auto-discovery
decoders left in the region targets array if failure occurs after endpoint
port attachment, leading to a Use-After-Free?

cxl_rr_ep_add() correctly upgrades an endpoint's staging state to
CXL_DECODER_STATE_AUTO:

drivers/cxl/core/region.c:cxl_rr_ep_add() {
    ...
		if (cxld == &cxled->cxld &&
		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
			cxled->state = CXL_DECODER_STATE_AUTO;
    ...
}

However, if subsequent topology validation fails, the error path calls
cxl_port_detach_region(), which clears the region pointer but neglects to
restore the endpoint's state back to CXL_DECODER_STATE_AUTO_STAGED. Because
of this, when __cxl_decoder_detach() invokes cxl_cancel_auto_attach(), it
silently skips removal due to the state mismatch:

drivers/cxl/core/region.c:cxl_cancel_auto_attach() {
    ...
	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
		return;
    ...
}

The endpoint decoder remains in the p->targets array and is subsequently
freed upon removal, resulting in a persistent Use-After-Free vulnerability
when the sysfs region targets are read or the array is otherwise accessed.

>  
>  			return 1;
>  		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260530-fix_null_in_targets_array-v1-1-312c3bf1fe0f@zohomail.com?part=1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cxl/region: Fix NULL pointer within p->targets[]
  2026-05-30  5:06 ` sashiko-bot
@ 2026-05-30 14:54   ` Li Ming
  0 siblings, 0 replies; 6+ messages in thread
From: Li Ming @ 2026-05-30 14:54 UTC (permalink / raw)
  To: alison.schofield, dave.jiang, Dan Williams, Jonathan Cameron,
	dave, vishal.l.verma, ira.weiny
  Cc: linux-cxl, sashiko-reviews


在 2026/5/30 13:06, sashiko-bot@kernel.org 写道:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] __cxl_decoder_detach() suffers from the exact same p->targets hole vulnerability for fully assembled auto-discovered regions, which will cause a NULL pointer dereference upon re-attachment.

I can reproduce it by the following sequence.

Environment: CFMWS lock flag is set, interleave_ways of an 
auto-discovery region is 4.

1. echo -n {one of cxl device} > /sys/bus/pci/drivers/cxl_pci/unbind

2. echo -n {same device} > /sys/bus/pci/drivers/cxl_pci/bind

Will see something like

[  113.444191] Oops: general protection fault, probably for 
non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KASAN PTI
[  113.444506] KASAN: null-ptr-deref in range 
[0x0000000000000040-0x0000000000000047]
[  113.444506] RIP: 0010:cxl_calc_interleave_pos+0x26/0x810 [cxl_core]
[  113.444506] Call Trace:
[  113.444506]  <TASK>
[  113.444506]  cxl_region_attach+0xc50/0x2140 [cxl_core]
[  113.444506]  cxl_add_to_region+0x321/0x2310 [cxl_core]
[  113.444506]  discover_region+0x92/0x150 [cxl_port]


Seems like we also need to compact p->targets array in that case, but I 
believe it will trigger below warning aftaer compacting.

__cxl_decoder_detach():

   ...

   if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
       p->targets[cxled->pos] != cxled) {
     struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);

     dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n",
                                dev_name(&cxlmd->dev), 
dev_name(&cxled->cxld.dev),
                                cxled->pos);
     return NULL;
   }

   ...

I think maybe we can also walk p->targets array to confirm if the given 
cxled belongs to the region?


> - [High] Failed auto-discovery decoders are left in the region targets array if failure occurs after endpoint port attachment, leading to a Use-After-Free (UAF).

For this issue, my understanding is that we need a separate patch to 
reset cxled->state to CXL_DECODER_STATE_AUTO in cxl_rr_free_decoder(). 
like that

@@ -1068,6 +1068,12 @@ static void cxl_rr_free_decoder(struct 
cxl_region_ref *cxl_rr)

         dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region 
mismatch\n");
         if (cxld->region == cxlr) {
+               if (is_endpoint_decoder(&cxld->dev)) {
+                       struct cxl_endpoint_decoder *cxled = 
to_cxl_endpoint_decoder(&cxld->dev);
+
+                       if (cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
+                               cxled->state = CXL_DECODER_STATE_AUTO;
+               }
                 cxld->region = NULL;
                 put_device(&cxlr->dev);
         }


Ming

> --
>
> commit 8beb15c55def2c0ba7cda46a708f6b3d2ab5e643
> Author: Li Ming <ming.li@zohomail.com>
>
> cxl/region: Fix NULL pointer within p->targets[]
>
> This patch fixes a NULL pointer issue in p->targets[] during endpoint
> decoder removal. It ensures the last valid endpoint decoder pointer is
> swapped into the removed slot, avoiding gaps in the valid targets array.
>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index e90c024c80365..54018db87a4c9 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>>   			p->nr_targets--;
>>   			cxled->state = CXL_DECODER_STATE_AUTO;
>>   			cxled->pos = -1;
>> -			p->targets[i] = NULL;
>> +
>> +			/*
>> +			 * Swap the last valid target into the slot to
>> +			 * ensure no invalid target in p->nr_targets range.
>> +			 * The targets array will be re-sorted during the
>> +			 * last endpoint decoder attaching again.
>> +			 */
>> +			p->targets[i] = p->targets[p->nr_targets];
>> +			p->targets[p->nr_targets] = NULL;
> [Severity: High]
> This isn't a bug introduced by this patch, but does __cxl_decoder_detach()
> suffer from the exact same p->targets hole vulnerability for fully assembled
> auto-discovered regions?
>
> When a target is detached from an auto-discovered region:
>
> drivers/cxl/core/region.c:__cxl_decoder_detach() {
>      ...
> 	p->targets[cxled->pos] = NULL;
> 	p->nr_targets--;
>      ...
> }
>
> For auto-discovered regions, new targets are appended at
> p->targets[p->nr_targets] by cxl_region_attach_auto(). This leaves the NULL
> hole intact while overwriting a valid endpoint pointer that was previously
> positioned at the end of the array. Once p->nr_targets reaches
> p->interleave_ways again, cxl_region_sort_targets() iterates through the
> array and attempts to dereference the NULL pointer, which could cause a
> kernel panic.
>
>
> [Severity: High]
> This isn't a bug introduced by this patch, but are failed auto-discovery
> decoders left in the region targets array if failure occurs after endpoint
> port attachment, leading to a Use-After-Free?
>
> cxl_rr_ep_add() correctly upgrades an endpoint's staging state to
> CXL_DECODER_STATE_AUTO:
>
> drivers/cxl/core/region.c:cxl_rr_ep_add() {
>      ...
> 		if (cxld == &cxled->cxld &&
> 		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
> 			cxled->state = CXL_DECODER_STATE_AUTO;
>      ...
> }
>
> However, if subsequent topology validation fails, the error path calls
> cxl_port_detach_region(), which clears the region pointer but neglects to
> restore the endpoint's state back to CXL_DECODER_STATE_AUTO_STAGED. Because
> of this, when __cxl_decoder_detach() invokes cxl_cancel_auto_attach(), it
> silently skips removal due to the state mismatch:
>
> drivers/cxl/core/region.c:cxl_cancel_auto_attach() {
>      ...
> 	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
> 		return;
>      ...
> }
>
> The endpoint decoder remains in the p->targets array and is subsequently
> freed upon removal, resulting in a persistent Use-After-Free vulnerability
> when the sysfs region targets are read or the array is otherwise accessed.
>
>>   
>>   			return 1;
>>   		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cxl/region: Fix NULL pointer within p->targets[]
  2026-05-30  4:24 [PATCH] cxl/region: Fix NULL pointer within p->targets[] Li Ming
  2026-05-30  5:06 ` sashiko-bot
@ 2026-06-03 22:40 ` Alison Schofield
  2026-06-04 13:28   ` Li Ming
  1 sibling, 1 reply; 6+ messages in thread
From: Alison Schofield @ 2026-06-03 22:40 UTC (permalink / raw)
  To: Li Ming
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, linux-cxl, linux-kernel

On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
> cxl_region_remove_target() leaves a NULL pointer in the slot of the
> removable endpoint decoder in p->targets array. However, p->targets
> array replies on p->nr_targets to determine validity, which means when
> p->nr_targets == p->interleave_ways, driver assumes all elements from
> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
> violates this assumption and causes the driver to treat a NULL pointer
> as a valid endpoint decoder.
> 
> To fix this issue, when a endpoint decoder is removed by
> cxl_region_remove_target(), always swap the last valid endpoint decoder
> pointer into the slot of removal endpoint decoder to ensure all pointers
> before p->targets[p->nr_targets] are valid.
> 
> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
> Suggested-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Li Ming <ming.li@zohomail.com>
> ---
>  drivers/cxl/core/region.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e90c024c8036..54018db87a4c 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>  			p->nr_targets--;
>  			cxled->state = CXL_DECODER_STATE_AUTO;
>  			cxled->pos = -1;
> -			p->targets[i] = NULL;
> +
> +			/*
> +			 * Swap the last valid target into the slot to
> +			 * ensure no invalid target in p->nr_targets range.
> +			 * The targets array will be re-sorted during the
> +			 * last endpoint decoder attaching again.
> +			 */
> +			p->targets[i] = p->targets[p->nr_targets];
> +			p->targets[p->nr_targets] = NULL;
>  
>  			return 1;
>  		}

Hi Ming,

I'm replying to top post here, but I have read the Sashiko response
and your response to that.

I'm offering review on the target list holes, but deferring on the
issue with cxl_rr_free_decoder because I think it's a narrow window
and it would not belong in *this* patch. (and maybe I'm running
out of steam too ;))

For the target list holes. I think there may be a single change
that can fix both the site you've fixed in this patch, and the
decoder detach site that Sashiko calls out.

Rather than add compaction at each removal site, make the AUTO
'appender' insert the decoder in the first free slot instead of
blindly at p->targets[p->nr_targets].

    /* Use first free slot. Do not assume nr_targets is dense */
    for (pos = 0; pos < p->interleave_ways; pos++)
            if (!p->targets[pos])
                    break;
    ...
    p->targets[pos] = cxled;
    cxled->pos = pos;


My reasoning, that you'll need to prove -

- Removal only leaves a hole. The damage happens later in the appender
Fix the appender and the hole becomes harmless no matter who created it.

- It covers the cancel-auto site because a hole left by the staging
cancel is skipped by the next append, so the swap-compaction in this
patch is no longer needed.

- It covers the decoder detach site similarly (per Sashiko). The NULL
left by __cxl_decoder_detach() is filled on re-attach instead of being
appended past. Your reproduce seems like it would verify that.

- It needs no manual-vs-auto special case. An AUTO region has exactly
interleave_ways slots and interleave_ways members, so first-free-slot
keeps the array dense whenever it is full. The manual path is
untouched.

I'd probably rename to something like:
cxl/region: Fill first free targets[] slot during auto-discovery

BTW the fixes tag is not the OOB fix but is the original commit,
the same one you used in the OOB fix. 87805c32e6ad

-- Alison


> 
> ---
> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
> 
> Best regards,
> -- 
> Li Ming <ming.li@zohomail.com>
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cxl/region: Fix NULL pointer within p->targets[]
  2026-06-03 22:40 ` Alison Schofield
@ 2026-06-04 13:28   ` Li Ming
  2026-06-04 15:45     ` Dave Jiang
  0 siblings, 1 reply; 6+ messages in thread
From: Li Ming @ 2026-06-04 13:28 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, linux-cxl, linux-kernel


在 2026/6/4 06:40, Alison Schofield 写道:
> On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
>> cxl_region_remove_target() leaves a NULL pointer in the slot of the
>> removable endpoint decoder in p->targets array. However, p->targets
>> array replies on p->nr_targets to determine validity, which means when
>> p->nr_targets == p->interleave_ways, driver assumes all elements from
>> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
>> violates this assumption and causes the driver to treat a NULL pointer
>> as a valid endpoint decoder.
>>
>> To fix this issue, when a endpoint decoder is removed by
>> cxl_region_remove_target(), always swap the last valid endpoint decoder
>> pointer into the slot of removal endpoint decoder to ensure all pointers
>> before p->targets[p->nr_targets] are valid.
>>
>> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
>> Suggested-by: Alison Schofield <alison.schofield@intel.com>
>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>> ---
>>   drivers/cxl/core/region.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index e90c024c8036..54018db87a4c 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>>   			p->nr_targets--;
>>   			cxled->state = CXL_DECODER_STATE_AUTO;
>>   			cxled->pos = -1;
>> -			p->targets[i] = NULL;
>> +
>> +			/*
>> +			 * Swap the last valid target into the slot to
>> +			 * ensure no invalid target in p->nr_targets range.
>> +			 * The targets array will be re-sorted during the
>> +			 * last endpoint decoder attaching again.
>> +			 */
>> +			p->targets[i] = p->targets[p->nr_targets];
>> +			p->targets[p->nr_targets] = NULL;
>>   
>>   			return 1;
>>   		}
> Hi Ming,
>
> I'm replying to top post here, but I have read the Sashiko response
> and your response to that.
>
> I'm offering review on the target list holes, but deferring on the
> issue with cxl_rr_free_decoder because I think it's a narrow window
> and it would not belong in *this* patch. (and maybe I'm running
> out of steam too ;))
>
> For the target list holes. I think there may be a single change
> that can fix both the site you've fixed in this patch, and the
> decoder detach site that Sashiko calls out.
>
> Rather than add compaction at each removal site, make the AUTO
> 'appender' insert the decoder in the first free slot instead of
> blindly at p->targets[p->nr_targets].
>
>      /* Use first free slot. Do not assume nr_targets is dense */
>      for (pos = 0; pos < p->interleave_ways; pos++)
>              if (!p->targets[pos])
>                      break;
>      ...
>      p->targets[pos] = cxled;
>      cxled->pos = pos;
Yes, I think it can solve the problem, I will take a try.
>
>
> My reasoning, that you'll need to prove -
>
> - Removal only leaves a hole. The damage happens later in the appender
> Fix the appender and the hole becomes harmless no matter who created it.
>
> - It covers the cancel-auto site because a hole left by the staging
> cancel is skipped by the next append, so the swap-compaction in this
> patch is no longer needed.

Yes, if we use finding the first free slot for a decoder attachment, we 
will not need swap-compaction. But we should reuse p->interleave_ways 
instead of p->nr_targets for walking p->targets array in 
cxl_region_remove_target(), consecutive detach will have problem without 
that. Like this:

p->nr_targets = 4

[-1, 1, 2, 3]    # __cxl_decoder_detach() called for pos 1, remove it.


p->nr_targets = 3

[-1, NULL, 2, 3] # __cxl_decoder_detach() called for pos 3, but it have 
no chance to be accessed by cxl_region_remove_target().


Maybe Dave could drop the OOB fix, I will send out a patchset that 
includes both the "insert decoder in the first free slot" and the OOB fix.


>
> - It covers the decoder detach site similarly (per Sashiko). The NULL
> left by __cxl_decoder_detach() is filled on re-attach instead of being
> appended past. Your reproduce seems like it would verify that.
>
> - It needs no manual-vs-auto special case. An AUTO region has exactly
> interleave_ways slots and interleave_ways members, so first-free-slot
> keeps the array dense whenever it is full. The manual path is
> untouched.
>
> I'd probably rename to something like:
> cxl/region: Fill first free targets[] slot during auto-discovery
Will do
>
> BTW the fixes tag is not the OOB fix but is the original commit,
> the same one you used in the OOB fix. 87805c32e6ad

Sure, Will fix it, thanks.


Ming

>
> -- Alison
>
>
>> ---
>> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
>> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
>>
>> Best regards,
>> -- 
>> Li Ming <ming.li@zohomail.com>
>>
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cxl/region: Fix NULL pointer within p->targets[]
  2026-06-04 13:28   ` Li Ming
@ 2026-06-04 15:45     ` Dave Jiang
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Jiang @ 2026-06-04 15:45 UTC (permalink / raw)
  To: Li Ming, Alison Schofield
  Cc: Davidlohr Bueso, Jonathan Cameron, Vishal Verma, Ira Weiny,
	Dan Williams, linux-cxl, linux-kernel



On 6/4/26 6:28 AM, Li Ming wrote:
> 
> 在 2026/6/4 06:40, Alison Schofield 写道:
>> On Sat, May 30, 2026 at 12:24:40PM +0800, Li Ming wrote:
>>> cxl_region_remove_target() leaves a NULL pointer in the slot of the
>>> removable endpoint decoder in p->targets array. However, p->targets
>>> array replies on p->nr_targets to determine validity, which means when
>>> p->nr_targets == p->interleave_ways, driver assumes all elements from
>>> index 0 to (p->nr_targets - 1) are valid. The stale NULL pointer
>>> violates this assumption and causes the driver to treat a NULL pointer
>>> as a valid endpoint decoder.
>>>
>>> To fix this issue, when a endpoint decoder is removed by
>>> cxl_region_remove_target(), always swap the last valid endpoint decoder
>>> pointer into the slot of removal endpoint decoder to ensure all pointers
>>> before p->targets[p->nr_targets] are valid.
>>>
>>> Fixes: 809ccef5385f ("cxl/region: Fix out-of-bounds access in cxl_cancel_auto_attach()")
>>> Suggested-by: Alison Schofield <alison.schofield@intel.com>
>>> Signed-off-by: Li Ming <ming.li@zohomail.com>
>>> ---
>>>   drivers/cxl/core/region.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index e90c024c8036..54018db87a4c 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2220,7 +2220,15 @@ static int cxl_region_remove_target(struct device *dev, void *data)
>>>               p->nr_targets--;
>>>               cxled->state = CXL_DECODER_STATE_AUTO;
>>>               cxled->pos = -1;
>>> -            p->targets[i] = NULL;
>>> +
>>> +            /*
>>> +             * Swap the last valid target into the slot to
>>> +             * ensure no invalid target in p->nr_targets range.
>>> +             * The targets array will be re-sorted during the
>>> +             * last endpoint decoder attaching again.
>>> +             */
>>> +            p->targets[i] = p->targets[p->nr_targets];
>>> +            p->targets[p->nr_targets] = NULL;
>>>                 return 1;
>>>           }
>> Hi Ming,
>>
>> I'm replying to top post here, but I have read the Sashiko response
>> and your response to that.
>>
>> I'm offering review on the target list holes, but deferring on the
>> issue with cxl_rr_free_decoder because I think it's a narrow window
>> and it would not belong in *this* patch. (and maybe I'm running
>> out of steam too ;))
>>
>> For the target list holes. I think there may be a single change
>> that can fix both the site you've fixed in this patch, and the
>> decoder detach site that Sashiko calls out.
>>
>> Rather than add compaction at each removal site, make the AUTO
>> 'appender' insert the decoder in the first free slot instead of
>> blindly at p->targets[p->nr_targets].
>>
>>      /* Use first free slot. Do not assume nr_targets is dense */
>>      for (pos = 0; pos < p->interleave_ways; pos++)
>>              if (!p->targets[pos])
>>                      break;
>>      ...
>>      p->targets[pos] = cxled;
>>      cxled->pos = pos;
> Yes, I think it can solve the problem, I will take a try.
>>
>>
>> My reasoning, that you'll need to prove -
>>
>> - Removal only leaves a hole. The damage happens later in the appender
>> Fix the appender and the hole becomes harmless no matter who created it.
>>
>> - It covers the cancel-auto site because a hole left by the staging
>> cancel is skipped by the next append, so the swap-compaction in this
>> patch is no longer needed.
> 
> Yes, if we use finding the first free slot for a decoder attachment, we will not need swap-compaction. But we should reuse p->interleave_ways instead of p->nr_targets for walking p->targets array in cxl_region_remove_target(), consecutive detach will have problem without that. Like this:
> 
> p->nr_targets = 4
> 
> [-1, 1, 2, 3]    # __cxl_decoder_detach() called for pos 1, remove it.
> 
> 
> p->nr_targets = 3
> 
> [-1, NULL, 2, 3] # __cxl_decoder_detach() called for pos 3, but it have no chance to be accessed by cxl_region_remove_target().
> 
> 
> Maybe Dave could drop the OOB fix, I will send out a patchset that includes both the "insert decoder in the first free slot" and the OOB fix.
> 

OOB fix dropped from cxl/next

DJ

> 
>>
>> - It covers the decoder detach site similarly (per Sashiko). The NULL
>> left by __cxl_decoder_detach() is filled on re-attach instead of being
>> appended past. Your reproduce seems like it would verify that.
>>
>> - It needs no manual-vs-auto special case. An AUTO region has exactly
>> interleave_ways slots and interleave_ways members, so first-free-slot
>> keeps the array dense whenever it is full. The manual path is
>> untouched.
>>
>> I'd probably rename to something like:
>> cxl/region: Fill first free targets[] slot during auto-discovery
> Will do
>>
>> BTW the fixes tag is not the OOB fix but is the original commit,
>> the same one you used in the OOB fix. 87805c32e6ad
> 
> Sure, Will fix it, thanks.
> 
> 
> Ming
> 
>>
>> -- Alison
>>
>>
>>> ---
>>> base-commit: 809ccef5385fa1779c7db3de43272f3fc6a87a45
>>> change-id: 20260530-fix_null_in_targets_array-124303a8ba0f
>>>
>>> Best regards,
>>> -- 
>>> Li Ming <ming.li@zohomail.com>
>>>
>>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-04 15:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-30  4:24 [PATCH] cxl/region: Fix NULL pointer within p->targets[] Li Ming
2026-05-30  5:06 ` sashiko-bot
2026-05-30 14:54   ` Li Ming
2026-06-03 22:40 ` Alison Schofield
2026-06-04 13:28   ` Li Ming
2026-06-04 15:45     ` Dave Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.