* [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths
@ 2026-02-24 2:36 Davidlohr Bueso
2026-02-24 3:00 ` dan.j.williams
0 siblings, 1 reply; 3+ messages in thread
From: Davidlohr Bueso @ 2026-02-24 2:36 UTC (permalink / raw)
To: dave.jiang
Cc: dan.j.williams, jonathan.cameron, alison.schofield,
vishal.l.verma, ira.weiny, dave, linux-cxl
Fuzzying CXL triggered:
BUG: KASAN: null-ptr-deref in cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
Read of size 4 at addr 0000000000000642 by task syz.0.97/2282
CPU: 2 UID: 0 PID: 2282 Comm: syz.0.97 Not tainted 7.0.0-rc1-gebd11be59f74-dirty #494 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
kasan_report+0xe0/0x110 mm/kasan/report.c:595
cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
cxl_mem_sanitize+0x141/0x170 drivers/cxl/core/mbox.c:1304
security_sanitize_store+0xb0/0x120 drivers/cxl/core/memdev.c:173
dev_attr_store+0x46/0x70 drivers/base/core.c:2437
sysfs_kf_write+0x95/0xb0 fs/sysfs/file.c:142
kernfs_fop_write_iter+0x276/0x330 fs/kernfs/file.c:352
new_sync_write fs/read_write.c:595 [inline]
vfs_write+0x5df/0xaa0 fs/read_write.c:688
ksys_write+0x103/0x1f0 fs/read_write.c:740
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x111/0x680 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f60a584ba79
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f60a42a7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f60a5ab5fa0 RCX: 00007f60a584ba79
RDX: 0000000000000002 RSI: 00002000000001c0 RDI: 0000000000000003
RBP: 00007f60a58a49df R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f60a5ab6038 R14: 00007f60a5ab5fa0 R15: 00007ffe58fad8b8
</TASK>
cxl_memdev_alloc() initializes cxlmd->endpoint to ERR_PTR(-ENXIO) as a
sentinel until the endpoint driver has probed. During that window the
memdev sysfs attributes are already visible, as soon as device_add()
completes.
Use IS_ERR_OR_NULL() instead of just NULL checks in all sysfs paths that
use cxlmd->endpoint to avoid the user prematurely accessing it.
Fixes: 29317f8dc6ed ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/edac.c | 2 +-
drivers/cxl/core/mbox.c | 2 +-
drivers/cxl/core/memdev.c | 2 +-
drivers/cxl/core/region.c | 3 ++-
4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index b321971fef58..65b37460d1a8 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -1149,7 +1149,7 @@ static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd)
{
struct cxl_port *port = cxlmd->endpoint;
- if (port && cxl_num_decoders_committed(port))
+ if (!IS_ERR_OR_NULL(port) && cxl_num_decoders_committed(port))
return true;
return false;
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index e7a6452bf544..f2308db13c5d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1301,7 +1301,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
* Require an endpoint to be safe otherwise the driver can not
* be sure that the device is unmapped.
*/
- if (endpoint && cxl_num_decoders_committed(endpoint) == 0)
+ if (!IS_ERR_OR_NULL(endpoint) && cxl_num_decoders_committed(endpoint) == 0)
return __cxl_mem_sanitize(mds, cmd);
return -EBUSY;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 273c22118d3d..f0a97aaaaf35 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -236,7 +236,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
int rc;
port = cxlmd->endpoint;
- if (!port || !is_cxl_endpoint(port))
+ if (IS_ERR_OR_NULL(port) || !is_cxl_endpoint(port))
return -EINVAL;
ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 42874948b589..5ff5a086ca0d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2956,7 +2956,8 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
.dpa = dpa,
};
port = cxlmd->endpoint;
- if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
+ if (!IS_ERR_OR_NULL(port) && is_cxl_endpoint(port) &&
+ cxl_num_decoders_committed(port))
device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
return ctx.cxlr;
--
2.39.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths
2026-02-24 2:36 [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths Davidlohr Bueso
@ 2026-02-24 3:00 ` dan.j.williams
2026-02-24 18:05 ` Davidlohr Bueso
0 siblings, 1 reply; 3+ messages in thread
From: dan.j.williams @ 2026-02-24 3:00 UTC (permalink / raw)
To: Davidlohr Bueso, dave.jiang
Cc: dan.j.williams, jonathan.cameron, alison.schofield,
vishal.l.verma, ira.weiny, dave, linux-cxl
Davidlohr Bueso wrote:
> Fuzzying CXL triggered:
>
> BUG: KASAN: null-ptr-deref in cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
> Read of size 4 at addr 0000000000000642 by task syz.0.97/2282
>
> CPU: 2 UID: 0 PID: 2282 Comm: syz.0.97 Not tainted 7.0.0-rc1-gebd11be59f74-dirty #494 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
> kasan_report+0xe0/0x110 mm/kasan/report.c:595
> cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
> cxl_mem_sanitize+0x141/0x170 drivers/cxl/core/mbox.c:1304
> security_sanitize_store+0xb0/0x120 drivers/cxl/core/memdev.c:173
> dev_attr_store+0x46/0x70 drivers/base/core.c:2437
> sysfs_kf_write+0x95/0xb0 fs/sysfs/file.c:142
> kernfs_fop_write_iter+0x276/0x330 fs/kernfs/file.c:352
> new_sync_write fs/read_write.c:595 [inline]
> vfs_write+0x5df/0xaa0 fs/read_write.c:688
> ksys_write+0x103/0x1f0 fs/read_write.c:740
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x111/0x680 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f60a584ba79
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f60a42a7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00007f60a5ab5fa0 RCX: 00007f60a584ba79
> RDX: 0000000000000002 RSI: 00002000000001c0 RDI: 0000000000000003
> RBP: 00007f60a58a49df R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f60a5ab6038 R14: 00007f60a5ab5fa0 R15: 00007ffe58fad8b8
> </TASK>
>
> cxl_memdev_alloc() initializes cxlmd->endpoint to ERR_PTR(-ENXIO) as a
> sentinel until the endpoint driver has probed. During that window the
> memdev sysfs attributes are already visible, as soon as device_add()
> completes.
>
> Use IS_ERR_OR_NULL() instead of just NULL checks in all sysfs paths that
> use cxlmd->endpoint to avoid the user prematurely accessing it.
Ugh, no, like I said here [1] I think endpoint non-NULL @endpoint
confusion is an indicator of other issues in the code.
[1]: http://lore.kernel.org/69813ac070f79_55fa1005c@dwillia2-mobl4.notmuch
>
> Fixes: 29317f8dc6ed ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation")
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/edac.c | 2 +-
> drivers/cxl/core/mbox.c | 2 +-
> drivers/cxl/core/memdev.c | 2 +-
> drivers/cxl/core/region.c | 3 ++-
> 4 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> index b321971fef58..65b37460d1a8 100644
> --- a/drivers/cxl/core/edac.c
> +++ b/drivers/cxl/core/edac.c
> @@ -1149,7 +1149,7 @@ static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd)
> {
> struct cxl_port *port = cxlmd->endpoint;
>
> - if (port && cxl_num_decoders_committed(port))
> + if (!IS_ERR_OR_NULL(port) && cxl_num_decoders_committed(port))
> return true;
This NULL check was never valid. How can a edac code ever encounter the
port detached case?
> return false;
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index e7a6452bf544..f2308db13c5d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1301,7 +1301,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
> * Require an endpoint to be safe otherwise the driver can not
> * be sure that the device is unmapped.
> */
> - if (endpoint && cxl_num_decoders_committed(endpoint) == 0)
> + if (!IS_ERR_OR_NULL(endpoint) && cxl_num_decoders_committed(endpoint) == 0)
> return __cxl_mem_sanitize(mds, cmd);
This looks to be abusing ->endpoint when it should be checking
cxlmd->dev.driver. The rule is @endpoint is always valid if
cxlmd->dev.driver is non-NULL.
> return -EBUSY;
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 273c22118d3d..f0a97aaaaf35 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -236,7 +236,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
> int rc;
>
> port = cxlmd->endpoint;
> - if (!port || !is_cxl_endpoint(port))
> + if (IS_ERR_OR_NULL(port) || !is_cxl_endpoint(port))
> return -EINVAL;
If you can trigger this, which I do not think you can after Li Ming's
recent fix, then the proper fix for this is in [1].
> ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 42874948b589..5ff5a086ca0d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2956,7 +2956,8 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
> .dpa = dpa,
> };
> port = cxlmd->endpoint;
> - if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
> + if (!IS_ERR_OR_NULL(port) && is_cxl_endpoint(port) &&
> + cxl_num_decoders_committed(port))
> device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
I believe this was the exact issue fixed by Li Ming's fix.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths
2026-02-24 3:00 ` dan.j.williams
@ 2026-02-24 18:05 ` Davidlohr Bueso
0 siblings, 0 replies; 3+ messages in thread
From: Davidlohr Bueso @ 2026-02-24 18:05 UTC (permalink / raw)
To: dan.j.williams
Cc: dave.jiang, jonathan.cameron, alison.schofield, vishal.l.verma,
ira.weiny, linux-cxl, ming.li
+Cc Li.
.
On Mon, 23 Feb 2026, dan.j.williams@intel.com wrote:
>Davidlohr Bueso wrote:
>> Fuzzying CXL triggered:
>>
>> BUG: KASAN: null-ptr-deref in cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
>> Read of size 4 at addr 0000000000000642 by task syz.0.97/2282
>>
>> CPU: 2 UID: 0 PID: 2282 Comm: syz.0.97 Not tainted 7.0.0-rc1-gebd11be59f74-dirty #494 PREEMPT(full)
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
>> Call Trace:
>> <TASK>
>> __dump_stack lib/dump_stack.c:94 [inline]
>> dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
>> kasan_report+0xe0/0x110 mm/kasan/report.c:595
>> cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
>> cxl_mem_sanitize+0x141/0x170 drivers/cxl/core/mbox.c:1304
>> security_sanitize_store+0xb0/0x120 drivers/cxl/core/memdev.c:173
>> dev_attr_store+0x46/0x70 drivers/base/core.c:2437
>> sysfs_kf_write+0x95/0xb0 fs/sysfs/file.c:142
>> kernfs_fop_write_iter+0x276/0x330 fs/kernfs/file.c:352
>> new_sync_write fs/read_write.c:595 [inline]
>> vfs_write+0x5df/0xaa0 fs/read_write.c:688
>> ksys_write+0x103/0x1f0 fs/read_write.c:740
>> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>> do_syscall_64+0x111/0x680 arch/x86/entry/syscall_64.c:94
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7f60a584ba79
>> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007f60a42a7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 00007f60a5ab5fa0 RCX: 00007f60a584ba79
>> RDX: 0000000000000002 RSI: 00002000000001c0 RDI: 0000000000000003
>> RBP: 00007f60a58a49df R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> R13: 00007f60a5ab6038 R14: 00007f60a5ab5fa0 R15: 00007ffe58fad8b8
>> </TASK>
>>
>> cxl_memdev_alloc() initializes cxlmd->endpoint to ERR_PTR(-ENXIO) as a
>> sentinel until the endpoint driver has probed. During that window the
>> memdev sysfs attributes are already visible, as soon as device_add()
>> completes.
>>
>> Use IS_ERR_OR_NULL() instead of just NULL checks in all sysfs paths that
>> use cxlmd->endpoint to avoid the user prematurely accessing it.
>
>Ugh, no, like I said here [1] I think endpoint non-NULL @endpoint
>confusion is an indicator of other issues in the code.
>
>[1]: http://lore.kernel.org/69813ac070f79_55fa1005c@dwillia2-mobl4.notmuch
Ah, I missed this.
>>
>> Fixes: 29317f8dc6ed ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation")
>> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
>> ---
>> drivers/cxl/core/edac.c | 2 +-
>> drivers/cxl/core/mbox.c | 2 +-
>> drivers/cxl/core/memdev.c | 2 +-
>> drivers/cxl/core/region.c | 3 ++-
>> 4 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>> index b321971fef58..65b37460d1a8 100644
>> --- a/drivers/cxl/core/edac.c
>> +++ b/drivers/cxl/core/edac.c
>> @@ -1149,7 +1149,7 @@ static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd)
>> {
>> struct cxl_port *port = cxlmd->endpoint;
>>
>> - if (port && cxl_num_decoders_committed(port))
>> + if (!IS_ERR_OR_NULL(port) && cxl_num_decoders_committed(port))
>> return true;
>
>This NULL check was never valid. How can a edac code ever encounter the
>port detached case?
>
>> return false;
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index e7a6452bf544..f2308db13c5d 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1301,7 +1301,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
>> * Require an endpoint to be safe otherwise the driver can not
>> * be sure that the device is unmapped.
>> */
>> - if (endpoint && cxl_num_decoders_committed(endpoint) == 0)
>> + if (!IS_ERR_OR_NULL(endpoint) && cxl_num_decoders_committed(endpoint) == 0)
>> return __cxl_mem_sanitize(mds, cmd);
>
>This looks to be abusing ->endpoint when it should be checking
>cxlmd->dev.driver. The rule is @endpoint is always valid if
>cxlmd->dev.driver is non-NULL.
Ok, I will send a patch for checking ->dev.driver instead.
>
>> return -EBUSY;
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index 273c22118d3d..f0a97aaaaf35 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -236,7 +236,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>> int rc;
>>
>> port = cxlmd->endpoint;
>> - if (!port || !is_cxl_endpoint(port))
>> + if (IS_ERR_OR_NULL(port) || !is_cxl_endpoint(port))
>> return -EINVAL;
>
>If you can trigger this, which I do not think you can after Li Ming's
>recent fix, then the proper fix for this is in [1].
The only path I've been able to reproduce is Sanitize. This was testing
the 'fixes' branch with all the patches Dave picked up yesterday including
Li's. So it's possible the bug could have triggered without it, but cannot
say for sure.
>
>> ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 42874948b589..5ff5a086ca0d 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2956,7 +2956,8 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
>> .dpa = dpa,
>> };
>> port = cxlmd->endpoint;
>> - if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
>> + if (!IS_ERR_OR_NULL(port) && is_cxl_endpoint(port) &&
>> + cxl_num_decoders_committed(port))
>> device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
>
>I believe this was the exact issue fixed by Li Ming's fix.
Looking at the splat, yeah I agree. And of course that fix holding the host
lock is much nicer than papering over it with nil sentinel, or this patch.
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-24 18:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 2:36 [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths Davidlohr Bueso
2026-02-24 3:00 ` dan.j.williams
2026-02-24 18:05 ` Davidlohr Bueso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox