Re: [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths

Linux CXL
 help / color / mirror / Atom feed

From: <dan.j.williams@intel.com>
To: Davidlohr Bueso <dave@stgolabs.net>, <dave.jiang@intel.com>
Cc: <dan.j.williams@intel.com>, <jonathan.cameron@huawei.com>,
	<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <dave@stgolabs.net>,
	<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths
Date: Mon, 23 Feb 2026 19:00:44 -0800	[thread overview]
Message-ID: <699d145c1da21_1cc5100e4@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <20260224023609.1059138-1-dave@stgolabs.net>

Davidlohr Bueso wrote:
> Fuzzying CXL triggered:
> 
> BUG: KASAN: null-ptr-deref in cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
> Read of size 4 at addr 0000000000000642 by task syz.0.97/2282
> 
> CPU: 2 UID: 0 PID: 2282 Comm: syz.0.97 Not tainted 7.0.0-rc1-gebd11be59f74-dirty #494 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
>  kasan_report+0xe0/0x110 mm/kasan/report.c:595
>  cxl_num_decoders_committed+0x3e/0x80 drivers/cxl/core/port.c:49
>  cxl_mem_sanitize+0x141/0x170 drivers/cxl/core/mbox.c:1304
>  security_sanitize_store+0xb0/0x120 drivers/cxl/core/memdev.c:173
>  dev_attr_store+0x46/0x70 drivers/base/core.c:2437
>  sysfs_kf_write+0x95/0xb0 fs/sysfs/file.c:142
>  kernfs_fop_write_iter+0x276/0x330 fs/kernfs/file.c:352
>  new_sync_write fs/read_write.c:595 [inline]
>  vfs_write+0x5df/0xaa0 fs/read_write.c:688
>  ksys_write+0x103/0x1f0 fs/read_write.c:740
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0x111/0x680 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f60a584ba79
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f60a42a7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00007f60a5ab5fa0 RCX: 00007f60a584ba79
> RDX: 0000000000000002 RSI: 00002000000001c0 RDI: 0000000000000003
> RBP: 00007f60a58a49df R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f60a5ab6038 R14: 00007f60a5ab5fa0 R15: 00007ffe58fad8b8
>  </TASK>
> 
> cxl_memdev_alloc() initializes cxlmd->endpoint to ERR_PTR(-ENXIO) as a
> sentinel until the endpoint driver has probed. During that window the
> memdev sysfs attributes are already visible, as soon as device_add()
> completes.
> 
> Use IS_ERR_OR_NULL() instead of just NULL checks in all sysfs paths that
> use cxlmd->endpoint to avoid the user prematurely accessing it.

Ugh, no, like I said here [1] I think endpoint non-NULL @endpoint
confusion is an indicator of other issues in the code.

[1]: http://lore.kernel.org/69813ac070f79_55fa1005c@dwillia2-mobl4.notmuch

> 
> Fixes: 29317f8dc6ed ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation")
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  drivers/cxl/core/edac.c   | 2 +-
>  drivers/cxl/core/mbox.c   | 2 +-
>  drivers/cxl/core/memdev.c | 2 +-
>  drivers/cxl/core/region.c | 3 ++-
>  4 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> index b321971fef58..65b37460d1a8 100644
> --- a/drivers/cxl/core/edac.c
> +++ b/drivers/cxl/core/edac.c
> @@ -1149,7 +1149,7 @@ static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd)
>  {
>  	struct cxl_port *port = cxlmd->endpoint;
>  
> -	if (port && cxl_num_decoders_committed(port))
> +	if (!IS_ERR_OR_NULL(port) && cxl_num_decoders_committed(port))
>  		return true;

This NULL check was never valid. How can a edac code ever encounter the
port detached case?

>  	return false;
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index e7a6452bf544..f2308db13c5d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1301,7 +1301,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
>  	 * Require an endpoint to be safe otherwise the driver can not
>  	 * be sure that the device is unmapped.
>  	 */
> -	if (endpoint && cxl_num_decoders_committed(endpoint) == 0)
> +	if (!IS_ERR_OR_NULL(endpoint) && cxl_num_decoders_committed(endpoint) == 0)
>  		return __cxl_mem_sanitize(mds, cmd);

This looks to be abusing ->endpoint when it should be checking
cxlmd->dev.driver. The rule is @endpoint is always valid if
cxlmd->dev.driver is non-NULL.

>  	return -EBUSY;
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 273c22118d3d..f0a97aaaaf35 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -236,7 +236,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  	int rc;
>  
>  	port = cxlmd->endpoint;
> -	if (!port || !is_cxl_endpoint(port))
> +	if (IS_ERR_OR_NULL(port) || !is_cxl_endpoint(port))
>  		return -EINVAL;

If you can trigger this, which I do not think you can after Li Ming's
recent fix, then the proper fix for this is in [1].
 
>  	ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 42874948b589..5ff5a086ca0d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2956,7 +2956,8 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
>  		.dpa = dpa,
>  	};
>  	port = cxlmd->endpoint;
> -	if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
> +	if (!IS_ERR_OR_NULL(port) && is_cxl_endpoint(port) &&
> +	    cxl_num_decoders_committed(port))
>  		device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);

I believe this was the exact issue fixed by Li Ming's fix.

next prev parent reply	other threads:[~2026-02-24  3:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  2:36 [PATCH] cxl: Handle ERR_PTR endpoint in sysfs-accessible paths Davidlohr Bueso
2026-02-24  3:00 ` dan.j.williams [this message]
2026-02-24 18:05   ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=699d145c1da21_1cc5100e4@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox