[PATCH] cxl/region: Fix a race bug in delete_region

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] cxl/region: Fix a race bug in delete_region_store
@ 2026-03-08 18:59 Sungwoo Kim
  2026-03-09 12:00 ` Jonathan Cameron
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Sungwoo Kim @ 2026-03-08 18:59 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Ben Widawsky
  Cc: daveti, Sungwoo Kim, Jonathan Cameron, linux-cxl, linux-kernel

A race exists when two concurrent sysfs writes to delete_region specify
the same region name. Both calls succeed in cxl_find_region_by_name()
(which only does device_find_child_by_name and takes a reference), and
both then proceed to call devm_release_action(). The first call atomically
removes and releases the devres entry successfully. The second call finds
no matching entry, causing devres_release() to return -ENOENT, which trips
the WARN_ON.

Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
followed by a manual call to unregister_region(). devm_remove_action_nowarn()
removes the devres tracking entry and returns an error code.

------------[ cut here ]------------
WARNING: drivers/base/devres.c:824 at devm_release_action drivers/base/devres.c:824 [inline], CPU#0: syz.1.12224/47589
WARNING: drivers/base/devres.c:824 at devm_release_action+0x2b2/0x360 drivers/base/devres.c:817, CPU#0: syz.1.12224/47589
Modules linked in:
CPU: 0 UID: 0 PID: 47589 Comm: syz.1.12224 Not tainted 6.19.0-g15a37b05f387 #10 PREEMPT(lazy) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
RIP: 0010:devm_release_action drivers/base/devres.c:824 [inline]
RIP: 0010:devm_release_action+0x2b2/0x360 drivers/base/devres.c:817
Code: 41 5c 41 5d 41 5e 41 5f e9 86 1a b3 fd e8 86 71 33 fe e8 81 71 33 fe 48 8b 74 24 08 4c 89 f7 e8 04 34 8a 01 e8 6f 71 33 fe 90 <0f> 0b 90 eb c6 e8 64 71 33 fe 90 0f 0b e8 2c f1 71 fe e9 f4 fd ff
RSP: 0018:ffff88813e5f7bf8 EFLAGS: 00010282
RAX: 0000000000000125 RBX: dffffc0000000000 RCX: ffffffff835d26a1
RDX: 0000000000080000 RSI: ffffc9000685b000 RDI: ffffffff8e875adf
RBP: ffff888101059010 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888101059428
R13: ffff888100e05c30 R14: ffff8881010593e8 R15: ffff888101059428
FS:  00007fae6afb46c0(0000) GS:ffff8882a526d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fae6a12b8c0 CR3: 0000000127f1b005 CR4: 0000000000770ef0
PKRU: 80000000
Call Trace:
 <TASK>
 delete_region_store+0x106/0x1d0 drivers/cxl/core/region.c:2753
 dev_attr_store+0x58/0x80 drivers/base/core.c:2437
 sysfs_kf_write+0xf2/0x150 fs/sysfs/file.c:142
 kernfs_fop_write_iter+0x3d6/0x5d0 fs/kernfs/file.c:352
 new_sync_write fs/read_write.c:595 [inline]
 vfs_write+0x68e/0x1050 fs/read_write.c:688
 ksys_write+0x12a/0x250 fs/read_write.c:740
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfc/0x670 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fae6a19c669
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fae6afb4028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fae6a415fa0 RCX: 00007fae6a19c669
RDX: 0000000000000008 RSI: 0000200000000580 RDI: 0000000000000005
RBP: 00007fae6a232c71 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fae6a416038 R14: 00007fae6a415fa0 R15: 00007ffc10522c58
 </TASK>
irq event stamp: 885
hardirqs last  enabled at (893): [<ffffffff8b104c4b>] __up_console_sem+0x7b/0x90 kernel/printk/printk.c:347
hardirqs last disabled at (902): [<ffffffff8b104c30>] __up_console_sem+0x60/0x90 kernel/printk/printk.c:345
softirqs last  enabled at (822): [<ffffffff8af73b93>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last  enabled at (822): [<ffffffff8af73b93>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last  enabled at (822): [<ffffffff8af73b93>] __irq_exit_rcu+0xb3/0xe0 kernel/softirq.c:723
softirqs last disabled at (811): [<ffffffff8af73b93>] __do_softirq kernel/softirq.c:656 [inline]
softirqs last disabled at (811): [<ffffffff8af73b93>] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last disabled at (811): [<ffffffff8af73b93>] __irq_exit_rcu+0xb3/0xe0 kernel/softirq.c:723
---[ end trace 0000000000000000 ]---

Fixes: 779dd20cfb56 ("cxl/region: Add region creation support")
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
---
 drivers/cxl/core/region.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 08fa3deef70ab..7ade9aa2aeecc 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 	struct cxl_port *port = to_cxl_port(dev->parent);
 	struct cxl_region *cxlr;
+	int err;
 
 	cxlr = cxl_find_region_by_name(cxlrd, buf);
 	if (IS_ERR(cxlr))
 		return PTR_ERR(cxlr);
 
-	devm_release_action(port->uport_dev, unregister_region, cxlr);
+	err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
+				       cxlr);
+	if (err) {
+		put_device(&cxlr->dev);
+		return err;
+	}
+	unregister_region(cxlr);
 	put_device(&cxlr->dev);
 
 	return len;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-08 18:59 [PATCH] cxl/region: Fix a race bug in delete_region_store Sungwoo Kim
@ 2026-03-09 12:00 ` Jonathan Cameron
  2026-03-09 17:56   ` Sungwoo Kim
  2026-03-09 20:32 ` Ira Weiny
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2026-03-09 12:00 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Ben Widawsky, daveti, linux-cxl,
	linux-kernel

On Sun, 8 Mar 2026 14:59:58 -0400
Sungwoo Kim <iam@sung-woo.kim> wrote:

> A race exists when two concurrent sysfs writes to delete_region specify
> the same region name. Both calls succeed in cxl_find_region_by_name()
> (which only does device_find_child_by_name and takes a reference), and
> both then proceed to call devm_release_action(). The first call atomically
> removes and releases the devres entry successfully. The second call finds
> no matching entry, causing devres_release() to return -ENOENT, which trips
> the WARN_ON.
> 
> Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> removes the devres tracking entry and returns an error code.

Naive question (or just me being lazy).  Why can't we take the
write lock on cxl_rwsem.region?

Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-09 12:00 ` Jonathan Cameron
@ 2026-03-09 17:56   ` Sungwoo Kim
  2026-03-09 18:10     ` Jonathan Cameron
  0 siblings, 1 reply; 8+ messages in thread
From: Sungwoo Kim @ 2026-03-09 17:56 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Ben Widawsky, daveti, linux-cxl,
	linux-kernel

On Mon, Mar 9, 2026 at 8:00 AM Jonathan Cameron
<jonathan.cameron@huawei.com> wrote:
>
> On Sun, 8 Mar 2026 14:59:58 -0400
> Sungwoo Kim <iam@sung-woo.kim> wrote:
>
> > A race exists when two concurrent sysfs writes to delete_region specify
> > the same region name. Both calls succeed in cxl_find_region_by_name()
> > (which only does device_find_child_by_name and takes a reference), and
> > both then proceed to call devm_release_action(). The first call atomically
> > removes and releases the devres entry successfully. The second call finds
> > no matching entry, causing devres_release() to return -ENOENT, which trips
> > the WARN_ON.
> >
> > Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> > followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> > removes the devres tracking entry and returns an error code.
>
> Naive question (or just me being lazy).  Why can't we take the
> write lock on cxl_rwsem.region?

Thanks for your review. I've just tested your suggestion, but it
caused an ABBA deadlock:

task 1:
create_pmem_region_store
  __device_attach()         ...dev_lock()
    cxl_region_can_probe()  ...lock(cxl_rwsem.region)

task 2:
delete_region_store()       ...lock(cxl_rwsem.region)
  unregister_region()
    device_del()            ...dev_lock()

One way to avoid a deadlock might be to not add an additional lock.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-09 17:56   ` Sungwoo Kim
@ 2026-03-09 18:10     ` Jonathan Cameron
  0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2026-03-09 18:10 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Ben Widawsky, daveti, linux-cxl,
	linux-kernel

On Mon, 9 Mar 2026 13:56:33 -0400
Sungwoo Kim <iam@sung-woo.kim> wrote:

> On Mon, Mar 9, 2026 at 8:00 AM Jonathan Cameron
> <jonathan.cameron@huawei.com> wrote:
> >
> > On Sun, 8 Mar 2026 14:59:58 -0400
> > Sungwoo Kim <iam@sung-woo.kim> wrote:
> >  
> > > A race exists when two concurrent sysfs writes to delete_region specify
> > > the same region name. Both calls succeed in cxl_find_region_by_name()
> > > (which only does device_find_child_by_name and takes a reference), and
> > > both then proceed to call devm_release_action(). The first call atomically
> > > removes and releases the devres entry successfully. The second call finds
> > > no matching entry, causing devres_release() to return -ENOENT, which trips
> > > the WARN_ON.
> > >
> > > Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> > > followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> > > removes the devres tracking entry and returns an error code.  
> >
> > Naive question (or just me being lazy).  Why can't we take the
> > write lock on cxl_rwsem.region?  
> 
> Thanks for your review. I've just tested your suggestion, but it
> caused an ABBA deadlock:
> 
> task 1:
> create_pmem_region_store
>   __device_attach()         ...dev_lock()
>     cxl_region_can_probe()  ...lock(cxl_rwsem.region)
> 
> task 2:
> delete_region_store()       ...lock(cxl_rwsem.region)
>   unregister_region()
>     device_del()            ...dev_lock()
> 

Thanks for chasing that down. (I was indeed just being lazy!)

Let's wait a few days to get inputs from others on this.

One horrible option would just be to have a single purpose lock to
serialize handling writes to the sysfs file. I don't much like
that solution however!

Thanks,

Jonathan

> One way to avoid a deadlock might be to not add an additional lock.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-08 18:59 [PATCH] cxl/region: Fix a race bug in delete_region_store Sungwoo Kim
  2026-03-09 12:00 ` Jonathan Cameron
@ 2026-03-09 20:32 ` Ira Weiny
  2026-03-10 18:36 ` Davidlohr Bueso
  2026-03-10 22:53 ` Dan Williams
  3 siblings, 0 replies; 8+ messages in thread
From: Ira Weiny @ 2026-03-09 20:32 UTC (permalink / raw)
  To: Sungwoo Kim, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Ben Widawsky
  Cc: daveti, Sungwoo Kim, Jonathan Cameron, linux-cxl, linux-kernel

Sungwoo Kim wrote:
> A race exists when two concurrent sysfs writes to delete_region specify
> the same region name. Both calls succeed in cxl_find_region_by_name()
> (which only does device_find_child_by_name and takes a reference), and
> both then proceed to call devm_release_action(). The first call atomically
> removes and releases the devres entry successfully. The second call finds
> no matching entry, causing devres_release() to return -ENOENT, which trips
> the WARN_ON.
> 
> Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> removes the devres tracking entry and returns an error code.
> 

[snip]

> 
> Fixes: 779dd20cfb56 ("cxl/region: Add region creation support")
> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-08 18:59 [PATCH] cxl/region: Fix a race bug in delete_region_store Sungwoo Kim
  2026-03-09 12:00 ` Jonathan Cameron
  2026-03-09 20:32 ` Ira Weiny
@ 2026-03-10 18:36 ` Davidlohr Bueso
  2026-03-10 22:53 ` Dan Williams
  3 siblings, 0 replies; 8+ messages in thread
From: Davidlohr Bueso @ 2026-03-10 18:36 UTC (permalink / raw)
  To: Sungwoo Kim
  Cc: Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Ben Widawsky, daveti, linux-cxl,
	linux-kernel

On Sun, 08 Mar 2026, Sungwoo Kim wrote:

>A race exists when two concurrent sysfs writes to delete_region specify
>the same region name. Both calls succeed in cxl_find_region_by_name()
>(which only does device_find_child_by_name and takes a reference), and
>both then proceed to call devm_release_action(). The first call atomically
>removes and releases the devres entry successfully. The second call finds
>no matching entry, causing devres_release() to return -ENOENT, which trips
>the WARN_ON.

afaict the splat is also triggable via devres_release_all(), ie: unbinding
the host bridge. Basically cxl_find_region_by_name() succeeds because the
region hasn't been device_del()'d yet:

CPU0					CPU1
devres_release_all()
   // take devres_lock
   remove_nodes(devres_head) // mv to local todo
   // drop devres_lock                   delete_region_store()
                                           cxlr = cxl_find_region_by_name()  // success
                                           devm_release_action(unregister_region)
                                             devres_release()
                                               devres_remove()
                                                 // hold devres_lock
                                                 find_dr(devres_head) // does not find it
                                             WARN_ON(-ENOENT)
   release_nodes() // drain todo
     unregister_region(cxlr) // release() cb
       device_del()
       
>Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
>followed by a manual call to unregister_region(). devm_remove_action_nowarn()
>removes the devres tracking entry and returns an error code.

While devm_remove_action_nowarn() has only a single driver user (gpio), using it
here would seem to fit the requirement of independent lifetime management; and
ultimately these races seem benign as unregister_region() is only being called
once.

>------------[ cut here ]------------
>WARNING: drivers/base/devres.c:824 at devm_release_action drivers/base/devres.c:824 [inline], CPU#0: syz.1.12224/47589
>WARNING: drivers/base/devres.c:824 at devm_release_action+0x2b2/0x360 drivers/base/devres.c:817, CPU#0: syz.1.12224/47589

I see you are using syzkaller; I added cxl support as well a while back based
on the usb fuzzying approach, and also triggered this issue (which was in my
to-investigate backlog, so glad you ran into this).

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-08 18:59 [PATCH] cxl/region: Fix a race bug in delete_region_store Sungwoo Kim
                   ` (2 preceding siblings ...)
  2026-03-10 18:36 ` Davidlohr Bueso
@ 2026-03-10 22:53 ` Dan Williams
  2026-03-11  6:55   ` Sungwoo Kim
  3 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2026-03-10 22:53 UTC (permalink / raw)
  To: Sungwoo Kim, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Ben Widawsky
  Cc: daveti, Sungwoo Kim, Jonathan Cameron, linux-cxl, linux-kernel

Sungwoo Kim wrote:
> A race exists when two concurrent sysfs writes to delete_region specify
> the same region name. Both calls succeed in cxl_find_region_by_name()
> (which only does device_find_child_by_name and takes a reference), and
> both then proceed to call devm_release_action(). The first call atomically
> removes and releases the devres entry successfully. The second call finds
> no matching entry, causing devres_release() to return -ENOENT, which trips
> the WARN_ON.

Ugh, yes, good find.

I think I understand the pathology of how this happened. Touching devres
actions needs to happen under device lock or with the knowledge that the
device hosting the action is currently bound to a driver. The
port->uport_dev is known to be bound in this path because decoder
devices are only registered in the bound lifetime of ->uport_dev.

The device_lock() is not needed to make sure the release action can be
called, however, *some* synchronization is needed for racing requesters
as you found.

> Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> removes the devres tracking entry and returns an error code.

No, that is not an acceptable or comprehensive fix. A subsystem should
never try to double unregister a device. Keep in mind the decoder sysfs
stays alive while the devres_release_all() actions are running for
port->uport_dev.  As a result, yes, Davidlohr is right. You can
theoretically observe the unregister_region() release action firing
while delete_region_store() is running.

The only comprehensive fix I currently see to that problem is to indeed
get it back synchronized under the device lock. This prevents multiple
requesters from colliding, and from devres_release_all() deleting an
action that delete_region_store() already committed to handling.

This looks something like schedule the devres action to be released in
workqueue under guard(device)(&port->udev).

It is ugly, but it may be the case that this wants to synchronously
delete the region, and asynchronously cleanup the release action.

I.e. unregister_region() grows a new:

	if (!test_and_set_bit(CXL_REGION_F_DELETE, ...)

...flag to enforce only one path successfully deletes.
delete_region_store() calls unregister_region() directly. Then the
workqueue's job is to trigger the release action under the lock only if
port->uport_dev is still driver-attached by the time the workqueue runs.

In the meantime the workqueue needs to hold a reference on the region
device until it can observe the state of CXL_REGION_F_DELETE under the
device lock.

Otherwise I suspect if you put a device_lock() in delete_region_store()
it will deadlock.

> ------------[ cut here ]------------
> WARNING: drivers/base/devres.c:824 at devm_release_action drivers/base/devres.c:824 [inline], CPU#0: syz.1.12224/47589
> WARNING: drivers/base/devres.c:824 at devm_release_action+0x2b2/0x360 drivers/base/devres.c:817, CPU#0: syz.1.12224/47589

For the next posting, these 2 lines are sufficient, no need for all the
extra detail of a full backtrace.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cxl/region: Fix a race bug in delete_region_store
  2026-03-10 22:53 ` Dan Williams
@ 2026-03-11  6:55   ` Sungwoo Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Sungwoo Kim @ 2026-03-11  6:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Ben Widawsky, daveti, linux-cxl,
	linux-kernel

On Tue, Mar 10, 2026 at 6:54 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Sungwoo Kim wrote:
> > Fix this by replacing devm_release_action() with devm_remove_action_nowarn()
> > followed by a manual call to unregister_region(). devm_remove_action_nowarn()
> > removes the devres tracking entry and returns an error code.
>
> No, that is not an acceptable or comprehensive fix. A subsystem should
> never try to double unregister a device. Keep in mind the decoder sysfs
> stays alive while the devres_release_all() actions are running for
> port->uport_dev.  As a result, yes, Davidlohr is right. You can
> theoretically observe the unregister_region() release action firing
> while delete_region_store() is running.

I appreciate your review and suggestion of a new patch. It seems more
complicated than I expected!

I'm new to CXL and device drivers, so I would like to confirm that my
understanding is correct.
- CXL tolpology could be: [upstream] host -> switch(s) -> endpoints(s)
[downstream].
- The host can configure a unique address range (= region) to access
each device.
- The bug happened because releasing the same regions can race.
- Although the prior patch handled this, it's insufficient because it
can also race with releasing the device (what Davidlohr had reported).

>
> The only comprehensive fix I currently see to that problem is to indeed
> get it back synchronized under the device lock. This prevents multiple
> requesters from colliding, and from devres_release_all() deleting an
> action that delete_region_store() already committed to handling.
>
> This looks something like schedule the devres action to be released in
> workqueue under guard(device)(&port->udev).

I assume you meant guard(device)(port->uport_dev).

>
> It is ugly, but it may be the case that this wants to synchronously
> delete the region, and asynchronously cleanup the release action.
>
> I.e. unregister_region() grows a new:
>
>         if (!test_and_set_bit(CXL_REGION_F_DELETE, ...)
>
> ...flag to enforce only one path successfully deletes.
> delete_region_store() calls unregister_region() directly. Then the

To fix this, you suggested:
- Add a flag to make unregister_region() idempotent.
- Make delete_region_store() call unregister_region() directly.

Now unregister_region() is race-free. But we need to release an action.

> workqueue's job is to trigger the release action under the lock only if
> port->uport_dev is still driver-attached by the time the workqueue runs.
>

What I'm missing is this part.
How could asynchronous cleanup solve the race? In the worst case, the
workqueue could start immediately after we schedule it. In this case,
it's the same as the synchronous execution.
Or do we have other reasons to make it asynchronous?

> In the meantime the workqueue needs to hold a reference on the region
> device until it can observe the state of CXL_REGION_F_DELETE under the
> device lock.

If it's the same for synchronous execution, we might execute this
directly in unregister_region(), holding
guard(device)(port->uport_dev).

>
> Otherwise I suspect if you put a device_lock() in delete_region_store()
> it will deadlock.
>
> > ------------[ cut here ]------------
> > WARNING: drivers/base/devres.c:824 at devm_release_action drivers/base/devres.c:824 [inline], CPU#0: syz.1.12224/47589
> > WARNING: drivers/base/devres.c:824 at devm_release_action+0x2b2/0x360 drivers/base/devres.c:817, CPU#0: syz.1.12224/47589
>
> For the next posting, these 2 lines are sufficient, no need for all the
> extra detail of a full backtrace.
>

Copy that.

Thanks again for your detailed reviews.
Sungwoo.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-11  6:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-08 18:59 [PATCH] cxl/region: Fix a race bug in delete_region_store Sungwoo Kim
2026-03-09 12:00 ` Jonathan Cameron
2026-03-09 17:56   ` Sungwoo Kim
2026-03-09 18:10     ` Jonathan Cameron
2026-03-09 20:32 ` Ira Weiny
2026-03-10 18:36 ` Davidlohr Bueso
2026-03-10 22:53 ` Dan Williams
2026-03-11  6:55   ` Sungwoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox