* [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
@ 2026-04-27 3:20 Sungwoo Kim
2026-04-27 3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Sungwoo Kim @ 2026-04-27 3:20 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams, Robert Richter, Li Ming,
Gregory Price, Ben Widawsky
Cc: Dave Tian, Sungwoo Kim, linux-cxl, linux-kernel
This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
Overview
========
This patch series fixes race conditions in cxl region unregistration.
devm_release_action() should be called once, otherwise, it warns about
the second call. However, the current implementation has a race condition
that allows multiple calls to devm_release_action(). The details are in
each patch.
To fix these, the first patch adds a new function that guarantees that
devm_release_action() is called only once.
Using this function, the second patch subsitutes the current use of
devm_release_action() in cxl region with the new function.
Change in v3
============
Addressed Dave's comments:
- Split and made this in a patch series.
- Enhanced a commit log explaining why a workqueue is used in the first patch.
- Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
Sashiko AI review fixes:
- Fixed construct_region() as it also can race.
- Used a driver's wq instead of system wq so unbinding can drain a work.
[1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
Earlier approach:
v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
Sungwoo Kim (2):
cxl/region: serialize devm action removal via scheduled work
cxl/region: Fix a race bug in delete_region_store()
drivers/cxl/core/port.c | 6 +++++
drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
drivers/cxl/cxl.h | 9 ++++++++
3 files changed, 57 insertions(+), 5 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work
2026-04-27 3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
@ 2026-04-27 3:20 ` Sungwoo Kim
2026-04-27 17:27 ` Dave Jiang
2026-04-27 3:20 ` [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store() Sungwoo Kim
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Sungwoo Kim @ 2026-04-27 3:20 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: Dave Tian, Sungwoo Kim, linux-cxl, linux-kernel
devm_remove_action() must be called (1) only once and (2) only if the
device is still bound to a driver. However, several race conditions
allow multiple calls to devm_remove_action().
For example, delete_region_store() and devres_release_all() can race [1]:
CPU0 CPU1
devres_release_all()
// take devres_lock
remove_nodes(devres_head) // mv to local todo
// drop devres_lock delete_region_store()
cxlr = cxl_find_region_by_name() // success
devm_release_action(unregister_region)
devres_release()
devres_remove()
// hold devres_lock
find_dr(devres_head) // does not find it
WARN_ON(-ENOENT)
release_nodes() // drain todo
unregister_region(cxlr) // release() cb
device_del()
To prevent this, this patch introduces a new function,
remove_devm_actions(), that safely performs devm_release_action().
remove_devm_actions() guarantees that devm_release_action() is called
only once by guarding with a flag. Also, it checks if the device is
still bound to a driver before calling devm_remove_action().
In order to check the binding, a device lock must be held. To do this,
Dan suggested [2] using a workqueue, since a new work has no prior lock
and is clean to acquire a device lock.
[1] https://lore.kernel.org/linux-cxl/20260310183644.4rwc7ilmzy4t5xp6@offworld/
[2] https://lore.kernel.org/linux-cxl/69b0a0f8bfb0b_213210026@dwillia2-mobl4.notmuch/
Suggested-by: Dan Williams <djbw@kernel.org>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
---
drivers/cxl/core/port.c | 6 ++++++
drivers/cxl/core/region.c | 27 +++++++++++++++++++++++++++
drivers/cxl/cxl.h | 9 +++++++++
3 files changed, 42 insertions(+)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c5aacd7054f1..2f142cea7f26 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2305,6 +2305,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
}
EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, "CXL");
+bool schedule_cxl_region_remove_devm_actions(struct cxl_region *cxlr)
+{
+ return queue_work(cxl_bus_wq, &cxlr->remove_work);
+}
+EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_remove_devm_actions, "CXL");
+
static void add_latency(struct access_coordinate *c, long latency)
{
for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..b086ae88b5bb 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -39,6 +39,7 @@
static nodemask_t nodemask_region_seen = NODE_MASK_NONE;
static struct cxl_region *to_cxl_region(struct device *dev);
+static void remove_devm_actions_work(struct work_struct *work);
#define __ACCESS_ATTR_RO(_level, _name) { \
.attr = { .name = __stringify(_name), .mode = 0444 }, \
@@ -2589,6 +2590,8 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
dev->type = &cxl_region_type;
cxl_region_setup_flags(cxlr, &cxlrd->cxlsd.cxld);
+ INIT_WORK(&cxlr->remove_work, remove_devm_actions_work);
+
return cxlr;
}
@@ -2831,6 +2834,30 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
return to_cxl_region(region_dev);
}
+static bool remove_devm_actions(struct cxl_region *cxlr)
+{
+ return schedule_cxl_region_remove_devm_actions(cxlr);
+}
+
+static void remove_devm_actions_work(struct work_struct *work)
+{
+ struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
+ struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
+ struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
+
+ if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
+ put_device(&cxlr->dev);
+ return;
+ }
+
+ scoped_guard(device, port->uport_dev) {
+ if (port->uport_dev->driver)
+ devm_remove_action(port->uport_dev, unregister_region, cxlr);
+ }
+
+ put_device(&cxlr->dev);
+}
+
static ssize_t delete_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..31ca4e6676ed 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -447,6 +447,12 @@ struct cxl_region_params {
*/
#define CXL_REGION_F_NORMALIZED_ADDRESSING 3
+/* Indicate that this region is being unregistered to prevent a race. */
+#define CXL_REGION_F_UNREGISTER 4
+
+/* Indicate that this region called devm_remove_action. */
+#define CXL_REGION_F_DEVM_REMOVE 5
+
/**
* struct cxl_region - CXL region
* @dev: This region's device
@@ -462,6 +468,7 @@ struct cxl_region_params {
* @coord: QoS access coordinates for the region
* @node_notifier: notifier for setting the access coordinates to node
* @adist_notifier: notifier for calculating the abstract distance of node
+ * @remove_work: trigger the remove action in a safe context to acquire locks
*/
struct cxl_region {
struct device dev;
@@ -477,6 +484,7 @@ struct cxl_region {
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
struct notifier_block node_notifier;
struct notifier_block adist_notifier;
+ struct work_struct remove_work;
};
struct cxl_nvdimm_bridge {
@@ -733,6 +741,7 @@ struct cxl_port *cxl_pci_find_port(struct pci_dev *pdev,
struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
struct cxl_dport **dport);
bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
+bool schedule_cxl_region_remove_devm_actions(struct cxl_region *cxlr);
struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
struct device *dport, int port_id,
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store()
2026-04-27 3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
2026-04-27 3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
@ 2026-04-27 3:20 ` Sungwoo Kim
2026-04-27 12:51 ` [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Jonathan Cameron
2026-04-27 17:42 ` Dave Jiang
3 siblings, 0 replies; 12+ messages in thread
From: Sungwoo Kim @ 2026-04-27 3:20 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams, Ben Widawsky
Cc: Dave Tian, Sungwoo Kim, Jonathan Cameron, linux-cxl, linux-kernel
This patch fixes race conditions in cxl region unregistration by (1) using
remove_devm_actions() instead of directly calling devm_release_action()
and (2) making unregister_region() idempotent.
Race:
CPU 0 CPU 1
====== ======
delete_region_store()
cxlr = cxl_find_region_by_name()
delete_region_store()
cxlr = cxl_find_region_by_name()
devm_release_action()
devm_release_action()
// cannot find the action, WARN_ON()
Splat:
WARNING: drivers/base/devres.c:824 at devm_release_action drivers/base/devres.c:824 [inline], CPU#0: syz.1.12224/47589
WARNING: drivers/base/devres.c:824 at devm_release_action+0x2b2/0x360 drivers/base/devres.c:817, CPU#0: syz.1.12224/47589
It's found by a fuzzer based on Syzkaller that rapidly executes CXL's
sysfs interface with QEMU CXL devices cxl-type3 and pxb-cxl.
Syzkaller's CXL grammar is available at:
https://github.com/swkim101/syz-cxl/blob/main/cxl.txt
The bug's impact seems low, since the region has already been released.
So, it does not affect users, except for a negligible warning message.
Fixes: 779dd20cfb56 ("cxl/region: Add region creation support")
Suggested-by: Dan Williams <djbw@kernel.org>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
---
drivers/cxl/core/region.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b086ae88b5bb..08b93b14a2c7 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2544,6 +2544,9 @@ static void unregister_region(void *_cxlr)
struct cxl_region_params *p = &cxlr->params;
int i;
+ if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
+ return;
+
device_del(&cxlr->dev);
/*
@@ -2863,15 +2866,18 @@ static ssize_t delete_region_store(struct device *dev,
const char *buf, size_t len)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
- struct cxl_port *port = to_cxl_port(dev->parent);
struct cxl_region *cxlr;
+ /* remove_devm_actions_work() will put cxlr->dev. */
cxlr = cxl_find_region_by_name(cxlrd, buf);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
- devm_release_action(port->uport_dev, unregister_region, cxlr);
- put_device(&cxlr->dev);
+ unregister_region(cxlr);
+ if (!remove_devm_actions(cxlr)) {
+ put_device(&cxlr->dev);
+ return -EBUSY;
+ }
return len;
}
@@ -3736,7 +3742,6 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
{
struct cxl_endpoint_decoder *cxled = ctx->cxled;
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_port *port = cxlrd_to_port(cxlrd);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
int rc, part = READ_ONCE(cxled->part);
struct cxl_region *cxlr;
@@ -3757,7 +3762,12 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
rc = __construct_region(cxlr, ctx);
if (rc) {
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ /* remove_devm_actions_work() will put the device */
+ get_device(&cxlr->dev);
+ unregister_region(cxlr);
+ if (!remove_devm_actions(cxlr))
+ put_device(&cxlr->dev);
+
return ERR_PTR(rc);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-27 3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
2026-04-27 3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
2026-04-27 3:20 ` [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store() Sungwoo Kim
@ 2026-04-27 12:51 ` Jonathan Cameron
2026-04-27 17:42 ` Dave Jiang
3 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron @ 2026-04-27 12:51 UTC (permalink / raw)
To: Sungwoo Kim
Cc: Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
On Sun, 26 Apr 2026 23:20:07 -0400
Sungwoo Kim <iam@sung-woo.kim> wrote:
> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
>
> Overview
> ========
> This patch series fixes race conditions in cxl region unregistration.
>
> devm_release_action() should be called once, otherwise, it warns about
> the second call. However, the current implementation has a race condition
> that allows multiple calls to devm_release_action(). The details are in
> each patch.
>
> To fix these, the first patch adds a new function that guarantees that
> devm_release_action() is called only once.
> Using this function, the second patch subsitutes the current use of
> devm_release_action() in cxl region with the new function.
>
> Change in v3
> ============
> Addressed Dave's comments:
> - Split and made this in a patch series.
> - Enhanced a commit log explaining why a workqueue is used in the first patch.
> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> Sashiko AI review fixes:
> - Fixed construct_region() as it also can race.
> - Used a driver's wq instead of system wq so unbinding can drain a work.
>
> [1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
>
> Earlier approach:
> v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
> v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
Looks fine to me - I'm not particular keen on delayed releasing as it
may end up with unordered tear down but I don't have a better idea.
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
>
> Sungwoo Kim (2):
> cxl/region: serialize devm action removal via scheduled work
> cxl/region: Fix a race bug in delete_region_store()
>
> drivers/cxl/core/port.c | 6 +++++
> drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
> drivers/cxl/cxl.h | 9 ++++++++
> 3 files changed, 57 insertions(+), 5 deletions(-)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work
2026-04-27 3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
@ 2026-04-27 17:27 ` Dave Jiang
0 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2026-04-27 17:27 UTC (permalink / raw)
To: Sungwoo Kim, Davidlohr Bueso, Jonathan Cameron, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: Dave Tian, linux-cxl, linux-kernel
On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> devm_remove_action() must be called (1) only once and (2) only if the
> device is still bound to a driver. However, several race conditions
> allow multiple calls to devm_remove_action().
>
> For example, delete_region_store() and devres_release_all() can race [1]:
>
> CPU0 CPU1
> devres_release_all()
> // take devres_lock
> remove_nodes(devres_head) // mv to local todo
> // drop devres_lock delete_region_store()
> cxlr = cxl_find_region_by_name() // success
> devm_release_action(unregister_region)
> devres_release()
> devres_remove()
> // hold devres_lock
> find_dr(devres_head) // does not find it
> WARN_ON(-ENOENT)
> release_nodes() // drain todo
> unregister_region(cxlr) // release() cb
> device_del()
>
> To prevent this, this patch introduces a new function,
> remove_devm_actions(), that safely performs devm_release_action().
>
> remove_devm_actions() guarantees that devm_release_action() is called
> only once by guarding with a flag. Also, it checks if the device is
> still bound to a driver before calling devm_remove_action().
>
> In order to check the binding, a device lock must be held. To do this,
> Dan suggested [2] using a workqueue, since a new work has no prior lock
> and is clean to acquire a device lock.
>
> [1] https://lore.kernel.org/linux-cxl/20260310183644.4rwc7ilmzy4t5xp6@offworld/
> [2] https://lore.kernel.org/linux-cxl/69b0a0f8bfb0b_213210026@dwillia2-mobl4.notmuch/
>
> Suggested-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
> ---
> drivers/cxl/core/port.c | 6 ++++++
> drivers/cxl/core/region.c | 27 +++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 9 +++++++++
> 3 files changed, 42 insertions(+)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index c5aacd7054f1..2f142cea7f26 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2305,6 +2305,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
> }
> EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, "CXL");
>
> +bool schedule_cxl_region_remove_devm_actions(struct cxl_region *cxlr)
> +{
> + return queue_work(cxl_bus_wq, &cxlr->remove_work);
> +}
> +EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_remove_devm_actions, "CXL");
> +
> static void add_latency(struct access_coordinate *c, long latency)
> {
> for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..b086ae88b5bb 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -39,6 +39,7 @@
> static nodemask_t nodemask_region_seen = NODE_MASK_NONE;
>
> static struct cxl_region *to_cxl_region(struct device *dev);
> +static void remove_devm_actions_work(struct work_struct *work);
>
> #define __ACCESS_ATTR_RO(_level, _name) { \
> .attr = { .name = __stringify(_name), .mode = 0444 }, \
> @@ -2589,6 +2590,8 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
> dev->type = &cxl_region_type;
> cxl_region_setup_flags(cxlr, &cxlrd->cxlsd.cxld);
>
> + INIT_WORK(&cxlr->remove_work, remove_devm_actions_work);
> +
> return cxlr;
> }
>
> @@ -2831,6 +2834,30 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
> return to_cxl_region(region_dev);
> }
>
> +static bool remove_devm_actions(struct cxl_region *cxlr)
> +{
> + return schedule_cxl_region_remove_devm_actions(cxlr);
> +}
> +
> +static void remove_devm_actions_work(struct work_struct *work)
> +{
> + struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
> + struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> + struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> +
> + if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
> + put_device(&cxlr->dev);
> + return;
> + }
> +
> + scoped_guard(device, port->uport_dev) {
> + if (port->uport_dev->driver)
> + devm_remove_action(port->uport_dev, unregister_region, cxlr);
> + }
> +
> + put_device(&cxlr->dev);
> +}
> +
> static ssize_t delete_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..31ca4e6676ed 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -447,6 +447,12 @@ struct cxl_region_params {
> */
> #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
>
> +/* Indicate that this region is being unregistered to prevent a race. */
> +#define CXL_REGION_F_UNREGISTER 4
I would flip the define of this with the one below and move this define to the second patch, since it is not being used here. Otherwise everything else LGTM.
DJ
> +
> +/* Indicate that this region called devm_remove_action. */
> +#define CXL_REGION_F_DEVM_REMOVE 5
> +
> /**
> * struct cxl_region - CXL region
> * @dev: This region's device
> @@ -462,6 +468,7 @@ struct cxl_region_params {
> * @coord: QoS access coordinates for the region
> * @node_notifier: notifier for setting the access coordinates to node
> * @adist_notifier: notifier for calculating the abstract distance of node
> + * @remove_work: trigger the remove action in a safe context to acquire locks
> */
> struct cxl_region {
> struct device dev;
> @@ -477,6 +484,7 @@ struct cxl_region {
> struct access_coordinate coord[ACCESS_COORDINATE_MAX];
> struct notifier_block node_notifier;
> struct notifier_block adist_notifier;
> + struct work_struct remove_work;
> };
>
> struct cxl_nvdimm_bridge {
> @@ -733,6 +741,7 @@ struct cxl_port *cxl_pci_find_port(struct pci_dev *pdev,
> struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
> struct cxl_dport **dport);
> bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
> +bool schedule_cxl_region_remove_devm_actions(struct cxl_region *cxlr);
>
> struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
> struct device *dport, int port_id,
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-27 3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
` (2 preceding siblings ...)
2026-04-27 12:51 ` [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Jonathan Cameron
@ 2026-04-27 17:42 ` Dave Jiang
2026-04-28 5:42 ` Sungwoo Kim
3 siblings, 1 reply; 12+ messages in thread
From: Dave Jiang @ 2026-04-27 17:42 UTC (permalink / raw)
To: Sungwoo Kim, Davidlohr Bueso, Jonathan Cameron, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams, Robert Richter, Li Ming,
Gregory Price, Ben Widawsky
Cc: Dave Tian, linux-cxl, linux-kernel
On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
>
> Overview
> ========
> This patch series fixes race conditions in cxl region unregistration.
>
> devm_release_action() should be called once, otherwise, it warns about
> the second call. However, the current implementation has a race condition
> that allows multiple calls to devm_release_action(). The details are in
> each patch.
>
> To fix these, the first patch adds a new function that guarantees that
> devm_release_action() is called only once.
> Using this function, the second patch subsitutes the current use of
> devm_release_action() in cxl region with the new function.
>
> Change in v3
> ============
> Addressed Dave's comments:
> - Split and made this in a patch series.
> - Enhanced a commit log explaining why a workqueue is used in the first patch.
> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> Sashiko AI review fixes:
> - Fixed construct_region() as it also can race.
> - Used a driver's wq instead of system wq so unbinding can drain a work.
A few issues Sashiko raised with the v3.
https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
DJ
>
> [1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
>
> Earlier approach:
> v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
> v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
>
> Sungwoo Kim (2):
> cxl/region: serialize devm action removal via scheduled work
> cxl/region: Fix a race bug in delete_region_store()
>
> drivers/cxl/core/port.c | 6 +++++
> drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
> drivers/cxl/cxl.h | 9 ++++++++
> 3 files changed, 57 insertions(+), 5 deletions(-)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-27 17:42 ` Dave Jiang
@ 2026-04-28 5:42 ` Sungwoo Kim
2026-04-28 19:04 ` Dave Jiang
0 siblings, 1 reply; 12+ messages in thread
From: Sungwoo Kim @ 2026-04-28 5:42 UTC (permalink / raw)
To: Dave Jiang
Cc: Davidlohr Bueso, Jonathan Cameron, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
>
> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> > This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
> >
> > Overview
> > ========
> > This patch series fixes race conditions in cxl region unregistration.
> >
> > devm_release_action() should be called once, otherwise, it warns about
> > the second call. However, the current implementation has a race condition
> > that allows multiple calls to devm_release_action(). The details are in
> > each patch.
> >
> > To fix these, the first patch adds a new function that guarantees that
> > devm_release_action() is called only once.
> > Using this function, the second patch subsitutes the current use of
> > devm_release_action() in cxl region with the new function.
> >
> > Change in v3
> > ============
> > Addressed Dave's comments:
> > - Split and made this in a patch series.
> > - Enhanced a commit log explaining why a workqueue is used in the first patch.
> > - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> > Sashiko AI review fixes:
> > - Fixed construct_region() as it also can race.
> > - Used a driver's wq instead of system wq so unbinding can drain a work.
>
> A few issues Sashiko raised with the v3.
> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
A Sashiko's reasoning makes sense to me. In the middle of
construct_region(), users can perform a sysfs write, which then calls
unregister_region().
unregister_region() must not be called before construct_region() has
completed, since it calls device_del() and put_device().
This isn't introduced by this patch, but I can fix it in v4.
An enable/disable flag to allow sysfs write might fix this, like:
static struct cxl_region *construct_region(...)
{
... // when a construction is done
set_bit(CXL_REGION_F_ENABLE_SYSFS);
return cxlr;
}
This prevents a sysfs write before it's done:
static ssize_t delete_region_store(...)
{
if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
return -EBUSY;
}
...
}
Would there be a better solution? I'd like to ask for comments on this.
This patch series already introduces two new flags, and I'm concerned
about adding another.
Thank you!
Sungwoo.
>
> DJ
>
>
> >
> > [1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
> >
> > Earlier approach:
> > v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
> > v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
> >
> > Sungwoo Kim (2):
> > cxl/region: serialize devm action removal via scheduled work
> > cxl/region: Fix a race bug in delete_region_store()
> >
> > drivers/cxl/core/port.c | 6 +++++
> > drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
> > drivers/cxl/cxl.h | 9 ++++++++
> > 3 files changed, 57 insertions(+), 5 deletions(-)
> >
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-28 5:42 ` Sungwoo Kim
@ 2026-04-28 19:04 ` Dave Jiang
2026-04-28 20:28 ` Sungwoo Kim
0 siblings, 1 reply; 12+ messages in thread
From: Dave Jiang @ 2026-04-28 19:04 UTC (permalink / raw)
To: Sungwoo Kim
Cc: Davidlohr Bueso, Jonathan Cameron, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
On 4/27/26 10:42 PM, Sungwoo Kim wrote:
> On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>>
>> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
>>> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
>>>
>>> Overview
>>> ========
>>> This patch series fixes race conditions in cxl region unregistration.
>>>
>>> devm_release_action() should be called once, otherwise, it warns about
>>> the second call. However, the current implementation has a race condition
>>> that allows multiple calls to devm_release_action(). The details are in
>>> each patch.
>>>
>>> To fix these, the first patch adds a new function that guarantees that
>>> devm_release_action() is called only once.
>>> Using this function, the second patch subsitutes the current use of
>>> devm_release_action() in cxl region with the new function.
>>>
>>> Change in v3
>>> ============
>>> Addressed Dave's comments:
>>> - Split and made this in a patch series.
>>> - Enhanced a commit log explaining why a workqueue is used in the first patch.
>>> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
>>> Sashiko AI review fixes:
>>> - Fixed construct_region() as it also can race.
>>> - Used a driver's wq instead of system wq so unbinding can drain a work.
>>
>> A few issues Sashiko raised with the v3.
>> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
>
> A Sashiko's reasoning makes sense to me. In the middle of
> construct_region(), users can perform a sysfs write, which then calls
> unregister_region().
> unregister_region() must not be called before construct_region() has
> completed, since it calls device_del() and put_device().
>
> This isn't introduced by this patch, but I can fix it in v4.
>
> An enable/disable flag to allow sysfs write might fix this, like:
>
> static struct cxl_region *construct_region(...)
> {
> ... // when a construction is done
> set_bit(CXL_REGION_F_ENABLE_SYSFS);
> return cxlr;
> }
>
> This prevents a sysfs write before it's done:
>
> static ssize_t delete_region_store(...)
> {
> if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
> return -EBUSY;
> }
> ...
> }
>
> Would there be a better solution? I'd like to ask for comments on this.
> This patch series already introduces two new flags, and I'm concerned
> about adding another.
Sungwoo,
I looked at the sachiko raised issues and looked at the current patches. I threw the sachiko objections and the current patches at Claude and it says the workqueue change made this more complicated and proposed this solution. Seems reasonable. What do you think?
---
Race 1: Concurrent delete_region_store() calls
===============================================
CPU 0 CPU 1
delete_region_store() delete_region_store()
cxl_find_region_by_name() cxl_find_region_by_name()
devm_release_action() devm_release_action()
unregister_region() WARN_ON(-ENOENT)
The second call fails because the devm action was already removed.
Race 2: Concurrent delete and driver unbind
============================================
CPU 0 CPU 1
delete_region_store() driver unbind
cxl_find_region_by_name() devres_release_all()
devm_release_action() unregister_region()
unregister_region() // cleanup done
// cleanup done again!
WARN_ON(-ENOENT)
Race 3: Use-after-free during construction
===========================================
CPU 0: __construct_region() CPU 1: delete_region_store()
device_add(&cxlr->dev)
// Region visible in sysfs!
cxl_find_region_by_name()
// SUCCESS!
__construct_region(cxlr)
p = &cxlr->params
// p->interleave_ways still 0
unregister_region(cxlr)
device_del()
for (i = 0; i < 0; i++)
// LOOP SKIPPED!
put_device(&cxlr->dev)
// refcount -> 0
// cxlr FREED!
p->interleave_ways = ctx->... // USE-AFTER-FREE!
p->state = INTERLEAVE_ACTIVE // USE-AFTER-FREE!
The Solution
============
1. Make unregister_region() idempotent using CXL_REGION_F_UNREGISTER flag
Multiple concurrent calls will have one do the cleanup and others
return early safely.
2. Use synchronous devm_remove_action_nowarn() instead of devm_release_action()
- Call unregister_region() first (idempotent cleanup)
- Then remove devm action with _nowarn variant
- -ENOENT is benign in all race scenarios:
* devres_release_all() already handled it
* Driver unbind/rebind race (old action gone, new devres list)
3. Protect params read in unregister_region() with lock
Use scoped_guard(rwsem_read) to safely read interleave_ways even
during concurrent construction. If state < INTERLEAVE_ACTIVE, the
region is still initializing and ways=0 is correct.
4. Add memory barriers for state transitions
Use smp_store_release() in __construct_region() and smp_load_acquire()
in unregister_region() to ensure proper ordering of initialization.
5. Hold extra reference during construct_region()
Prevents premature freeing if unregister races during initialization.
6. Check unregister flag in __construct_region()
Abort construction early if delete_region_store() raced after
device_add() but before acquiring the write lock.
This approach is simpler and more correct than async workers because:
- Relies on fundamental kernel primitives (idempotency + locks)
- No TOCTOU races with driver binding state
- Fewer lines of code
- Handles all race scenarios correctly
Race Scenario Analysis:
| Scenario | Outcome |
|---------------------------|--------------------------------------|
| Normal delete | Cleanup done, action removed |
| Concurrent deletes | One cleans up, others return early |
| Delete + driver unbind | One cleans up, other gets -ENOENT |
| Delete + unbind/rebind | Old action handled, -ENOENT benign |
| Delete during construct | Extra ref prevents UAF, lock protects|
---
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..8474756913e6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2541,16 +2541,45 @@ static void unregister_region(void *_cxlr)
{
struct cxl_region *cxlr = _cxlr;
struct cxl_region_params *p = &cxlr->params;
- int i;
+ int i, ways;
+
+ /*
+ * Idempotency: prevent multiple concurrent calls to unregister_region().
+ * This can happen if user calls delete_region_store() while driver
+ * unbind triggers devres_release_all() concurrently.
+ */
+ if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
+ return;
device_del(&cxlr->dev);
/*
- * Now that region sysfs is shutdown, the parameter block is now
- * read-only, so no need to hold the region rwsem to access the
- * region parameters.
+ * Even though sysfs is shutdown, the region may still be initializing
+ * if construct_region() is running concurrently. We must safely read
+ * interleave_ways under lock to avoid reading uninitialized values.
+ *
+ * Use a read lock (not write) to allow concurrent operations if needed.
+ * Read the ways count under lock, then release before the loop to
+ * minimize lock hold time.
+ *
+ * Memory ordering: use smp_load_acquire() to ensure we see the fully
+ * initialized value of interleave_ways if state >= INTERLEAVE_ACTIVE.
*/
- for (i = 0; i < p->interleave_ways; i++)
+ scoped_guard(rwsem_read, &cxl_rwsem.region) {
+ /*
+ * Pairs with smp_store_release() in __construct_region().
+ * Ensures if we see state >= INTERLEAVE_ACTIVE, we also see
+ * the initialized interleave_ways value.
+ */
+ enum cxl_config_state state = smp_load_acquire(&p->state);
+
+ if (state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+ ways = READ_ONCE(p->interleave_ways);
+ else
+ ways = 0; /* Still initializing, nothing to detach */
+ }
+
+ for (i = 0; i < ways; i++)
detach_target(cxlr, i);
cxlr->hpa_range = DEFINE_RANGE(0, -1);
@@ -2838,14 +2867,48 @@ static ssize_t delete_region_store(struct device *dev,
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_port *port = to_cxl_port(dev->parent);
struct cxl_region *cxlr;
+ int rc;
cxlr = cxl_find_region_by_name(cxlrd, buf);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ /*
+ * Call unregister_region() first to do the cleanup. This is
+ * idempotent (CXL_REGION_F_UNREGISTER flag), so if
+ * devres_release_all() calls it concurrently, one will do the
+ * work and the other will return early.
+ */
+ unregister_region(cxlr);
+
+ /*
+ * Now remove the devm action to prevent devres_release_all()
+ * from calling unregister_region() again during driver unbind.
+ *
+ * Three possible outcomes:
+ *
+ * 1. Success: We removed the action before driver unbind.
+ * Driver unbind won't call unregister_region().
+ *
+ * 2. -ENOENT: devres_release_all() already removed the action
+ * and called unregister_region() (which returned early due
+ * to F_UNREGISTER flag). This is benign.
+ *
+ * 3. -ENOENT: Driver was unbound and rebound between
+ * unregister_region() above and here. The old action was
+ * already handled by the old unbind. The new binding has
+ * no action for this region. This is benign.
+ *
+ * Use _nowarn variant since -ENOENT is expected and benign.
+ * The devres_lock serializes access, so we can't corrupt the
+ * devres list.
+ */
+ rc = devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+ if (rc && rc != -ENOENT)
+ dev_warn(&cxlr->dev,
+ "Unexpected error removing devm action: %d\n", rc);
+
put_device(&cxlr->dev);
-
return len;
}
DEVICE_ATTR_WO(delete_region);
@@ -3644,6 +3707,16 @@ static int __construct_region(struct cxl_region *cxlr,
return -EBUSY;
}
+ /*
+ * Check if region is being unregistered concurrently. This can happen
+ * if user triggers delete_region_store() right after device_add() but
+ * before we acquire the write lock above.
+ */
+ if (test_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags)) {
+ dev_dbg(&cxlr->dev, "Region unregister in progress, aborting construction\n");
+ return -EBUSY;
+ }
+
set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
cxlr->hpa_range = *hpa_range;
@@ -3686,7 +3759,16 @@ static int __construct_region(struct cxl_region *cxlr,
p->res = res;
p->interleave_ways = ctx->interleave_ways;
p->interleave_granularity = ctx->interleave_granularity;
- p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+
+ /*
+ * Use smp_store_release() to ensure all initialization above is
+ * visible before state becomes INTERLEAVE_ACTIVE. This pairs with
+ * smp_load_acquire() in unregister_region().
+ *
+ * This ensures that if unregister_region() sees state >=
+ * INTERLEAVE_ACTIVE, it will also see the initialized interleave_ways.
+ */
+ smp_store_release(&p->state, CXL_CONFIG_INTERLEAVE_ACTIVE);
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
if (rc)
@@ -3728,12 +3810,31 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
+ /*
+ * Hold an extra reference during initialization to prevent
+ * unregister_region() from freeing the object if delete_region_store()
+ * races with __construct_region(). The device_initialize() in
+ * cxl_region_alloc() set refcount=1, and device_add() above made it
+ * visible, so a concurrent unregister could drop that reference while
+ * we're still initializing.
+ */
+ get_device(&cxlr->dev);
+
rc = __construct_region(cxlr, ctx);
if (rc) {
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ /*
+ * Construction failed. Unregister the partially initialized
+ * region and remove the devm action. Both are safe to call
+ * due to idempotency and proper locking.
+ */
+ unregister_region(cxlr);
+ devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+
+ put_device(&cxlr->dev); /* Drop construction reference */
return ERR_PTR(rc);
}
+ put_device(&cxlr->dev); /* Drop construction reference */
return cxlr;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..81952f0763e1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -447,6 +447,13 @@ struct cxl_region_params {
*/
#define CXL_REGION_F_NORMALIZED_ADDRESSING 3
+/*
+ * Indicate that this region is being unregistered. Used to make
+ * unregister_region() idempotent and prevent race conditions between
+ * delete_region_store() and devres_release_all().
+ */
+#define CXL_REGION_F_UNREGISTER 4
+
/**
* struct cxl_region - CXL region
* @dev: This region's device
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-28 19:04 ` Dave Jiang
@ 2026-04-28 20:28 ` Sungwoo Kim
2026-04-28 22:33 ` Dave Jiang
0 siblings, 1 reply; 12+ messages in thread
From: Sungwoo Kim @ 2026-04-28 20:28 UTC (permalink / raw)
To: Dave Jiang
Cc: Davidlohr Bueso, Jonathan Cameron, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
Claude suggests not using wq, since it's simpler. I agree that it's
simple, but it's overly tailored to fix a specific bug.
Actually, v1[1] proposed a similar patch. So let me bring a patch and
discussion from v1:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 08fa3deef70ab..7ade9aa2aeecc 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_port *port = to_cxl_port(dev->parent);
struct cxl_region *cxlr;
+ int err;
cxlr = cxl_find_region_by_name(cxlrd, buf);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
+ cxlr);
+ if (err) {
+ put_device(&cxlr->dev);
+ return err;
+ }
+ unregister_region(cxlr);
put_device(&cxlr->dev);
return len;
However, v1 has not been merged. Dan[2] commented that "No, that is
not an acceptable or comprehensive fix. A subsystem should never try
to double unregister a device." Also in the following thread[3], "The
patch was technically correct but it relies on a design that requires
depending on a double free semantic."
I respect this design decision. Then, I need to execute
devm_release_[action|all]() only once, which requires a device lock,
guard(device)(port->uport_dev). Under a lock, I can check a flag to
execute devm_release_[action|all]() only once.
To use the lock, a clean work without a prior lock is required. That's
a reason this patch ended up in wq.
I hope I've explained the rationale for using wq. What do you think?
Thank you!
Sungwoo.
[1] https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
[2] https://lore.kernel.org/linux-cxl/69b0a0f8bfb0b_213210026@dwillia2-mobl4.notmuch/
[3] https://lore.kernel.org/linux-cxl/69cc961ef12e8_17890410036@dwillia2-mobl4.notmuch/
On Tue, Apr 28, 2026 at 3:05 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
>
> On 4/27/26 10:42 PM, Sungwoo Kim wrote:
> > On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
> >>
> >>
> >>
> >> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> >>> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
> >>>
> >>> Overview
> >>> ========
> >>> This patch series fixes race conditions in cxl region unregistration.
> >>>
> >>> devm_release_action() should be called once, otherwise, it warns about
> >>> the second call. However, the current implementation has a race condition
> >>> that allows multiple calls to devm_release_action(). The details are in
> >>> each patch.
> >>>
> >>> To fix these, the first patch adds a new function that guarantees that
> >>> devm_release_action() is called only once.
> >>> Using this function, the second patch subsitutes the current use of
> >>> devm_release_action() in cxl region with the new function.
> >>>
> >>> Change in v3
> >>> ============
> >>> Addressed Dave's comments:
> >>> - Split and made this in a patch series.
> >>> - Enhanced a commit log explaining why a workqueue is used in the first patch.
> >>> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> >>> Sashiko AI review fixes:
> >>> - Fixed construct_region() as it also can race.
> >>> - Used a driver's wq instead of system wq so unbinding can drain a work.
> >>
> >> A few issues Sashiko raised with the v3.
> >> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
> >
> > A Sashiko's reasoning makes sense to me. In the middle of
> > construct_region(), users can perform a sysfs write, which then calls
> > unregister_region().
> > unregister_region() must not be called before construct_region() has
> > completed, since it calls device_del() and put_device().
> >
> > This isn't introduced by this patch, but I can fix it in v4.
> >
> > An enable/disable flag to allow sysfs write might fix this, like:
> >
> > static struct cxl_region *construct_region(...)
> > {
> > ... // when a construction is done
> > set_bit(CXL_REGION_F_ENABLE_SYSFS);
> > return cxlr;
> > }
> >
> > This prevents a sysfs write before it's done:
> >
> > static ssize_t delete_region_store(...)
> > {
> > if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
> > return -EBUSY;
> > }
> > ...
> > }
> >
> > Would there be a better solution? I'd like to ask for comments on this.
> > This patch series already introduces two new flags, and I'm concerned
> > about adding another.
>
> Sungwoo,
> I looked at the sachiko raised issues and looked at the current patches. I threw the sachiko objections and the current patches at Claude and it says the workqueue change made this more complicated and proposed this solution. Seems reasonable. What do you think?
>
> ---
>
> Race 1: Concurrent delete_region_store() calls
> ===============================================
> CPU 0 CPU 1
> delete_region_store() delete_region_store()
> cxl_find_region_by_name() cxl_find_region_by_name()
> devm_release_action() devm_release_action()
> unregister_region() WARN_ON(-ENOENT)
>
> The second call fails because the devm action was already removed.
>
> Race 2: Concurrent delete and driver unbind
> ============================================
> CPU 0 CPU 1
> delete_region_store() driver unbind
> cxl_find_region_by_name() devres_release_all()
> devm_release_action() unregister_region()
> unregister_region() // cleanup done
> // cleanup done again!
> WARN_ON(-ENOENT)
>
> Race 3: Use-after-free during construction
> ===========================================
> CPU 0: __construct_region() CPU 1: delete_region_store()
> device_add(&cxlr->dev)
> // Region visible in sysfs!
> cxl_find_region_by_name()
> // SUCCESS!
> __construct_region(cxlr)
> p = &cxlr->params
> // p->interleave_ways still 0
> unregister_region(cxlr)
> device_del()
> for (i = 0; i < 0; i++)
> // LOOP SKIPPED!
> put_device(&cxlr->dev)
> // refcount -> 0
> // cxlr FREED!
>
> p->interleave_ways = ctx->... // USE-AFTER-FREE!
> p->state = INTERLEAVE_ACTIVE // USE-AFTER-FREE!
>
> The Solution
> ============
>
> 1. Make unregister_region() idempotent using CXL_REGION_F_UNREGISTER flag
> Multiple concurrent calls will have one do the cleanup and others
> return early safely.
>
> 2. Use synchronous devm_remove_action_nowarn() instead of devm_release_action()
> - Call unregister_region() first (idempotent cleanup)
> - Then remove devm action with _nowarn variant
> - -ENOENT is benign in all race scenarios:
> * devres_release_all() already handled it
> * Driver unbind/rebind race (old action gone, new devres list)
>
> 3. Protect params read in unregister_region() with lock
> Use scoped_guard(rwsem_read) to safely read interleave_ways even
> during concurrent construction. If state < INTERLEAVE_ACTIVE, the
> region is still initializing and ways=0 is correct.
>
> 4. Add memory barriers for state transitions
> Use smp_store_release() in __construct_region() and smp_load_acquire()
> in unregister_region() to ensure proper ordering of initialization.
>
> 5. Hold extra reference during construct_region()
> Prevents premature freeing if unregister races during initialization.
>
> 6. Check unregister flag in __construct_region()
> Abort construction early if delete_region_store() raced after
> device_add() but before acquiring the write lock.
>
> This approach is simpler and more correct than async workers because:
> - Relies on fundamental kernel primitives (idempotency + locks)
> - No TOCTOU races with driver binding state
> - Fewer lines of code
> - Handles all race scenarios correctly
>
> Race Scenario Analysis:
>
> | Scenario | Outcome |
> |---------------------------|--------------------------------------|
> | Normal delete | Cleanup done, action removed |
> | Concurrent deletes | One cleans up, others return early |
> | Delete + driver unbind | One cleans up, other gets -ENOENT |
> | Delete + unbind/rebind | Old action handled, -ENOENT benign |
> | Delete during construct | Extra ref prevents UAF, lock protects|
>
> ---
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..8474756913e6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2541,16 +2541,45 @@ static void unregister_region(void *_cxlr)
> {
> struct cxl_region *cxlr = _cxlr;
> struct cxl_region_params *p = &cxlr->params;
> - int i;
> + int i, ways;
> +
> + /*
> + * Idempotency: prevent multiple concurrent calls to unregister_region().
> + * This can happen if user calls delete_region_store() while driver
> + * unbind triggers devres_release_all() concurrently.
> + */
> + if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
> + return;
>
> device_del(&cxlr->dev);
>
> /*
> - * Now that region sysfs is shutdown, the parameter block is now
> - * read-only, so no need to hold the region rwsem to access the
> - * region parameters.
> + * Even though sysfs is shutdown, the region may still be initializing
> + * if construct_region() is running concurrently. We must safely read
> + * interleave_ways under lock to avoid reading uninitialized values.
> + *
> + * Use a read lock (not write) to allow concurrent operations if needed.
> + * Read the ways count under lock, then release before the loop to
> + * minimize lock hold time.
> + *
> + * Memory ordering: use smp_load_acquire() to ensure we see the fully
> + * initialized value of interleave_ways if state >= INTERLEAVE_ACTIVE.
> */
> - for (i = 0; i < p->interleave_ways; i++)
> + scoped_guard(rwsem_read, &cxl_rwsem.region) {
> + /*
> + * Pairs with smp_store_release() in __construct_region().
> + * Ensures if we see state >= INTERLEAVE_ACTIVE, we also see
> + * the initialized interleave_ways value.
> + */
> + enum cxl_config_state state = smp_load_acquire(&p->state);
> +
> + if (state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
> + ways = READ_ONCE(p->interleave_ways);
> + else
> + ways = 0; /* Still initializing, nothing to detach */
> + }
> +
> + for (i = 0; i < ways; i++)
> detach_target(cxlr, i);
>
> cxlr->hpa_range = DEFINE_RANGE(0, -1);
> @@ -2838,14 +2867,48 @@ static ssize_t delete_region_store(struct device *dev,
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> struct cxl_port *port = to_cxl_port(dev->parent);
> struct cxl_region *cxlr;
> + int rc;
>
> cxlr = cxl_find_region_by_name(cxlrd, buf);
> if (IS_ERR(cxlr))
> return PTR_ERR(cxlr);
>
> - devm_release_action(port->uport_dev, unregister_region, cxlr);
> + /*
> + * Call unregister_region() first to do the cleanup. This is
> + * idempotent (CXL_REGION_F_UNREGISTER flag), so if
> + * devres_release_all() calls it concurrently, one will do the
> + * work and the other will return early.
> + */
> + unregister_region(cxlr);
> +
> + /*
> + * Now remove the devm action to prevent devres_release_all()
> + * from calling unregister_region() again during driver unbind.
> + *
> + * Three possible outcomes:
> + *
> + * 1. Success: We removed the action before driver unbind.
> + * Driver unbind won't call unregister_region().
> + *
> + * 2. -ENOENT: devres_release_all() already removed the action
> + * and called unregister_region() (which returned early due
> + * to F_UNREGISTER flag). This is benign.
> + *
> + * 3. -ENOENT: Driver was unbound and rebound between
> + * unregister_region() above and here. The old action was
> + * already handled by the old unbind. The new binding has
> + * no action for this region. This is benign.
> + *
> + * Use _nowarn variant since -ENOENT is expected and benign.
> + * The devres_lock serializes access, so we can't corrupt the
> + * devres list.
> + */
> + rc = devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
> + if (rc && rc != -ENOENT)
> + dev_warn(&cxlr->dev,
> + "Unexpected error removing devm action: %d\n", rc);
> +
> put_device(&cxlr->dev);
> -
> return len;
> }
> DEVICE_ATTR_WO(delete_region);
> @@ -3644,6 +3707,16 @@ static int __construct_region(struct cxl_region *cxlr,
> return -EBUSY;
> }
>
> + /*
> + * Check if region is being unregistered concurrently. This can happen
> + * if user triggers delete_region_store() right after device_add() but
> + * before we acquire the write lock above.
> + */
> + if (test_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags)) {
> + dev_dbg(&cxlr->dev, "Region unregister in progress, aborting construction\n");
> + return -EBUSY;
> + }
> +
> set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> cxlr->hpa_range = *hpa_range;
>
> @@ -3686,7 +3759,16 @@ static int __construct_region(struct cxl_region *cxlr,
> p->res = res;
> p->interleave_ways = ctx->interleave_ways;
> p->interleave_granularity = ctx->interleave_granularity;
> - p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
> +
> + /*
> + * Use smp_store_release() to ensure all initialization above is
> + * visible before state becomes INTERLEAVE_ACTIVE. This pairs with
> + * smp_load_acquire() in unregister_region().
> + *
> + * This ensures that if unregister_region() sees state >=
> + * INTERLEAVE_ACTIVE, it will also see the initialized interleave_ways.
> + */
> + smp_store_release(&p->state, CXL_CONFIG_INTERLEAVE_ACTIVE);
>
> rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
> if (rc)
> @@ -3728,12 +3810,31 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> return cxlr;
> }
>
> + /*
> + * Hold an extra reference during initialization to prevent
> + * unregister_region() from freeing the object if delete_region_store()
> + * races with __construct_region(). The device_initialize() in
> + * cxl_region_alloc() set refcount=1, and device_add() above made it
> + * visible, so a concurrent unregister could drop that reference while
> + * we're still initializing.
> + */
> + get_device(&cxlr->dev);
> +
> rc = __construct_region(cxlr, ctx);
> if (rc) {
> - devm_release_action(port->uport_dev, unregister_region, cxlr);
> + /*
> + * Construction failed. Unregister the partially initialized
> + * region and remove the devm action. Both are safe to call
> + * due to idempotency and proper locking.
> + */
> + unregister_region(cxlr);
> + devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
> +
> + put_device(&cxlr->dev); /* Drop construction reference */
> return ERR_PTR(rc);
> }
>
> + put_device(&cxlr->dev); /* Drop construction reference */
> return cxlr;
> }
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..81952f0763e1 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -447,6 +447,13 @@ struct cxl_region_params {
> */
> #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
>
> +/*
> + * Indicate that this region is being unregistered. Used to make
> + * unregister_region() idempotent and prevent race conditions between
> + * delete_region_store() and devres_release_all().
> + */
> +#define CXL_REGION_F_UNREGISTER 4
> +
> /**
> * struct cxl_region - CXL region
> * @dev: This region's device
>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-28 20:28 ` Sungwoo Kim
@ 2026-04-28 22:33 ` Dave Jiang
2026-04-30 4:39 ` Sungwoo Kim
0 siblings, 1 reply; 12+ messages in thread
From: Dave Jiang @ 2026-04-28 22:33 UTC (permalink / raw)
To: Sungwoo Kim
Cc: Davidlohr Bueso, Jonathan Cameron, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
On 4/28/26 1:28 PM, Sungwoo Kim wrote:
> Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
>
> Claude suggests not using wq, since it's simpler. I agree that it's
> simple, but it's overly tailored to fix a specific bug.
> Actually, v1[1] proposed a similar patch. So let me bring a patch and
> discussion from v1:
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 08fa3deef70ab..7ade9aa2aeecc 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> struct cxl_port *port = to_cxl_port(dev->parent);
> struct cxl_region *cxlr;
> + int err;
>
> cxlr = cxl_find_region_by_name(cxlrd, buf);
> if (IS_ERR(cxlr))
> return PTR_ERR(cxlr);
>
> - devm_release_action(port->uport_dev, unregister_region, cxlr);
> + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
> + cxlr);
> + if (err) {
> + put_device(&cxlr->dev);
> + return err;
> + }
> + unregister_region(cxlr);
> put_device(&cxlr->dev);
>
> return len;
>
> However, v1 has not been merged. Dan[2] commented that "No, that is
> not an acceptable or comprehensive fix. A subsystem should never try
> to double unregister a device." Also in the following thread[3], "The
> patch was technically correct but it relies on a design that requires
> depending on a double free semantic."
>
> I respect this design decision. Then, I need to execute
> devm_release_[action|all]() only once, which requires a device lock,
> guard(device)(port->uport_dev). Under a lock, I can check a flag to
> execute devm_release_[action|all]() only once.
> To use the lock, a clean work without a prior lock is required. That's
> a reason this patch ended up in wq.
>
> I hope I've explained the rationale for using wq. What do you think?
Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.
Issue 1: Lockless read of p->interleave_ways
=============================================
Problem: unregister_region() reads p->interleave_ways without holding
the region rwsem while __construct_region() may still be initializing it
under write lock. This can result in reading 0 and skipping target
detachment, potentially causing use-after-free.
Solution: Acquire rwsem read lock before reading p->interleave_ways to
serialize with __construct_region(). Add explicit handling for
interleave_ways == 0 (uninitialized region) and use smp_rmb() to ensure
proper memory ordering.
Issue 2: Driver unbind/rebind race with workqueue
==================================================
Problem: The async workqueue could execute after driver unbind clears
the devres list, then operate on a new/empty list after rebind.
Solution: The workqueue already checks if driver is bound before calling
devm_remove_action(). Added cancel_work_sync() in cxl_region_release()
to ensure no dangling work when region is freed. Enhanced comments to
clarify the synchronization.
Issue 3: Use-after-free in construct_region() error path
=========================================================
Problem: In the original problematic code, calling unregister_region()
directly then get_device() created a window where concurrent
delete_region_store() could free the region.
Solution: Never call unregister_region() directly. The workqueue calls
devm_remove_action() which invokes unregister_region() in a safe context.
Reference transfer is now clean - caller's reference is transferred to
the workqueue, which drops it after work completes.
Key design:
- unregister_region() is ONLY called as a devm callback (driver unbind
or workqueue via devm_remove_action())
- CXL_REGION_F_UNREGISTER flag is ONLY set by unregister_region() itself,
preventing concurrent execution
- CXL_REGION_F_DEVM_REMOVE flag is set by workqueue to prevent duplicate
devm_remove_action() calls
- Reference counting is clear: lookup refs are transferred to workqueue
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 08b93b14a2c7..4c5771bc4717 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2500,6 +2500,14 @@ static void cxl_region_release(struct device *dev)
struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
int id = atomic_read(&cxlrd->region_id);
+ /*
+ * Ensure pending removal work completes before freeing.
+ * This is safe even if called from within the work function
+ * because cancel_work_sync() returns immediately if the work
+ * is currently executing.
+ */
+ cancel_work_sync(&cxlr->remove_work);
+
/*
* Try to reuse the recently idled id rather than the cached
* next id to prevent the region id space from increasing
@@ -2544,22 +2552,46 @@ static void unregister_region(void *_cxlr)
struct cxl_region_params *p = &cxlr->params;
int i;
+ /*
+ * This is ONLY called as a devm callback:
+ * 1. During driver unbind (devres_release_all)
+ * 2. Via devm_remove_action() from workqueue
+ *
+ * The UNREGISTER flag prevents concurrent execution if both
+ * paths race (e.g., user delete concurrent with driver unbind).
+ */
if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
return;
device_del(&cxlr->dev);
/*
- * Now that region sysfs is shutdown, the parameter block is now
- * read-only, so no need to hold the region rwsem to access the
- * region parameters.
+ * Sysfs is shutdown, blocking new writers. However, __construct_region()
+ * may still be initializing params under region rwsem write lock.
+ * Acquire read lock to serialize and get consistent view of interleave_ways.
+ *
+ * If interleave_ways is 0, region was never fully initialized.
*/
+ guard(rwsem_read)(&cxl_rwsem.region);
+
+ /*
+ * Ensure UNREGISTER flag check happens-before interleave_ways read.
+ */
+ smp_rmb();
+
+ if (p->interleave_ways == 0) {
+ /* Not initialized - skip target detachment */
+ goto out;
+ }
+
for (i = 0; i < p->interleave_ways; i++)
detach_target(cxlr, i);
cxlr->hpa_range = DEFINE_RANGE(0, -1);
+out:
cxl_region_iomem_release(cxlr);
+ /* Drop the devm-owned reference */
put_device(&cxlr->dev);
}
@@ -2842,22 +2874,59 @@ static bool remove_devm_actions(struct cxl_region *cxlr)
return schedule_cxl_region_remove_devm_actions(cxlr);
}
+/**
+ * remove_devm_actions_work - Workqueue handler for region removal
+ * @work: work_struct embedded in cxl_region
+ *
+ * Runs in safe context where it can acquire locks and call
+ * devm_remove_action() which will invoke unregister_region() without deadlock.
+ *
+ * This ensures unregister_region() is called through the devm system.
+ */
static void remove_devm_actions_work(struct work_struct *work)
{
struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
+ /*
+ * Prevent multiple attempts to remove devm actions.
+ * If already set, another thread beat us to it.
+ */
if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
put_device(&cxlr->dev);
return;
}
+ /*
+ * Lock port device and verify driver is still bound.
+ * If driver was unbound, devres_release_all() already called
+ * unregister_region() and the devres entry is gone.
+ */
scoped_guard(device, port->uport_dev) {
- if (port->uport_dev->driver)
+ if (port->uport_dev->driver) {
+ /*
+ * Call devm_remove_action() which will:
+ * 1. Find and remove the devres entry
+ * 2. Call unregister_region(cxlr)
+ *
+ * The UNREGISTER flag in unregister_region() prevents
+ * re-entry if this races with driver unbind.
+ */
devm_remove_action(port->uport_dev, unregister_region, cxlr);
+ }
+ /*
+ * If driver not bound, devres already released everything.
+ * The UNREGISTER flag would have prevented duplicate execution.
+ */
}
+ /*
+ * Drop the extra reference taken when work was queued.
+ * After this put_device(), if ref reaches 0, cxl_region_release()
+ * will be called, which does cancel_work_sync() on this work item
+ * (safe because we're about to return from the work function).
+ */
put_device(&cxlr->dev);
}
@@ -2868,17 +2937,40 @@ static ssize_t delete_region_store(struct device *dev,
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_region *cxlr;
- /* remove_devm_actions_work() will put cxlr->dev. */
+ /*
+ * Look up region by name. Takes a reference that will be transferred
+ * to the workqueue.
+ */
cxlr = cxl_find_region_by_name(cxlrd, buf);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
- unregister_region(cxlr);
+ /*
+ * Queue work to remove devm action. The workqueue will call
+ * devm_remove_action() -> unregister_region() in safe context.
+ *
+ * unregister_region() will set CXL_REGION_F_UNREGISTER to prevent
+ * concurrent execution. We don't set it here because that would
+ * cause unregister_region() to return early without cleanup.
+ *
+ * The work function expects to hold a reference, and will drop it
+ * when done. We transfer our lookup reference to the workqueue.
+ */
if (!remove_devm_actions(cxlr)) {
+ /*
+ * Failed to queue work. This can happen if:
+ * - Work already queued (another thread racing)
+ * - Workqueue stopped
+ * Drop our reference and return busy.
+ */
put_device(&cxlr->dev);
return -EBUSY;
}
+ /*
+ * Work queued successfully. The workqueue now owns our reference
+ * and will drop it after calling devm_remove_action().
+ */
return len;
}
DEVICE_ATTR_WO(delete_region);
@@ -3762,11 +3854,32 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
rc = __construct_region(cxlr, ctx);
if (rc) {
- /* remove_devm_actions_work() will put the device */
- get_device(&cxlr->dev);
- unregister_region(cxlr);
- if (!remove_devm_actions(cxlr))
+ /*
+ * Construction failed. We need to unregister the region.
+ * We hold a reference from __create_region() above.
+ *
+ * Queue work to remove devm action. The work will call
+ * devm_remove_action() -> unregister_region() which will
+ * set CXL_REGION_F_UNREGISTER to prevent concurrent execution.
+ *
+ * The work function expects a reference. We transfer our
+ * reference from __create_region() to the workqueue.
+ */
+ if (!remove_devm_actions(cxlr)) {
+ /*
+ * Failed to queue work. This can happen if:
+ * - Another thread already queued removal
+ * - Workqueue is stopped
+ *
+ * In either case, drop our reference and let the other
+ * path handle cleanup.
+ */
put_device(&cxlr->dev);
+ }
+ /*
+ * If work queued successfully, it now owns our reference
+ * and will drop it after calling devm_remove_action().
+ */
return ERR_PTR(rc);
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-28 22:33 ` Dave Jiang
@ 2026-04-30 4:39 ` Sungwoo Kim
2026-04-30 16:00 ` Dave Jiang
0 siblings, 1 reply; 12+ messages in thread
From: Sungwoo Kim @ 2026-04-30 4:39 UTC (permalink / raw)
To: Dave Jiang
Cc: Davidlohr Bueso, Jonathan Cameron, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
On Tue, Apr 28, 2026 at 6:33 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
>
> On 4/28/26 1:28 PM, Sungwoo Kim wrote:
> > Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
> >
> > Claude suggests not using wq, since it's simpler. I agree that it's
> > simple, but it's overly tailored to fix a specific bug.
> > Actually, v1[1] proposed a similar patch. So let me bring a patch and
> > discussion from v1:
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 08fa3deef70ab..7ade9aa2aeecc 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
> > struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> > struct cxl_port *port = to_cxl_port(dev->parent);
> > struct cxl_region *cxlr;
> > + int err;
> >
> > cxlr = cxl_find_region_by_name(cxlrd, buf);
> > if (IS_ERR(cxlr))
> > return PTR_ERR(cxlr);
> >
> > - devm_release_action(port->uport_dev, unregister_region, cxlr);
> > + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
> > + cxlr);
> > + if (err) {
> > + put_device(&cxlr->dev);
> > + return err;
> > + }
> > + unregister_region(cxlr);
> > put_device(&cxlr->dev);
> >
> > return len;
> >
> > However, v1 has not been merged. Dan[2] commented that "No, that is
> > not an acceptable or comprehensive fix. A subsystem should never try
> > to double unregister a device." Also in the following thread[3], "The
> > patch was technically correct but it relies on a design that requires
> > depending on a double free semantic."
> >
> > I respect this design decision. Then, I need to execute
> > devm_release_[action|all]() only once, which requires a device lock,
> > guard(device)(port->uport_dev). Under a lock, I can check a flag to
> > execute devm_release_[action|all]() only once.
> > To use the lock, a clean work without a prior lock is required. That's
> > a reason this patch ended up in wq.
> >
> > I hope I've explained the rationale for using wq. What do you think?
>
> Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.
I (and Claude) don't have a better solution than using wq, although I
agree that not using wq is simpler.
Also, I'm not yet experienced enough to decide which is better for
CXL, so I'm happy with both directions. Would you prefer the version
without wq?
>
> Issue 1: Lockless read of p->interleave_ways
> =============================================
> Problem: unregister_region() reads p->interleave_ways without holding
> the region rwsem while __construct_region() may still be initializing it
> under write lock. This can result in reading 0 and skipping target
> detachment, potentially causing use-after-free.
>
> Solution: Acquire rwsem read lock before reading p->interleave_ways to
> serialize with __construct_region(). Add explicit handling for
> interleave_ways == 0 (uninitialized region) and use smp_rmb() to ensure
> proper memory ordering.
Memory barrier is a good catch.
>
> Issue 2: Driver unbind/rebind race with workqueue
> ==================================================
> Problem: The async workqueue could execute after driver unbind clears
> the devres list, then operate on a new/empty list after rebind.
>
> Solution: The workqueue already checks if driver is bound before calling
> devm_remove_action(). Added cancel_work_sync() in cxl_region_release()
> to ensure no dangling work when region is freed. Enhanced comments to
> clarify the synchronization.
Sounds good to me.
>
> Issue 3: Use-after-free in construct_region() error path
> =========================================================
> Problem: In the original problematic code, calling unregister_region()
> directly then get_device() created a window where concurrent
> delete_region_store() could free the region.
>
> Solution: Never call unregister_region() directly. The workqueue calls
> devm_remove_action() which invokes unregister_region() in a safe context.
> Reference transfer is now clean - caller's reference is transferred to
> the workqueue, which drops it after work completes.
>
> Key design:
> - unregister_region() is ONLY called as a devm callback (driver unbind
> or workqueue via devm_remove_action())
> - CXL_REGION_F_UNREGISTER flag is ONLY set by unregister_region() itself,
> preventing concurrent execution
> - CXL_REGION_F_DEVM_REMOVE flag is set by workqueue to prevent duplicate
> devm_remove_action() calls
> - Reference counting is clear: lookup refs are transferred to workqueue
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 08b93b14a2c7..4c5771bc4717 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2500,6 +2500,14 @@ static void cxl_region_release(struct device *dev)
> struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> int id = atomic_read(&cxlrd->region_id);
>
> + /*
> + * Ensure pending removal work completes before freeing.
> + * This is safe even if called from within the work function
> + * because cancel_work_sync() returns immediately if the work
> + * is currently executing.
> + */
> + cancel_work_sync(&cxlr->remove_work);
> +
> /*
> * Try to reuse the recently idled id rather than the cached
> * next id to prevent the region id space from increasing
> @@ -2544,22 +2552,46 @@ static void unregister_region(void *_cxlr)
> struct cxl_region_params *p = &cxlr->params;
> int i;
>
> + /*
> + * This is ONLY called as a devm callback:
> + * 1. During driver unbind (devres_release_all)
> + * 2. Via devm_remove_action() from workqueue
> + *
> + * The UNREGISTER flag prevents concurrent execution if both
> + * paths race (e.g., user delete concurrent with driver unbind).
> + */
> if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
> return;
>
> device_del(&cxlr->dev);
>
> /*
> - * Now that region sysfs is shutdown, the parameter block is now
> - * read-only, so no need to hold the region rwsem to access the
> - * region parameters.
> + * Sysfs is shutdown, blocking new writers. However, __construct_region()
> + * may still be initializing params under region rwsem write lock.
> + * Acquire read lock to serialize and get consistent view of interleave_ways.
> + *
> + * If interleave_ways is 0, region was never fully initialized.
> */
> + guard(rwsem_read)(&cxl_rwsem.region);
> +
> + /*
> + * Ensure UNREGISTER flag check happens-before interleave_ways read.
> + */
> + smp_rmb();
> +
> + if (p->interleave_ways == 0) {
> + /* Not initialized - skip target detachment */
> + goto out;
> + }
> +
> for (i = 0; i < p->interleave_ways; i++)
> detach_target(cxlr, i);
>
> cxlr->hpa_range = DEFINE_RANGE(0, -1);
>
> +out:
> cxl_region_iomem_release(cxlr);
> + /* Drop the devm-owned reference */
> put_device(&cxlr->dev);
Does this lead to dropping the refcount to below 0 in __construct_region()?
If unregister_region() is called before __construct_region() by
devm_remove_all(), cxlr->dev's refcount can be 0 at this point, and
could lead to use-after-free or refcount underflow in
__construct_region().
Such as:
__create_region()
devm_cxl_add_region()
cxl_region_alloc(); // refcount = 1 ... (1)
__construct_region() ... (3)
get_device(&cxlr->dev); // BUG: refcount is already 0 at (2)
devm_remove_all()
...
unregister_region() ... (2)
put_device(); // refcount = 0, release.
To fix this, __construct_region() should also check that
unregister_region() has been called. If so, __construct_region()
should return failure.
Thank you!
Sungwoo.
> }
>
> @@ -2842,22 +2874,59 @@ static bool remove_devm_actions(struct cxl_region *cxlr)
> return schedule_cxl_region_remove_devm_actions(cxlr);
> }
>
> +/**
> + * remove_devm_actions_work - Workqueue handler for region removal
> + * @work: work_struct embedded in cxl_region
> + *
> + * Runs in safe context where it can acquire locks and call
> + * devm_remove_action() which will invoke unregister_region() without deadlock.
> + *
> + * This ensures unregister_region() is called through the devm system.
> + */
> static void remove_devm_actions_work(struct work_struct *work)
> {
> struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
> struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
>
> + /*
> + * Prevent multiple attempts to remove devm actions.
> + * If already set, another thread beat us to it.
> + */
> if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
> put_device(&cxlr->dev);
> return;
> }
>
> + /*
> + * Lock port device and verify driver is still bound.
> + * If driver was unbound, devres_release_all() already called
> + * unregister_region() and the devres entry is gone.
> + */
> scoped_guard(device, port->uport_dev) {
> - if (port->uport_dev->driver)
> + if (port->uport_dev->driver) {
> + /*
> + * Call devm_remove_action() which will:
> + * 1. Find and remove the devres entry
> + * 2. Call unregister_region(cxlr)
> + *
> + * The UNREGISTER flag in unregister_region() prevents
> + * re-entry if this races with driver unbind.
> + */
> devm_remove_action(port->uport_dev, unregister_region, cxlr);
> + }
> + /*
> + * If driver not bound, devres already released everything.
> + * The UNREGISTER flag would have prevented duplicate execution.
> + */
> }
>
> + /*
> + * Drop the extra reference taken when work was queued.
> + * After this put_device(), if ref reaches 0, cxl_region_release()
> + * will be called, which does cancel_work_sync() on this work item
> + * (safe because we're about to return from the work function).
> + */
> put_device(&cxlr->dev);
> }
>
> @@ -2868,17 +2937,40 @@ static ssize_t delete_region_store(struct device *dev,
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> struct cxl_region *cxlr;
>
> - /* remove_devm_actions_work() will put cxlr->dev. */
> + /*
> + * Look up region by name. Takes a reference that will be transferred
> + * to the workqueue.
> + */
> cxlr = cxl_find_region_by_name(cxlrd, buf);
> if (IS_ERR(cxlr))
> return PTR_ERR(cxlr);
>
> - unregister_region(cxlr);
> + /*
> + * Queue work to remove devm action. The workqueue will call
> + * devm_remove_action() -> unregister_region() in safe context.
> + *
> + * unregister_region() will set CXL_REGION_F_UNREGISTER to prevent
> + * concurrent execution. We don't set it here because that would
> + * cause unregister_region() to return early without cleanup.
> + *
> + * The work function expects to hold a reference, and will drop it
> + * when done. We transfer our lookup reference to the workqueue.
> + */
> if (!remove_devm_actions(cxlr)) {
> + /*
> + * Failed to queue work. This can happen if:
> + * - Work already queued (another thread racing)
> + * - Workqueue stopped
> + * Drop our reference and return busy.
> + */
> put_device(&cxlr->dev);
> return -EBUSY;
> }
>
> + /*
> + * Work queued successfully. The workqueue now owns our reference
> + * and will drop it after calling devm_remove_action().
> + */
> return len;
> }
> DEVICE_ATTR_WO(delete_region);
> @@ -3762,11 +3854,32 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>
> rc = __construct_region(cxlr, ctx);
> if (rc) {
> - /* remove_devm_actions_work() will put the device */
> - get_device(&cxlr->dev);
> - unregister_region(cxlr);
> - if (!remove_devm_actions(cxlr))
> + /*
> + * Construction failed. We need to unregister the region.
> + * We hold a reference from __create_region() above.
> + *
> + * Queue work to remove devm action. The work will call
> + * devm_remove_action() -> unregister_region() which will
> + * set CXL_REGION_F_UNREGISTER to prevent concurrent execution.
> + *
> + * The work function expects a reference. We transfer our
> + * reference from __create_region() to the workqueue.
> + */
> + if (!remove_devm_actions(cxlr)) {
> + /*
> + * Failed to queue work. This can happen if:
> + * - Another thread already queued removal
> + * - Workqueue is stopped
> + *
> + * In either case, drop our reference and let the other
> + * path handle cleanup.
> + */
> put_device(&cxlr->dev);
> + }
> + /*
> + * If work queued successfully, it now owns our reference
> + * and will drop it after calling devm_remove_action().
> + */
>
> return ERR_PTR(rc);
> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
2026-04-30 4:39 ` Sungwoo Kim
@ 2026-04-30 16:00 ` Dave Jiang
0 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2026-04-30 16:00 UTC (permalink / raw)
To: Sungwoo Kim
Cc: Davidlohr Bueso, Jonathan Cameron, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Robert Richter, Li Ming, Gregory Price,
Ben Widawsky, Dave Tian, linux-cxl, linux-kernel
On 4/29/26 9:39 PM, Sungwoo Kim wrote:
> On Tue, Apr 28, 2026 at 6:33 PM Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>>
>> On 4/28/26 1:28 PM, Sungwoo Kim wrote:
>>> Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
>>>
>>> Claude suggests not using wq, since it's simpler. I agree that it's
>>> simple, but it's overly tailored to fix a specific bug.
>>> Actually, v1[1] proposed a similar patch. So let me bring a patch and
>>> discussion from v1:
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 08fa3deef70ab..7ade9aa2aeecc 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
>>> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>>> struct cxl_port *port = to_cxl_port(dev->parent);
>>> struct cxl_region *cxlr;
>>> + int err;
>>>
>>> cxlr = cxl_find_region_by_name(cxlrd, buf);
>>> if (IS_ERR(cxlr))
>>> return PTR_ERR(cxlr);
>>>
>>> - devm_release_action(port->uport_dev, unregister_region, cxlr);
>>> + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
>>> + cxlr);
>>> + if (err) {
>>> + put_device(&cxlr->dev);
>>> + return err;
>>> + }
>>> + unregister_region(cxlr);
>>> put_device(&cxlr->dev);
>>>
>>> return len;
>>>
>>> However, v1 has not been merged. Dan[2] commented that "No, that is
>>> not an acceptable or comprehensive fix. A subsystem should never try
>>> to double unregister a device." Also in the following thread[3], "The
>>> patch was technically correct but it relies on a design that requires
>>> depending on a double free semantic."
>>>
>>> I respect this design decision. Then, I need to execute
>>> devm_release_[action|all]() only once, which requires a device lock,
>>> guard(device)(port->uport_dev). Under a lock, I can check a flag to
>>> execute devm_release_[action|all]() only once.
>>> To use the lock, a clean work without a prior lock is required. That's
>>> a reason this patch ended up in wq.
>>>
>>> I hope I've explained the rationale for using wq. What do you think?
>>
>> Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.
>
> I (and Claude) don't have a better solution than using wq, although I
> agree that not using wq is simpler.
> Also, I'm not yet experienced enough to decide which is better for
> CXL, so I'm happy with both directions. Would you prefer the version
> without wq?
Looks like Dan is working on something [1]. So maybe we wait and see what he comes up with.
[1]: https://lore.kernel.org/linux-cxl/c65851c1-4fca-46ba-8dde-fa10b7cb9cd3@amd.com/T/#mdd1ab49c012321fcd3dc34bd0cb7c0846cf1d1f9
DJ
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-30 16:00 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
2026-04-27 3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
2026-04-27 17:27 ` Dave Jiang
2026-04-27 3:20 ` [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store() Sungwoo Kim
2026-04-27 12:51 ` [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Jonathan Cameron
2026-04-27 17:42 ` Dave Jiang
2026-04-28 5:42 ` Sungwoo Kim
2026-04-28 19:04 ` Dave Jiang
2026-04-28 20:28 ` Sungwoo Kim
2026-04-28 22:33 ` Dave Jiang
2026-04-30 4:39 ` Sungwoo Kim
2026-04-30 16:00 ` Dave Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox