From: Dave Jiang <dave.jiang@intel.com>
To: Sungwoo Kim <iam@sung-woo.kim>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jic23@kernel.org>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>, Dan Williams <djbw@kernel.org>,
Robert Richter <rrichter@amd.com>, Li Ming <ming.li@zohomail.com>,
Gregory Price <gourry@gourry.net>,
Ben Widawsky <bwidawsk@kernel.org>, Dave Tian <daveti@purdue.edu>,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Date: Tue, 28 Apr 2026 12:04:56 -0700 [thread overview]
Message-ID: <4a20f390-a49d-4a9f-911f-21c36449b990@intel.com> (raw)
In-Reply-To: <CAJNyHpKMMwGnf7+2yzNnSbzozm2ROzfWmWg4xj8gROha+ncQeA@mail.gmail.com>
On 4/27/26 10:42 PM, Sungwoo Kim wrote:
> On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>>
>> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
>>> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
>>>
>>> Overview
>>> ========
>>> This patch series fixes race conditions in cxl region unregistration.
>>>
>>> devm_release_action() should be called once, otherwise, it warns about
>>> the second call. However, the current implementation has a race condition
>>> that allows multiple calls to devm_release_action(). The details are in
>>> each patch.
>>>
>>> To fix these, the first patch adds a new function that guarantees that
>>> devm_release_action() is called only once.
>>> Using this function, the second patch subsitutes the current use of
>>> devm_release_action() in cxl region with the new function.
>>>
>>> Change in v3
>>> ============
>>> Addressed Dave's comments:
>>> - Split and made this in a patch series.
>>> - Enhanced a commit log explaining why a workqueue is used in the first patch.
>>> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
>>> Sashiko AI review fixes:
>>> - Fixed construct_region() as it also can race.
>>> - Used a driver's wq instead of system wq so unbinding can drain a work.
>>
>> A few issues Sashiko raised with the v3.
>> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
>
> A Sashiko's reasoning makes sense to me. In the middle of
> construct_region(), users can perform a sysfs write, which then calls
> unregister_region().
> unregister_region() must not be called before construct_region() has
> completed, since it calls device_del() and put_device().
>
> This isn't introduced by this patch, but I can fix it in v4.
>
> An enable/disable flag to allow sysfs write might fix this, like:
>
> static struct cxl_region *construct_region(...)
> {
> ... // when a construction is done
> set_bit(CXL_REGION_F_ENABLE_SYSFS);
> return cxlr;
> }
>
> This prevents a sysfs write before it's done:
>
> static ssize_t delete_region_store(...)
> {
> if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
> return -EBUSY;
> }
> ...
> }
>
> Would there be a better solution? I'd like to ask for comments on this.
> This patch series already introduces two new flags, and I'm concerned
> about adding another.
Sungwoo,
I looked at the sachiko raised issues and looked at the current patches. I threw the sachiko objections and the current patches at Claude and it says the workqueue change made this more complicated and proposed this solution. Seems reasonable. What do you think?
---
Race 1: Concurrent delete_region_store() calls
===============================================
CPU 0 CPU 1
delete_region_store() delete_region_store()
cxl_find_region_by_name() cxl_find_region_by_name()
devm_release_action() devm_release_action()
unregister_region() WARN_ON(-ENOENT)
The second call fails because the devm action was already removed.
Race 2: Concurrent delete and driver unbind
============================================
CPU 0 CPU 1
delete_region_store() driver unbind
cxl_find_region_by_name() devres_release_all()
devm_release_action() unregister_region()
unregister_region() // cleanup done
// cleanup done again!
WARN_ON(-ENOENT)
Race 3: Use-after-free during construction
===========================================
CPU 0: __construct_region() CPU 1: delete_region_store()
device_add(&cxlr->dev)
// Region visible in sysfs!
cxl_find_region_by_name()
// SUCCESS!
__construct_region(cxlr)
p = &cxlr->params
// p->interleave_ways still 0
unregister_region(cxlr)
device_del()
for (i = 0; i < 0; i++)
// LOOP SKIPPED!
put_device(&cxlr->dev)
// refcount -> 0
// cxlr FREED!
p->interleave_ways = ctx->... // USE-AFTER-FREE!
p->state = INTERLEAVE_ACTIVE // USE-AFTER-FREE!
The Solution
============
1. Make unregister_region() idempotent using CXL_REGION_F_UNREGISTER flag
Multiple concurrent calls will have one do the cleanup and others
return early safely.
2. Use synchronous devm_remove_action_nowarn() instead of devm_release_action()
- Call unregister_region() first (idempotent cleanup)
- Then remove devm action with _nowarn variant
- -ENOENT is benign in all race scenarios:
* devres_release_all() already handled it
* Driver unbind/rebind race (old action gone, new devres list)
3. Protect params read in unregister_region() with lock
Use scoped_guard(rwsem_read) to safely read interleave_ways even
during concurrent construction. If state < INTERLEAVE_ACTIVE, the
region is still initializing and ways=0 is correct.
4. Add memory barriers for state transitions
Use smp_store_release() in __construct_region() and smp_load_acquire()
in unregister_region() to ensure proper ordering of initialization.
5. Hold extra reference during construct_region()
Prevents premature freeing if unregister races during initialization.
6. Check unregister flag in __construct_region()
Abort construction early if delete_region_store() raced after
device_add() but before acquiring the write lock.
This approach is simpler and more correct than async workers because:
- Relies on fundamental kernel primitives (idempotency + locks)
- No TOCTOU races with driver binding state
- Fewer lines of code
- Handles all race scenarios correctly
Race Scenario Analysis:
| Scenario | Outcome |
|---------------------------|--------------------------------------|
| Normal delete | Cleanup done, action removed |
| Concurrent deletes | One cleans up, others return early |
| Delete + driver unbind | One cleans up, other gets -ENOENT |
| Delete + unbind/rebind | Old action handled, -ENOENT benign |
| Delete during construct | Extra ref prevents UAF, lock protects|
---
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..8474756913e6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2541,16 +2541,45 @@ static void unregister_region(void *_cxlr)
{
struct cxl_region *cxlr = _cxlr;
struct cxl_region_params *p = &cxlr->params;
- int i;
+ int i, ways;
+
+ /*
+ * Idempotency: prevent multiple concurrent calls to unregister_region().
+ * This can happen if user calls delete_region_store() while driver
+ * unbind triggers devres_release_all() concurrently.
+ */
+ if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
+ return;
device_del(&cxlr->dev);
/*
- * Now that region sysfs is shutdown, the parameter block is now
- * read-only, so no need to hold the region rwsem to access the
- * region parameters.
+ * Even though sysfs is shutdown, the region may still be initializing
+ * if construct_region() is running concurrently. We must safely read
+ * interleave_ways under lock to avoid reading uninitialized values.
+ *
+ * Use a read lock (not write) to allow concurrent operations if needed.
+ * Read the ways count under lock, then release before the loop to
+ * minimize lock hold time.
+ *
+ * Memory ordering: use smp_load_acquire() to ensure we see the fully
+ * initialized value of interleave_ways if state >= INTERLEAVE_ACTIVE.
*/
- for (i = 0; i < p->interleave_ways; i++)
+ scoped_guard(rwsem_read, &cxl_rwsem.region) {
+ /*
+ * Pairs with smp_store_release() in __construct_region().
+ * Ensures if we see state >= INTERLEAVE_ACTIVE, we also see
+ * the initialized interleave_ways value.
+ */
+ enum cxl_config_state state = smp_load_acquire(&p->state);
+
+ if (state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+ ways = READ_ONCE(p->interleave_ways);
+ else
+ ways = 0; /* Still initializing, nothing to detach */
+ }
+
+ for (i = 0; i < ways; i++)
detach_target(cxlr, i);
cxlr->hpa_range = DEFINE_RANGE(0, -1);
@@ -2838,14 +2867,48 @@ static ssize_t delete_region_store(struct device *dev,
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_port *port = to_cxl_port(dev->parent);
struct cxl_region *cxlr;
+ int rc;
cxlr = cxl_find_region_by_name(cxlrd, buf);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ /*
+ * Call unregister_region() first to do the cleanup. This is
+ * idempotent (CXL_REGION_F_UNREGISTER flag), so if
+ * devres_release_all() calls it concurrently, one will do the
+ * work and the other will return early.
+ */
+ unregister_region(cxlr);
+
+ /*
+ * Now remove the devm action to prevent devres_release_all()
+ * from calling unregister_region() again during driver unbind.
+ *
+ * Three possible outcomes:
+ *
+ * 1. Success: We removed the action before driver unbind.
+ * Driver unbind won't call unregister_region().
+ *
+ * 2. -ENOENT: devres_release_all() already removed the action
+ * and called unregister_region() (which returned early due
+ * to F_UNREGISTER flag). This is benign.
+ *
+ * 3. -ENOENT: Driver was unbound and rebound between
+ * unregister_region() above and here. The old action was
+ * already handled by the old unbind. The new binding has
+ * no action for this region. This is benign.
+ *
+ * Use _nowarn variant since -ENOENT is expected and benign.
+ * The devres_lock serializes access, so we can't corrupt the
+ * devres list.
+ */
+ rc = devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+ if (rc && rc != -ENOENT)
+ dev_warn(&cxlr->dev,
+ "Unexpected error removing devm action: %d\n", rc);
+
put_device(&cxlr->dev);
-
return len;
}
DEVICE_ATTR_WO(delete_region);
@@ -3644,6 +3707,16 @@ static int __construct_region(struct cxl_region *cxlr,
return -EBUSY;
}
+ /*
+ * Check if region is being unregistered concurrently. This can happen
+ * if user triggers delete_region_store() right after device_add() but
+ * before we acquire the write lock above.
+ */
+ if (test_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags)) {
+ dev_dbg(&cxlr->dev, "Region unregister in progress, aborting construction\n");
+ return -EBUSY;
+ }
+
set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
cxlr->hpa_range = *hpa_range;
@@ -3686,7 +3759,16 @@ static int __construct_region(struct cxl_region *cxlr,
p->res = res;
p->interleave_ways = ctx->interleave_ways;
p->interleave_granularity = ctx->interleave_granularity;
- p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+
+ /*
+ * Use smp_store_release() to ensure all initialization above is
+ * visible before state becomes INTERLEAVE_ACTIVE. This pairs with
+ * smp_load_acquire() in unregister_region().
+ *
+ * This ensures that if unregister_region() sees state >=
+ * INTERLEAVE_ACTIVE, it will also see the initialized interleave_ways.
+ */
+ smp_store_release(&p->state, CXL_CONFIG_INTERLEAVE_ACTIVE);
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
if (rc)
@@ -3728,12 +3810,31 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
+ /*
+ * Hold an extra reference during initialization to prevent
+ * unregister_region() from freeing the object if delete_region_store()
+ * races with __construct_region(). The device_initialize() in
+ * cxl_region_alloc() set refcount=1, and device_add() above made it
+ * visible, so a concurrent unregister could drop that reference while
+ * we're still initializing.
+ */
+ get_device(&cxlr->dev);
+
rc = __construct_region(cxlr, ctx);
if (rc) {
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ /*
+ * Construction failed. Unregister the partially initialized
+ * region and remove the devm action. Both are safe to call
+ * due to idempotency and proper locking.
+ */
+ unregister_region(cxlr);
+ devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+
+ put_device(&cxlr->dev); /* Drop construction reference */
return ERR_PTR(rc);
}
+ put_device(&cxlr->dev); /* Drop construction reference */
return cxlr;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..81952f0763e1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -447,6 +447,13 @@ struct cxl_region_params {
*/
#define CXL_REGION_F_NORMALIZED_ADDRESSING 3
+/*
+ * Indicate that this region is being unregistered. Used to make
+ * unregister_region() idempotent and prevent race conditions between
+ * delete_region_store() and devres_release_all().
+ */
+#define CXL_REGION_F_UNREGISTER 4
+
/**
* struct cxl_region - CXL region
* @dev: This region's device
next prev parent reply other threads:[~2026-04-28 19:04 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
2026-04-27 3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
2026-04-27 17:27 ` Dave Jiang
2026-04-27 3:20 ` [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store() Sungwoo Kim
2026-04-27 12:51 ` [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Jonathan Cameron
2026-04-27 17:42 ` Dave Jiang
2026-04-28 5:42 ` Sungwoo Kim
2026-04-28 19:04 ` Dave Jiang [this message]
2026-04-28 20:28 ` Sungwoo Kim
2026-04-28 22:33 ` Dave Jiang
2026-04-30 4:39 ` Sungwoo Kim
2026-04-30 16:00 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a20f390-a49d-4a9f-911f-21c36449b990@intel.com \
--to=dave.jiang@intel.com \
--cc=alison.schofield@intel.com \
--cc=bwidawsk@kernel.org \
--cc=dave@stgolabs.net \
--cc=daveti@purdue.edu \
--cc=djbw@kernel.org \
--cc=gourry@gourry.net \
--cc=iam@sung-woo.kim \
--cc=ira.weiny@intel.com \
--cc=jic23@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox