Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.

Linux CXL
 help / color / mirror / Atom feed

From: Dave Jiang <dave.jiang@intel.com>
To: Sungwoo Kim <iam@sung-woo.kim>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jic23@kernel.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>, Dan Williams <djbw@kernel.org>,
	Robert Richter <rrichter@amd.com>, Li Ming <ming.li@zohomail.com>,
	Gregory Price <gourry@gourry.net>,
	Ben Widawsky <bwidawsk@kernel.org>, Dave Tian <daveti@purdue.edu>,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Date: Tue, 28 Apr 2026 12:04:56 -0700	[thread overview]
Message-ID: <4a20f390-a49d-4a9f-911f-21c36449b990@intel.com> (raw)
In-Reply-To: <CAJNyHpKMMwGnf7+2yzNnSbzozm2ROzfWmWg4xj8gROha+ncQeA@mail.gmail.com>



On 4/27/26 10:42 PM, Sungwoo Kim wrote:
> On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>>
>> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
>>> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
>>>
>>> Overview
>>> ========
>>> This patch series fixes race conditions in cxl region unregistration.
>>>
>>> devm_release_action() should be called once, otherwise, it warns about
>>> the second call. However, the current implementation has a race condition
>>> that allows multiple calls to devm_release_action(). The details are in
>>> each patch.
>>>
>>> To fix these, the first patch adds a new function that guarantees that
>>> devm_release_action() is called only once.
>>> Using this function, the second patch subsitutes the current use of
>>> devm_release_action() in cxl region with the new function.
>>>
>>> Change in v3
>>> ============
>>> Addressed Dave's comments:
>>> - Split and made this in a patch series.
>>> - Enhanced a commit log explaining why a workqueue is used in the first patch.
>>> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
>>> Sashiko AI review fixes:
>>> - Fixed construct_region() as it also can race.
>>> - Used a driver's wq instead of system wq so unbinding can drain a work.
>>
>> A few issues Sashiko raised with the v3.
>> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
> 
> A Sashiko's reasoning makes sense to me. In the middle of
> construct_region(), users can perform a sysfs write, which then calls
> unregister_region().
> unregister_region() must not be called before construct_region() has
> completed, since it calls device_del() and put_device().
> 
> This isn't introduced by this patch, but I can fix it in v4.
> 
> An enable/disable flag to allow sysfs write might fix this, like:
> 
> static struct cxl_region *construct_region(...)
> {
> ... // when a construction is done
>   set_bit(CXL_REGION_F_ENABLE_SYSFS);
>   return cxlr;
> }
> 
> This prevents a sysfs write before it's done:
> 
> static ssize_t delete_region_store(...)
> {
>   if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
>     return -EBUSY;
>   }
> ...
> }
> 
> Would there be a better solution? I'd like to ask for comments on this.
> This patch series already introduces two new flags, and I'm concerned
> about adding another.

Sungwoo,
I looked at the sachiko raised issues and looked at the current patches. I threw the sachiko objections and the current patches at Claude and it says the workqueue change made this more complicated and proposed this solution. Seems reasonable. What do you think?

---

    Race 1: Concurrent delete_region_store() calls
    ===============================================
    CPU 0                                  CPU 1
    delete_region_store()                  delete_region_store()
      cxl_find_region_by_name()              cxl_find_region_by_name()
      devm_release_action()                  devm_release_action()
        unregister_region()                    WARN_ON(-ENOENT)
    
    The second call fails because the devm action was already removed.
    
    Race 2: Concurrent delete and driver unbind
    ============================================
    CPU 0                                  CPU 1
    delete_region_store()                  driver unbind
      cxl_find_region_by_name()              devres_release_all()
      devm_release_action()                    unregister_region()
        unregister_region()                    // cleanup done
        // cleanup done again!
                                             WARN_ON(-ENOENT)
    
    Race 3: Use-after-free during construction
    ===========================================
    CPU 0: __construct_region()            CPU 1: delete_region_store()
    device_add(&cxlr->dev)
      // Region visible in sysfs!
                                           cxl_find_region_by_name()
                                             // SUCCESS!
    __construct_region(cxlr)
      p = &cxlr->params
      // p->interleave_ways still 0
                                           unregister_region(cxlr)
                                             device_del()
                                             for (i = 0; i < 0; i++)
                                               // LOOP SKIPPED!
                                             put_device(&cxlr->dev)
                                             // refcount -> 0
                                           // cxlr FREED!
    
      p->interleave_ways = ctx->...        // USE-AFTER-FREE!
      p->state = INTERLEAVE_ACTIVE         // USE-AFTER-FREE!
    
    The Solution
    ============
    
    1. Make unregister_region() idempotent using CXL_REGION_F_UNREGISTER flag
       Multiple concurrent calls will have one do the cleanup and others
       return early safely.
    
    2. Use synchronous devm_remove_action_nowarn() instead of devm_release_action()
       - Call unregister_region() first (idempotent cleanup)
       - Then remove devm action with _nowarn variant
       - -ENOENT is benign in all race scenarios:
         * devres_release_all() already handled it
         * Driver unbind/rebind race (old action gone, new devres list)
    
    3. Protect params read in unregister_region() with lock
       Use scoped_guard(rwsem_read) to safely read interleave_ways even
       during concurrent construction. If state < INTERLEAVE_ACTIVE, the
       region is still initializing and ways=0 is correct.
    
    4. Add memory barriers for state transitions
       Use smp_store_release() in __construct_region() and smp_load_acquire()
       in unregister_region() to ensure proper ordering of initialization.
    
    5. Hold extra reference during construct_region()
       Prevents premature freeing if unregister races during initialization.
    
    6. Check unregister flag in __construct_region()
       Abort construction early if delete_region_store() raced after
       device_add() but before acquiring the write lock.
    
    This approach is simpler and more correct than async workers because:
    - Relies on fundamental kernel primitives (idempotency + locks)
    - No TOCTOU races with driver binding state
    - Fewer lines of code
    - Handles all race scenarios correctly
    
    Race Scenario Analysis:
    
    | Scenario                  | Outcome                              |
    |---------------------------|--------------------------------------|
    | Normal delete             | Cleanup done, action removed         |
    | Concurrent deletes        | One cleans up, others return early   |
    | Delete + driver unbind    | One cleans up, other gets -ENOENT    |
    | Delete + unbind/rebind    | Old action handled, -ENOENT benign   |
    | Delete during construct   | Extra ref prevents UAF, lock protects|
    
---

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..8474756913e6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2541,16 +2541,45 @@ static void unregister_region(void *_cxlr)
 {
 	struct cxl_region *cxlr = _cxlr;
 	struct cxl_region_params *p = &cxlr->params;
-	int i;
+	int i, ways;
+
+	/*
+	 * Idempotency: prevent multiple concurrent calls to unregister_region().
+	 * This can happen if user calls delete_region_store() while driver
+	 * unbind triggers devres_release_all() concurrently.
+	 */
+	if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
+		return;
 
 	device_del(&cxlr->dev);
 
 	/*
-	 * Now that region sysfs is shutdown, the parameter block is now
-	 * read-only, so no need to hold the region rwsem to access the
-	 * region parameters.
+	 * Even though sysfs is shutdown, the region may still be initializing
+	 * if construct_region() is running concurrently. We must safely read
+	 * interleave_ways under lock to avoid reading uninitialized values.
+	 *
+	 * Use a read lock (not write) to allow concurrent operations if needed.
+	 * Read the ways count under lock, then release before the loop to
+	 * minimize lock hold time.
+	 *
+	 * Memory ordering: use smp_load_acquire() to ensure we see the fully
+	 * initialized value of interleave_ways if state >= INTERLEAVE_ACTIVE.
 	 */
-	for (i = 0; i < p->interleave_ways; i++)
+	scoped_guard(rwsem_read, &cxl_rwsem.region) {
+		/*
+		 * Pairs with smp_store_release() in __construct_region().
+		 * Ensures if we see state >= INTERLEAVE_ACTIVE, we also see
+		 * the initialized interleave_ways value.
+		 */
+		enum cxl_config_state state = smp_load_acquire(&p->state);
+
+		if (state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+			ways = READ_ONCE(p->interleave_ways);
+		else
+			ways = 0; /* Still initializing, nothing to detach */
+	}
+
+	for (i = 0; i < ways; i++)
 		detach_target(cxlr, i);
 
 	cxlr->hpa_range = DEFINE_RANGE(0, -1);
@@ -2838,14 +2867,48 @@ static ssize_t delete_region_store(struct device *dev,
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 	struct cxl_port *port = to_cxl_port(dev->parent);
 	struct cxl_region *cxlr;
+	int rc;
 
 	cxlr = cxl_find_region_by_name(cxlrd, buf);
 	if (IS_ERR(cxlr))
 		return PTR_ERR(cxlr);
 
-	devm_release_action(port->uport_dev, unregister_region, cxlr);
+	/*
+	 * Call unregister_region() first to do the cleanup. This is
+	 * idempotent (CXL_REGION_F_UNREGISTER flag), so if
+	 * devres_release_all() calls it concurrently, one will do the
+	 * work and the other will return early.
+	 */
+	unregister_region(cxlr);
+
+	/*
+	 * Now remove the devm action to prevent devres_release_all()
+	 * from calling unregister_region() again during driver unbind.
+	 *
+	 * Three possible outcomes:
+	 *
+	 * 1. Success: We removed the action before driver unbind.
+	 *    Driver unbind won't call unregister_region().
+	 *
+	 * 2. -ENOENT: devres_release_all() already removed the action
+	 *    and called unregister_region() (which returned early due
+	 *    to F_UNREGISTER flag). This is benign.
+	 *
+	 * 3. -ENOENT: Driver was unbound and rebound between
+	 *    unregister_region() above and here. The old action was
+	 *    already handled by the old unbind. The new binding has
+	 *    no action for this region. This is benign.
+	 *
+	 * Use _nowarn variant since -ENOENT is expected and benign.
+	 * The devres_lock serializes access, so we can't corrupt the
+	 * devres list.
+	 */
+	rc = devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+	if (rc && rc != -ENOENT)
+		dev_warn(&cxlr->dev,
+			 "Unexpected error removing devm action: %d\n", rc);
+
 	put_device(&cxlr->dev);
-
 	return len;
 }
 DEVICE_ATTR_WO(delete_region);
@@ -3644,6 +3707,16 @@ static int __construct_region(struct cxl_region *cxlr,
 		return -EBUSY;
 	}
 
+	/*
+	 * Check if region is being unregistered concurrently. This can happen
+	 * if user triggers delete_region_store() right after device_add() but
+	 * before we acquire the write lock above.
+	 */
+	if (test_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags)) {
+		dev_dbg(&cxlr->dev, "Region unregister in progress, aborting construction\n");
+		return -EBUSY;
+	}
+
 	set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
 	cxlr->hpa_range = *hpa_range;
 
@@ -3686,7 +3759,16 @@ static int __construct_region(struct cxl_region *cxlr,
 	p->res = res;
 	p->interleave_ways = ctx->interleave_ways;
 	p->interleave_granularity = ctx->interleave_granularity;
-	p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+
+	/*
+	 * Use smp_store_release() to ensure all initialization above is
+	 * visible before state becomes INTERLEAVE_ACTIVE. This pairs with
+	 * smp_load_acquire() in unregister_region().
+	 *
+	 * This ensures that if unregister_region() sees state >=
+	 * INTERLEAVE_ACTIVE, it will also see the initialized interleave_ways.
+	 */
+	smp_store_release(&p->state, CXL_CONFIG_INTERLEAVE_ACTIVE);
 
 	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
 	if (rc)
@@ -3728,12 +3810,31 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 		return cxlr;
 	}
 
+	/*
+	 * Hold an extra reference during initialization to prevent
+	 * unregister_region() from freeing the object if delete_region_store()
+	 * races with __construct_region(). The device_initialize() in
+	 * cxl_region_alloc() set refcount=1, and device_add() above made it
+	 * visible, so a concurrent unregister could drop that reference while
+	 * we're still initializing.
+	 */
+	get_device(&cxlr->dev);
+
 	rc = __construct_region(cxlr, ctx);
 	if (rc) {
-		devm_release_action(port->uport_dev, unregister_region, cxlr);
+		/*
+		 * Construction failed. Unregister the partially initialized
+		 * region and remove the devm action. Both are safe to call
+		 * due to idempotency and proper locking.
+		 */
+		unregister_region(cxlr);
+		devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+
+		put_device(&cxlr->dev); /* Drop construction reference */
 		return ERR_PTR(rc);
 	}
 
+	put_device(&cxlr->dev); /* Drop construction reference */
 	return cxlr;
 }
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..81952f0763e1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -447,6 +447,13 @@ struct cxl_region_params {
  */
 #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
 
+/*
+ * Indicate that this region is being unregistered. Used to make
+ * unregister_region() idempotent and prevent race conditions between
+ * delete_region_store() and devres_release_all().
+ */
+#define CXL_REGION_F_UNREGISTER 4
+
 /**
  * struct cxl_region - CXL region
  * @dev: This region's device

next prev parent reply	other threads:[~2026-04-28 19:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
2026-04-27  3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
2026-04-27 17:27   ` Dave Jiang
2026-04-27  3:20 ` [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store() Sungwoo Kim
2026-04-27 12:51 ` [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Jonathan Cameron
2026-04-27 17:42 ` Dave Jiang
2026-04-28  5:42   ` Sungwoo Kim
2026-04-28 19:04     ` Dave Jiang [this message]
2026-04-28 20:28       ` Sungwoo Kim
2026-04-28 22:33         ` Dave Jiang
2026-04-30  4:39           ` Sungwoo Kim
2026-04-30 16:00             ` Dave Jiang

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e50dc716d4e dfblob:8474756913e dfblob:1297594beae
dfblob:81952f0763e )
 OR (
bs:"Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a20f390-a49d-4a9f-911f-21c36449b990@intel.com \
    --to=dave.jiang@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=bwidawsk@kernel.org \
    --cc=dave@stgolabs.net \
    --cc=daveti@purdue.edu \
    --cc=djbw@kernel.org \
    --cc=gourry@gourry.net \
    --cc=iam@sung-woo.kim \
    --cc=ira.weiny@intel.com \
    --cc=jic23@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.li@zohomail.com \
    --cc=rrichter@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox