public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: Sungwoo Kim <iam@sung-woo.kim>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jic23@kernel.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>, Dan Williams <djbw@kernel.org>
Cc: Dave Tian <daveti@purdue.edu>,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work
Date: Mon, 27 Apr 2026 10:27:17 -0700	[thread overview]
Message-ID: <5d122716-440c-465f-a999-a9e1fa8e308d@intel.com> (raw)
In-Reply-To: <20260427032010.916681-4-iam@sung-woo.kim>



On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> devm_remove_action() must be called (1) only once and (2) only if the
> device is still bound to a driver. However, several race conditions
> allow multiple calls to devm_remove_action().
> 
> For example, delete_region_store() and devres_release_all() can race [1]:
> 
> CPU0                                    CPU1
> devres_release_all()
>    // take devres_lock
>    remove_nodes(devres_head) // mv to local todo
>    // drop devres_lock                   delete_region_store()
>                                            cxlr = cxl_find_region_by_name()  // success
>                                            devm_release_action(unregister_region)
>                                              devres_release()
>                                                devres_remove()
>                                                  // hold devres_lock
>                                                  find_dr(devres_head) // does not find it
>                                              WARN_ON(-ENOENT)
>    release_nodes() // drain todo
>      unregister_region(cxlr) // release() cb
>        device_del()
> 
> To prevent this, this patch introduces a new function,
> remove_devm_actions(), that safely performs devm_release_action().
> 
> remove_devm_actions() guarantees that devm_release_action() is called
> only once by guarding with a flag. Also, it checks if the device is
> still bound to a driver before calling devm_remove_action().
> 
> In order to check the binding, a device lock must be held. To do this,
> Dan suggested [2] using a workqueue, since a new work has no prior lock
> and is clean to acquire a device lock.
> 
> [1] https://lore.kernel.org/linux-cxl/20260310183644.4rwc7ilmzy4t5xp6@offworld/
> [2] https://lore.kernel.org/linux-cxl/69b0a0f8bfb0b_213210026@dwillia2-mobl4.notmuch/
> 
> Suggested-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
> ---
>  drivers/cxl/core/port.c   |  6 ++++++
>  drivers/cxl/core/region.c | 27 +++++++++++++++++++++++++++
>  drivers/cxl/cxl.h         |  9 +++++++++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index c5aacd7054f1..2f142cea7f26 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2305,6 +2305,12 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
>  }
>  EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, "CXL");
>  
> +bool schedule_cxl_region_remove_devm_actions(struct cxl_region *cxlr)
> +{
> +	return queue_work(cxl_bus_wq, &cxlr->remove_work);
> +}
> +EXPORT_SYMBOL_NS_GPL(schedule_cxl_region_remove_devm_actions, "CXL");
> +
>  static void add_latency(struct access_coordinate *c, long latency)
>  {
>  	for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..b086ae88b5bb 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -39,6 +39,7 @@
>  static nodemask_t nodemask_region_seen = NODE_MASK_NONE;
>  
>  static struct cxl_region *to_cxl_region(struct device *dev);
> +static void remove_devm_actions_work(struct work_struct *work);
>  
>  #define __ACCESS_ATTR_RO(_level, _name) {				\
>  	.attr	= { .name = __stringify(_name), .mode = 0444 },		\
> @@ -2589,6 +2590,8 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i
>  	dev->type = &cxl_region_type;
>  	cxl_region_setup_flags(cxlr, &cxlrd->cxlsd.cxld);
>  
> +	INIT_WORK(&cxlr->remove_work, remove_devm_actions_work);
> +
>  	return cxlr;
>  }
>  
> @@ -2831,6 +2834,30 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
>  	return to_cxl_region(region_dev);
>  }
>  
> +static bool remove_devm_actions(struct cxl_region *cxlr)
> +{
> +	return schedule_cxl_region_remove_devm_actions(cxlr);
> +}
> +
> +static void remove_devm_actions_work(struct work_struct *work)
> +{
> +	struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
> +	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
> +	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
> +
> +	if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
> +		put_device(&cxlr->dev);
> +		return;
> +	}
> +
> +	scoped_guard(device, port->uport_dev) {
> +		if (port->uport_dev->driver)
> +			devm_remove_action(port->uport_dev, unregister_region, cxlr);
> +	}
> +
> +	put_device(&cxlr->dev);
> +}
> +
>  static ssize_t delete_region_store(struct device *dev,
>  				   struct device_attribute *attr,
>  				   const char *buf, size_t len)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..31ca4e6676ed 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -447,6 +447,12 @@ struct cxl_region_params {
>   */
>  #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
>  
> +/* Indicate that this region is being unregistered to prevent a race. */
> +#define CXL_REGION_F_UNREGISTER 4

I would flip the define of this with the one below and move this define to the second patch, since it is not being used here. Otherwise everything else LGTM.

DJ

> +
> +/* Indicate that this region called devm_remove_action. */
> +#define CXL_REGION_F_DEVM_REMOVE 5
> +
>  /**
>   * struct cxl_region - CXL region
>   * @dev: This region's device
> @@ -462,6 +468,7 @@ struct cxl_region_params {
>   * @coord: QoS access coordinates for the region
>   * @node_notifier: notifier for setting the access coordinates to node
>   * @adist_notifier: notifier for calculating the abstract distance of node
> + * @remove_work: trigger the remove action in a safe context to acquire locks
>   */
>  struct cxl_region {
>  	struct device dev;
> @@ -477,6 +484,7 @@ struct cxl_region {
>  	struct access_coordinate coord[ACCESS_COORDINATE_MAX];
>  	struct notifier_block node_notifier;
>  	struct notifier_block adist_notifier;
> +	struct work_struct remove_work;
>  };
>  
>  struct cxl_nvdimm_bridge {
> @@ -733,6 +741,7 @@ struct cxl_port *cxl_pci_find_port(struct pci_dev *pdev,
>  struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
>  				   struct cxl_dport **dport);
>  bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
> +bool schedule_cxl_region_remove_devm_actions(struct cxl_region *cxlr);
>  
>  struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>  				     struct device *dport, int port_id,


  reply	other threads:[~2026-04-27 17:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  3:20 [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Sungwoo Kim
2026-04-27  3:20 ` [PATCH v3 1/2] cxl/region: serialize devm action removal via scheduled work Sungwoo Kim
2026-04-27 17:27   ` Dave Jiang [this message]
2026-04-27  3:20 ` [PATCH v3 2/2] cxl/region: Fix a race bug in delete_region_store() Sungwoo Kim
2026-04-27 12:51 ` [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration Jonathan Cameron
2026-04-27 17:42 ` Dave Jiang
2026-04-28  5:42   ` Sungwoo Kim
2026-04-28 19:04     ` Dave Jiang
2026-04-28 20:28       ` Sungwoo Kim
2026-04-28 22:33         ` Dave Jiang
2026-04-30  4:39           ` Sungwoo Kim
2026-04-30 16:00             ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d122716-440c-465f-a999-a9e1fa8e308d@intel.com \
    --to=dave.jiang@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=dave@stgolabs.net \
    --cc=daveti@purdue.edu \
    --cc=djbw@kernel.org \
    --cc=iam@sung-woo.kim \
    --cc=ira.weiny@intel.com \
    --cc=jic23@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox