public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory
@ 2025-12-16  0:56 Dan Williams
  2025-12-16  0:56 ` [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron

Changes since v1 [1]:
- Introduce cxl_memdev_autoremove() one patch earlier in the series to
  fix no_free_ptr() induced NULL pointer de-reference bisect bug. (Ben)
- Drop returning a custom error code via @cxlmd->endpoint that can be
  added back by the accelerator enabling series if needed. (Ben)
- Document how providing @attach changes the semantics of
  devm_cxl_add_memdev(). (Ben)
- Rename @ops to @attach
- Add support to release an accelerator driver when the logical CXL link
  drops.
- Pick up test and review tags from Alison, Ben, Jonathan, Dave, and
  Alejandro.

[1]: http://lore.kernel.org/20251204022136.2573521-1-dan.j.williams@intel.com

Original Cover:
===============

The CXL subsystem is modular. That modularity is a benefit for
separation of concerns and testing. It is generally appropriate for this
class of devices that support hotplug and can dynamically add a CXL
personality alongside their PCI personality. However, a cost of modules
is ambiguity about when devices (cxl_memdevs, cxl_ports, cxl_regions)
have had a chance to attach to their corresponding drivers on
@cxl_bus_type.

This problem of not being able to reliably determine when a device has
had a chance to attach to its driver vs still waiting for the module to
load, is a common problem for the "Soft Reserve Recovery" [2], and
"Accelerator Memory" [3] enabling efforts.

For "Soft Reserve Recovery" it wants to use wait_for_device_probe() as a
sync point for when CXL devices present at boot have had a chance to
attach to the cxl_pci driver (generic CXL memory expansion class
driver). That breaks down if wait_for_device_probe() only flushes PCI
device probe, but not the cxl_mem_probe() of the cxl_memdev that
cxl_pci_probe() creates.

For "Accelerator Memory", the driver is not cxl_pci, but any potential
PCI driver that wants to use the devm_cxl_add_memdev() ABI to attach to
the CXL memory domain. Those drivers want to know if the CXL link is
live end-to-end (from endpoint, through switches, to the host bridge)
and CXL memory operations are enabled. If not, a CXL accelerator may be
able to fall back to PCI-only operation. Similar to the "Soft Reserve
Memory" it needs to know that the CXL subsystem had a chance to probe
the ancestor topology of the device and let that driver make a
synchronous decision about CXL operation.

In support of those efforts:

* Clean up some resource lifetime issues in the current code
* Move some object creation symbols (devm_cxl_add_memdev() and
  devm_cxl_add_endpoint()) into the cxl_mem.ko and cxl_port.ko objects.
  Implicitly guarantee that cxl_mem_driver and cxl_port_driver have been
  registered prior to any device objects being registered. This is
  preferred over explicit open-coded request_module().
* Use scoped-based-cleanup before adding more resource management in
  devm_cxl_add_memdev()
* Give an accelerator the opportunity to run setup operations in
  cxl_mem_probe() so it can further probe if the CXL configuration matches
  its needs.

Some of these previously appeared on a branch as an RFC [4] and left
"Soft Reserve Recovery" and "Accelerator Memory" to jockey for ordering.
Instead, create a shared topic branch for both of those efforts to
import. The main changes since that RFC are fixing a bug and reducing
the amount of refactoring (which contributed to hiding the bug).

[2]: http://lore.kernel.org/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com
[3]: http://lore.kernel.org/20251119192236.2527305-1-alejandro.lucero-palau@amd.com
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order

Dan Williams (6):
  cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
  cxl/mem: Arrange for always-synchronous memdev attach
  cxl/port: Arrange for always synchronous endpoint attach
  cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
  cxl/mem: Drop @host argument to devm_cxl_add_memdev()
  cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation

 drivers/cxl/Kconfig          |   2 +-
 drivers/cxl/cxl.h            |   2 +
 drivers/cxl/cxlmem.h         |  17 ++++--
 drivers/cxl/core/edac.c      |  64 +++++++++++---------
 drivers/cxl/core/memdev.c    | 109 +++++++++++++++++++++++++----------
 drivers/cxl/mem.c            |  73 ++++++++++-------------
 drivers/cxl/pci.c            |   2 +-
 drivers/cxl/port.c           |  40 +++++++++++++
 tools/testing/cxl/test/mem.c |   2 +-
 9 files changed, 200 insertions(+), 111 deletions(-)


base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
-- 
2.51.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
  2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
@ 2025-12-16  0:56 ` Dan Williams
  2025-12-16  0:56 ` [PATCH v2 2/6] cxl/mem: Arrange for always-synchronous memdev attach Dan Williams
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Shiju Jose, Ben Cheatham, Jonathan Cameron,
	Alejandro Lucero

A device release method is only for undoing allocations on the path to
preparing the device for device_add(). In contrast, devm allocations are
post device_add(), are acquired during / after ->probe() and are released
synchronous with ->remove().

So, a "devm" helper in a "release" method is a clear anti-pattern.

Move this devm release action where it belongs, an action created at edac
object creation time. Otherwise, this leaks resources until
cxl_memdev_release() time which may be long after these xarray and error
record caches have gone idle.

Note, this also fixes up the type of @cxlmd->err_rec_array which needlessly
dropped type-safety.

Fixes: 0b5ccb0de1e2 ("cxl/edac: Support for finding memory operation attributes from the current boot")
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Shiju Jose <shiju.jose@huawei.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxlmem.h      |  5 +--
 drivers/cxl/core/edac.c   | 64 ++++++++++++++++++++++-----------------
 drivers/cxl/core/memdev.c |  1 -
 3 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 434031a0c1f7..c12ab4fc9512 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -63,7 +63,7 @@ struct cxl_memdev {
 	int depth;
 	u8 scrub_cycle;
 	int scrub_region_id;
-	void *err_rec_array;
+	struct cxl_mem_err_rec *err_rec_array;
 };
 
 static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
@@ -877,7 +877,6 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd);
 int devm_cxl_region_edac_register(struct cxl_region *cxlr);
 int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt);
 int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt);
-void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd);
 #else
 static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 { return 0; }
@@ -889,8 +888,6 @@ static inline int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd,
 static inline int cxl_store_rec_dram(struct cxl_memdev *cxlmd,
 				     union cxl_event *evt)
 { return 0; }
-static inline void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd)
-{ return; }
 #endif
 
 #ifdef CONFIG_CXL_SUSPEND
diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
index 79994ca9bc9f..81160260e26b 100644
--- a/drivers/cxl/core/edac.c
+++ b/drivers/cxl/core/edac.c
@@ -1988,6 +1988,40 @@ static int cxl_memdev_soft_ppr_init(struct cxl_memdev *cxlmd,
 	return 0;
 }
 
+static void err_rec_free(void *_cxlmd)
+{
+	struct cxl_memdev *cxlmd = _cxlmd;
+	struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
+	struct cxl_event_gen_media *rec_gen_media;
+	struct cxl_event_dram *rec_dram;
+	unsigned long index;
+
+	cxlmd->err_rec_array = NULL;
+	xa_for_each(&array_rec->rec_dram, index, rec_dram)
+		kfree(rec_dram);
+	xa_destroy(&array_rec->rec_dram);
+
+	xa_for_each(&array_rec->rec_gen_media, index, rec_gen_media)
+		kfree(rec_gen_media);
+	xa_destroy(&array_rec->rec_gen_media);
+	kfree(array_rec);
+}
+
+static int devm_cxl_memdev_setup_err_rec(struct cxl_memdev *cxlmd)
+{
+	struct cxl_mem_err_rec *array_rec =
+		kzalloc(sizeof(*array_rec), GFP_KERNEL);
+
+	if (!array_rec)
+		return -ENOMEM;
+
+	xa_init(&array_rec->rec_gen_media);
+	xa_init(&array_rec->rec_dram);
+	cxlmd->err_rec_array = array_rec;
+
+	return devm_add_action_or_reset(&cxlmd->dev, err_rec_free, cxlmd);
+}
+
 int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 {
 	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
@@ -2038,15 +2072,9 @@ int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
 		}
 
 		if (repair_inst) {
-			struct cxl_mem_err_rec *array_rec =
-				devm_kzalloc(&cxlmd->dev, sizeof(*array_rec),
-					     GFP_KERNEL);
-			if (!array_rec)
-				return -ENOMEM;
-
-			xa_init(&array_rec->rec_gen_media);
-			xa_init(&array_rec->rec_dram);
-			cxlmd->err_rec_array = array_rec;
+			rc = devm_cxl_memdev_setup_err_rec(cxlmd);
+			if (rc)
+				return rc;
 		}
 	}
 
@@ -2088,22 +2116,4 @@ int devm_cxl_region_edac_register(struct cxl_region *cxlr)
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
 
-void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd)
-{
-	struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array;
-	struct cxl_event_gen_media *rec_gen_media;
-	struct cxl_event_dram *rec_dram;
-	unsigned long index;
-
-	if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec)
-		return;
-
-	xa_for_each(&array_rec->rec_dram, index, rec_dram)
-		kfree(rec_dram);
-	xa_destroy(&array_rec->rec_dram);
 
-	xa_for_each(&array_rec->rec_gen_media, index, rec_gen_media)
-		kfree(rec_gen_media);
-	xa_destroy(&array_rec->rec_gen_media);
-}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_release, "CXL");
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index e370d733e440..4dff7f44d908 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -27,7 +27,6 @@ static void cxl_memdev_release(struct device *dev)
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 
 	ida_free(&cxl_memdev_ida, cxlmd->id);
-	devm_cxl_memdev_edac_release(cxlmd);
 	kfree(cxlmd);
 }
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/6] cxl/mem: Arrange for always-synchronous memdev attach
  2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
  2025-12-16  0:56 ` [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
@ 2025-12-16  0:56 ` Dan Williams
  2025-12-16  0:56 ` [PATCH v2 3/6] cxl/port: Arrange for always synchronous endpoint attach Dan Williams
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Alejandro Lucero, Jonathan Cameron,
	Ben Cheatham

In preparation for CXL accelerator drivers that have a hard dependency on
CXL capability initialization, arrange for cxl_mem_probe() to always run
synchronous with the device_add() of cxl_memdev instances. I.e.
cxl_mem_driver registration is always complete before the first memdev
creation event.

At present, cxl_pci does not care about the attach state of the cxl_memdev
because all generic memory expansion functionality can be handled by the
cxl_core. For accelerators, however, that driver needs to perform driver
specific initialization if CXL is available, or execute a fallback to PCIe
only operation.

This synchronous attach guarantee is also needed for Soft Reserve Recovery,
which is an effort that needs to assert that devices have had a chance to
attach before making a go / no-go decision on proceeding with CXL subsystem
initialization.

By moving devm_cxl_add_memdev() to cxl_mem.ko it removes async module
loading as one reason that a memdev may not be attached upon return from
devm_cxl_add_memdev().

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/Kconfig       |  2 +-
 drivers/cxl/cxlmem.h      |  2 ++
 drivers/cxl/core/memdev.c | 10 +++++++---
 drivers/cxl/mem.c         | 17 +++++++++++++++++
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 48b7314afdb8..f1361ed6a0d4 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -22,6 +22,7 @@ if CXL_BUS
 config CXL_PCI
 	tristate "PCI manageability"
 	default CXL_BUS
+	select CXL_MEM
 	help
 	  The CXL specification defines a "CXL memory device" sub-class in the
 	  PCI "memory controller" base class of devices. Device's identified by
@@ -89,7 +90,6 @@ config CXL_PMEM
 
 config CXL_MEM
 	tristate "CXL: Memory Expansion"
-	depends on CXL_PCI
 	default CXL_BUS
 	help
 	  The CXL.mem protocol allows a device to act as a provider of "System
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index c12ab4fc9512..012e68acad34 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -95,6 +95,8 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
 	return is_cxl_memdev(port->uport_dev);
 }
 
+struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
+					 struct cxl_dev_state *cxlds);
 struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
 				       struct cxl_dev_state *cxlds);
 int devm_cxl_sanitize_setup_notifier(struct device *host,
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 4dff7f44d908..7a4153e1c6a7 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1050,8 +1050,12 @@ static const struct file_operations cxl_memdev_fops = {
 	.llseek = noop_llseek,
 };
 
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
-				       struct cxl_dev_state *cxlds)
+/*
+ * Core helper for devm_cxl_add_memdev() that wants to both create a device and
+ * assert to the caller that upon return cxl_mem::probe() has been invoked.
+ */
+struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
+					 struct cxl_dev_state *cxlds)
 {
 	struct cxl_memdev *cxlmd;
 	struct device *dev;
@@ -1093,7 +1097,7 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
 	put_device(dev);
 	return ERR_PTR(rc);
 }
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
+EXPORT_SYMBOL_FOR_MODULES(__devm_cxl_add_memdev, "cxl_mem");
 
 static void sanitize_teardown_notifier(void *data)
 {
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 6e6777b7bafb..55883797ab2d 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -201,6 +201,22 @@ static int cxl_mem_probe(struct device *dev)
 	return devm_add_action_or_reset(dev, enable_suspend, NULL);
 }
 
+/**
+ * devm_cxl_add_memdev - Add a CXL memory device
+ * @host: devres alloc/release context and parent for the memdev
+ * @cxlds: CXL device state to associate with the memdev
+ *
+ * Upon return the device will have had a chance to attach to the
+ * cxl_mem driver, but may fail if the CXL topology is not ready
+ * (hardware CXL link down, or software platform CXL root not attached)
+ */
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+				       struct cxl_dev_state *cxlds)
+{
+	return __devm_cxl_add_memdev(host, cxlds);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
+
 static ssize_t trigger_poison_list_store(struct device *dev,
 					 struct device_attribute *attr,
 					 const char *buf, size_t len)
@@ -248,6 +264,7 @@ static struct cxl_driver cxl_mem_driver = {
 	.probe = cxl_mem_probe,
 	.id = CXL_DEVICE_MEMORY_EXPANDER,
 	.drv = {
+		.probe_type = PROBE_FORCE_SYNCHRONOUS,
 		.dev_groups = cxl_mem_groups,
 	},
 };
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/6] cxl/port: Arrange for always synchronous endpoint attach
  2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
  2025-12-16  0:56 ` [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
  2025-12-16  0:56 ` [PATCH v2 2/6] cxl/mem: Arrange for always-synchronous memdev attach Dan Williams
@ 2025-12-16  0:56 ` Dan Williams
  2025-12-16  0:56 ` [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Alejandro Lucero, Jonathan Cameron,
	Ben Cheatham

Make it so that upon return from devm_cxl_add_endpoint() that
cxl_mem_probe() can assume that the endpoint has had a chance to complete
cxl_port_probe().  I.e. cxl_port module loading has completed prior to
device registration.

Delete the MODULE_SOFTDEP() as it is not sufficient for this purpose, but a
hard link-time dependency is reliable. Specifically MODULE_SOFTDEP() does
not guarantee that the module loading has completed prior to the completion
of the current module's init.

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxl.h  |  2 ++
 drivers/cxl/mem.c  | 43 -------------------------------------------
 drivers/cxl/port.c | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ba17fa86d249..c796c3db36e0 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -780,6 +780,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
 				   struct cxl_dport *parent_dport);
 struct cxl_root *devm_cxl_add_root(struct device *host,
 				   const struct cxl_root_ops *ops);
+int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
+			  struct cxl_dport *parent_dport);
 struct cxl_root *find_cxl_root(struct cxl_port *port);
 
 DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev))
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 55883797ab2d..d62931526fd4 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -45,44 +45,6 @@ static int cxl_mem_dpa_show(struct seq_file *file, void *data)
 	return 0;
 }
 
-static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
-				 struct cxl_dport *parent_dport)
-{
-	struct cxl_port *parent_port = parent_dport->port;
-	struct cxl_port *endpoint, *iter, *down;
-	int rc;
-
-	/*
-	 * Now that the path to the root is established record all the
-	 * intervening ports in the chain.
-	 */
-	for (iter = parent_port, down = NULL; !is_cxl_root(iter);
-	     down = iter, iter = to_cxl_port(iter->dev.parent)) {
-		struct cxl_ep *ep;
-
-		ep = cxl_ep_load(iter, cxlmd);
-		ep->next = down;
-	}
-
-	/* Note: endpoint port component registers are derived from @cxlds */
-	endpoint = devm_cxl_add_port(host, &cxlmd->dev, CXL_RESOURCE_NONE,
-				     parent_dport);
-	if (IS_ERR(endpoint))
-		return PTR_ERR(endpoint);
-
-	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
-	if (rc)
-		return rc;
-
-	if (!endpoint->dev.driver) {
-		dev_err(&cxlmd->dev, "%s failed probe\n",
-			dev_name(&endpoint->dev));
-		return -ENXIO;
-	}
-
-	return 0;
-}
-
 static int cxl_debugfs_poison_inject(void *data, u64 dpa)
 {
 	struct cxl_memdev *cxlmd = data;
@@ -275,8 +237,3 @@ MODULE_DESCRIPTION("CXL: Memory Expansion");
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS("CXL");
 MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER);
-/*
- * create_endpoint() wants to validate port driver attach immediately after
- * endpoint registration.
- */
-MODULE_SOFTDEP("pre: cxl_port");
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 51c8f2f84717..7937e7e53797 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -156,10 +156,50 @@ static struct cxl_driver cxl_port_driver = {
 	.probe = cxl_port_probe,
 	.id = CXL_DEVICE_PORT,
 	.drv = {
+		.probe_type = PROBE_FORCE_SYNCHRONOUS,
 		.dev_groups = cxl_port_attribute_groups,
 	},
 };
 
+int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
+			  struct cxl_dport *parent_dport)
+{
+	struct cxl_port *parent_port = parent_dport->port;
+	struct cxl_port *endpoint, *iter, *down;
+	int rc;
+
+	/*
+	 * Now that the path to the root is established record all the
+	 * intervening ports in the chain.
+	 */
+	for (iter = parent_port, down = NULL; !is_cxl_root(iter);
+	     down = iter, iter = to_cxl_port(iter->dev.parent)) {
+		struct cxl_ep *ep;
+
+		ep = cxl_ep_load(iter, cxlmd);
+		ep->next = down;
+	}
+
+	/* Note: endpoint port component registers are derived from @cxlds */
+	endpoint = devm_cxl_add_port(host, &cxlmd->dev, CXL_RESOURCE_NONE,
+				     parent_dport);
+	if (IS_ERR(endpoint))
+		return PTR_ERR(endpoint);
+
+	rc = cxl_endpoint_autoremove(cxlmd, endpoint);
+	if (rc)
+		return rc;
+
+	if (!endpoint->dev.driver) {
+		dev_err(&cxlmd->dev, "%s failed probe\n",
+			dev_name(&endpoint->dev));
+		return -ENXIO;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_FOR_MODULES(devm_cxl_add_endpoint, "cxl_mem");
+
 static int __init cxl_port_init(void)
 {
 	return cxl_driver_register(&cxl_port_driver);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
  2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
                   ` (2 preceding siblings ...)
  2025-12-16  0:56 ` [PATCH v2 3/6] cxl/port: Arrange for always synchronous endpoint attach Dan Williams
@ 2025-12-16  0:56 ` Dan Williams
  2025-12-17 15:48   ` Jonathan Cameron
  2025-12-16  0:56 ` [PATCH v2 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev() Dan Williams
  2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
  5 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Ben Cheatham, Alejandro Lucero

In preparation for adding more setup steps, convert the current
implementation to scope-based cleanup.

The cxl_memdev_shutdown() is only required after cdev_device_add(). With
that moved to a helper function it precludes the need to add
scope-based-handler for that cleanup if devm_add_action_or_reset() fails.

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/memdev.c | 70 ++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 26 deletions(-)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 7a4153e1c6a7..18efbf294db5 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1050,6 +1050,45 @@ static const struct file_operations cxl_memdev_fops = {
 	.llseek = noop_llseek,
 };
 
+/*
+ * Activate ioctl operations, no cxl_memdev_rwsem manipulation needed as this is
+ * ordered with cdev_add() publishing the device.
+ */
+static int cxlmd_add(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
+{
+	int rc;
+
+	cxlmd->cxlds = cxlds;
+	cxlds->cxlmd = cxlmd;
+
+	rc = cdev_device_add(&cxlmd->cdev, &cxlmd->dev);
+	if (rc) {
+		/*
+		 * The cdev was briefly live, shutdown any ioctl operations that
+		 * saw that state.
+		 */
+		cxl_memdev_shutdown(&cxlmd->dev);
+		return rc;
+	}
+
+	return 0;
+}
+
+DEFINE_FREE(put_cxlmd, struct cxl_memdev *,
+	    if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev);)
+
+static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
+{
+	int rc;
+
+	rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
+				      cxlmd);
+	if (rc)
+		return ERR_PTR(rc);
+
+	return cxlmd;
+}
+
 /*
  * Core helper for devm_cxl_add_memdev() that wants to both create a device and
  * assert to the caller that upon return cxl_mem::probe() has been invoked.
@@ -1057,45 +1096,24 @@ static const struct file_operations cxl_memdev_fops = {
 struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
 					 struct cxl_dev_state *cxlds)
 {
-	struct cxl_memdev *cxlmd;
 	struct device *dev;
-	struct cdev *cdev;
 	int rc;
 
-	cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
+	struct cxl_memdev *cxlmd __free(put_cxlmd) =
+		cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
 	if (IS_ERR(cxlmd))
 		return cxlmd;
 
 	dev = &cxlmd->dev;
 	rc = dev_set_name(dev, "mem%d", cxlmd->id);
 	if (rc)
-		goto err;
-
-	/*
-	 * Activate ioctl operations, no cxl_memdev_rwsem manipulation
-	 * needed as this is ordered with cdev_add() publishing the device.
-	 */
-	cxlmd->cxlds = cxlds;
-	cxlds->cxlmd = cxlmd;
-
-	cdev = &cxlmd->cdev;
-	rc = cdev_device_add(cdev, dev);
-	if (rc)
-		goto err;
+		return ERR_PTR(rc);
 
-	rc = devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
+	rc = cxlmd_add(cxlmd, cxlds);
 	if (rc)
 		return ERR_PTR(rc);
-	return cxlmd;
 
-err:
-	/*
-	 * The cdev was briefly live, shutdown any ioctl operations that
-	 * saw that state.
-	 */
-	cxl_memdev_shutdown(dev);
-	put_device(dev);
-	return ERR_PTR(rc);
+	return cxl_memdev_autoremove(no_free_ptr(cxlmd));
 }
 EXPORT_SYMBOL_FOR_MODULES(__devm_cxl_add_memdev, "cxl_mem");
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev()
  2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
                   ` (3 preceding siblings ...)
  2025-12-16  0:56 ` [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
@ 2025-12-16  0:56 ` Dan Williams
  2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
  5 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Jonathan Cameron, Ben Cheatham,
	Alejandro Lucero

In all cases the device that created the 'struct cxl_dev_state' instance is
also the device to host the devm cleanup of devm_cxl_add_memdev(). This
simplifies the function prototype, and limits a degree of freedom of the
API.

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com> (✓ DKIM/intel.com)
Reviewed-by: Dave Jiang <dave.jiang@intel.com> (✓ DKIM/intel.com)
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (✓ DKIM/amd.com)
Tested-by: Alejandro Lucero <alucerop@amd.com> (✓ DKIM/amd.com)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxlmem.h         | 6 ++----
 drivers/cxl/core/memdev.c    | 3 +--
 drivers/cxl/mem.c            | 9 +++++----
 drivers/cxl/pci.c            | 2 +-
 tools/testing/cxl/test/mem.c | 2 +-
 5 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 012e68acad34..9db31c7993c4 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -95,10 +95,8 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
 	return is_cxl_memdev(port->uport_dev);
 }
 
-struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
-					 struct cxl_dev_state *cxlds);
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
-				       struct cxl_dev_state *cxlds);
+struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
+struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
 int devm_cxl_sanitize_setup_notifier(struct device *host,
 				     struct cxl_memdev *cxlmd);
 struct cxl_memdev_state;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 18efbf294db5..63da2bd4436e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1093,8 +1093,7 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
  * Core helper for devm_cxl_add_memdev() that wants to both create a device and
  * assert to the caller that upon return cxl_mem::probe() has been invoked.
  */
-struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
-					 struct cxl_dev_state *cxlds)
+struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 {
 	struct device *dev;
 	int rc;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index d62931526fd4..677996c65272 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -165,17 +165,18 @@ static int cxl_mem_probe(struct device *dev)
 
 /**
  * devm_cxl_add_memdev - Add a CXL memory device
- * @host: devres alloc/release context and parent for the memdev
  * @cxlds: CXL device state to associate with the memdev
  *
  * Upon return the device will have had a chance to attach to the
  * cxl_mem driver, but may fail if the CXL topology is not ready
  * (hardware CXL link down, or software platform CXL root not attached)
+ *
+ * The parent of the resulting device and the devm context for allocations is
+ * @cxlds->dev.
  */
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
-				       struct cxl_dev_state *cxlds)
+struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
 {
-	return __devm_cxl_add_memdev(host, cxlds);
+	return __devm_cxl_add_memdev(cxlds);
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
 
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0be4e508affe..1c6fc5334806 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1006,7 +1006,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		dev_dbg(&pdev->dev, "No CXL Features discovered\n");
 
-	cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds);
+	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 176dcde570cd..8a22b7601627 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1767,7 +1767,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 
 	cxl_mock_add_event_logs(&mdata->mes);
 
-	cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds);
+	cxlmd = devm_cxl_add_memdev(cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
                   ` (4 preceding siblings ...)
  2025-12-16  0:56 ` [PATCH v2 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev() Dan Williams
@ 2025-12-16  0:56 ` Dan Williams
  2025-12-17 15:57   ` Jonathan Cameron
                     ` (2 more replies)
  5 siblings, 3 replies; 16+ messages in thread
From: Dan Williams @ 2025-12-16  0:56 UTC (permalink / raw)
  To: dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Ben Cheatham, Alejandro Lucero

Unlike the cxl_pci class driver that opportunistically enables memory
expansion with no other dependent functionality, CXL accelerator drivers
have distinct PCIe-only and CXL-enhanced operation states. If CXL is
available some additional coherent memory/cache operations can be enabled,
otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.

Allow for a driver to pass a routine to be called in cxl_mem_probe()
context. This ability is inspired by and mirrors the semantics of
faux_device_create(). It allows for the caller to run CXL-topology
attach-dependent logic on behalf of the caller.

The probe callback runs after the port topology is successfully attached
for the given memdev.

Additionally the presence of @cxlmd->attach indicates that the accelerator
driver be detached when CXL operation ends. This conceptually makes a CXL
link loss event mirror a PCIe link loss event which results in triggering
the ->remove() callback of affected devices+drivers. A driver can re-attach
to recover back to PCIe-only operation. Live recovery, i.e. without a
->remove()/->probe() cycle, is left as a future consideration.

Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com> (✓ DKIM/intel.com)
Tested-by: Alejandro Lucero <alucerop@amd.com> (✓ DKIM/amd.com)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxlmem.h         | 12 ++++++++++--
 drivers/cxl/core/memdev.c    | 33 +++++++++++++++++++++++++++++----
 drivers/cxl/mem.c            | 20 ++++++++++++++++----
 drivers/cxl/pci.c            |  2 +-
 tools/testing/cxl/test/mem.c |  2 +-
 5 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 9db31c7993c4..ef202b34e5ea 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -34,6 +34,10 @@
 	(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=                       \
 	 CXLMDEV_RESET_NEEDED_NOT)
 
+struct cxl_memdev_attach {
+	int (*probe)(struct cxl_memdev *cxlmd);
+};
+
 /**
  * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
  * @dev: driver core device object
@@ -43,6 +47,7 @@
  * @cxl_nvb: coordinate removal of @cxl_nvd if present
  * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem
  * @endpoint: connection to the CXL port topology for this memory device
+ * @attach: creator of this memdev depends on CXL link attach to operate
  * @id: id number of this memdev instance.
  * @depth: endpoint port depth
  * @scrub_cycle: current scrub cycle set for this device
@@ -59,6 +64,7 @@ struct cxl_memdev {
 	struct cxl_nvdimm_bridge *cxl_nvb;
 	struct cxl_nvdimm *cxl_nvd;
 	struct cxl_port *endpoint;
+	const struct cxl_memdev_attach *attach;
 	int id;
 	int depth;
 	u8 scrub_cycle;
@@ -95,8 +101,10 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
 	return is_cxl_memdev(port->uport_dev);
 }
 
-struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
-struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
+struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
+					 const struct cxl_memdev_attach *attach);
+struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
+				       const struct cxl_memdev_attach *attach);
 int devm_cxl_sanitize_setup_notifier(struct device *host,
 				     struct cxl_memdev *cxlmd);
 struct cxl_memdev_state;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 63da2bd4436e..3ab4cd8f19ed 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -641,14 +641,24 @@ static void detach_memdev(struct work_struct *work)
 	struct cxl_memdev *cxlmd;
 
 	cxlmd = container_of(work, typeof(*cxlmd), detach_work);
-	device_release_driver(&cxlmd->dev);
+
+	/*
+	 * When the creator of @cxlmd sets ->attach it indicates CXL operation
+	 * is required. In that case, @cxlmd detach escalates to parent device
+	 * detach.
+	 */
+	if (cxlmd->attach)
+		device_release_driver(cxlmd->dev.parent);
+	else
+		device_release_driver(&cxlmd->dev);
 	put_device(&cxlmd->dev);
 }
 
 static struct lock_class_key cxl_memdev_key;
 
 static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
-					   const struct file_operations *fops)
+					   const struct file_operations *fops,
+					   const struct cxl_memdev_attach *attach)
 {
 	struct cxl_memdev *cxlmd;
 	struct device *dev;
@@ -664,6 +674,8 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
 		goto err;
 	cxlmd->id = rc;
 	cxlmd->depth = -1;
+	cxlmd->attach = attach;
+	cxlmd->endpoint = ERR_PTR(-ENXIO);
 
 	dev = &cxlmd->dev;
 	device_initialize(dev);
@@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
 {
 	int rc;
 
+	/*
+	 * If @attach is provided fail if the driver is not attached upon
+	 * return. Note that failure here could be the result of a race to
+	 * teardown the CXL port topology. I.e. cxl_mem_probe() could have
+	 * succeeded and then cxl_mem unbound before the lock is acquired.
+	 */
+	guard(device)(&cxlmd->dev);
+	if (cxlmd->attach && !cxlmd->dev.driver) {
+		cxl_memdev_unregister(cxlmd);
+		return ERR_PTR(-ENXIO);
+	}
+
 	rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
 				      cxlmd);
 	if (rc)
@@ -1093,13 +1117,14 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
  * Core helper for devm_cxl_add_memdev() that wants to both create a device and
  * assert to the caller that upon return cxl_mem::probe() has been invoked.
  */
-struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
+struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
+					 const struct cxl_memdev_attach *attach)
 {
 	struct device *dev;
 	int rc;
 
 	struct cxl_memdev *cxlmd __free(put_cxlmd) =
-		cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
+		cxl_memdev_alloc(cxlds, &cxl_memdev_fops, attach);
 	if (IS_ERR(cxlmd))
 		return cxlmd;
 
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 677996c65272..333c366b69e7 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -142,6 +142,12 @@ static int cxl_mem_probe(struct device *dev)
 			return rc;
 	}
 
+	if (cxlmd->attach) {
+		rc = cxlmd->attach->probe(cxlmd);
+		if (rc)
+			return rc;
+	}
+
 	rc = devm_cxl_memdev_edac_register(cxlmd);
 	if (rc)
 		dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
@@ -166,17 +172,23 @@ static int cxl_mem_probe(struct device *dev)
 /**
  * devm_cxl_add_memdev - Add a CXL memory device
  * @cxlds: CXL device state to associate with the memdev
+ * @attach: Caller depends on CXL topology attachment
  *
  * Upon return the device will have had a chance to attach to the
- * cxl_mem driver, but may fail if the CXL topology is not ready
- * (hardware CXL link down, or software platform CXL root not attached)
+ * cxl_mem driver, but may fail to attach if the CXL topology is not ready
+ * (hardware CXL link down, or software platform CXL root not attached).
+ *
+ * When @attach is NULL it indicates the caller wants the memdev to remain
+ * registered even if it does not immediately attach to the CXL hierarchy. When
+ * @attach is provided a cxl_mem_probe() failure leads to failure of this routine.
  *
  * The parent of the resulting device and the devm context for allocations is
  * @cxlds->dev.
  */
-struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
+struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
+				       const struct cxl_memdev_attach *attach)
 {
-	return __devm_cxl_add_memdev(cxlds);
+	return __devm_cxl_add_memdev(cxlds, attach);
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
 
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 1c6fc5334806..549368a9c868 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1006,7 +1006,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		dev_dbg(&pdev->dev, "No CXL Features discovered\n");
 
-	cxlmd = devm_cxl_add_memdev(cxlds);
+	cxlmd = devm_cxl_add_memdev(cxlds, NULL);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 8a22b7601627..cb87e8c0e63c 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1767,7 +1767,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 
 	cxl_mock_add_event_logs(&mdata->mes);
 
-	cxlmd = devm_cxl_add_memdev(cxlds);
+	cxlmd = devm_cxl_add_memdev(cxlds, NULL);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
  2025-12-16  0:56 ` [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
@ 2025-12-17 15:48   ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2025-12-17 15:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Ben Cheatham, Alejandro Lucero

On Mon, 15 Dec 2025 16:56:14 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> In preparation for adding more setup steps, convert the current
> implementation to scope-based cleanup.
> 
> The cxl_memdev_shutdown() is only required after cdev_device_add(). With
> that moved to a helper function it precludes the need to add
> scope-based-handler for that cleanup if devm_add_action_or_reset() fails.
> 
> Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Tested-by: Alejandro Lucero <alucerop@amd.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
One trivial thing below that is probably fine to cleanup whilst picking this
up.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/cxl/core/memdev.c | 70 ++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 7a4153e1c6a7..18efbf294db5 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c

> +
> +DEFINE_FREE(put_cxlmd, struct cxl_memdev *,
> +	    if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev);)

Bonus ; ?




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
@ 2025-12-17 15:57   ` Jonathan Cameron
  2025-12-17 16:27     ` dan.j.williams
  2025-12-20  7:31   ` Alejandro Lucero Palau
  2026-02-10  5:19   ` Gregory Price
  2 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2025-12-17 15:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Ben Cheatham, Alejandro Lucero

On Mon, 15 Dec 2025 16:56:16 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Unlike the cxl_pci class driver that opportunistically enables memory
> expansion with no other dependent functionality, CXL accelerator drivers
> have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> available some additional coherent memory/cache operations can be enabled,
> otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
> 
> Allow for a driver to pass a routine to be called in cxl_mem_probe()
> context. This ability is inspired by and mirrors the semantics of
> faux_device_create(). It allows for the caller to run CXL-topology
> attach-dependent logic on behalf of the caller.

This seems confusing.  The caller is running logic for the caller?
It can do that whenever it likes!  One of those is presumably callee



> 
> The probe callback runs after the port topology is successfully attached
> for the given memdev.
> 
> Additionally the presence of @cxlmd->attach indicates that the accelerator
> driver be detached when CXL operation ends. This conceptually makes a CXL
> link loss event mirror a PCIe link loss event which results in triggering
> the ->remove() callback of affected devices+drivers. A driver can re-attach
> to recover back to PCIe-only operation. Live recovery, i.e. without a
> ->remove()/->probe() cycle, is left as a future consideration.  
> 
> Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (✓ DKIM/intel.com)

Have we started adding DKIM stuff to tags?

> Tested-by: Alejandro Lucero <alucerop@amd.com> (✓ DKIM/amd.com)
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
One trivial thing on function naming inline.  Either way.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

To me this looks good to start building the other stuff on top of.
Thanks for unblocking this stuff (hopefully)

Jonathan

> ---
>  drivers/cxl/cxlmem.h         | 12 ++++++++++--
>  drivers/cxl/core/memdev.c    | 33 +++++++++++++++++++++++++++++----
>  drivers/cxl/mem.c            | 20 ++++++++++++++++----
>  drivers/cxl/pci.c            |  2 +-
>  tools/testing/cxl/test/mem.c |  2 +-
>  5 files changed, 57 insertions(+), 12 deletions(-)

> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 63da2bd4436e..3ab4cd8f19ed 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c


> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>  {
>  	int rc;
>  
> +	/*

The general approach is fine but is the function name appropriate for this
new stuff?  Naturally I'm not suggesting the bikeshed should be any particular
alternative color just maybe not the current blue.


> +	 * If @attach is provided fail if the driver is not attached upon
> +	 * return. Note that failure here could be the result of a race to
> +	 * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> +	 * succeeded and then cxl_mem unbound before the lock is acquired.
> +	 */
> +	guard(device)(&cxlmd->dev);
> +	if (cxlmd->attach && !cxlmd->dev.driver) {
> +		cxl_memdev_unregister(cxlmd);
> +		return ERR_PTR(-ENXIO);
> +	}
> +
>  	rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
>  				      cxlmd);
>  	if (rc)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-17 15:57   ` Jonathan Cameron
@ 2025-12-17 16:27     ` dan.j.williams
  2025-12-18 14:46       ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: dan.j.williams @ 2025-12-17 16:27 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Ben Cheatham, Alejandro Lucero

Jonathan Cameron wrote:
> On Mon, 15 Dec 2025 16:56:16 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Unlike the cxl_pci class driver that opportunistically enables memory
> > expansion with no other dependent functionality, CXL accelerator drivers
> > have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> > available some additional coherent memory/cache operations can be enabled,
> > otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
> > 
> > Allow for a driver to pass a routine to be called in cxl_mem_probe()
> > context. This ability is inspired by and mirrors the semantics of
> > faux_device_create(). It allows for the caller to run CXL-topology
> > attach-dependent logic on behalf of the caller.
> 
> This seems confusing. 

Is faux_device_create() confusing?

> The caller is running logic for the caller?  It can do that whenever
> it likes!  One of those is presumably callee

No, it cannot execute CXL topology attach dependendent functionality in
the device's initial probe context synchronous with the device-attach
event "whenever it likes".

> > The probe callback runs after the port topology is successfully attached
> > for the given memdev.
> > 
> > Additionally the presence of @cxlmd->attach indicates that the accelerator
> > driver be detached when CXL operation ends. This conceptually makes a CXL
> > link loss event mirror a PCIe link loss event which results in triggering
> > the ->remove() callback of affected devices+drivers. A driver can re-attach
> > to recover back to PCIe-only operation. Live recovery, i.e. without a
> > ->remove()/->probe() cycle, is left as a future consideration.  
> > 
> > Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> (✓ DKIM/intel.com)
> 
> Have we started adding DKIM stuff to tags?

No, just a copy/paste typo that I did not catch.

> > Tested-by: Alejandro Lucero <alucerop@amd.com> (✓ DKIM/amd.com)
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> One trivial thing on function naming inline.  Either way.
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> 
> To me this looks good to start building the other stuff on top of.
> Thanks for unblocking this stuff (hopefully)
> 
> Jonathan
> 
> > ---
> >  drivers/cxl/cxlmem.h         | 12 ++++++++++--
> >  drivers/cxl/core/memdev.c    | 33 +++++++++++++++++++++++++++++----
> >  drivers/cxl/mem.c            | 20 ++++++++++++++++----
> >  drivers/cxl/pci.c            |  2 +-
> >  tools/testing/cxl/test/mem.c |  2 +-
> >  5 files changed, 57 insertions(+), 12 deletions(-)
> 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 63da2bd4436e..3ab4cd8f19ed 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> 
> 
> > @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> >  {
> >  	int rc;
> >  
> > +	/*
> 
> The general approach is fine but is the function name appropriate for this
> new stuff?  Naturally I'm not suggesting the bikeshed should be any particular
> alternative color just maybe not the current blue.

The _autoremove() verb appears multiple times in the subsystem, not sure
why it is raising bikeshed concerns now. Please send a new proposal if
"autoremove" is not jibing.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-17 16:27     ` dan.j.williams
@ 2025-12-18 14:46       ` Jonathan Cameron
  2025-12-18 19:50         ` dan.j.williams
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2025-12-18 14:46 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Ben Cheatham, Alejandro Lucero

On Wed, 17 Dec 2025 08:27:00 -0800
dan.j.williams@intel.com wrote:

> Jonathan Cameron wrote:
> > On Mon, 15 Dec 2025 16:56:16 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > Unlike the cxl_pci class driver that opportunistically enables memory
> > > expansion with no other dependent functionality, CXL accelerator drivers
> > > have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> > > available some additional coherent memory/cache operations can be enabled,
> > > otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
> > > 
> > > Allow for a driver to pass a routine to be called in cxl_mem_probe()
> > > context. This ability is inspired by and mirrors the semantics of
> > > faux_device_create(). It allows for the caller to run CXL-topology
> > > attach-dependent logic on behalf of the caller.  
> > 
> > This seems confusing.   
> 
> Is faux_device_create() confusing?

Just to be clear this question is all about the word 'caller' being repeated
in that sentence. Not about the code itself or anything else in the explanation
or flow. 

This comment that reads very oddly to me and I think means something very
different from what is going on here.

> 
> > The caller is running logic for the caller?  It can do that whenever
> > it likes!  One of those is presumably callee  
> 
> No, it cannot execute CXL topology attach dependendent functionality in
> the device's initial probe context synchronous with the device-attach
> event "whenever it likes".

I'm still lost. Why 'caller to run' ... 'on behalf of the caller.'  In this case
caller is in both cases the function calling cxl_memdev_alloc()?

Maybe something like

"This arranges for CXL-topology attach-dependent logic to be run later, on behalf of
the caller."

Though that kind of repeats what follows, so maybe just drop the sentence.

> 
> > > The probe callback runs after the port topology is successfully attached
> > > for the given memdev.
> > > 
> > > Additionally the presence of @cxlmd->attach indicates that the accelerator
> > > driver be detached when CXL operation ends. This conceptually makes a CXL
> > > link loss event mirror a PCIe link loss event which results in triggering
> > > the ->remove() callback of affected devices+drivers. A driver can re-attach
> > > to recover back to PCIe-only operation. Live recovery, i.e. without a  
> > > ->remove()/->probe() cycle, is left as a future consideration.    

> > > @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> > >  {
> > >  	int rc;
> > >  
> > > +	/*  
> > 
> > The general approach is fine but is the function name appropriate for this
> > new stuff?  Naturally I'm not suggesting the bikeshed should be any particular
> > alternative color just maybe not the current blue.  
> 
> The _autoremove() verb appears multiple times in the subsystem, not sure
> why it is raising bikeshed concerns now. Please send a new proposal if
> "autoremove" is not jibing.

It felt like a stretch given the additional code that is not about registering
autoremove for later, but doing it now under some circumstances.  Ah well I don't
care that much what it's called.

Jonathan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-18 14:46       ` Jonathan Cameron
@ 2025-12-18 19:50         ` dan.j.williams
  2025-12-19 10:26           ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: dan.j.williams @ 2025-12-18 19:50 UTC (permalink / raw)
  To: Jonathan Cameron, dan.j.williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Ben Cheatham, Alejandro Lucero

Jonathan Cameron wrote:
> On Wed, 17 Dec 2025 08:27:00 -0800
> dan.j.williams@intel.com wrote:
> 
> > Jonathan Cameron wrote:
> > > On Mon, 15 Dec 2025 16:56:16 -0800
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >   
> > > > Unlike the cxl_pci class driver that opportunistically enables memory
> > > > expansion with no other dependent functionality, CXL accelerator drivers
> > > > have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> > > > available some additional coherent memory/cache operations can be enabled,
> > > > otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
> > > > 
> > > > Allow for a driver to pass a routine to be called in cxl_mem_probe()
> > > > context. This ability is inspired by and mirrors the semantics of
> > > > faux_device_create(). It allows for the caller to run CXL-topology
> > > > attach-dependent logic on behalf of the caller.  
> > > 
> > > This seems confusing.   
> > 
> > Is faux_device_create() confusing?
> 
> Just to be clear this question is all about the word 'caller' being repeated
> in that sentence. Not about the code itself or anything else in the explanation
> or flow. 

Oh, sorry, I took it as the design was confusing.

> This comment that reads very oddly to me and I think means something very
> different from what is going on here.
> 
> > 
> > > The caller is running logic for the caller?  It can do that whenever
> > > it likes!  One of those is presumably callee  
> > 
> > No, it cannot execute CXL topology attach dependendent functionality in
> > the device's initial probe context synchronous with the device-attach
> > event "whenever it likes".
> 
> I'm still lost. Why 'caller to run' ... 'on behalf of the caller.'  In this case
> caller is in both cases the function calling cxl_memdev_alloc()?
> 
> Maybe something like
> 
> "This arranges for CXL-topology attach-dependent logic to be run later, on behalf of
> the caller."
> 
> Though that kind of repeats what follows, so maybe just drop the sentence.

How about this reworked changelog?

---

Unlike the cxl_pci class driver that opportunistically enables memory
expansion with no other dependent functionality, CXL accelerator drivers
have distinct PCIe-only and CXL-enhanced operation states. If CXL is
available some additional coherent memory/cache operations can be enabled,
otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.

This constitutes a new mode of operation where the caller of
devm_cxl_add_memdev() wants to make a "go/no-go" decision about running
in CXL accelerated mode or falling back to PCIe-only operation. Part of
that decision making process likely also includes additional
CXL-acceleration-specific resource setup. Encapsulate both of those
requirements into 'struct cxl_memdev_attach' that provides a ->probe()
callback. The probe callback runs in cxl_mem_probe() context, after the
port topology is successfully attached for the given memdev. It supports
a contract where, upon successful return from devm_cxl_add_memdev(),
everything needed for CXL accelerated operation has been enabled.

Additionally the presence of @cxlmd->attach indicates that the accelerator
driver be detached when CXL operation ends. This conceptually makes a CXL
link loss event mirror a PCIe link loss event which results in triggering
the ->remove() callback of affected devices+drivers. A driver can re-attach
to recover back to PCIe-only operation. Live recovery, i.e. without a
->remove()/->probe() cycle, is left as a future consideration.

---

> > > > @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> > > >  {
> > > >  	int rc;
> > > >  
> > > > +	/*  
> > > 
> > > The general approach is fine but is the function name appropriate for this
> > > new stuff?  Naturally I'm not suggesting the bikeshed should be any particular
> > > alternative color just maybe not the current blue.  
> > 
> > The _autoremove() verb appears multiple times in the subsystem, not sure
> > why it is raising bikeshed concerns now. Please send a new proposal if
> > "autoremove" is not jibing.
> 
> It felt like a stretch given the additional code that is not about registering
> autoremove for later, but doing it now under some circumstances.  Ah well I don't
> care that much what it's called.

It is the same semantic as devm_add_action_or_reset() in that if the
"add_action" fails then the "reset" is triggered.

Yes, there is additional code that validates that the device to be
registered for removal is attached to its driver. That organization
supports having a single handoff from scoped-based cleanup to devm based
cleanup.

If you can think of a better organization and name I am open to hearing
options, but nothing better is immediately jumping out at me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-18 19:50         ` dan.j.williams
@ 2025-12-19 10:26           ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2025-12-19 10:26 UTC (permalink / raw)
  To: dan.j.williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Ben Cheatham, Alejandro Lucero

On Thu, 18 Dec 2025 11:50:10 -0800
<dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Wed, 17 Dec 2025 08:27:00 -0800
> > dan.j.williams@intel.com wrote:
> >   
> > > Jonathan Cameron wrote:  
> > > > On Mon, 15 Dec 2025 16:56:16 -0800
> > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > >     
> > > > > Unlike the cxl_pci class driver that opportunistically enables memory
> > > > > expansion with no other dependent functionality, CXL accelerator drivers
> > > > > have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> > > > > available some additional coherent memory/cache operations can be enabled,
> > > > > otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
> > > > > 
> > > > > Allow for a driver to pass a routine to be called in cxl_mem_probe()
> > > > > context. This ability is inspired by and mirrors the semantics of
> > > > > faux_device_create(). It allows for the caller to run CXL-topology
> > > > > attach-dependent logic on behalf of the caller.    
> > > > 
> > > > This seems confusing.     
> > > 
> > > Is faux_device_create() confusing?  
> > 
> > Just to be clear this question is all about the word 'caller' being repeated
> > in that sentence. Not about the code itself or anything else in the explanation
> > or flow.   
> 
> Oh, sorry, I took it as the design was confusing.

I should have been clearer!

> 
> > This comment that reads very oddly to me and I think means something very
> > different from what is going on here.
> >   
> > >   
> > > > The caller is running logic for the caller?  It can do that whenever
> > > > it likes!  One of those is presumably callee    
> > > 
> > > No, it cannot execute CXL topology attach dependendent functionality in
> > > the device's initial probe context synchronous with the device-attach
> > > event "whenever it likes".  
> > 
> > I'm still lost. Why 'caller to run' ... 'on behalf of the caller.'  In this case
> > caller is in both cases the function calling cxl_memdev_alloc()?
> > 
> > Maybe something like
> > 
> > "This arranges for CXL-topology attach-dependent logic to be run later, on behalf of
> > the caller."
> > 
> > Though that kind of repeats what follows, so maybe just drop the sentence.  
> 
> How about this reworked changelog?
> 
> ---
> 
> Unlike the cxl_pci class driver that opportunistically enables memory
> expansion with no other dependent functionality, CXL accelerator drivers
> have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> available some additional coherent memory/cache operations can be enabled,
> otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
> 
> This constitutes a new mode of operation where the caller of
> devm_cxl_add_memdev() wants to make a "go/no-go" decision about running
> in CXL accelerated mode or falling back to PCIe-only operation. Part of
> that decision making process likely also includes additional
> CXL-acceleration-specific resource setup. Encapsulate both of those
> requirements into 'struct cxl_memdev_attach' that provides a ->probe()
> callback. The probe callback runs in cxl_mem_probe() context, after the
> port topology is successfully attached for the given memdev. It supports
> a contract where, upon successful return from devm_cxl_add_memdev(),
> everything needed for CXL accelerated operation has been enabled.
> 
> Additionally the presence of @cxlmd->attach indicates that the accelerator
> driver be detached when CXL operation ends. This conceptually makes a CXL
> link loss event mirror a PCIe link loss event which results in triggering
> the ->remove() callback of affected devices+drivers. A driver can re-attach
> to recover back to PCIe-only operation. Live recovery, i.e. without a
> ->remove()/->probe() cycle, is left as a future consideration.  
Nice.  Thanks for the rewrite.

> 
> ---
> 
> > > > > @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> > > > >  {
> > > > >  	int rc;
> > > > >  
> > > > > +	/*    
> > > > 
> > > > The general approach is fine but is the function name appropriate for this
> > > > new stuff?  Naturally I'm not suggesting the bikeshed should be any particular
> > > > alternative color just maybe not the current blue.    
> > > 
> > > The _autoremove() verb appears multiple times in the subsystem, not sure
> > > why it is raising bikeshed concerns now. Please send a new proposal if
> > > "autoremove" is not jibing.  
> > 
> > It felt like a stretch given the additional code that is not about registering
> > autoremove for later, but doing it now under some circumstances.  Ah well I don't
> > care that much what it's called.  
> 
> It is the same semantic as devm_add_action_or_reset() in that if the
> "add_action" fails then the "reset" is triggered.
> 
> Yes, there is additional code that validates that the device to be
> registered for removal is attached to its driver. That organization
> supports having a single handoff from scoped-based cleanup to devm based
> cleanup.
> 
> If you can think of a better organization and name I am open to hearing
> options, but nothing better is immediately jumping out at me.
> 
Let's go with current naming.  I don't like the only options I can come up
with either.  If inspiration strikes we can always change it later.

Jonathan





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
  2025-12-17 15:57   ` Jonathan Cameron
@ 2025-12-20  7:31   ` Alejandro Lucero Palau
  2026-02-10  5:19   ` Gregory Price
  2 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero Palau @ 2025-12-20  7:31 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Ben Cheatham


On 12/16/25 00:56, Dan Williams wrote:
> Unlike the cxl_pci class driver that opportunistically enables memory
> expansion with no other dependent functionality, CXL accelerator drivers
> have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> available some additional coherent memory/cache operations can be enabled,
> otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.


This can be misleading. I bet most Type2 drivers will indeed use CXL.io 
and traditional DMA along with some special functionality with CXL.mem 
and CXL.cache.


>
> Allow for a driver to pass a routine to be called in cxl_mem_probe()
> context. This ability is inspired by and mirrors the semantics of
> faux_device_create(). It allows for the caller to run CXL-topology
> attach-dependent logic on behalf of the caller.
>
> The probe callback runs after the port topology is successfully attached
> for the given memdev.


As discussed at LPC, I would like to reflect here this potential 
problem, with port not ready, is mainly due to CXL switches being in the 
path to the device, unlike an accelerator directly connected to a CXL 
Host Bridge which will have all this sorted out at the time of driver 
probing. If there exist other cases, that should be also documented here 
or in the future when discovered.


>
> Additionally the presence of @cxlmd->attach indicates that the accelerator
> driver be detached when CXL operation ends. This conceptually makes a CXL
> link loss event mirror a PCIe link loss event which results in triggering
> the ->remove() callback of affected devices+drivers.


Also as discussed at LPC, I do not think this should happen. The 
accelerator driver should of course get a notification about this but 
not removed. Moreover, should not a link error being reported somehow 
through the RAS infrastructure and the driver reacting accordingly?


FWIW, there is a pending work by Vishal Aslot based on Type2 support for 
this RAS support what I have discussed with him for being a follow-up 
work of Type2.


>   A driver can re-attach
> to recover back to PCIe-only operation. Live recovery, i.e. without a
> ->remove()/->probe() cycle, is left as a future consideration.
>
> Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (✓ DKIM/intel.com)
> Tested-by: Alejandro Lucero <alucerop@amd.com> (✓ DKIM/amd.com)
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>   drivers/cxl/cxlmem.h         | 12 ++++++++++--
>   drivers/cxl/core/memdev.c    | 33 +++++++++++++++++++++++++++++----
>   drivers/cxl/mem.c            | 20 ++++++++++++++++----
>   drivers/cxl/pci.c            |  2 +-
>   tools/testing/cxl/test/mem.c |  2 +-
>   5 files changed, 57 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 9db31c7993c4..ef202b34e5ea 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -34,6 +34,10 @@
>   	(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) !=                       \
>   	 CXLMDEV_RESET_NEEDED_NOT)
>   
> +struct cxl_memdev_attach {
> +	int (*probe)(struct cxl_memdev *cxlmd);
> +};
> +
>   /**
>    * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
>    * @dev: driver core device object
> @@ -43,6 +47,7 @@
>    * @cxl_nvb: coordinate removal of @cxl_nvd if present
>    * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem
>    * @endpoint: connection to the CXL port topology for this memory device
> + * @attach: creator of this memdev depends on CXL link attach to operate
>    * @id: id number of this memdev instance.
>    * @depth: endpoint port depth
>    * @scrub_cycle: current scrub cycle set for this device
> @@ -59,6 +64,7 @@ struct cxl_memdev {
>   	struct cxl_nvdimm_bridge *cxl_nvb;
>   	struct cxl_nvdimm *cxl_nvd;
>   	struct cxl_port *endpoint;
> +	const struct cxl_memdev_attach *attach;
>   	int id;
>   	int depth;
>   	u8 scrub_cycle;
> @@ -95,8 +101,10 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
>   	return is_cxl_memdev(port->uport_dev);
>   }
>   
> -struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> -struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> +struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> +					 const struct cxl_memdev_attach *attach);
> +struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> +				       const struct cxl_memdev_attach *attach);
>   int devm_cxl_sanitize_setup_notifier(struct device *host,
>   				     struct cxl_memdev *cxlmd);
>   struct cxl_memdev_state;
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 63da2bd4436e..3ab4cd8f19ed 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -641,14 +641,24 @@ static void detach_memdev(struct work_struct *work)
>   	struct cxl_memdev *cxlmd;
>   
>   	cxlmd = container_of(work, typeof(*cxlmd), detach_work);
> -	device_release_driver(&cxlmd->dev);
> +
> +	/*
> +	 * When the creator of @cxlmd sets ->attach it indicates CXL operation
> +	 * is required. In that case, @cxlmd detach escalates to parent device
> +	 * detach.
> +	 */
> +	if (cxlmd->attach)
> +		device_release_driver(cxlmd->dev.parent);
> +	else
> +		device_release_driver(&cxlmd->dev);
>   	put_device(&cxlmd->dev);
>   }
>   
>   static struct lock_class_key cxl_memdev_key;
>   
>   static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> -					   const struct file_operations *fops)
> +					   const struct file_operations *fops,
> +					   const struct cxl_memdev_attach *attach)
>   {
>   	struct cxl_memdev *cxlmd;
>   	struct device *dev;
> @@ -664,6 +674,8 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>   		goto err;
>   	cxlmd->id = rc;
>   	cxlmd->depth = -1;
> +	cxlmd->attach = attach;
> +	cxlmd->endpoint = ERR_PTR(-ENXIO);
>   
>   	dev = &cxlmd->dev;
>   	device_initialize(dev);
> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>   {
>   	int rc;
>   
> +	/*
> +	 * If @attach is provided fail if the driver is not attached upon
> +	 * return. Note that failure here could be the result of a race to
> +	 * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> +	 * succeeded and then cxl_mem unbound before the lock is acquired.
> +	 */
> +	guard(device)(&cxlmd->dev);
> +	if (cxlmd->attach && !cxlmd->dev.driver) {
> +		cxl_memdev_unregister(cxlmd);
> +		return ERR_PTR(-ENXIO);
> +	}
> +
>   	rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
>   				      cxlmd);
>   	if (rc)
> @@ -1093,13 +1117,14 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>    * Core helper for devm_cxl_add_memdev() that wants to both create a device and
>    * assert to the caller that upon return cxl_mem::probe() has been invoked.
>    */
> -struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> +struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> +					 const struct cxl_memdev_attach *attach)
>   {
>   	struct device *dev;
>   	int rc;
>   
>   	struct cxl_memdev *cxlmd __free(put_cxlmd) =
> -		cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
> +		cxl_memdev_alloc(cxlds, &cxl_memdev_fops, attach);
>   	if (IS_ERR(cxlmd))
>   		return cxlmd;
>   
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 677996c65272..333c366b69e7 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -142,6 +142,12 @@ static int cxl_mem_probe(struct device *dev)
>   			return rc;
>   	}
>   
> +	if (cxlmd->attach) {
> +		rc = cxlmd->attach->probe(cxlmd);
> +		if (rc)
> +			return rc;
> +	}
> +
>   	rc = devm_cxl_memdev_edac_register(cxlmd);
>   	if (rc)
>   		dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
> @@ -166,17 +172,23 @@ static int cxl_mem_probe(struct device *dev)
>   /**
>    * devm_cxl_add_memdev - Add a CXL memory device
>    * @cxlds: CXL device state to associate with the memdev
> + * @attach: Caller depends on CXL topology attachment
>    *
>    * Upon return the device will have had a chance to attach to the
> - * cxl_mem driver, but may fail if the CXL topology is not ready
> - * (hardware CXL link down, or software platform CXL root not attached)
> + * cxl_mem driver, but may fail to attach if the CXL topology is not ready
> + * (hardware CXL link down, or software platform CXL root not attached).
> + *
> + * When @attach is NULL it indicates the caller wants the memdev to remain
> + * registered even if it does not immediately attach to the CXL hierarchy. When
> + * @attach is provided a cxl_mem_probe() failure leads to failure of this routine.
>    *
>    * The parent of the resulting device and the devm context for allocations is
>    * @cxlds->dev.
>    */
> -struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> +struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> +				       const struct cxl_memdev_attach *attach)
>   {
> -	return __devm_cxl_add_memdev(cxlds);
> +	return __devm_cxl_add_memdev(cxlds, attach);
>   }
>   EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>   
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 1c6fc5334806..549368a9c868 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1006,7 +1006,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	if (rc)
>   		dev_dbg(&pdev->dev, "No CXL Features discovered\n");
>   
> -	cxlmd = devm_cxl_add_memdev(cxlds);
> +	cxlmd = devm_cxl_add_memdev(cxlds, NULL);
>   	if (IS_ERR(cxlmd))
>   		return PTR_ERR(cxlmd);
>   
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 8a22b7601627..cb87e8c0e63c 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1767,7 +1767,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>   
>   	cxl_mock_add_event_logs(&mdata->mes);
>   
> -	cxlmd = devm_cxl_add_memdev(cxlds);
> +	cxlmd = devm_cxl_add_memdev(cxlds, NULL);
>   	if (IS_ERR(cxlmd))
>   		return PTR_ERR(cxlmd);
>   

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
  2025-12-17 15:57   ` Jonathan Cameron
  2025-12-20  7:31   ` Alejandro Lucero Palau
@ 2026-02-10  5:19   ` Gregory Price
  2026-02-10 15:05     ` Dave Jiang
  2 siblings, 1 reply; 16+ messages in thread
From: Gregory Price @ 2026-02-10  5:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, linux-cxl, linux-kernel,
	Smita.KoralahalliChannabasappa, alison.schofield, terry.bowman,
	alejandro.lucero-palau, linux-pci, Jonathan.Cameron, Ben Cheatham,
	Alejandro Lucero

On Mon, Dec 15, 2025 at 04:56:16PM -0800, Dan Williams wrote:
> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>  {
>  	int rc;
>  
> +	/*
> +	 * If @attach is provided fail if the driver is not attached upon
> +	 * return. Note that failure here could be the result of a race to
> +	 * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> +	 * succeeded and then cxl_mem unbound before the lock is acquired.
> +	 */
> +	guard(device)(&cxlmd->dev);
> +	if (cxlmd->attach && !cxlmd->dev.driver) {
> +		cxl_memdev_unregister(cxlmd);
> +		return ERR_PTR(-ENXIO);
> +	}
> +
>  	rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
>  				      cxlmd);
>  	if (rc)


This caused a deadlock when trying to unbind cxl_pci and bind a custom
driver. 

The followwing analysis was produced by Chris Mason's kreview scripts,
but it did resolve my deadlock, and the analysis seems pretty straight
forward. 

Although it was likely caused by a qemu quirk I was dealing with at the
same time (see the note about topology failing to enumerate).

~Gregory

---

    cxl_memdev_autoremove() takes device_lock(&cxlmd->dev) via guard(device)
    and then calls cxl_memdev_unregister() when the attach callback was
    provided but cxl_mem_probe() failed to bind.

    cxl_memdev_unregister() calls cdev_device_del() → device_del() →
    bus_remove_device() → device_release_driver() which also takes
    device_lock(), deadlocking the calling thread.

    This path is reached when a driver uses the @attach parameter to
    devm_cxl_add_memdev() and the CXL topology fails to enumerate (e.g.
    DVSEC range registers decode outside platform-defined CXL ranges,
    causing the endpoint port probe to fail).

    Fix by using scoped_guard() and breaking out of the guard scope before
    calling cxl_memdev_unregister(), so device_lock() is released first.

It suggested:
---

static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
{
        int rc;

        /*
         * If @attach is provided fail if the driver is not attached upon
         * return. Note that failure here could be the result of a race to
         * teardown the CXL port topology. I.e. cxl_mem_probe() could have
         * succeeded and then cxl_mem unbound before the lock is acquired.
         *
         * Check under device_lock but unregister outside of it:
         * cxl_memdev_unregister() calls cdev_device_del() → device_del()
         * → device_release_driver() which also takes device_lock().
         */
        scoped_guard(device, &cxlmd->dev) {
                if (cxlmd->attach && !cxlmd->dev.driver) {
                        /* Drop lock before unregister to avoid deadlock */
                        break;
                }

                rc = devm_add_action_or_reset(cxlmd->cxlds->dev,
                                              cxl_memdev_unregister, cxlmd);
                if (rc)
                        return ERR_PTR(rc);

                return cxlmd;
        }

        /* Reached only when attach failed — lock is released */
        cxl_memdev_unregister(cxlmd);
        return ERR_PTR(-ENXIO);
}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
  2026-02-10  5:19   ` Gregory Price
@ 2026-02-10 15:05     ` Dave Jiang
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2026-02-10 15:05 UTC (permalink / raw)
  To: Gregory Price, Dan Williams
  Cc: linux-cxl, linux-kernel, Smita.KoralahalliChannabasappa,
	alison.schofield, terry.bowman, alejandro.lucero-palau, linux-pci,
	Jonathan.Cameron, Ben Cheatham, Alejandro Lucero



On 2/9/26 10:19 PM, Gregory Price wrote:
> On Mon, Dec 15, 2025 at 04:56:16PM -0800, Dan Williams wrote:
>> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>>  {
>>  	int rc;
>>  
>> +	/*
>> +	 * If @attach is provided fail if the driver is not attached upon
>> +	 * return. Note that failure here could be the result of a race to
>> +	 * teardown the CXL port topology. I.e. cxl_mem_probe() could have
>> +	 * succeeded and then cxl_mem unbound before the lock is acquired.
>> +	 */
>> +	guard(device)(&cxlmd->dev);
>> +	if (cxlmd->attach && !cxlmd->dev.driver) {
>> +		cxl_memdev_unregister(cxlmd);
>> +		return ERR_PTR(-ENXIO);
>> +	}
>> +
>>  	rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
>>  				      cxlmd);
>>  	if (rc)
> 
> 
> This caused a deadlock when trying to unbind cxl_pci and bind a custom
> driver. 
> 
> The followwing analysis was produced by Chris Mason's kreview scripts,
> but it did resolve my deadlock, and the analysis seems pretty straight
> forward. 

The suggested resolution looks reasonable. Can you post a fix patch? Will need to apply it against 7.0-rc. It's too late to get it into the 7.0 merge PR.

DJ

> 
> Although it was likely caused by a qemu quirk I was dealing with at the
> same time (see the note about topology failing to enumerate).
> 
> ~Gregory
> 
> ---
> 
>     cxl_memdev_autoremove() takes device_lock(&cxlmd->dev) via guard(device)
>     and then calls cxl_memdev_unregister() when the attach callback was
>     provided but cxl_mem_probe() failed to bind.
> 
>     cxl_memdev_unregister() calls cdev_device_del() → device_del() →
>     bus_remove_device() → device_release_driver() which also takes
>     device_lock(), deadlocking the calling thread.
> 
>     This path is reached when a driver uses the @attach parameter to
>     devm_cxl_add_memdev() and the CXL topology fails to enumerate (e.g.
>     DVSEC range registers decode outside platform-defined CXL ranges,
>     causing the endpoint port probe to fail).
> 
>     Fix by using scoped_guard() and breaking out of the guard scope before
>     calling cxl_memdev_unregister(), so device_lock() is released first.
> 
> It suggested:
> ---
> 
> static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> {
>         int rc;
> 
>         /*
>          * If @attach is provided fail if the driver is not attached upon
>          * return. Note that failure here could be the result of a race to
>          * teardown the CXL port topology. I.e. cxl_mem_probe() could have
>          * succeeded and then cxl_mem unbound before the lock is acquired.
>          *
>          * Check under device_lock but unregister outside of it:
>          * cxl_memdev_unregister() calls cdev_device_del() → device_del()
>          * → device_release_driver() which also takes device_lock().
>          */
>         scoped_guard(device, &cxlmd->dev) {
>                 if (cxlmd->attach && !cxlmd->dev.driver) {
>                         /* Drop lock before unregister to avoid deadlock */
>                         break;
>                 }
> 
>                 rc = devm_add_action_or_reset(cxlmd->cxlds->dev,
>                                               cxl_memdev_unregister, cxlmd);
>                 if (rc)
>                         return ERR_PTR(rc);
> 
>                 return cxlmd;
>         }
> 
>         /* Reached only when attach failed — lock is released */
>         cxl_memdev_unregister(cxlmd);
>         return ERR_PTR(-ENXIO);
> }
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-02-10 15:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16  0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
2025-12-16  0:56 ` [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
2025-12-16  0:56 ` [PATCH v2 2/6] cxl/mem: Arrange for always-synchronous memdev attach Dan Williams
2025-12-16  0:56 ` [PATCH v2 3/6] cxl/port: Arrange for always synchronous endpoint attach Dan Williams
2025-12-16  0:56 ` [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
2025-12-17 15:48   ` Jonathan Cameron
2025-12-16  0:56 ` [PATCH v2 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev() Dan Williams
2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
2025-12-17 15:57   ` Jonathan Cameron
2025-12-17 16:27     ` dan.j.williams
2025-12-18 14:46       ` Jonathan Cameron
2025-12-18 19:50         ` dan.j.williams
2025-12-19 10:26           ` Jonathan Cameron
2025-12-20  7:31   ` Alejandro Lucero Palau
2026-02-10  5:19   ` Gregory Price
2026-02-10 15:05     ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox