public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability
@ 2026-03-27  5:28 Dan Williams
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, Jonathan Cameron, stable

Given all the cross subsystem dependencies needed to make this solution
work, it needs to have a unit test to keep it functional.

On the path to writing that, several fixes fell out, but not to Smita's
code, to mine. One use-after-free has been there since the original
automatic region assembly code.

Here is a preview of the core of the test I will submit to the cxl-cli project:

---
modprobe cxl_mock_mem && modprobe cxl_test hmem_test=1

dax=$(find_dax_cxl)
[[ "$dax" == "" ]] && err $LINENO
dax=$(find_dax_hmem)
[[ "$dax" != "" ]] && err $LINENO

unload

modprobe cxl_mock_mem && modprobe cxl_test fail_autoassemble hmem_test=1

dax=$(find_dax_cxl)
[[ "$dax" != "" ]] && err $LINENO
dax=$(find_dax_hmem)
[[ "$dax" == "" ]] && err $LINENO

unload
---

This builds on Smita's series [1] pushed out to for-7.1/dax-hmem in
cxl.git [2].

[1]: http://lore.kernel.org/20260322195343.206900-1-Smita.KoralahalliChannabasappa@amd.com
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-7.1/dax-hmem

Dan Williams (9):
  cxl/region: Fix use-after-free from auto assembly failure
  dax/cxl: Fix HMEM dependencies
  cxl/region: Limit visibility of cxl_region_contains_resource()
  cxl/region: Constify cxl_region_resource_contains()
  dax/hmem: Reduce visibility of dax_cxl coordination symbols
  dax/hmem: Fix singleton confusion between dax_hmem_work and hmem
    devices
  dax/hmem: Parent dax_hmem devices
  tools/testing/cxl: Simulate auto-assembly failure
  tools/testing/cxl: Test dax_hmem takeover of CXL regions

 drivers/dax/Kconfig                |   6 +-
 drivers/cxl/cxl.h                  |  11 ++-
 drivers/dax/bus.h                  |  15 +++-
 include/cxl/cxl.h                  |  15 ----
 tools/testing/cxl/test/mock.h      |   8 ++
 drivers/cxl/core/region.c          |  68 +++++++++++++++--
 drivers/dax/hmem/device.c          |  28 ++++---
 drivers/dax/hmem/hmem.c            | 115 +++++++++++++++--------------
 tools/testing/cxl/test/cxl.c       |  66 +++++++++++++++++
 tools/testing/cxl/test/hmem_test.c |  47 ++++++++++++
 tools/testing/cxl/test/mem.c       |   3 +
 tools/testing/cxl/test/mock.c      |  50 +++++++++++++
 tools/testing/cxl/Kbuild           |   7 ++
 tools/testing/cxl/test/Kbuild      |   1 +
 14 files changed, 344 insertions(+), 96 deletions(-)
 delete mode 100644 include/cxl/cxl.h
 create mode 100644 tools/testing/cxl/test/hmem_test.c


base-commit: 51d2fa02c0e4b3b23c4484f2af9b6d65c35471e8
-- 
2.53.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 16:28   ` Dave Jiang
                     ` (3 more replies)
  2026-03-27  5:28 ` [PATCH 2/9] dax/cxl: Fix HMEM dependencies Dan Williams
                   ` (10 subsequent siblings)
  11 siblings, 4 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, stable, Jonathan Cameron

The following crash signature results from region destruction while an
endpoint decoder is staged, but not fully attached.

---
 BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
 Read of size 8 at addr ffff888265638840 by task modprobe/1287

 Call Trace:
  <TASK>
  dump_stack_lvl+0x68/0x90
  print_report+0x170/0x4e2
  kasan_report+0xc2/0x1a0
  __cxl_decoder_detach+0x724/0x830 [cxl_core]
  cxl_decoder_detach+0x6c/0x100 [cxl_core]
  unregister_region+0x88/0x140 [cxl_core]
  devres_release_all+0x172/0x230
---

The "staged" state is established by cxl_region_attach_auto() and finalized
by cxl_region_attach_position(). When that is finalized a memdev removal
event will destroy regions before endpoint decoders. However, in the
interim the memdev removal will falsely assume that the endpoint decoder is
unattached. Later, the eventual region removal finds the stale pointer to
the now freed endpoint decoder.

Introduce CXL_DECODER_STATE_AUTO_STAGED and cxl_cancel_auto_attach() to
cleanup this interim state.

Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
Cc: <stable@vger.kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxl.h         |  6 +++--
 drivers/cxl/core/region.c | 54 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9b947286eb9b..30a31968f266 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -378,12 +378,14 @@ struct cxl_decoder {
 };
 
 /*
- * Track whether this decoder is reserved for region autodiscovery, or
- * free for userspace provisioning.
+ * Track whether this decoder is free for userspace provisioning, reserved for
+ * region autodiscovery, whether it is started connecting (awaiting other
+ * peers), or has completed auto assembly.
  */
 enum cxl_decoder_state {
 	CXL_DECODER_STATE_MANUAL,
 	CXL_DECODER_STATE_AUTO,
+	CXL_DECODER_STATE_AUTO_STAGED,
 };
 
 /**
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f7b20f60ac5c..b72556c1458b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1064,6 +1064,14 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
 
 	if (!cxld->region) {
 		cxld->region = cxlr;
+
+		/*
+		 * Now that cxld->region is set the intermediate staging state
+		 * can be cleared.
+		 */
+		if (cxld == &cxled->cxld &&
+		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
+			cxled->state = CXL_DECODER_STATE_AUTO;
 		get_device(&cxlr->dev);
 	}
 
@@ -1805,6 +1813,7 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
 	pos = p->nr_targets;
 	p->targets[pos] = cxled;
 	cxled->pos = pos;
+	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;
 	p->nr_targets++;
 
 	return 0;
@@ -2154,6 +2163,47 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	return 0;
 }
 
+static int cxl_region_by_target(struct device *dev, const void *data)
+{
+	const struct cxl_endpoint_decoder *cxled = data;
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+
+	if (!is_cxl_region(dev))
+		return 0;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+	return p->targets[cxled->pos] == cxled;
+}
+
+/*
+ * When an auto-region fails to assemble the decoder may be listed as a target,
+ * but not fully attached.
+ */
+static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	int pos = cxled->pos;
+
+	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
+		return;
+
+	struct device *dev __free(put_device) = bus_find_device(
+		&cxl_bus_type, NULL, cxled, cxl_region_by_target);
+	if (!dev)
+		return;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+
+	p->nr_targets--;
+	cxled->state = CXL_DECODER_STATE_AUTO;
+	cxled->pos = -1;
+	p->targets[pos] = NULL;
+}
+
 static struct cxl_region *
 __cxl_decoder_detach(struct cxl_region *cxlr,
 		     struct cxl_endpoint_decoder *cxled, int pos,
@@ -2177,8 +2227,10 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
 		cxled = p->targets[pos];
 	} else {
 		cxlr = cxled->cxld.region;
-		if (!cxlr)
+		if (!cxlr) {
+			cxl_cancel_auto_attach(cxled);
 			return NULL;
+		}
 		p = &cxlr->params;
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/9] dax/cxl: Fix HMEM dependencies
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 16:29   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource() Dan Williams
                   ` (9 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

The expectation is that DEV_DAX_HMEM=y should be disallowed if any of
CXL_ACPI, or CXL_PCI are set =m. Also DEV_DAX_CXL=y should be disallowed if
DEV_DAX_HMEM=m. Use "$config || !$config" syntax for each dependency.
Otherwise, the invalid DEV_DAX_HMEM=m && DEV_DAX_CXL=y configuration is
allowed.

Lastly, dax_hmem depends on the availability of the
cxl_region_contains_resource() symbol published by the cxl_core.ko module.
So, also prevent DEV_DAX_HMEM from being built-in when the cxl_core module
is not built-in.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/Kconfig | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3683bb3f2311..504f7f735ef5 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -32,6 +32,9 @@ config DEV_DAX_HMEM
 	depends on EFI_SOFT_RESERVE
 	select NUMA_KEEP_MEMINFO if NUMA_MEMBLKS
 	default DEV_DAX
+	depends on CXL_ACPI || !CXL_ACPI
+	depends on CXL_PCI || !CXL_PCI
+	depends on CXL_BUS || !CXL_BUS
 	help
 	  EFI 2.8 platforms, and others, may advertise 'specific purpose'
 	  memory. For example, a high bandwidth memory pool. The
@@ -48,8 +51,7 @@ config DEV_DAX_CXL
 	tristate "CXL DAX: direct access to CXL RAM regions"
 	depends on CXL_BUS && CXL_REGION && DEV_DAX
 	default CXL_REGION && DEV_DAX
-	depends on CXL_ACPI >= DEV_DAX_HMEM
-	depends on CXL_PCI >= DEV_DAX_HMEM
+	depends on DEV_DAX_HMEM || !DEV_DAX_HMEM
 	help
 	  CXL RAM regions are either mapped by platform-firmware
 	  and published in the initial system-memory map as "System RAM", mapped
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource()
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
  2026-03-27  5:28 ` [PATCH 2/9] dax/cxl: Fix HMEM dependencies Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 16:39   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains() Dan Williams
                   ` (8 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

The dax_hmem dependency on cxl_region_contains_resource() is a one-off
special case. It is not suitable for other use cases.

Move the definition to the other CONFIG_CXL_REGION guarded definitions in
drivers/cxl/cxl.h and include that by a relative path include. This matches
what drivers/dax/cxl.c does for its limited private usage of CXL core
symbols.

Reduce the symbol export visibility from global to just dax_hmem, to
further clarify its applicability.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxl.h         |  5 +++++
 include/cxl/cxl.h         | 15 ---------------
 drivers/cxl/core/region.c |  3 +--
 drivers/dax/hmem/hmem.c   |  2 +-
 4 files changed, 7 insertions(+), 18 deletions(-)
 delete mode 100644 include/cxl/cxl.h

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 30a31968f266..84ad04a02bde 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -941,6 +941,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
 int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
 struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
 u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
+bool cxl_region_contains_resource(struct resource *res);
 #else
 static inline bool is_cxl_pmem_region(struct device *dev)
 {
@@ -963,6 +964,10 @@ static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
 {
 	return 0;
 }
+static inline bool cxl_region_contains_resource(struct resource *res)
+{
+	return false;
+}
 #endif
 
 void cxl_endpoint_parse_cdat(struct cxl_port *port);
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
deleted file mode 100644
index b12d3d0f6658..000000000000
--- a/include/cxl/cxl.h
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/* Copyright (c) 2026 Advanced Micro Devices, Inc. */
-#ifndef _CXL_H_
-#define _CXL_H_
-
-#ifdef CONFIG_CXL_REGION
-bool cxl_region_contains_resource(struct resource *res);
-#else
-static inline bool cxl_region_contains_resource(struct resource *res)
-{
-	return false;
-}
-#endif
-
-#endif /* _CXL_H_ */
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b72556c1458b..12a9572f34d5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -12,7 +12,6 @@
 #include <linux/idr.h>
 #include <linux/memory-tiers.h>
 #include <linux/string_choices.h>
-#include <cxl/cxl.h>
 #include <cxlmem.h>
 #include <cxl.h>
 #include "core.h"
@@ -4253,7 +4252,7 @@ bool cxl_region_contains_resource(struct resource *res)
 	return bus_for_each_dev(&cxl_bus_type, NULL, res,
 				region_contains_resource) != 0;
 }
-EXPORT_SYMBOL_GPL(cxl_region_contains_resource);
+EXPORT_SYMBOL_FOR_MODULES(cxl_region_contains_resource, "dax_hmem");
 
 static int cxl_region_can_probe(struct cxl_region *cxlr)
 {
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 9ceda6b5cadf..0051e553c33f 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,7 +3,7 @@
 #include <linux/memregion.h>
 #include <linux/module.h>
 #include <linux/dax.h>
-#include <cxl/cxl.h>
+#include "../../cxl/cxl.h"
 #include "../bus.h"
 
 static bool region_idle;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains()
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (2 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource() Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 16:40   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols Dan Williams
                   ` (7 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

The call to cxl_region_resource_contains() in hmem_register_cxl_device()
need not cast away 'const'. The problem is the usage of the
bus_for_each_dev() API which does not mark its @data parameter as 'const'.
Switch to bus_find_device() which does take 'const' @data, fixup
cxl_region_resource_contains() and its caller.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/cxl.h         |  4 ++--
 drivers/cxl/core/region.c | 11 ++++++-----
 drivers/dax/hmem/hmem.c   |  2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 84ad04a02bde..340bdc9fcacc 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -941,7 +941,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
 int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
 struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
 u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
-bool cxl_region_contains_resource(struct resource *res);
+bool cxl_region_contains_resource(const struct resource *res);
 #else
 static inline bool is_cxl_pmem_region(struct device *dev)
 {
@@ -964,7 +964,7 @@ static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
 {
 	return 0;
 }
-static inline bool cxl_region_contains_resource(struct resource *res)
+static inline bool cxl_region_contains_resource(const struct resource *res)
 {
 	return false;
 }
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 12a9572f34d5..a8b183f2d9c5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -4225,9 +4225,9 @@ static int cxl_region_setup_poison(struct cxl_region *cxlr)
 	return devm_add_action_or_reset(dev, remove_debugfs, dentry);
 }
 
-static int region_contains_resource(struct device *dev, void *data)
+static int region_contains_resource(struct device *dev, const void *data)
 {
-	struct resource *res = data;
+	const struct resource *res = data;
 	struct cxl_region *cxlr;
 	struct cxl_region_params *p;
 
@@ -4246,11 +4246,12 @@ static int region_contains_resource(struct device *dev, void *data)
 	return resource_contains(p->res, res) ? 1 : 0;
 }
 
-bool cxl_region_contains_resource(struct resource *res)
+bool cxl_region_contains_resource(const struct resource *res)
 {
 	guard(rwsem_read)(&cxl_rwsem.region);
-	return bus_for_each_dev(&cxl_bus_type, NULL, res,
-				region_contains_resource) != 0;
+	struct device *dev __free(put_device) = bus_find_device(
+		&cxl_bus_type, NULL, res, region_contains_resource);
+	return !!dev;
 }
 EXPORT_SYMBOL_FOR_MODULES(cxl_region_contains_resource, "dax_hmem");
 
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 0051e553c33f..b2ab1292fa81 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -159,7 +159,7 @@ static int hmem_register_cxl_device(struct device *host, int target_nid,
 			      IORES_DESC_CXL) == REGION_DISJOINT)
 		return 0;
 
-	if (cxl_region_contains_resource((struct resource *)res)) {
+	if (cxl_region_contains_resource(res)) {
 		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
 		return 0;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (3 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains() Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 16:46   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices Dan Williams
                   ` (6 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

No other module or use case should be using dax_hmem_initial_probe or
dax_hmem_flush_work(). Limit their use to dax_hmem, and dax_cxl
respectively.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/device.c | 2 +-
 drivers/dax/hmem/hmem.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 991a4bf7d969..675d56276d78 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -9,7 +9,7 @@ static bool nohmem;
 module_param_named(disable, nohmem, bool, 0444);
 
 bool dax_hmem_initial_probe;
-EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
+EXPORT_SYMBOL_FOR_MODULES(dax_hmem_initial_probe, "dax_hmem");
 
 static bool platform_initialized;
 static DEFINE_MUTEX(hmem_resource_lock);
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index b2ab1292fa81..dd3d7f93baee 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -74,7 +74,7 @@ void dax_hmem_flush_work(void)
 {
 	flush_work(&dax_hmem_work.work);
 }
-EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
+EXPORT_SYMBOL_FOR_MODULES(dax_hmem_flush_work, "dax_cxl");
 
 static int __hmem_register_device(struct device *host, int target_nid,
 				  const struct resource *res)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (4 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 17:06   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 7/9] dax/hmem: Parent dax_hmem devices Dan Williams
                   ` (5 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

dax_hmem (ab)uses a platform device to allow for a module to autoload in
the presence of "Soft Reserved" resources. The dax_hmem driver had no
dependencies on the "hmem_platform" device being a singleton until the
recent "dax_hmem vs dax_cxl" takeover solution.

Replace the layering violation of dax_hmem_work assuming that there will
never be more than one "hmem_platform" device associated with a global work
item with a dax_hmem local workqueue that can theoretically support any
number of hmem_platform devices.

Fixup the reference counting to only pin the device while it is live in the
queue.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.h         |  15 +++++-
 drivers/dax/hmem/device.c |  28 ++++++----
 drivers/dax/hmem/hmem.c   | 108 +++++++++++++++++++-------------------
 3 files changed, 85 insertions(+), 66 deletions(-)

diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index ebbfe2d6da14..7b1a83f1ce1f 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -3,7 +3,9 @@
 #ifndef __DAX_BUS_H__
 #define __DAX_BUS_H__
 #include <linux/device.h>
+#include <linux/platform_device.h>
 #include <linux/range.h>
+#include <linux/workqueue.h>
 
 struct dev_dax;
 struct resource;
@@ -49,8 +51,19 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
 void kill_dev_dax(struct dev_dax *dev_dax);
 bool static_dev_dax(struct dev_dax *dev_dax);
 
+struct hmem_platform_device {
+	struct platform_device pdev;
+	struct work_struct work;
+	bool did_probe;
+};
+
+static inline struct hmem_platform_device *
+to_hmem_platform_device(struct platform_device *pdev)
+{
+	return container_of(pdev, struct hmem_platform_device, pdev);
+}
+
 #if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
-extern bool dax_hmem_initial_probe;
 void dax_hmem_flush_work(void);
 #else
 static inline void dax_hmem_flush_work(void) { }
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 675d56276d78..d70359b4307b 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -4,13 +4,11 @@
 #include <linux/module.h>
 #include <linux/dax.h>
 #include <linux/mm.h>
+#include "../bus.h"
 
 static bool nohmem;
 module_param_named(disable, nohmem, bool, 0444);
 
-bool dax_hmem_initial_probe;
-EXPORT_SYMBOL_FOR_MODULES(dax_hmem_initial_probe, "dax_hmem");
-
 static bool platform_initialized;
 static DEFINE_MUTEX(hmem_resource_lock);
 static struct resource hmem_active = {
@@ -36,9 +34,21 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
 }
 EXPORT_SYMBOL_GPL(walk_hmem_resources);
 
+static void hmem_work(struct work_struct *work)
+{
+	/* place holder until dax_hmem driver attaches */
+}
+
+static struct hmem_platform_device hmem_platform = {
+	.pdev = {
+		.name = "hmem_platform",
+		.id = 0,
+	},
+	.work = __WORK_INITIALIZER(hmem_platform.work, hmem_work),
+};
+
 static void __hmem_register_resource(int target_nid, struct resource *res)
 {
-	struct platform_device *pdev;
 	struct resource *new;
 	int rc;
 
@@ -54,17 +64,13 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
 	if (platform_initialized)
 		return;
 
-	pdev = platform_device_alloc("hmem_platform", 0);
-	if (!pdev) {
+	rc = platform_device_register(&hmem_platform.pdev);
+	if (rc) {
 		pr_err_once("failed to register device-dax hmem_platform device\n");
 		return;
 	}
 
-	rc = platform_device_add(pdev);
-	if (rc)
-		platform_device_put(pdev);
-	else
-		platform_initialized = true;
+	platform_initialized = true;
 }
 
 void hmem_register_resource(int target_nid, struct resource *res)
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index dd3d7f93baee..e1dae83dae8d 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -59,20 +59,11 @@ static void release_hmem(void *pdev)
 	platform_device_unregister(pdev);
 }
 
-struct dax_defer_work {
-	struct platform_device *pdev;
-	struct work_struct work;
-};
-
-static void process_defer_work(struct work_struct *w);
-
-static struct dax_defer_work dax_hmem_work = {
-	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
-};
+static struct workqueue_struct *dax_hmem_wq;
 
 void dax_hmem_flush_work(void)
 {
-	flush_work(&dax_hmem_work.work);
+	flush_workqueue(dax_hmem_wq);
 }
 EXPORT_SYMBOL_FOR_MODULES(dax_hmem_flush_work, "dax_cxl");
 
@@ -134,24 +125,6 @@ static int __hmem_register_device(struct device *host, int target_nid,
 	return rc;
 }
 
-static int hmem_register_device(struct device *host, int target_nid,
-				const struct resource *res)
-{
-	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
-	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
-			      IORES_DESC_CXL) != REGION_DISJOINT) {
-		if (!dax_hmem_initial_probe) {
-			dev_dbg(host, "await CXL initial probe: %pr\n", res);
-			queue_work(system_long_wq, &dax_hmem_work.work);
-			return 0;
-		}
-		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-		return 0;
-	}
-
-	return __hmem_register_device(host, target_nid, res);
-}
-
 static int hmem_register_cxl_device(struct device *host, int target_nid,
 				    const struct resource *res)
 {
@@ -170,35 +143,55 @@ static int hmem_register_cxl_device(struct device *host, int target_nid,
 
 static void process_defer_work(struct work_struct *w)
 {
-	struct dax_defer_work *work = container_of(w, typeof(*work), work);
-	struct platform_device *pdev;
-
-	if (!work->pdev)
-		return;
-
-	pdev = work->pdev;
+	struct hmem_platform_device *hpdev = container_of(w, typeof(*hpdev), work);
+	struct device *dev = &hpdev->pdev.dev;
 
 	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
 	wait_for_device_probe();
 
-	guard(device)(&pdev->dev);
-	if (!pdev->dev.driver)
-		return;
+	guard(device)(dev);
+	if (!dev->driver)
+		goto out;
 
-	if (!dax_hmem_initial_probe) {
-		dax_hmem_initial_probe = true;
-		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
+	if (!hpdev->did_probe) {
+		hpdev->did_probe = true;
+		walk_hmem_resources(dev, hmem_register_cxl_device);
 	}
+out:
+	put_device(dev);
+}
+
+static int hmem_register_device(struct device *host, int target_nid,
+				const struct resource *res)
+{
+	struct platform_device *pdev = to_platform_device(host);
+	struct hmem_platform_device *hpdev = to_hmem_platform_device(pdev);
+
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
+	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) != REGION_DISJOINT) {
+		if (!hpdev->did_probe) {
+			dev_dbg(host, "await CXL initial probe: %pr\n", res);
+			hpdev->work.func = process_defer_work;
+			get_device(host);
+			if (!queue_work(dax_hmem_wq, &hpdev->work))
+				put_device(host);
+			return 0;
+		}
+		dev_dbg(host, "deferring range to CXL: %pr\n", res);
+		return 0;
+	}
+
+	return __hmem_register_device(host, target_nid, res);
 }
 
 static int dax_hmem_platform_probe(struct platform_device *pdev)
 {
-	if (work_pending(&dax_hmem_work.work))
-		return -EBUSY;
+	struct hmem_platform_device *hpdev = to_hmem_platform_device(pdev);
 
-	if (!dax_hmem_work.pdev)
-		dax_hmem_work.pdev =
-			to_platform_device(get_device(&pdev->dev));
+	/* queue is only flushed on module unload, fail rebind with pending work */
+	if (work_pending(&hpdev->work))
+		return -EBUSY;
 
 	return walk_hmem_resources(&pdev->dev, hmem_register_device);
 }
@@ -224,26 +217,33 @@ static __init int dax_hmem_init(void)
 		request_module("cxl_pci");
 	}
 
+	dax_hmem_wq = alloc_ordered_workqueue("dax_hmem_wq", 0);
+	if (!dax_hmem_wq)
+		return -ENOMEM;
+
 	rc = platform_driver_register(&dax_hmem_platform_driver);
 	if (rc)
-		return rc;
+		goto err_platform_driver;
 
 	rc = platform_driver_register(&dax_hmem_driver);
 	if (rc)
-		platform_driver_unregister(&dax_hmem_platform_driver);
+		goto err_driver;
+
+	return 0;
+
+err_driver:
+	platform_driver_unregister(&dax_hmem_platform_driver);
+err_platform_driver:
+	destroy_workqueue(dax_hmem_wq);
 
 	return rc;
 }
 
 static __exit void dax_hmem_exit(void)
 {
-	if (dax_hmem_work.pdev) {
-		flush_work(&dax_hmem_work.work);
-		put_device(&dax_hmem_work.pdev->dev);
-	}
-
 	platform_driver_unregister(&dax_hmem_driver);
 	platform_driver_unregister(&dax_hmem_platform_driver);
+	destroy_workqueue(dax_hmem_wq);
 }
 
 module_init(dax_hmem_init);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 7/9] dax/hmem: Parent dax_hmem devices
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (5 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 17:07   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure Dan Williams
                   ` (4 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

For test purposes it is useful to be able to determine which
"hmem_platform" device is hosting a given sub-device.

Register hmem devices underneath "hmem_platform".

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/hmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index e1dae83dae8d..af21f66bf872 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -96,6 +96,7 @@ static int __hmem_register_device(struct device *host, int target_nid,
 		return -ENOMEM;
 	}
 
+	pdev->dev.parent = host;
 	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
 	info = (struct memregion_info) {
 		.target_node = target_nid,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (6 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 7/9] dax/hmem: Parent dax_hmem devices Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 17:08   ` Dave Jiang
                     ` (2 more replies)
  2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
                   ` (3 subsequent siblings)
  11 siblings, 3 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Add a cxl_test module option to skip setting up one of the members of the
default auto-assembled region.

This simulates a device failing between firmware setup and OS boot, or
region configuration interrupted by an event like kexec.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/cxl.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 81e2aef3627a..7deeb7ff7bdf 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -16,6 +16,7 @@
 
 static int interleave_arithmetic;
 static bool extended_linear_cache;
+static bool fail_autoassemble;
 
 #define FAKE_QTG_ID	42
 
@@ -819,6 +820,12 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
 		return;
 	}
 
+	/* Simulate missing cxl_mem.4 configuration */
+	if (hb0 && pdev->id == 4 && cxld->id == 0 && fail_autoassemble) {
+		default_mock_decoder(cxld);
+		return;
+	}
+
 	base = window->base_hpa;
 	if (extended_linear_cache)
 		base += mock_auto_region_size;
@@ -1620,6 +1627,8 @@ module_param(interleave_arithmetic, int, 0444);
 MODULE_PARM_DESC(interleave_arithmetic, "Modulo:0, XOR:1");
 module_param(extended_linear_cache, bool, 0444);
 MODULE_PARM_DESC(extended_linear_cache, "Enable extended linear cache support");
+module_param(fail_autoassemble, bool, 0444);
+MODULE_PARM_DESC(fail_autoassemble, "Simulate missing member of an auto-region");
 module_init(cxl_test_init);
 module_exit(cxl_test_exit);
 MODULE_LICENSE("GPL v2");
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (7 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure Dan Williams
@ 2026-03-27  5:28 ` Dan Williams
  2026-03-27 17:10   ` Dave Jiang
                     ` (3 more replies)
  2026-03-27 23:42 ` [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Alison Schofield
                   ` (2 subsequent siblings)
  11 siblings, 4 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-27  5:28 UTC (permalink / raw)
  To: dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

When platform firmware is committed to publishing EFI_CONVENTIONAL_MEMORY
in the memory map, but CXL fails to assemble the region, dax_hmem can
attempt to attach a dax device to the memory range.

Take advantage of the new ability to support multiple "hmem_platform"
devices, and to enable regression testing of several scenarios:

* CXL correctly assembles a region, check dax_hmem fails to attach dax
* CXL fails to assemble a region, check dax_hmem successfully attaches dax
* Check that loading the dax_cxl driver loads the dax_hmem driver
* Attempt to race cxl_mock_mem async probe vs dax_hmem probe flushing.
  Check that both positive and negative cases.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/mock.h      |  8 +++++
 tools/testing/cxl/test/cxl.c       | 57 ++++++++++++++++++++++++++++++
 tools/testing/cxl/test/hmem_test.c | 47 ++++++++++++++++++++++++
 tools/testing/cxl/test/mem.c       |  3 ++
 tools/testing/cxl/test/mock.c      | 50 ++++++++++++++++++++++++++
 tools/testing/cxl/Kbuild           |  7 ++++
 tools/testing/cxl/test/Kbuild      |  1 +
 7 files changed, 173 insertions(+)
 create mode 100644 tools/testing/cxl/test/hmem_test.c

diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
index 2684b89c8aa2..4f57dc80ae7d 100644
--- a/tools/testing/cxl/test/mock.h
+++ b/tools/testing/cxl/test/mock.h
@@ -2,6 +2,7 @@
 
 #include <linux/list.h>
 #include <linux/acpi.h>
+#include <linux/dax.h>
 #include <cxl.h>
 
 struct cxl_mock_ops {
@@ -27,8 +28,15 @@ struct cxl_mock_ops {
 	int (*hmat_get_extended_linear_cache_size)(struct resource *backing_res,
 						   int nid,
 						   resource_size_t *cache_size);
+	int (*walk_hmem_resources)(struct device *host, walk_hmem_fn fn);
+	int (*region_intersects)(resource_size_t start, size_t size,
+				 unsigned long flags, unsigned long desc);
+	int (*region_intersects_soft_reserve)(resource_size_t start,
+					      size_t size);
 };
 
+int hmem_test_init(void);
+void hmem_test_exit(void);
 void register_cxl_mock_ops(struct cxl_mock_ops *ops);
 void unregister_cxl_mock_ops(struct cxl_mock_ops *ops);
 struct cxl_mock_ops *get_cxl_mock_ops(int *index);
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 7deeb7ff7bdf..9a9f52090c1d 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -1121,6 +1121,53 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
 	cxl_endpoint_get_perf_coordinates(port, ep_c);
 }
 
+/*
+ * Simulate that the first half of mock CXL Window 0 is "Soft Reserve" capacity
+ */
+static int mock_walk_hmem_resources(struct device *host, walk_hmem_fn fn)
+{
+	struct acpi_cedt_cfmws *cfmws = mock_cfmws[0];
+	struct resource window =
+		DEFINE_RES_MEM(cfmws->base_hpa, cfmws->window_size / 2);
+
+	dev_dbg(host, "walk cxl_test resource: %pr\n", &window);
+	return fn(host, 0, &window);
+}
+
+/*
+ * This should only be called by the dax_hmem case, treat mismatches (negative
+ * result) as "fallback to base region_intersects()". Simulate that the first
+ * half of mock CXL Window 0 is IORES_DESC_CXL capacity.
+ */
+static int mock_region_intersects(resource_size_t start, size_t size,
+				  unsigned long flags, unsigned long desc)
+{
+	struct resource res = DEFINE_RES_MEM(start, size);
+	struct acpi_cedt_cfmws *cfmws = mock_cfmws[0];
+	struct resource window =
+		DEFINE_RES_MEM(cfmws->base_hpa, cfmws->window_size / 2);
+
+	if (resource_overlaps(&res, &window))
+		return REGION_INTERSECTS;
+	pr_debug("warning: no cxl_test CXL intersection for %pr\n", &res);
+	return -1;
+}
+
+
+static int
+mock_region_intersects_soft_reserve(resource_size_t start, size_t size)
+{
+	struct resource res = DEFINE_RES_MEM(start, size);
+	struct acpi_cedt_cfmws *cfmws = mock_cfmws[0];
+	struct resource window =
+		DEFINE_RES_MEM(cfmws->base_hpa, cfmws->window_size / 2);
+
+	if (resource_overlaps(&res, &window))
+		return REGION_INTERSECTS;
+	pr_debug("warning: no cxl_test soft reserve intersection for %pr\n", &res);
+	return -1;
+}
+
 static struct cxl_mock_ops cxl_mock_ops = {
 	.is_mock_adev = is_mock_adev,
 	.is_mock_bridge = is_mock_bridge,
@@ -1136,6 +1183,9 @@ static struct cxl_mock_ops cxl_mock_ops = {
 	.devm_cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
 	.hmat_get_extended_linear_cache_size =
 		mock_hmat_get_extended_linear_cache_size,
+	.walk_hmem_resources = mock_walk_hmem_resources,
+	.region_intersects = mock_region_intersects,
+	.region_intersects_soft_reserve = mock_region_intersects_soft_reserve,
 	.list = LIST_HEAD_INIT(cxl_mock_ops.list),
 };
 
@@ -1561,8 +1611,14 @@ static __init int cxl_test_init(void)
 	if (rc)
 		goto err_root;
 
+	rc = hmem_test_init();
+	if (rc)
+		goto err_mem;
+
 	return 0;
 
+err_mem:
+	cxl_mem_exit();
 err_root:
 	platform_device_put(cxl_acpi);
 err_rch:
@@ -1600,6 +1656,7 @@ static __exit void cxl_test_exit(void)
 {
 	int i;
 
+	hmem_test_exit();
 	cxl_mem_exit();
 	platform_device_unregister(cxl_acpi);
 	cxl_rch_topo_exit();
diff --git a/tools/testing/cxl/test/hmem_test.c b/tools/testing/cxl/test/hmem_test.c
new file mode 100644
index 000000000000..3a1a089e1721
--- /dev/null
+++ b/tools/testing/cxl/test/hmem_test.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2026 Intel Corporation */
+#include <linux/moduleparam.h>
+#include <linux/workqueue.h>
+#include "../../../drivers/dax/bus.h"
+
+static bool hmem_test;
+
+static void hmem_test_work(struct work_struct *work)
+{
+}
+
+static void hmem_test_release(struct device *dev)
+{
+	struct hmem_platform_device *hpdev =
+		container_of(dev, typeof(*hpdev), pdev.dev);
+
+	memset(hpdev, 0, sizeof(*hpdev));
+}
+
+static struct hmem_platform_device hmem_test_device = {
+	.pdev = {
+		.name = "hmem_platform",
+		.id = 1,
+		.dev = {
+			.release = hmem_test_release,
+		},
+	},
+	.work = __WORK_INITIALIZER(hmem_test_device.work, hmem_test_work),
+};
+
+int hmem_test_init(void)
+{
+	if (!hmem_test)
+		return 0;
+
+	return platform_device_register(&hmem_test_device.pdev);
+}
+
+void hmem_test_exit(void)
+{
+	if (hmem_test)
+		platform_device_unregister(&hmem_test_device.pdev);
+}
+
+module_param(hmem_test, bool, 0444);
+MODULE_PARM_DESC(hmem_test, "Enable/disable the dax_hmem test platform device");
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index cb87e8c0e63c..cc847e9aeceb 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1695,6 +1695,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	struct cxl_dpa_info range_info = { 0 };
 	int rc;
 
+	/* Increase async probe race window */
+	usleep_range(500*1000, 1000*1000);
+
 	mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
 	if (!mdata)
 		return -ENOMEM;
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index b8fcb50c1027..6454b868b122 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -251,6 +251,56 @@ struct cxl_dport *__wrap_devm_cxl_add_dport_by_dev(struct cxl_port *port,
 }
 EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_dport_by_dev, "CXL");
 
+int __wrap_region_intersects(resource_size_t start, size_t size,
+			     unsigned long flags, unsigned long desc)
+{
+	int rc = -1;
+	int index;
+	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+	if (ops)
+		rc = ops->region_intersects(start, size, flags, desc);
+	if (rc < 0)
+		rc = region_intersects(start, size, flags, desc);
+	put_cxl_mock_ops(index);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(__wrap_region_intersects);
+
+int __wrap_region_intersects_soft_reserve(resource_size_t start, size_t size)
+{
+	int rc = -1;
+	int index;
+	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+	if (ops)
+		rc = ops->region_intersects_soft_reserve(start, size);
+	if (rc < 0)
+		rc = region_intersects_soft_reserve(start, size);
+	put_cxl_mock_ops(index);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(__wrap_region_intersects_soft_reserve);
+
+int __wrap_walk_hmem_resources(struct device *host, walk_hmem_fn fn)
+{
+	int index, rc = 0;
+	bool is_mock = strcmp(dev_name(host), "hmem_platform.1") == 0;
+	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+	if (is_mock) {
+		if (ops)
+			rc = ops->walk_hmem_resources(host, fn);
+	} else {
+		rc = walk_hmem_resources(host, fn);
+	}
+	put_cxl_mock_ops(index);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(__wrap_walk_hmem_resources);
+
 MODULE_LICENSE("GPL v2");
 MODULE_DESCRIPTION("cxl_test: emulation module");
 MODULE_IMPORT_NS("ACPI");
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 53d84a6874b7..540425c7cd41 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -11,8 +11,12 @@ ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
 ldflags-y += --wrap=hmat_get_extended_linear_cache_size
 ldflags-y += --wrap=devm_cxl_add_dport_by_dev
 ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup
+ldflags-y += --wrap=walk_hmem_resources
+ldflags-y += --wrap=region_intersects
+ldflags-y += --wrap=region_intersects_soft_reserve
 
 DRIVERS := ../../../drivers
+DAX_HMEM_SRC := $(DRIVERS)/dax/hmem
 CXL_SRC := $(DRIVERS)/cxl
 CXL_CORE_SRC := $(DRIVERS)/cxl/core
 ccflags-y := -I$(srctree)/drivers/cxl/
@@ -70,6 +74,9 @@ cxl_core-y += config_check.o
 cxl_core-y += cxl_core_test.o
 cxl_core-y += cxl_core_exports.o
 
+obj-m += dax_hmem.o
+dax_hmem-y := $(DAX_HMEM_SRC)/hmem.o
+
 KBUILD_CFLAGS := $(filter-out -Wmissing-prototypes -Wmissing-declarations, $(KBUILD_CFLAGS))
 
 obj-m += test/
diff --git a/tools/testing/cxl/test/Kbuild b/tools/testing/cxl/test/Kbuild
index af50972c8b6d..c168e3c998a7 100644
--- a/tools/testing/cxl/test/Kbuild
+++ b/tools/testing/cxl/test/Kbuild
@@ -7,6 +7,7 @@ obj-m += cxl_mock_mem.o
 obj-m += cxl_translate.o
 
 cxl_test-y := cxl.o
+cxl_test-y += hmem_test.o
 cxl_mock-y := mock.o
 cxl_mock_mem-y := mem.o
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
@ 2026-03-27 16:28   ` Dave Jiang
  2026-03-27 19:20   ` Alison Schofield
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 16:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, stable, Jonathan Cameron



On 3/26/26 10:28 PM, Dan Williams wrote:
> The following crash signature results from region destruction while an
> endpoint decoder is staged, but not fully attached.
> 
> ---
>  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
>  Read of size 8 at addr ffff888265638840 by task modprobe/1287
> 
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x68/0x90
>   print_report+0x170/0x4e2
>   kasan_report+0xc2/0x1a0
>   __cxl_decoder_detach+0x724/0x830 [cxl_core]
>   cxl_decoder_detach+0x6c/0x100 [cxl_core]
>   unregister_region+0x88/0x140 [cxl_core]
>   devres_release_all+0x172/0x230
> ---
> 
> The "staged" state is established by cxl_region_attach_auto() and finalized
> by cxl_region_attach_position(). When that is finalized a memdev removal
> event will destroy regions before endpoint decoders. However, in the
> interim the memdev removal will falsely assume that the endpoint decoder is
> unattached. Later, the eventual region removal finds the stale pointer to
> the now freed endpoint decoder.
> 
> Introduce CXL_DECODER_STATE_AUTO_STAGED and cxl_cancel_auto_attach() to
> cleanup this interim state.
> 
> Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
> Cc: <stable@vger.kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  drivers/cxl/cxl.h         |  6 +++--
>  drivers/cxl/core/region.c | 54 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 9b947286eb9b..30a31968f266 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -378,12 +378,14 @@ struct cxl_decoder {
>  };
>  
>  /*
> - * Track whether this decoder is reserved for region autodiscovery, or
> - * free for userspace provisioning.
> + * Track whether this decoder is free for userspace provisioning, reserved for
> + * region autodiscovery, whether it is started connecting (awaiting other
> + * peers), or has completed auto assembly.
>   */
>  enum cxl_decoder_state {
>  	CXL_DECODER_STATE_MANUAL,
>  	CXL_DECODER_STATE_AUTO,
> +	CXL_DECODER_STATE_AUTO_STAGED,
>  };
>  
>  /**
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f7b20f60ac5c..b72556c1458b 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1064,6 +1064,14 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
>  
>  	if (!cxld->region) {
>  		cxld->region = cxlr;
> +
> +		/*
> +		 * Now that cxld->region is set the intermediate staging state
> +		 * can be cleared.
> +		 */
> +		if (cxld == &cxled->cxld &&
> +		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
> +			cxled->state = CXL_DECODER_STATE_AUTO;
>  		get_device(&cxlr->dev);
>  	}
>  
> @@ -1805,6 +1813,7 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
>  	pos = p->nr_targets;
>  	p->targets[pos] = cxled;
>  	cxled->pos = pos;
> +	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;
>  	p->nr_targets++;
>  
>  	return 0;
> @@ -2154,6 +2163,47 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  	return 0;
>  }
>  
> +static int cxl_region_by_target(struct device *dev, const void *data)
> +{
> +	const struct cxl_endpoint_decoder *cxled = data;
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +
> +	if (!is_cxl_region(dev))
> +		return 0;
> +
> +	cxlr = to_cxl_region(dev);
> +	p = &cxlr->params;
> +	return p->targets[cxled->pos] == cxled;
> +}
> +
> +/*
> + * When an auto-region fails to assemble the decoder may be listed as a target,
> + * but not fully attached.
> + */
> +static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	int pos = cxled->pos;
> +
> +	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
> +		return;
> +
> +	struct device *dev __free(put_device) = bus_find_device(
> +		&cxl_bus_type, NULL, cxled, cxl_region_by_target);
> +	if (!dev)
> +		return;
> +
> +	cxlr = to_cxl_region(dev);
> +	p = &cxlr->params;
> +
> +	p->nr_targets--;
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +	cxled->pos = -1;
> +	p->targets[pos] = NULL;
> +}
> +
>  static struct cxl_region *
>  __cxl_decoder_detach(struct cxl_region *cxlr,
>  		     struct cxl_endpoint_decoder *cxled, int pos,
> @@ -2177,8 +2227,10 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
>  		cxled = p->targets[pos];
>  	} else {
>  		cxlr = cxled->cxld.region;
> -		if (!cxlr)
> +		if (!cxlr) {
> +			cxl_cancel_auto_attach(cxled);
>  			return NULL;
> +		}
>  		p = &cxlr->params;
>  	}
>  


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/9] dax/cxl: Fix HMEM dependencies
  2026-03-27  5:28 ` [PATCH 2/9] dax/cxl: Fix HMEM dependencies Dan Williams
@ 2026-03-27 16:29   ` Dave Jiang
  2026-03-27 23:44   ` Alison Schofield
  2026-03-30 21:10   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 16:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> The expectation is that DEV_DAX_HMEM=y should be disallowed if any of
> CXL_ACPI, or CXL_PCI are set =m. Also DEV_DAX_CXL=y should be disallowed if
> DEV_DAX_HMEM=m. Use "$config || !$config" syntax for each dependency.
> Otherwise, the invalid DEV_DAX_HMEM=m && DEV_DAX_CXL=y configuration is
> allowed.
> 
> Lastly, dax_hmem depends on the availability of the
> cxl_region_contains_resource() symbol published by the cxl_core.ko module.
> So, also prevent DEV_DAX_HMEM from being built-in when the cxl_core module
> is not built-in.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/dax/Kconfig | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index 3683bb3f2311..504f7f735ef5 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -32,6 +32,9 @@ config DEV_DAX_HMEM
>  	depends on EFI_SOFT_RESERVE
>  	select NUMA_KEEP_MEMINFO if NUMA_MEMBLKS
>  	default DEV_DAX
> +	depends on CXL_ACPI || !CXL_ACPI
> +	depends on CXL_PCI || !CXL_PCI
> +	depends on CXL_BUS || !CXL_BUS
>  	help
>  	  EFI 2.8 platforms, and others, may advertise 'specific purpose'
>  	  memory. For example, a high bandwidth memory pool. The
> @@ -48,8 +51,7 @@ config DEV_DAX_CXL
>  	tristate "CXL DAX: direct access to CXL RAM regions"
>  	depends on CXL_BUS && CXL_REGION && DEV_DAX
>  	default CXL_REGION && DEV_DAX
> -	depends on CXL_ACPI >= DEV_DAX_HMEM
> -	depends on CXL_PCI >= DEV_DAX_HMEM
> +	depends on DEV_DAX_HMEM || !DEV_DAX_HMEM
>  	help
>  	  CXL RAM regions are either mapped by platform-firmware
>  	  and published in the initial system-memory map as "System RAM", mapped


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource()
  2026-03-27  5:28 ` [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource() Dan Williams
@ 2026-03-27 16:39   ` Dave Jiang
  2026-03-27 23:45   ` Alison Schofield
  2026-03-30 22:19   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 16:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> The dax_hmem dependency on cxl_region_contains_resource() is a one-off
> special case. It is not suitable for other use cases.
> 
> Move the definition to the other CONFIG_CXL_REGION guarded definitions in
> drivers/cxl/cxl.h and include that by a relative path include. This matches
> what drivers/dax/cxl.c does for its limited private usage of CXL core
> symbols.
> 
> Reduce the symbol export visibility from global to just dax_hmem, to
> further clarify its applicability.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/cxl/cxl.h         |  5 +++++
>  include/cxl/cxl.h         | 15 ---------------
>  drivers/cxl/core/region.c |  3 +--
>  drivers/dax/hmem/hmem.c   |  2 +-
>  4 files changed, 7 insertions(+), 18 deletions(-)
>  delete mode 100644 include/cxl/cxl.h
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 30a31968f266..84ad04a02bde 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -941,6 +941,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
>  int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
>  struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
>  u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
> +bool cxl_region_contains_resource(struct resource *res);
>  #else
>  static inline bool is_cxl_pmem_region(struct device *dev)
>  {
> @@ -963,6 +964,10 @@ static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
>  {
>  	return 0;
>  }
> +static inline bool cxl_region_contains_resource(struct resource *res)
> +{
> +	return false;
> +}
>  #endif
>  
>  void cxl_endpoint_parse_cdat(struct cxl_port *port);
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> deleted file mode 100644
> index b12d3d0f6658..000000000000
> --- a/include/cxl/cxl.h
> +++ /dev/null
> @@ -1,15 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/* Copyright (c) 2026 Advanced Micro Devices, Inc. */
> -#ifndef _CXL_H_
> -#define _CXL_H_
> -
> -#ifdef CONFIG_CXL_REGION
> -bool cxl_region_contains_resource(struct resource *res);
> -#else
> -static inline bool cxl_region_contains_resource(struct resource *res)
> -{
> -	return false;
> -}
> -#endif
> -
> -#endif /* _CXL_H_ */
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index b72556c1458b..12a9572f34d5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -12,7 +12,6 @@
>  #include <linux/idr.h>
>  #include <linux/memory-tiers.h>
>  #include <linux/string_choices.h>
> -#include <cxl/cxl.h>
>  #include <cxlmem.h>
>  #include <cxl.h>
>  #include "core.h"
> @@ -4253,7 +4252,7 @@ bool cxl_region_contains_resource(struct resource *res)
>  	return bus_for_each_dev(&cxl_bus_type, NULL, res,
>  				region_contains_resource) != 0;
>  }
> -EXPORT_SYMBOL_GPL(cxl_region_contains_resource);
> +EXPORT_SYMBOL_FOR_MODULES(cxl_region_contains_resource, "dax_hmem");
>  
>  static int cxl_region_can_probe(struct cxl_region *cxlr)
>  {
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 9ceda6b5cadf..0051e553c33f 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,7 +3,7 @@
>  #include <linux/memregion.h>
>  #include <linux/module.h>
>  #include <linux/dax.h>
> -#include <cxl/cxl.h>
> +#include "../../cxl/cxl.h"
>  #include "../bus.h"
>  
>  static bool region_idle;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains()
  2026-03-27  5:28 ` [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains() Dan Williams
@ 2026-03-27 16:40   ` Dave Jiang
  2026-03-27 23:45   ` Alison Schofield
  2026-03-30 22:22   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 16:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> The call to cxl_region_resource_contains() in hmem_register_cxl_device()
> need not cast away 'const'. The problem is the usage of the
> bus_for_each_dev() API which does not mark its @data parameter as 'const'.
> Switch to bus_find_device() which does take 'const' @data, fixup
> cxl_region_resource_contains() and its caller.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/cxl/cxl.h         |  4 ++--
>  drivers/cxl/core/region.c | 11 ++++++-----
>  drivers/dax/hmem/hmem.c   |  2 +-
>  3 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 84ad04a02bde..340bdc9fcacc 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -941,7 +941,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
>  int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
>  struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
>  u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
> -bool cxl_region_contains_resource(struct resource *res);
> +bool cxl_region_contains_resource(const struct resource *res);
>  #else
>  static inline bool is_cxl_pmem_region(struct device *dev)
>  {
> @@ -964,7 +964,7 @@ static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
>  {
>  	return 0;
>  }
> -static inline bool cxl_region_contains_resource(struct resource *res)
> +static inline bool cxl_region_contains_resource(const struct resource *res)
>  {
>  	return false;
>  }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 12a9572f34d5..a8b183f2d9c5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -4225,9 +4225,9 @@ static int cxl_region_setup_poison(struct cxl_region *cxlr)
>  	return devm_add_action_or_reset(dev, remove_debugfs, dentry);
>  }
>  
> -static int region_contains_resource(struct device *dev, void *data)
> +static int region_contains_resource(struct device *dev, const void *data)
>  {
> -	struct resource *res = data;
> +	const struct resource *res = data;
>  	struct cxl_region *cxlr;
>  	struct cxl_region_params *p;
>  
> @@ -4246,11 +4246,12 @@ static int region_contains_resource(struct device *dev, void *data)
>  	return resource_contains(p->res, res) ? 1 : 0;
>  }
>  
> -bool cxl_region_contains_resource(struct resource *res)
> +bool cxl_region_contains_resource(const struct resource *res)
>  {
>  	guard(rwsem_read)(&cxl_rwsem.region);
> -	return bus_for_each_dev(&cxl_bus_type, NULL, res,
> -				region_contains_resource) != 0;
> +	struct device *dev __free(put_device) = bus_find_device(
> +		&cxl_bus_type, NULL, res, region_contains_resource);
> +	return !!dev;
>  }
>  EXPORT_SYMBOL_FOR_MODULES(cxl_region_contains_resource, "dax_hmem");
>  
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 0051e553c33f..b2ab1292fa81 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -159,7 +159,7 @@ static int hmem_register_cxl_device(struct device *host, int target_nid,
>  			      IORES_DESC_CXL) == REGION_DISJOINT)
>  		return 0;
>  
> -	if (cxl_region_contains_resource((struct resource *)res)) {
> +	if (cxl_region_contains_resource(res)) {
>  		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
>  		return 0;
>  	}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols
  2026-03-27  5:28 ` [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols Dan Williams
@ 2026-03-27 16:46   ` Dave Jiang
  2026-03-27 23:46   ` Alison Schofield
  2026-03-30 22:26   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 16:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> No other module or use case should be using dax_hmem_initial_probe or
> dax_hmem_flush_work(). Limit their use to dax_hmem, and dax_cxl
> respectively.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  drivers/dax/hmem/device.c | 2 +-
>  drivers/dax/hmem/hmem.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 991a4bf7d969..675d56276d78 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -9,7 +9,7 @@ static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
>  
>  bool dax_hmem_initial_probe;
> -EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
> +EXPORT_SYMBOL_FOR_MODULES(dax_hmem_initial_probe, "dax_hmem");
>  
>  static bool platform_initialized;
>  static DEFINE_MUTEX(hmem_resource_lock);
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index b2ab1292fa81..dd3d7f93baee 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -74,7 +74,7 @@ void dax_hmem_flush_work(void)
>  {
>  	flush_work(&dax_hmem_work.work);
>  }
> -EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> +EXPORT_SYMBOL_FOR_MODULES(dax_hmem_flush_work, "dax_cxl");
>  
>  static int __hmem_register_device(struct device *host, int target_nid,
>  				  const struct resource *res)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
  2026-03-27  5:28 ` [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices Dan Williams
@ 2026-03-27 17:06   ` Dave Jiang
  2026-03-27 23:46   ` Alison Schofield
  2026-03-31 17:32   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 17:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> dax_hmem (ab)uses a platform device to allow for a module to autoload in
> the presence of "Soft Reserved" resources. The dax_hmem driver had no
> dependencies on the "hmem_platform" device being a singleton until the
> recent "dax_hmem vs dax_cxl" takeover solution.
> 
> Replace the layering violation of dax_hmem_work assuming that there will
> never be more than one "hmem_platform" device associated with a global work
> item with a dax_hmem local workqueue that can theoretically support any
> number of hmem_platform devices.
> 
> Fixup the reference counting to only pin the device while it is live in the
> queue.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  drivers/dax/bus.h         |  15 +++++-
>  drivers/dax/hmem/device.c |  28 ++++++----
>  drivers/dax/hmem/hmem.c   | 108 +++++++++++++++++++-------------------
>  3 files changed, 85 insertions(+), 66 deletions(-)
> 
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index ebbfe2d6da14..7b1a83f1ce1f 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -3,7 +3,9 @@
>  #ifndef __DAX_BUS_H__
>  #define __DAX_BUS_H__
>  #include <linux/device.h>
> +#include <linux/platform_device.h>
>  #include <linux/range.h>
> +#include <linux/workqueue.h>
>  
>  struct dev_dax;
>  struct resource;
> @@ -49,8 +51,19 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
>  void kill_dev_dax(struct dev_dax *dev_dax);
>  bool static_dev_dax(struct dev_dax *dev_dax);
>  
> +struct hmem_platform_device {
> +	struct platform_device pdev;
> +	struct work_struct work;
> +	bool did_probe;
> +};
> +
> +static inline struct hmem_platform_device *
> +to_hmem_platform_device(struct platform_device *pdev)
> +{
> +	return container_of(pdev, struct hmem_platform_device, pdev);
> +}
> +
>  #if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
> -extern bool dax_hmem_initial_probe;
>  void dax_hmem_flush_work(void);
>  #else
>  static inline void dax_hmem_flush_work(void) { }
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 675d56276d78..d70359b4307b 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -4,13 +4,11 @@
>  #include <linux/module.h>
>  #include <linux/dax.h>
>  #include <linux/mm.h>
> +#include "../bus.h"
>  
>  static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
>  
> -bool dax_hmem_initial_probe;
> -EXPORT_SYMBOL_FOR_MODULES(dax_hmem_initial_probe, "dax_hmem");
> -
>  static bool platform_initialized;
>  static DEFINE_MUTEX(hmem_resource_lock);
>  static struct resource hmem_active = {
> @@ -36,9 +34,21 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
>  }
>  EXPORT_SYMBOL_GPL(walk_hmem_resources);
>  
> +static void hmem_work(struct work_struct *work)
> +{
> +	/* place holder until dax_hmem driver attaches */
> +}
> +
> +static struct hmem_platform_device hmem_platform = {
> +	.pdev = {
> +		.name = "hmem_platform",
> +		.id = 0,
> +	},
> +	.work = __WORK_INITIALIZER(hmem_platform.work, hmem_work),
> +};
> +
>  static void __hmem_register_resource(int target_nid, struct resource *res)
>  {
> -	struct platform_device *pdev;
>  	struct resource *new;
>  	int rc;
>  
> @@ -54,17 +64,13 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>  	if (platform_initialized)
>  		return;
>  
> -	pdev = platform_device_alloc("hmem_platform", 0);
> -	if (!pdev) {
> +	rc = platform_device_register(&hmem_platform.pdev);
> +	if (rc) {
>  		pr_err_once("failed to register device-dax hmem_platform device\n");
>  		return;
>  	}
>  
> -	rc = platform_device_add(pdev);
> -	if (rc)
> -		platform_device_put(pdev);
> -	else
> -		platform_initialized = true;
> +	platform_initialized = true;
>  }
>  
>  void hmem_register_resource(int target_nid, struct resource *res)
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index dd3d7f93baee..e1dae83dae8d 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -59,20 +59,11 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> -struct dax_defer_work {
> -	struct platform_device *pdev;
> -	struct work_struct work;
> -};
> -
> -static void process_defer_work(struct work_struct *w);
> -
> -static struct dax_defer_work dax_hmem_work = {
> -	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
> -};
> +static struct workqueue_struct *dax_hmem_wq;
>  
>  void dax_hmem_flush_work(void)
>  {
> -	flush_work(&dax_hmem_work.work);
> +	flush_workqueue(dax_hmem_wq);
>  }
>  EXPORT_SYMBOL_FOR_MODULES(dax_hmem_flush_work, "dax_cxl");
>  
> @@ -134,24 +125,6 @@ static int __hmem_register_device(struct device *host, int target_nid,
>  	return rc;
>  }
>  
> -static int hmem_register_device(struct device *host, int target_nid,
> -				const struct resource *res)
> -{
> -	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> -	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> -			      IORES_DESC_CXL) != REGION_DISJOINT) {
> -		if (!dax_hmem_initial_probe) {
> -			dev_dbg(host, "await CXL initial probe: %pr\n", res);
> -			queue_work(system_long_wq, &dax_hmem_work.work);
> -			return 0;
> -		}
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> -	}
> -
> -	return __hmem_register_device(host, target_nid, res);
> -}
> -
>  static int hmem_register_cxl_device(struct device *host, int target_nid,
>  				    const struct resource *res)
>  {
> @@ -170,35 +143,55 @@ static int hmem_register_cxl_device(struct device *host, int target_nid,
>  
>  static void process_defer_work(struct work_struct *w)
>  {
> -	struct dax_defer_work *work = container_of(w, typeof(*work), work);
> -	struct platform_device *pdev;
> -
> -	if (!work->pdev)
> -		return;
> -
> -	pdev = work->pdev;
> +	struct hmem_platform_device *hpdev = container_of(w, typeof(*hpdev), work);
> +	struct device *dev = &hpdev->pdev.dev;
>  
>  	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
>  	wait_for_device_probe();
>  
> -	guard(device)(&pdev->dev);
> -	if (!pdev->dev.driver)
> -		return;
> +	guard(device)(dev);
> +	if (!dev->driver)
> +		goto out;
>  
> -	if (!dax_hmem_initial_probe) {
> -		dax_hmem_initial_probe = true;
> -		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> +	if (!hpdev->did_probe) {
> +		hpdev->did_probe = true;
> +		walk_hmem_resources(dev, hmem_register_cxl_device);
>  	}
> +out:
> +	put_device(dev);
> +}
> +
> +static int hmem_register_device(struct device *host, int target_nid,
> +				const struct resource *res)
> +{
> +	struct platform_device *pdev = to_platform_device(host);
> +	struct hmem_platform_device *hpdev = to_hmem_platform_device(pdev);
> +
> +	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> +	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (!hpdev->did_probe) {
> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
> +			hpdev->work.func = process_defer_work;
> +			get_device(host);
> +			if (!queue_work(dax_hmem_wq, &hpdev->work))
> +				put_device(host);
> +			return 0;
> +		}
> +		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +		return 0;
> +	}
> +
> +	return __hmem_register_device(host, target_nid, res);
>  }
>  
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> -	if (work_pending(&dax_hmem_work.work))
> -		return -EBUSY;
> +	struct hmem_platform_device *hpdev = to_hmem_platform_device(pdev);
>  
> -	if (!dax_hmem_work.pdev)
> -		dax_hmem_work.pdev =
> -			to_platform_device(get_device(&pdev->dev));
> +	/* queue is only flushed on module unload, fail rebind with pending work */
> +	if (work_pending(&hpdev->work))
> +		return -EBUSY;
>  
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>  }
> @@ -224,26 +217,33 @@ static __init int dax_hmem_init(void)
>  		request_module("cxl_pci");
>  	}
>  
> +	dax_hmem_wq = alloc_ordered_workqueue("dax_hmem_wq", 0);
> +	if (!dax_hmem_wq)
> +		return -ENOMEM;
> +
>  	rc = platform_driver_register(&dax_hmem_platform_driver);
>  	if (rc)
> -		return rc;
> +		goto err_platform_driver;
>  
>  	rc = platform_driver_register(&dax_hmem_driver);
>  	if (rc)
> -		platform_driver_unregister(&dax_hmem_platform_driver);
> +		goto err_driver;
> +
> +	return 0;
> +
> +err_driver:
> +	platform_driver_unregister(&dax_hmem_platform_driver);
> +err_platform_driver:
> +	destroy_workqueue(dax_hmem_wq);
>  
>  	return rc;
>  }
>  
>  static __exit void dax_hmem_exit(void)
>  {
> -	if (dax_hmem_work.pdev) {
> -		flush_work(&dax_hmem_work.work);
> -		put_device(&dax_hmem_work.pdev->dev);
> -	}
> -
>  	platform_driver_unregister(&dax_hmem_driver);
>  	platform_driver_unregister(&dax_hmem_platform_driver);
> +	destroy_workqueue(dax_hmem_wq);
>  }
>  
>  module_init(dax_hmem_init);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 7/9] dax/hmem: Parent dax_hmem devices
  2026-03-27  5:28 ` [PATCH 7/9] dax/hmem: Parent dax_hmem devices Dan Williams
@ 2026-03-27 17:07   ` Dave Jiang
  2026-03-27 23:47   ` Alison Schofield
  2026-03-31 17:42   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 17:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> For test purposes it is useful to be able to determine which
> "hmem_platform" device is hosting a given sub-device.
> 
> Register hmem devices underneath "hmem_platform".
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  drivers/dax/hmem/hmem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index e1dae83dae8d..af21f66bf872 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -96,6 +96,7 @@ static int __hmem_register_device(struct device *host, int target_nid,
>  		return -ENOMEM;
>  	}
>  
> +	pdev->dev.parent = host;
>  	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
>  	info = (struct memregion_info) {
>  		.target_node = target_nid,


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure
  2026-03-27  5:28 ` [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure Dan Williams
@ 2026-03-27 17:08   ` Dave Jiang
  2026-03-27 23:48   ` Alison Schofield
  2026-03-31 17:43   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 17:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> Add a cxl_test module option to skip setting up one of the members of the
> default auto-assembled region.
> 
> This simulates a device failing between firmware setup and OS boot, or
> region configuration interrupted by an event like kexec.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  tools/testing/cxl/test/cxl.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 81e2aef3627a..7deeb7ff7bdf 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -16,6 +16,7 @@
>  
>  static int interleave_arithmetic;
>  static bool extended_linear_cache;
> +static bool fail_autoassemble;
>  
>  #define FAKE_QTG_ID	42
>  
> @@ -819,6 +820,12 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
>  		return;
>  	}
>  
> +	/* Simulate missing cxl_mem.4 configuration */
> +	if (hb0 && pdev->id == 4 && cxld->id == 0 && fail_autoassemble) {
> +		default_mock_decoder(cxld);
> +		return;
> +	}
> +
>  	base = window->base_hpa;
>  	if (extended_linear_cache)
>  		base += mock_auto_region_size;
> @@ -1620,6 +1627,8 @@ module_param(interleave_arithmetic, int, 0444);
>  MODULE_PARM_DESC(interleave_arithmetic, "Modulo:0, XOR:1");
>  module_param(extended_linear_cache, bool, 0444);
>  MODULE_PARM_DESC(extended_linear_cache, "Enable extended linear cache support");
> +module_param(fail_autoassemble, bool, 0444);
> +MODULE_PARM_DESC(fail_autoassemble, "Simulate missing member of an auto-region");
>  module_init(cxl_test_init);
>  module_exit(cxl_test_exit);
>  MODULE_LICENSE("GPL v2");


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions
  2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
@ 2026-03-27 17:10   ` Dave Jiang
  2026-03-27 23:58   ` Alison Schofield
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-27 17:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa



On 3/26/26 10:28 PM, Dan Williams wrote:
> When platform firmware is committed to publishing EFI_CONVENTIONAL_MEMORY
> in the memory map, but CXL fails to assemble the region, dax_hmem can
> attempt to attach a dax device to the memory range.
> 
> Take advantage of the new ability to support multiple "hmem_platform"
> devices, and to enable regression testing of several scenarios:
> 
> * CXL correctly assembles a region, check dax_hmem fails to attach dax
> * CXL fails to assemble a region, check dax_hmem successfully attaches dax
> * Check that loading the dax_cxl driver loads the dax_hmem driver
> * Attempt to race cxl_mock_mem async probe vs dax_hmem probe flushing.
>   Check that both positive and negative cases.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  tools/testing/cxl/test/mock.h      |  8 +++++
>  tools/testing/cxl/test/cxl.c       | 57 ++++++++++++++++++++++++++++++
>  tools/testing/cxl/test/hmem_test.c | 47 ++++++++++++++++++++++++
>  tools/testing/cxl/test/mem.c       |  3 ++
>  tools/testing/cxl/test/mock.c      | 50 ++++++++++++++++++++++++++
>  tools/testing/cxl/Kbuild           |  7 ++++
>  tools/testing/cxl/test/Kbuild      |  1 +
>  7 files changed, 173 insertions(+)
>  create mode 100644 tools/testing/cxl/test/hmem_test.c
> 
> diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
> index 2684b89c8aa2..4f57dc80ae7d 100644
> --- a/tools/testing/cxl/test/mock.h
> +++ b/tools/testing/cxl/test/mock.h
> @@ -2,6 +2,7 @@
>  
>  #include <linux/list.h>
>  #include <linux/acpi.h>
> +#include <linux/dax.h>
>  #include <cxl.h>
>  
>  struct cxl_mock_ops {
> @@ -27,8 +28,15 @@ struct cxl_mock_ops {
>  	int (*hmat_get_extended_linear_cache_size)(struct resource *backing_res,
>  						   int nid,
>  						   resource_size_t *cache_size);
> +	int (*walk_hmem_resources)(struct device *host, walk_hmem_fn fn);
> +	int (*region_intersects)(resource_size_t start, size_t size,
> +				 unsigned long flags, unsigned long desc);
> +	int (*region_intersects_soft_reserve)(resource_size_t start,
> +					      size_t size);
>  };
>  
> +int hmem_test_init(void);
> +void hmem_test_exit(void);
>  void register_cxl_mock_ops(struct cxl_mock_ops *ops);
>  void unregister_cxl_mock_ops(struct cxl_mock_ops *ops);
>  struct cxl_mock_ops *get_cxl_mock_ops(int *index);
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 7deeb7ff7bdf..9a9f52090c1d 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -1121,6 +1121,53 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>  	cxl_endpoint_get_perf_coordinates(port, ep_c);
>  }
>  
> +/*
> + * Simulate that the first half of mock CXL Window 0 is "Soft Reserve" capacity
> + */
> +static int mock_walk_hmem_resources(struct device *host, walk_hmem_fn fn)
> +{
> +	struct acpi_cedt_cfmws *cfmws = mock_cfmws[0];
> +	struct resource window =
> +		DEFINE_RES_MEM(cfmws->base_hpa, cfmws->window_size / 2);
> +
> +	dev_dbg(host, "walk cxl_test resource: %pr\n", &window);
> +	return fn(host, 0, &window);
> +}
> +
> +/*
> + * This should only be called by the dax_hmem case, treat mismatches (negative
> + * result) as "fallback to base region_intersects()". Simulate that the first
> + * half of mock CXL Window 0 is IORES_DESC_CXL capacity.
> + */
> +static int mock_region_intersects(resource_size_t start, size_t size,
> +				  unsigned long flags, unsigned long desc)
> +{
> +	struct resource res = DEFINE_RES_MEM(start, size);
> +	struct acpi_cedt_cfmws *cfmws = mock_cfmws[0];
> +	struct resource window =
> +		DEFINE_RES_MEM(cfmws->base_hpa, cfmws->window_size / 2);
> +
> +	if (resource_overlaps(&res, &window))
> +		return REGION_INTERSECTS;
> +	pr_debug("warning: no cxl_test CXL intersection for %pr\n", &res);
> +	return -1;
> +}
> +
> +
> +static int
> +mock_region_intersects_soft_reserve(resource_size_t start, size_t size)
> +{
> +	struct resource res = DEFINE_RES_MEM(start, size);
> +	struct acpi_cedt_cfmws *cfmws = mock_cfmws[0];
> +	struct resource window =
> +		DEFINE_RES_MEM(cfmws->base_hpa, cfmws->window_size / 2);
> +
> +	if (resource_overlaps(&res, &window))
> +		return REGION_INTERSECTS;
> +	pr_debug("warning: no cxl_test soft reserve intersection for %pr\n", &res);
> +	return -1;
> +}
> +
>  static struct cxl_mock_ops cxl_mock_ops = {
>  	.is_mock_adev = is_mock_adev,
>  	.is_mock_bridge = is_mock_bridge,
> @@ -1136,6 +1183,9 @@ static struct cxl_mock_ops cxl_mock_ops = {
>  	.devm_cxl_add_dport_by_dev = mock_cxl_add_dport_by_dev,
>  	.hmat_get_extended_linear_cache_size =
>  		mock_hmat_get_extended_linear_cache_size,
> +	.walk_hmem_resources = mock_walk_hmem_resources,
> +	.region_intersects = mock_region_intersects,
> +	.region_intersects_soft_reserve = mock_region_intersects_soft_reserve,
>  	.list = LIST_HEAD_INIT(cxl_mock_ops.list),
>  };
>  
> @@ -1561,8 +1611,14 @@ static __init int cxl_test_init(void)
>  	if (rc)
>  		goto err_root;
>  
> +	rc = hmem_test_init();
> +	if (rc)
> +		goto err_mem;
> +
>  	return 0;
>  
> +err_mem:
> +	cxl_mem_exit();
>  err_root:
>  	platform_device_put(cxl_acpi);
>  err_rch:
> @@ -1600,6 +1656,7 @@ static __exit void cxl_test_exit(void)
>  {
>  	int i;
>  
> +	hmem_test_exit();
>  	cxl_mem_exit();
>  	platform_device_unregister(cxl_acpi);
>  	cxl_rch_topo_exit();
> diff --git a/tools/testing/cxl/test/hmem_test.c b/tools/testing/cxl/test/hmem_test.c
> new file mode 100644
> index 000000000000..3a1a089e1721
> --- /dev/null
> +++ b/tools/testing/cxl/test/hmem_test.c
> @@ -0,0 +1,47 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2026 Intel Corporation */
> +#include <linux/moduleparam.h>
> +#include <linux/workqueue.h>
> +#include "../../../drivers/dax/bus.h"
> +
> +static bool hmem_test;
> +
> +static void hmem_test_work(struct work_struct *work)
> +{
> +}
> +
> +static void hmem_test_release(struct device *dev)
> +{
> +	struct hmem_platform_device *hpdev =
> +		container_of(dev, typeof(*hpdev), pdev.dev);
> +
> +	memset(hpdev, 0, sizeof(*hpdev));
> +}
> +
> +static struct hmem_platform_device hmem_test_device = {
> +	.pdev = {
> +		.name = "hmem_platform",
> +		.id = 1,
> +		.dev = {
> +			.release = hmem_test_release,
> +		},
> +	},
> +	.work = __WORK_INITIALIZER(hmem_test_device.work, hmem_test_work),
> +};
> +
> +int hmem_test_init(void)
> +{
> +	if (!hmem_test)
> +		return 0;
> +
> +	return platform_device_register(&hmem_test_device.pdev);
> +}
> +
> +void hmem_test_exit(void)
> +{
> +	if (hmem_test)
> +		platform_device_unregister(&hmem_test_device.pdev);
> +}
> +
> +module_param(hmem_test, bool, 0444);
> +MODULE_PARM_DESC(hmem_test, "Enable/disable the dax_hmem test platform device");
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index cb87e8c0e63c..cc847e9aeceb 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1695,6 +1695,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	struct cxl_dpa_info range_info = { 0 };
>  	int rc;
>  
> +	/* Increase async probe race window */
> +	usleep_range(500*1000, 1000*1000);
> +
>  	mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
>  	if (!mdata)
>  		return -ENOMEM;
> diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
> index b8fcb50c1027..6454b868b122 100644
> --- a/tools/testing/cxl/test/mock.c
> +++ b/tools/testing/cxl/test/mock.c
> @@ -251,6 +251,56 @@ struct cxl_dport *__wrap_devm_cxl_add_dport_by_dev(struct cxl_port *port,
>  }
>  EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_dport_by_dev, "CXL");
>  
> +int __wrap_region_intersects(resource_size_t start, size_t size,
> +			     unsigned long flags, unsigned long desc)
> +{
> +	int rc = -1;
> +	int index;
> +	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> +
> +	if (ops)
> +		rc = ops->region_intersects(start, size, flags, desc);
> +	if (rc < 0)
> +		rc = region_intersects(start, size, flags, desc);
> +	put_cxl_mock_ops(index);
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(__wrap_region_intersects);
> +
> +int __wrap_region_intersects_soft_reserve(resource_size_t start, size_t size)
> +{
> +	int rc = -1;
> +	int index;
> +	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> +
> +	if (ops)
> +		rc = ops->region_intersects_soft_reserve(start, size);
> +	if (rc < 0)
> +		rc = region_intersects_soft_reserve(start, size);
> +	put_cxl_mock_ops(index);
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(__wrap_region_intersects_soft_reserve);
> +
> +int __wrap_walk_hmem_resources(struct device *host, walk_hmem_fn fn)
> +{
> +	int index, rc = 0;
> +	bool is_mock = strcmp(dev_name(host), "hmem_platform.1") == 0;
> +	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
> +
> +	if (is_mock) {
> +		if (ops)
> +			rc = ops->walk_hmem_resources(host, fn);
> +	} else {
> +		rc = walk_hmem_resources(host, fn);
> +	}
> +	put_cxl_mock_ops(index);
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(__wrap_walk_hmem_resources);
> +
>  MODULE_LICENSE("GPL v2");
>  MODULE_DESCRIPTION("cxl_test: emulation module");
>  MODULE_IMPORT_NS("ACPI");
> diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
> index 53d84a6874b7..540425c7cd41 100644
> --- a/tools/testing/cxl/Kbuild
> +++ b/tools/testing/cxl/Kbuild
> @@ -11,8 +11,12 @@ ldflags-y += --wrap=devm_cxl_endpoint_decoders_setup
>  ldflags-y += --wrap=hmat_get_extended_linear_cache_size
>  ldflags-y += --wrap=devm_cxl_add_dport_by_dev
>  ldflags-y += --wrap=devm_cxl_switch_port_decoders_setup
> +ldflags-y += --wrap=walk_hmem_resources
> +ldflags-y += --wrap=region_intersects
> +ldflags-y += --wrap=region_intersects_soft_reserve
>  
>  DRIVERS := ../../../drivers
> +DAX_HMEM_SRC := $(DRIVERS)/dax/hmem
>  CXL_SRC := $(DRIVERS)/cxl
>  CXL_CORE_SRC := $(DRIVERS)/cxl/core
>  ccflags-y := -I$(srctree)/drivers/cxl/
> @@ -70,6 +74,9 @@ cxl_core-y += config_check.o
>  cxl_core-y += cxl_core_test.o
>  cxl_core-y += cxl_core_exports.o
>  
> +obj-m += dax_hmem.o
> +dax_hmem-y := $(DAX_HMEM_SRC)/hmem.o
> +
>  KBUILD_CFLAGS := $(filter-out -Wmissing-prototypes -Wmissing-declarations, $(KBUILD_CFLAGS))
>  
>  obj-m += test/
> diff --git a/tools/testing/cxl/test/Kbuild b/tools/testing/cxl/test/Kbuild
> index af50972c8b6d..c168e3c998a7 100644
> --- a/tools/testing/cxl/test/Kbuild
> +++ b/tools/testing/cxl/test/Kbuild
> @@ -7,6 +7,7 @@ obj-m += cxl_mock_mem.o
>  obj-m += cxl_translate.o
>  
>  cxl_test-y := cxl.o
> +cxl_test-y += hmem_test.o
>  cxl_mock-y := mock.o
>  cxl_mock_mem-y := mem.o
>  


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
  2026-03-27 16:28   ` Dave Jiang
@ 2026-03-27 19:20   ` Alison Schofield
  2026-03-27 21:54     ` Dan Williams
  2026-03-27 23:43   ` Alison Schofield
  2026-03-30 20:24   ` Ira Weiny
  3 siblings, 1 reply; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 19:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa,
	stable, Jonathan Cameron

On Thu, Mar 26, 2026 at 10:28:13PM -0700, Dan Williams wrote:
> The following crash signature results from region destruction while an
> endpoint decoder is staged, but not fully attached.
> 
> ---
>  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
>  Read of size 8 at addr ffff888265638840 by task modprobe/1287
> 
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x68/0x90
>   print_report+0x170/0x4e2
>   kasan_report+0xc2/0x1a0
>   __cxl_decoder_detach+0x724/0x830 [cxl_core]
>   cxl_decoder_detach+0x6c/0x100 [cxl_core]
>   unregister_region+0x88/0x140 [cxl_core]
>   devres_release_all+0x172/0x230
> ---
> 
> The "staged" state is established by cxl_region_attach_auto() and finalized
> by cxl_region_attach_position(). When that is finalized a memdev removal
> event will destroy regions before endpoint decoders. However, in the
> interim the memdev removal will falsely assume that the endpoint decoder is
> unattached. Later, the eventual region removal finds the stale pointer to
> the now freed endpoint decoder.

I'm wondering how this is exposed. What is 'eventual region removal'? 

The region driver does not clean up after failed auto assembly.
The cxl-cli cannot because topology is broken.

How did you get here?

> 
> Introduce CXL_DECODER_STATE_AUTO_STAGED and cxl_cancel_auto_attach() to
> cleanup this interim state.
> 
> Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
> Cc: <stable@vger.kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/cxl.h         |  6 +++--
>  drivers/cxl/core/region.c | 54 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 9b947286eb9b..30a31968f266 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -378,12 +378,14 @@ struct cxl_decoder {
>  };
>  
>  /*
> - * Track whether this decoder is reserved for region autodiscovery, or
> - * free for userspace provisioning.
> + * Track whether this decoder is free for userspace provisioning, reserved for
> + * region autodiscovery, whether it is started connecting (awaiting other
> + * peers), or has completed auto assembly.
>   */
>  enum cxl_decoder_state {
>  	CXL_DECODER_STATE_MANUAL,
>  	CXL_DECODER_STATE_AUTO,
> +	CXL_DECODER_STATE_AUTO_STAGED,
>  };
>  
>  /**
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f7b20f60ac5c..b72556c1458b 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1064,6 +1064,14 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
>  
>  	if (!cxld->region) {
>  		cxld->region = cxlr;
> +
> +		/*
> +		 * Now that cxld->region is set the intermediate staging state
> +		 * can be cleared.
> +		 */
> +		if (cxld == &cxled->cxld &&
> +		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
> +			cxled->state = CXL_DECODER_STATE_AUTO;
>  		get_device(&cxlr->dev);
>  	}
>  
> @@ -1805,6 +1813,7 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
>  	pos = p->nr_targets;
>  	p->targets[pos] = cxled;
>  	cxled->pos = pos;
> +	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;
>  	p->nr_targets++;
>  
>  	return 0;
> @@ -2154,6 +2163,47 @@ static int cxl_region_attach(struct cxl_region *cxlr,
>  	return 0;
>  }
>  
> +static int cxl_region_by_target(struct device *dev, const void *data)
> +{
> +	const struct cxl_endpoint_decoder *cxled = data;
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +
> +	if (!is_cxl_region(dev))
> +		return 0;
> +
> +	cxlr = to_cxl_region(dev);
> +	p = &cxlr->params;
> +	return p->targets[cxled->pos] == cxled;
> +}
> +
> +/*
> + * When an auto-region fails to assemble the decoder may be listed as a target,
> + * but not fully attached.
> + */
> +static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_region_params *p;
> +	struct cxl_region *cxlr;
> +	int pos = cxled->pos;
> +
> +	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
> +		return;
> +
> +	struct device *dev __free(put_device) = bus_find_device(
> +		&cxl_bus_type, NULL, cxled, cxl_region_by_target);
> +	if (!dev)
> +		return;
> +
> +	cxlr = to_cxl_region(dev);
> +	p = &cxlr->params;
> +
> +	p->nr_targets--;
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +	cxled->pos = -1;
> +	p->targets[pos] = NULL;
> +}
> +
>  static struct cxl_region *
>  __cxl_decoder_detach(struct cxl_region *cxlr,
>  		     struct cxl_endpoint_decoder *cxled, int pos,
> @@ -2177,8 +2227,10 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
>  		cxled = p->targets[pos];
>  	} else {
>  		cxlr = cxled->cxld.region;
> -		if (!cxlr)
> +		if (!cxlr) {
> +			cxl_cancel_auto_attach(cxled);
>  			return NULL;
> +		}
>  		p = &cxlr->params;
>  	}
>  
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27 19:20   ` Alison Schofield
@ 2026-03-27 21:54     ` Dan Williams
  2026-03-27 22:37       ` Alison Schofield
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Williams @ 2026-03-27 21:54 UTC (permalink / raw)
  To: Alison Schofield, Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa,
	stable, Jonathan Cameron

Alison Schofield wrote:
> On Thu, Mar 26, 2026 at 10:28:13PM -0700, Dan Williams wrote:
> > The following crash signature results from region destruction while an
> > endpoint decoder is staged, but not fully attached.
> > 
> > ---
> >  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
> >  Read of size 8 at addr ffff888265638840 by task modprobe/1287
> > 
> >  Call Trace:
> >   <TASK>
> >   dump_stack_lvl+0x68/0x90
> >   print_report+0x170/0x4e2
> >   kasan_report+0xc2/0x1a0
> >   __cxl_decoder_detach+0x724/0x830 [cxl_core]
> >   cxl_decoder_detach+0x6c/0x100 [cxl_core]
> >   unregister_region+0x88/0x140 [cxl_core]
> >   devres_release_all+0x172/0x230
> > ---
> > 
> > The "staged" state is established by cxl_region_attach_auto() and finalized
> > by cxl_region_attach_position(). When that is finalized a memdev removal
> > event will destroy regions before endpoint decoders. However, in the
> > interim the memdev removal will falsely assume that the endpoint decoder is
> > unattached. Later, the eventual region removal finds the stale pointer to
> > the now freed endpoint decoder.
> 
> I'm wondering how this is exposed. What is 'eventual region removal'? 
> 
> The region driver does not clean up after failed auto assembly.
> The cxl-cli cannot because topology is broken.
> 
> How did you get here?

tl;dr: "modprobe -r cxl_test"

When the cxl_acpi driver is removed the CXL Window root decoders are
destroyed along with any regions that were in the process of being
created.

If one of the region's to be cleaned up has a p->targets[] entry setup
by cxl_region_attach_auto(), but not finalized by
cxl_region_attach_position() then there is nothing to stop that @cxled
object from being freed.

The "modprobe -r cxl_test" event destroys all the memdevs. When the
memdev goes to free its decoders it sees that @cxled->cxld.region is not
yet set, assumes it is idle and frees it. Later, unregister_region()
sees the now freed @cxled in its p->targets[] list, tries to
de-reference it and boom.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27 21:54     ` Dan Williams
@ 2026-03-27 22:37       ` Alison Schofield
  0 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 22:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa,
	stable, Jonathan Cameron

On Fri, Mar 27, 2026 at 02:54:24PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Thu, Mar 26, 2026 at 10:28:13PM -0700, Dan Williams wrote:
> > > The following crash signature results from region destruction while an
> > > endpoint decoder is staged, but not fully attached.
> > > 
> > > ---
> > >  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
> > >  Read of size 8 at addr ffff888265638840 by task modprobe/1287
> > > 
> > >  Call Trace:
> > >   <TASK>
> > >   dump_stack_lvl+0x68/0x90
> > >   print_report+0x170/0x4e2
> > >   kasan_report+0xc2/0x1a0
> > >   __cxl_decoder_detach+0x724/0x830 [cxl_core]
> > >   cxl_decoder_detach+0x6c/0x100 [cxl_core]
> > >   unregister_region+0x88/0x140 [cxl_core]
> > >   devres_release_all+0x172/0x230
> > > ---
> > > 
> > > The "staged" state is established by cxl_region_attach_auto() and finalized
> > > by cxl_region_attach_position(). When that is finalized a memdev removal
> > > event will destroy regions before endpoint decoders. However, in the
> > > interim the memdev removal will falsely assume that the endpoint decoder is
> > > unattached. Later, the eventual region removal finds the stale pointer to
> > > the now freed endpoint decoder.
> > 
> > I'm wondering how this is exposed. What is 'eventual region removal'? 
> > 
> > The region driver does not clean up after failed auto assembly.
> > The cxl-cli cannot because topology is broken.
> > 
> > How did you get here?
> 
> tl;dr: "modprobe -r cxl_test"

That explains it. We did failure test outside cxl/test. No module removes.
I'm curious to see how this fix may help with the stranded broken
region cleanup from userspace. 

Thanks for the detail below too.
> 
> When the cxl_acpi driver is removed the CXL Window root decoders are
> destroyed along with any regions that were in the process of being
> created.
> 
> If one of the region's to be cleaned up has a p->targets[] entry setup
> by cxl_region_attach_auto(), but not finalized by
> cxl_region_attach_position() then there is nothing to stop that @cxled
> object from being freed.
> 
> The "modprobe -r cxl_test" event destroys all the memdevs. When the
> memdev goes to free its decoders it sees that @cxled->cxld.region is not
> yet set, assumes it is idle and frees it. Later, unregister_region()
> sees the now freed @cxled in its p->targets[] list, tries to
> de-reference it and boom.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (8 preceding siblings ...)
  2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
@ 2026-03-27 23:42 ` Alison Schofield
  2026-03-30 21:12 ` Koralahalli Channabasappa, Smita
  2026-03-31 21:57 ` Dave Jiang
  11 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:42 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa,
	Jonathan Cameron, stable

On Thu, Mar 26, 2026 at 10:28:12PM -0700, Dan Williams wrote:
> Given all the cross subsystem dependencies needed to make this solution
> work, it needs to have a unit test to keep it functional.
> 
> On the path to writing that, several fixes fell out, but not to Smita's
> code, to mine. One use-after-free has been there since the original
> automatic region assembly code.
> 
> Here is a preview of the core of the test I will submit to the cxl-cli project:
> 
> ---
> modprobe cxl_mock_mem && modprobe cxl_test hmem_test=1
> 
> dax=$(find_dax_cxl)
> [[ "$dax" == "" ]] && err $LINENO
> dax=$(find_dax_hmem)
> [[ "$dax" != "" ]] && err $LINENO
> 
> unload
> 
> modprobe cxl_mock_mem && modprobe cxl_test fail_autoassemble hmem_test=1
> 
> dax=$(find_dax_cxl)
> [[ "$dax" != "" ]] && err $LINENO
> dax=$(find_dax_hmem)
> [[ "$dax" == "" ]] && err $LINENO
> 
> unload
> ---

I tested the new cxl-test patches following your commands above, and
all good. I'll ask about the race condition in that specific patch.

Tested this set on my real hardware too, confirmed this with Smita's
still passes my hotplug test cases.

snip


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
  2026-03-27 16:28   ` Dave Jiang
  2026-03-27 19:20   ` Alison Schofield
@ 2026-03-27 23:43   ` Alison Schofield
  2026-03-30 20:24   ` Ira Weiny
  3 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa,
	stable, Jonathan Cameron

On Thu, Mar 26, 2026 at 10:28:13PM -0700, Dan Williams wrote:
> The following crash signature results from region destruction while an
> endpoint decoder is staged, but not fully attached.
> 
> ---
>  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
>  Read of size 8 at addr ffff888265638840 by task modprobe/1287
> 
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x68/0x90
>   print_report+0x170/0x4e2
>   kasan_report+0xc2/0x1a0
>   __cxl_decoder_detach+0x724/0x830 [cxl_core]
>   cxl_decoder_detach+0x6c/0x100 [cxl_core]
>   unregister_region+0x88/0x140 [cxl_core]
>   devres_release_all+0x172/0x230
> ---
> 
> The "staged" state is established by cxl_region_attach_auto() and finalized
> by cxl_region_attach_position(). When that is finalized a memdev removal
> event will destroy regions before endpoint decoders. However, in the
> interim the memdev removal will falsely assume that the endpoint decoder is
> unattached. Later, the eventual region removal finds the stale pointer to
> the now freed endpoint decoder.
> 
> Introduce CXL_DECODER_STATE_AUTO_STAGED and cxl_cancel_auto_attach() to
> cleanup this interim state.
> 
> Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
> Cc: <stable@vger.kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/9] dax/cxl: Fix HMEM dependencies
  2026-03-27  5:28 ` [PATCH 2/9] dax/cxl: Fix HMEM dependencies Dan Williams
  2026-03-27 16:29   ` Dave Jiang
@ 2026-03-27 23:44   ` Alison Schofield
  2026-03-30 21:10   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:44 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:14PM -0700, Dan Williams wrote:
> The expectation is that DEV_DAX_HMEM=y should be disallowed if any of
> CXL_ACPI, or CXL_PCI are set =m. Also DEV_DAX_CXL=y should be disallowed if
> DEV_DAX_HMEM=m. Use "$config || !$config" syntax for each dependency.
> Otherwise, the invalid DEV_DAX_HMEM=m && DEV_DAX_CXL=y configuration is
> allowed.
> 
> Lastly, dax_hmem depends on the availability of the
> cxl_region_contains_resource() symbol published by the cxl_core.ko module.
> So, also prevent DEV_DAX_HMEM from being built-in when the cxl_core module
> is not built-in.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource()
  2026-03-27  5:28 ` [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource() Dan Williams
  2026-03-27 16:39   ` Dave Jiang
@ 2026-03-27 23:45   ` Alison Schofield
  2026-03-30 22:19   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:15PM -0700, Dan Williams wrote:
> The dax_hmem dependency on cxl_region_contains_resource() is a one-off
> special case. It is not suitable for other use cases.
> 
> Move the definition to the other CONFIG_CXL_REGION guarded definitions in
> drivers/cxl/cxl.h and include that by a relative path include. This matches
> what drivers/dax/cxl.c does for its limited private usage of CXL core
> symbols.
> 
> Reduce the symbol export visibility from global to just dax_hmem, to
> further clarify its applicability.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains()
  2026-03-27  5:28 ` [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains() Dan Williams
  2026-03-27 16:40   ` Dave Jiang
@ 2026-03-27 23:45   ` Alison Schofield
  2026-03-30 22:22   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:16PM -0700, Dan Williams wrote:
> The call to cxl_region_resource_contains() in hmem_register_cxl_device()
> need not cast away 'const'. The problem is the usage of the
> bus_for_each_dev() API which does not mark its @data parameter as 'const'.
> Switch to bus_find_device() which does take 'const' @data, fixup
> cxl_region_resource_contains() and its caller.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols
  2026-03-27  5:28 ` [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols Dan Williams
  2026-03-27 16:46   ` Dave Jiang
@ 2026-03-27 23:46   ` Alison Schofield
  2026-03-30 22:26   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:17PM -0700, Dan Williams wrote:
> No other module or use case should be using dax_hmem_initial_probe or
> dax_hmem_flush_work(). Limit their use to dax_hmem, and dax_cxl
> respectively.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
  2026-03-27  5:28 ` [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices Dan Williams
  2026-03-27 17:06   ` Dave Jiang
@ 2026-03-27 23:46   ` Alison Schofield
  2026-03-31 17:32   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:18PM -0700, Dan Williams wrote:
> dax_hmem (ab)uses a platform device to allow for a module to autoload in
> the presence of "Soft Reserved" resources. The dax_hmem driver had no
> dependencies on the "hmem_platform" device being a singleton until the
> recent "dax_hmem vs dax_cxl" takeover solution.
> 
> Replace the layering violation of dax_hmem_work assuming that there will
> never be more than one "hmem_platform" device associated with a global work
> item with a dax_hmem local workqueue that can theoretically support any
> number of hmem_platform devices.
> 
> Fixup the reference counting to only pin the device while it is live in the
> queue.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 7/9] dax/hmem: Parent dax_hmem devices
  2026-03-27  5:28 ` [PATCH 7/9] dax/hmem: Parent dax_hmem devices Dan Williams
  2026-03-27 17:07   ` Dave Jiang
@ 2026-03-27 23:47   ` Alison Schofield
  2026-03-31 17:42   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:19PM -0700, Dan Williams wrote:
> For test purposes it is useful to be able to determine which
> "hmem_platform" device is hosting a given sub-device.
> 
> Register hmem devices underneath "hmem_platform".
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure
  2026-03-27  5:28 ` [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure Dan Williams
  2026-03-27 17:08   ` Dave Jiang
@ 2026-03-27 23:48   ` Alison Schofield
  2026-03-31 17:43   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:20PM -0700, Dan Williams wrote:
> Add a cxl_test module option to skip setting up one of the members of the
> default auto-assembled region.
> 
> This simulates a device failing between firmware setup and OS boot, or
> region configuration interrupted by an event like kexec.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Reviewed-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions
  2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
  2026-03-27 17:10   ` Dave Jiang
@ 2026-03-27 23:58   ` Alison Schofield
  2026-03-28  3:20     ` Dan Williams
  2026-03-31 17:57   ` Ira Weiny
  2026-03-31 18:13   ` Alison Schofield
  3 siblings, 1 reply; 46+ messages in thread
From: Alison Schofield @ 2026-03-27 23:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:21PM -0700, Dan Williams wrote:
> When platform firmware is committed to publishing EFI_CONVENTIONAL_MEMORY
> in the memory map, but CXL fails to assemble the region, dax_hmem can
> attempt to attach a dax device to the memory range.
> 
> Take advantage of the new ability to support multiple "hmem_platform"
> devices, and to enable regression testing of several scenarios:
> 
> * CXL correctly assembles a region, check dax_hmem fails to attach dax
> * CXL fails to assemble a region, check dax_hmem successfully attaches dax
> * Check that loading the dax_cxl driver loads the dax_hmem driver
> * Attempt to race cxl_mock_mem async probe vs dax_hmem probe flushing.
>   Check that both positive and negative cases.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

snip

> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index cb87e8c0e63c..cc847e9aeceb 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1695,6 +1695,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	struct cxl_dpa_info range_info = { 0 };
>  	int rc;
>  
> +	/* Increase async probe race window */
> +	usleep_range(500*1000, 1000*1000);
> +

I see your words in the commit log "Attempt to race..."
and this sleep looks like it is only widening the timing window,
not making the order deterministic. Given these tests are typically
single-pass tests, what ensure we actually hit the intended ordering?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions
  2026-03-27 23:58   ` Alison Schofield
@ 2026-03-28  3:20     ` Dan Williams
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Williams @ 2026-03-28  3:20 UTC (permalink / raw)
  To: Alison Schofield, Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

Alison Schofield wrote:
[..] 
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index cb87e8c0e63c..cc847e9aeceb 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -1695,6 +1695,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> >  	struct cxl_dpa_info range_info = { 0 };
> >  	int rc;
> >  
> > +	/* Increase async probe race window */
> > +	usleep_range(500*1000, 1000*1000);
> > +
> 
> I see your words in the commit log "Attempt to race..."
> and this sleep looks like it is only widening the timing window,
> not making the order deterministic. Given these tests are typically
> single-pass tests, what ensure we actually hit the intended ordering?

The determinism needed is "async probing will have started before
dax_hmem starts to probe" comes from the test loading cxl_mock_mem
before cxl_test, and making sure that dax_hmem is unloaded.
cxl_test_init() then registers all the mock_mem instances before
registering the mock hmem_platform device.

Now it seems to be enough to dax_hmem to hit the windows, but to make
sure the test would need to do dmesg analysis to make sure that
messages like this:

[   69.111420] mock_walk_hmem_resources: hmem_platform hmem_platform.1: walk cxl_test resource: [mem 0xf010000000-0xf02fffffff flags 0x200]
[   69.113990] hmem_register_device: hmem_platform hmem_platform.1: await CXL initial probe: [mem 0xf010000000-0xf02fffffff flags 0x200]

...are emitted while the typical CXL startup messages are still firing.

On my runs I indeed typically see a couple seconds of CXL init messages
before these fire:

[   71.514222] mock_walk_hmem_resources: hmem_platform hmem_platform.1: walk cxl_test resource: [mem 0xf010000000-0xf02fffffff flags 0x200]
[   71.516478] hmem_register_cxl_device: hmem_platform hmem_platform.1: CXL claims resource, dropping: [mem 0xf010000000-0xf02fffffff flags 0x200]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure
  2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
                     ` (2 preceding siblings ...)
  2026-03-27 23:43   ` Alison Schofield
@ 2026-03-30 20:24   ` Ira Weiny
  3 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-30 20:24 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, stable, Jonathan Cameron

Dan Williams wrote:
> The following crash signature results from region destruction while an
> endpoint decoder is staged, but not fully attached.
> 
> ---

NIT: When I applied this series to check it out this '---' incorrectly
trimmed the commit message.  Dave should be able to fix that.

So with that fixed:

Reviewed-by: Ira Weiny <ira.weiny@intel.com>


>  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830 [cxl_core]
>  Read of size 8 at addr ffff888265638840 by task modprobe/1287
> 
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x68/0x90
>   print_report+0x170/0x4e2
>   kasan_report+0xc2/0x1a0
>   __cxl_decoder_detach+0x724/0x830 [cxl_core]
>   cxl_decoder_detach+0x6c/0x100 [cxl_core]
>   unregister_region+0x88/0x140 [cxl_core]
>   devres_release_all+0x172/0x230
> ---
> 
> The "staged" state is established by cxl_region_attach_auto() and finalized
> by cxl_region_attach_position(). When that is finalized a memdev removal
> event will destroy regions before endpoint decoders. However, in the
> interim the memdev removal will falsely assume that the endpoint decoder is
> unattached. Later, the eventual region removal finds the stale pointer to
> the now freed endpoint decoder.
> 
> Introduce CXL_DECODER_STATE_AUTO_STAGED and cxl_cancel_auto_attach() to
> cleanup this interim state.
> 
> Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
> Cc: <stable@vger.kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/9] dax/cxl: Fix HMEM dependencies
  2026-03-27  5:28 ` [PATCH 2/9] dax/cxl: Fix HMEM dependencies Dan Williams
  2026-03-27 16:29   ` Dave Jiang
  2026-03-27 23:44   ` Alison Schofield
@ 2026-03-30 21:10   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-30 21:10 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> The expectation is that DEV_DAX_HMEM=y should be disallowed if any of
> CXL_ACPI, or CXL_PCI are set =m. Also DEV_DAX_CXL=y should be disallowed if
> DEV_DAX_HMEM=m. Use "$config || !$config" syntax for each dependency.
> Otherwise, the invalid DEV_DAX_HMEM=m && DEV_DAX_CXL=y configuration is
> allowed.
> 
> Lastly, dax_hmem depends on the availability of the
> cxl_region_contains_resource() symbol published by the cxl_core.ko module.
> So, also prevent DEV_DAX_HMEM from being built-in when the cxl_core module
> is not built-in.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (9 preceding siblings ...)
  2026-03-27 23:42 ` [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Alison Schofield
@ 2026-03-30 21:12 ` Koralahalli Channabasappa, Smita
  2026-03-30 21:17   ` Dave Jiang
  2026-03-31 21:57 ` Dave Jiang
  11 siblings, 1 reply; 46+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-30 21:12 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, Jonathan Cameron, stable

On 3/26/2026 10:28 PM, Dan Williams wrote:
> Given all the cross subsystem dependencies needed to make this solution
> work, it needs to have a unit test to keep it functional.
> 
> On the path to writing that, several fixes fell out, but not to Smita's
> code, to mine. One use-after-free has been there since the original
> automatic region assembly code.
> 
> Here is a preview of the core of the test I will submit to the cxl-cli project:
> 
> ---
> modprobe cxl_mock_mem && modprobe cxl_test hmem_test=1
> 
> dax=$(find_dax_cxl)
> [[ "$dax" == "" ]] && err $LINENO
> dax=$(find_dax_hmem)
> [[ "$dax" != "" ]] && err $LINENO
> 
> unload
> 
> modprobe cxl_mock_mem && modprobe cxl_test fail_autoassemble hmem_test=1
> 
> dax=$(find_dax_cxl)
> [[ "$dax" != "" ]] && err $LINENO
> dax=$(find_dax_hmem)
> [[ "$dax" == "" ]] && err $LINENO
> 
> unload
> ---
> 
> This builds on Smita's series [1] pushed out to for-7.1/dax-hmem in
> cxl.git [2].
> 
> [1]: http://lore.kernel.org/20260322195343.206900-1-Smita.KoralahalliChannabasappa@amd.com
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-7.1/dax-hmem
> 
> Dan Williams (9):
>    cxl/region: Fix use-after-free from auto assembly failure
>    dax/cxl: Fix HMEM dependencies
>    cxl/region: Limit visibility of cxl_region_contains_resource()
>    cxl/region: Constify cxl_region_resource_contains()
>    dax/hmem: Reduce visibility of dax_cxl coordination symbols
>    dax/hmem: Fix singleton confusion between dax_hmem_work and hmem
>      devices
>    dax/hmem: Parent dax_hmem devices
>    tools/testing/cxl: Simulate auto-assembly failure
>    tools/testing/cxl: Test dax_hmem takeover of CXL regions
> 
>   drivers/dax/Kconfig                |   6 +-
>   drivers/cxl/cxl.h                  |  11 ++-
>   drivers/dax/bus.h                  |  15 +++-
>   include/cxl/cxl.h                  |  15 ----
>   tools/testing/cxl/test/mock.h      |   8 ++
>   drivers/cxl/core/region.c          |  68 +++++++++++++++--
>   drivers/dax/hmem/device.c          |  28 ++++---
>   drivers/dax/hmem/hmem.c            | 115 +++++++++++++++--------------
>   tools/testing/cxl/test/cxl.c       |  66 +++++++++++++++++
>   tools/testing/cxl/test/hmem_test.c |  47 ++++++++++++
>   tools/testing/cxl/test/mem.c       |   3 +
>   tools/testing/cxl/test/mock.c      |  50 +++++++++++++
>   tools/testing/cxl/Kbuild           |   7 ++
>   tools/testing/cxl/test/Kbuild      |   1 +
>   14 files changed, 344 insertions(+), 96 deletions(-)
>   delete mode 100644 include/cxl/cxl.h
>   create mode 100644 tools/testing/cxl/test/hmem_test.c
> 
> 
> base-commit: 51d2fa02c0e4b3b23c4484f2af9b6d65c35471e8

I tested this series. Its working as expected for me. Thanks for the 
incremental.

Thanks
Smita


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability
  2026-03-30 21:12 ` Koralahalli Channabasappa, Smita
@ 2026-03-30 21:17   ` Dave Jiang
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-30 21:17 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita, Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, Jonathan Cameron, stable



On 3/30/26 2:12 PM, Koralahalli Channabasappa, Smita wrote:
> On 3/26/2026 10:28 PM, Dan Williams wrote:
>> Given all the cross subsystem dependencies needed to make this solution
>> work, it needs to have a unit test to keep it functional.
>>
>> On the path to writing that, several fixes fell out, but not to Smita's
>> code, to mine. One use-after-free has been there since the original
>> automatic region assembly code.
>>
>> Here is a preview of the core of the test I will submit to the cxl-cli project:
>>
>> ---
>> modprobe cxl_mock_mem && modprobe cxl_test hmem_test=1
>>
>> dax=$(find_dax_cxl)
>> [[ "$dax" == "" ]] && err $LINENO
>> dax=$(find_dax_hmem)
>> [[ "$dax" != "" ]] && err $LINENO
>>
>> unload
>>
>> modprobe cxl_mock_mem && modprobe cxl_test fail_autoassemble hmem_test=1
>>
>> dax=$(find_dax_cxl)
>> [[ "$dax" != "" ]] && err $LINENO
>> dax=$(find_dax_hmem)
>> [[ "$dax" == "" ]] && err $LINENO
>>
>> unload
>> ---
>>
>> This builds on Smita's series [1] pushed out to for-7.1/dax-hmem in
>> cxl.git [2].
>>
>> [1]: http://lore.kernel.org/20260322195343.206900-1-Smita.KoralahalliChannabasappa@amd.com
>> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-7.1/dax-hmem
>>
>> Dan Williams (9):
>>    cxl/region: Fix use-after-free from auto assembly failure
>>    dax/cxl: Fix HMEM dependencies
>>    cxl/region: Limit visibility of cxl_region_contains_resource()
>>    cxl/region: Constify cxl_region_resource_contains()
>>    dax/hmem: Reduce visibility of dax_cxl coordination symbols
>>    dax/hmem: Fix singleton confusion between dax_hmem_work and hmem
>>      devices
>>    dax/hmem: Parent dax_hmem devices
>>    tools/testing/cxl: Simulate auto-assembly failure
>>    tools/testing/cxl: Test dax_hmem takeover of CXL regions
>>
>>   drivers/dax/Kconfig                |   6 +-
>>   drivers/cxl/cxl.h                  |  11 ++-
>>   drivers/dax/bus.h                  |  15 +++-
>>   include/cxl/cxl.h                  |  15 ----
>>   tools/testing/cxl/test/mock.h      |   8 ++
>>   drivers/cxl/core/region.c          |  68 +++++++++++++++--
>>   drivers/dax/hmem/device.c          |  28 ++++---
>>   drivers/dax/hmem/hmem.c            | 115 +++++++++++++++--------------
>>   tools/testing/cxl/test/cxl.c       |  66 +++++++++++++++++
>>   tools/testing/cxl/test/hmem_test.c |  47 ++++++++++++
>>   tools/testing/cxl/test/mem.c       |   3 +
>>   tools/testing/cxl/test/mock.c      |  50 +++++++++++++
>>   tools/testing/cxl/Kbuild           |   7 ++
>>   tools/testing/cxl/test/Kbuild      |   1 +
>>   14 files changed, 344 insertions(+), 96 deletions(-)
>>   delete mode 100644 include/cxl/cxl.h
>>   create mode 100644 tools/testing/cxl/test/hmem_test.c
>>
>>
>> base-commit: 51d2fa02c0e4b3b23c4484f2af9b6d65c35471e8
> 
> I tested this series. Its working as expected for me. Thanks for the incremental.

Hi Smita. Can you provide a tested-by tag pls?

> 
> Thanks
> Smita
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource()
  2026-03-27  5:28 ` [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource() Dan Williams
  2026-03-27 16:39   ` Dave Jiang
  2026-03-27 23:45   ` Alison Schofield
@ 2026-03-30 22:19   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-30 22:19 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> The dax_hmem dependency on cxl_region_contains_resource() is a one-off
> special case. It is not suitable for other use cases.
> 
> Move the definition to the other CONFIG_CXL_REGION guarded definitions in
> drivers/cxl/cxl.h and include that by a relative path include. This matches
> what drivers/dax/cxl.c does for its limited private usage of CXL core
> symbols.
> 
> Reduce the symbol export visibility from global to just dax_hmem, to
> further clarify its applicability.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains()
  2026-03-27  5:28 ` [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains() Dan Williams
  2026-03-27 16:40   ` Dave Jiang
  2026-03-27 23:45   ` Alison Schofield
@ 2026-03-30 22:22   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-30 22:22 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> The call to cxl_region_resource_contains() in hmem_register_cxl_device()
> need not cast away 'const'. The problem is the usage of the
> bus_for_each_dev() API which does not mark its @data parameter as 'const'.
> Switch to bus_find_device() which does take 'const' @data, fixup
> cxl_region_resource_contains() and its caller.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols
  2026-03-27  5:28 ` [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols Dan Williams
  2026-03-27 16:46   ` Dave Jiang
  2026-03-27 23:46   ` Alison Schofield
@ 2026-03-30 22:26   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-30 22:26 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> No other module or use case should be using dax_hmem_initial_probe or
> dax_hmem_flush_work(). Limit their use to dax_hmem, and dax_cxl
> respectively.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
  2026-03-27  5:28 ` [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices Dan Williams
  2026-03-27 17:06   ` Dave Jiang
  2026-03-27 23:46   ` Alison Schofield
@ 2026-03-31 17:32   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-31 17:32 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> dax_hmem (ab)uses a platform device to allow for a module to autoload in
> the presence of "Soft Reserved" resources. The dax_hmem driver had no
> dependencies on the "hmem_platform" device being a singleton until the
> recent "dax_hmem vs dax_cxl" takeover solution.
> 
> Replace the layering violation of dax_hmem_work assuming that there will
> never be more than one "hmem_platform" device associated with a global work
> item with a dax_hmem local workqueue that can theoretically support any
> number of hmem_platform devices.
> 
> Fixup the reference counting to only pin the device while it is live in the
> queue.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 7/9] dax/hmem: Parent dax_hmem devices
  2026-03-27  5:28 ` [PATCH 7/9] dax/hmem: Parent dax_hmem devices Dan Williams
  2026-03-27 17:07   ` Dave Jiang
  2026-03-27 23:47   ` Alison Schofield
@ 2026-03-31 17:42   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-31 17:42 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> For test purposes it is useful to be able to determine which
> "hmem_platform" device is hosting a given sub-device.
> 
> Register hmem devices underneath "hmem_platform".
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure
  2026-03-27  5:28 ` [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure Dan Williams
  2026-03-27 17:08   ` Dave Jiang
  2026-03-27 23:48   ` Alison Schofield
@ 2026-03-31 17:43   ` Ira Weiny
  2 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-31 17:43 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> Add a cxl_test module option to skip setting up one of the members of the
> default auto-assembled region.
> 
> This simulates a device failing between firmware setup and OS boot, or
> region configuration interrupted by an event like kexec.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions
  2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
  2026-03-27 17:10   ` Dave Jiang
  2026-03-27 23:58   ` Alison Schofield
@ 2026-03-31 17:57   ` Ira Weiny
  2026-03-31 18:13   ` Alison Schofield
  3 siblings, 0 replies; 46+ messages in thread
From: Ira Weiny @ 2026-03-31 17:57 UTC (permalink / raw)
  To: Dan Williams, dave.jiang
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa

Dan Williams wrote:
> When platform firmware is committed to publishing EFI_CONVENTIONAL_MEMORY
> in the memory map, but CXL fails to assemble the region, dax_hmem can
> attempt to attach a dax device to the memory range.
> 
> Take advantage of the new ability to support multiple "hmem_platform"
> devices, and to enable regression testing of several scenarios:
> 
> * CXL correctly assembles a region, check dax_hmem fails to attach dax
> * CXL fails to assemble a region, check dax_hmem successfully attaches dax
> * Check that loading the dax_cxl driver loads the dax_hmem driver
> * Attempt to race cxl_mock_mem async probe vs dax_hmem probe flushing.
>   Check that both positive and negative cases.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions
  2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
                     ` (2 preceding siblings ...)
  2026-03-31 17:57   ` Ira Weiny
@ 2026-03-31 18:13   ` Alison Schofield
  3 siblings, 0 replies; 46+ messages in thread
From: Alison Schofield @ 2026-03-31 18:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: dave.jiang, patches, linux-cxl, Smita.KoralahalliChannabasappa

On Thu, Mar 26, 2026 at 10:28:21PM -0700, Dan Williams wrote:
> When platform firmware is committed to publishing EFI_CONVENTIONAL_MEMORY
> in the memory map, but CXL fails to assemble the region, dax_hmem can
> attempt to attach a dax device to the memory range.
> 
> Take advantage of the new ability to support multiple "hmem_platform"
> devices, and to enable regression testing of several scenarios:
> 
> * CXL correctly assembles a region, check dax_hmem fails to attach dax
> * CXL fails to assemble a region, check dax_hmem successfully attaches dax
> * Check that loading the dax_cxl driver loads the dax_hmem driver
> * Attempt to race cxl_mock_mem async probe vs dax_hmem probe flushing.
>   Check that both positive and negative cases.

Tested-by: Alison Schofield <alison.schofield@intel.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability
  2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
                   ` (10 preceding siblings ...)
  2026-03-30 21:12 ` Koralahalli Channabasappa, Smita
@ 2026-03-31 21:57 ` Dave Jiang
  11 siblings, 0 replies; 46+ messages in thread
From: Dave Jiang @ 2026-03-31 21:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: patches, linux-cxl, alison.schofield,
	Smita.KoralahalliChannabasappa, Jonathan Cameron, stable



On 3/26/26 10:28 PM, Dan Williams wrote:
> Given all the cross subsystem dependencies needed to make this solution
> work, it needs to have a unit test to keep it functional.
> 
> On the path to writing that, several fixes fell out, but not to Smita's
> code, to mine. One use-after-free has been there since the original
> automatic region assembly code.
> 
> Here is a preview of the core of the test I will submit to the cxl-cli project:
> 
> ---
> modprobe cxl_mock_mem && modprobe cxl_test hmem_test=1
> 
> dax=$(find_dax_cxl)
> [[ "$dax" == "" ]] && err $LINENO
> dax=$(find_dax_hmem)
> [[ "$dax" != "" ]] && err $LINENO
> 
> unload
> 
> modprobe cxl_mock_mem && modprobe cxl_test fail_autoassemble hmem_test=1
> 
> dax=$(find_dax_cxl)
> [[ "$dax" != "" ]] && err $LINENO
> dax=$(find_dax_hmem)
> [[ "$dax" == "" ]] && err $LINENO
> 
> unload
> ---
> 
> This builds on Smita's series [1] pushed out to for-7.1/dax-hmem in
> cxl.git [2].
> 
> [1]: http://lore.kernel.org/20260322195343.206900-1-Smita.KoralahalliChannabasappa@amd.com
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-7.1/dax-hmem
> 
> Dan Williams (9):
>   cxl/region: Fix use-after-free from auto assembly failure
>   dax/cxl: Fix HMEM dependencies
>   cxl/region: Limit visibility of cxl_region_contains_resource()
>   cxl/region: Constify cxl_region_resource_contains()
>   dax/hmem: Reduce visibility of dax_cxl coordination symbols
>   dax/hmem: Fix singleton confusion between dax_hmem_work and hmem
>     devices
>   dax/hmem: Parent dax_hmem devices
>   tools/testing/cxl: Simulate auto-assembly failure
>   tools/testing/cxl: Test dax_hmem takeover of CXL regions
> 
>  drivers/dax/Kconfig                |   6 +-
>  drivers/cxl/cxl.h                  |  11 ++-
>  drivers/dax/bus.h                  |  15 +++-
>  include/cxl/cxl.h                  |  15 ----
>  tools/testing/cxl/test/mock.h      |   8 ++
>  drivers/cxl/core/region.c          |  68 +++++++++++++++--
>  drivers/dax/hmem/device.c          |  28 ++++---
>  drivers/dax/hmem/hmem.c            | 115 +++++++++++++++--------------
>  tools/testing/cxl/test/cxl.c       |  66 +++++++++++++++++
>  tools/testing/cxl/test/hmem_test.c |  47 ++++++++++++
>  tools/testing/cxl/test/mem.c       |   3 +
>  tools/testing/cxl/test/mock.c      |  50 +++++++++++++
>  tools/testing/cxl/Kbuild           |   7 ++
>  tools/testing/cxl/test/Kbuild      |   1 +
>  14 files changed, 344 insertions(+), 96 deletions(-)
>  delete mode 100644 include/cxl/cxl.h
>  create mode 100644 tools/testing/cxl/test/hmem_test.c
> 
> 
> base-commit: 51d2fa02c0e4b3b23c4484f2af9b6d65c35471e8

Applied to cxl/next
fe5dfc24e003 tools/testing/cxl: Test dax_hmem takeover of CXL regions
de121c377f88 tools/testing/cxl: Simulate auto-assembly failure
a515eb335f51 dax/hmem: Parent dax_hmem devices
841e96c053f1 dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
6c96c76597cc dax/hmem: Reduce visibility of dax_cxl coordination symbols
069a54fd21e8 cxl/region: Constify cxl_region_resource_contains()
b95c14f0dc79 cxl/region: Limit visibility of cxl_region_contains_resource()
6c7077d5ca81 dax/cxl: Fix HMEM dependencies
16413cc33cfd cxl/region: Fix use-after-free from auto assembly failure


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2026-03-31 21:57 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27  5:28 [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Dan Williams
2026-03-27  5:28 ` [PATCH 1/9] cxl/region: Fix use-after-free from auto assembly failure Dan Williams
2026-03-27 16:28   ` Dave Jiang
2026-03-27 19:20   ` Alison Schofield
2026-03-27 21:54     ` Dan Williams
2026-03-27 22:37       ` Alison Schofield
2026-03-27 23:43   ` Alison Schofield
2026-03-30 20:24   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 2/9] dax/cxl: Fix HMEM dependencies Dan Williams
2026-03-27 16:29   ` Dave Jiang
2026-03-27 23:44   ` Alison Schofield
2026-03-30 21:10   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 3/9] cxl/region: Limit visibility of cxl_region_contains_resource() Dan Williams
2026-03-27 16:39   ` Dave Jiang
2026-03-27 23:45   ` Alison Schofield
2026-03-30 22:19   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 4/9] cxl/region: Constify cxl_region_resource_contains() Dan Williams
2026-03-27 16:40   ` Dave Jiang
2026-03-27 23:45   ` Alison Schofield
2026-03-30 22:22   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 5/9] dax/hmem: Reduce visibility of dax_cxl coordination symbols Dan Williams
2026-03-27 16:46   ` Dave Jiang
2026-03-27 23:46   ` Alison Schofield
2026-03-30 22:26   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 6/9] dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices Dan Williams
2026-03-27 17:06   ` Dave Jiang
2026-03-27 23:46   ` Alison Schofield
2026-03-31 17:32   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 7/9] dax/hmem: Parent dax_hmem devices Dan Williams
2026-03-27 17:07   ` Dave Jiang
2026-03-27 23:47   ` Alison Schofield
2026-03-31 17:42   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 8/9] tools/testing/cxl: Simulate auto-assembly failure Dan Williams
2026-03-27 17:08   ` Dave Jiang
2026-03-27 23:48   ` Alison Schofield
2026-03-31 17:43   ` Ira Weiny
2026-03-27  5:28 ` [PATCH 9/9] tools/testing/cxl: Test dax_hmem takeover of CXL regions Dan Williams
2026-03-27 17:10   ` Dave Jiang
2026-03-27 23:58   ` Alison Schofield
2026-03-28  3:20     ` Dan Williams
2026-03-31 17:57   ` Ira Weiny
2026-03-31 18:13   ` Alison Schofield
2026-03-27 23:42 ` [PATCH 0/9] dax/hmem: Add tests for the dax_hmem takeover capability Alison Schofield
2026-03-30 21:12 ` Koralahalli Channabasappa, Smita
2026-03-30 21:17   ` Dave Jiang
2026-03-31 21:57 ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox