public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
@ 2026-03-19  1:14 Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

This series aims to address long-standing conflicts between HMEM and
CXL when handling Soft Reserved memory ranges.

Reworked from Dan's patch:
https://lore.kernel.org/all/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/

Previous work:
https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/

Link to v6:
https://lore.kernel.org/all/20260210064501.157591-1-Smita.KoralahalliChannabasappa@amd.com/

The series is based on Linux 7.0-rc4 and base-commit is
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c

[1] After offlining the memory I can tear down the regions and recreate
them back. dax_cxl creates dax devices and onlines memory.
850000000-284fffffff : CXL Window 0
  850000000-284fffffff : region0
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)

[2] With CONFIG_CXL_REGION disabled, all the resources are handled by
HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
and dax devices are created from HMEM.
850000000-284fffffff : CXL Window 0
  850000000-284fffffff : Soft Reserved
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)

[3] Region assembly failure: Soft Reserved range shows up in /proc/iomem
and dax devices are handled by HMEM.
850000000-284fffffff : Soft Reserved
  850000000-284fffffff : CXL Window 0
    850000000-284fffffff : region0
      850000000-284fffffff : dax6.0
        850000000-284fffffff : System RAM (kmem)

[4] REGISTER path:
The results are as expected with both CXL_BUS = y and CXL_BUS = m.
To validate the REGISTER path, I forced REGISTER even in cases where SR
completely overlaps the CXL region as I did not have access to a system
where the CXL region range is smaller than the SR range.

850000000-284fffffff : Soft Reserved
  850000000-284fffffff : CXL Window 0
    850000000-280fffffff : region0
      850000000-284fffffff : dax6.0
        850000000-284fffffff : System RAM (kmem)

kreview complained on the deadlock for taking pdev->dev.mutex before
wait_for_device_probe(). Hence, I moved it.

From kreview:
The guard(device) takes pdev->dev.mutex and holds it across
wait_for_device_probe(). If any probe function in the system tries to
access this device (directly or indirectly), it would need the same
mutex:

process_defer_work()
  guard(device)(&pdev->dev)     <- Takes pdev->dev.mutex
  wait_for_device_probe()       <- Waits for all probes globally
    wait_event(probe_count == 0)

Meanwhile, if another driver's probe:

  some_driver_probe()
    device_lock(&pdev->dev)     <- Blocks waiting for mutex

The probe can't complete while waiting for the mutex, and
wait_for_device_probe() won't return while the probe is pending..

v7 updates:
- Added Reviewed-by tags.
- co-developed-by -> Suggested-by for Patch 4.
- Dropped "cxl/region: Skip decoder reset for auto-discovered regions"
  patch.
- cxl_region_contains_soft_reserve() -> cxl_region_contains_resource()
- Dropped scoped_guard around request_resource() and release_resource().
- Dropped patch 7. All deferred work infrastructure moved from bus.c into
  hmem.c
- Dropped enum dax_cxl_mode (DEFER/REGISTER/DROP) and replaced with bool
  dax_hmem_initial_probe in device.c (built-in, survives module reload).
- Changed from all-or-nothing to per-range ownership decisions. Each range
  decided individually — CXL keeps what it covers, HMEM gets the rest.
- Replaces single pass walk instead of 2 passes to exercise per range
  ownership.
- Moved wait_for_device_probe() before guard(device) to avoid lockdep
  warning (kreview, Gregory).
- Added guard(device) + driver bound check.
- Added get_device()/put_device() for pdev refcount.
- Added flush_work() in dax_hmem_exit() to ensure work completes before
  module unload.
- dax_hmem_flush_work() exported from dax_hmem.ko — symbol dependency
  forces dax_hmem to load before dax_cxl (Dan requirement 2).
- Added static inline no-op stub in bus.h for CONFIG_DEV_DAX_HMEM = n.
- Added work_pending() check (Dan requirement 3).
- pdev and work_struct initialized together on first probe, making
  singleton nature explicit. static struct and INIT_WORK once.
- Reverted back to container_of() in work function instead of global
  variables.
- No kill_defer_work() with the struct being static.

v6 updates:
- Patch 1-3 no changes.
- New Patches 4-5.
- (void *)res -> res.
- cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
- New file include/cxl/cxl.h
- Introduced singleton workqueue.
- hmem to queue the work and cxl to flush.
- cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
- Included descriptions for dax_cxl_mode.
- kzalloc -> kmalloc in add_soft_reserve_into_iomem()
- dax_cxl_mode is exported to CXL.
- Introduced hmem_register_cxl_device() for walking only CXL
  intersected SR ranges the second time.

v5 updates:
- Patch 1 dropped as its been merged for-7.0/cxl-init.
- Added Reviewed-by tags.
- Shared dax_cxl_mode between dax/cxl.c and dax/hmem.c and used
  -EPROBE_DEFER to defer dax_cxl.
- CXL_REGION_F_AUTO check for resetting decoders.
- Teardown all CXL regions if any one CXL region doesn't fully contain
  the Soft Reserved range.
- Added helper cxl_region_contains_sr() to determine Soft Reserved
  ownership.
- bus_rescan_devices() to retry dax_cxl.
- Added guard(rwsem_read)(&cxl_rwsem.region).

v4 updates:
- No changes patches 1-3.
- New patches 4-7.
- handle_deferred_cxl() has been enhanced to handle case where CXL
  regions do not contiguously and fully cover Soft Reserved ranges.
- Support added to defer cxl_dax registration.
- Support added to teardown cxl regions.

v3 updates:
- Fixed two "From".

v2 updates:
- Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem
  depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian)
- Added TODO note. (Zhijian)
- Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE
  conditional check. (Zhijian)
- insert_resource_late() -> insert_resource_expand_to_fit() and
  __insert_resource_expand_to_fit() replacement. (Boris)
- Fixed Co-developed and Signed-off by. (Dan)
- Combined 2/6 and 3/6 into a single patch. (Zhijian).
- Skip local variable in remove_soft_reserved. (Jonathan)
- Drop kfree with __free(). (Jonathan)
- return 0 -> return dev_add_action_or_reset(host...) (Jonathan)
- Dropped 6/6.
- Reviewed-by tags (Dave, Jonathan)

Dan Williams (3):
  dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
    ranges
  dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
  dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding

Smita Koralahalli (4):
  dax: Track all dax_region allocations under a global resource tree
  cxl/region: Add helper to check Soft Reserved containment by CXL
    regions
  dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree

 drivers/cxl/core/region.c |  30 ++++++++++
 drivers/dax/Kconfig       |   2 +
 drivers/dax/Makefile      |   3 +-
 drivers/dax/bus.c         |  20 ++++++-
 drivers/dax/bus.h         |   7 +++
 drivers/dax/cxl.c         |  28 ++++++++-
 drivers/dax/hmem/device.c |   3 +
 drivers/dax/hmem/hmem.c   | 117 ++++++++++++++++++++++++++++++++++----
 include/cxl/cxl.h         |  15 +++++
 9 files changed, 208 insertions(+), 17 deletions(-)
 create mode 100644 include/cxl/cxl.h

base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
-- 
2.17.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
@ 2026-03-19  1:14 ` Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 2/7] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

From: Dan Williams <dan.j.williams@intel.com>

Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.

Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.

Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.

Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/dax/Kconfig     |  2 ++
 drivers/dax/hmem/hmem.c | 17 ++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d656e4c0eb84..3683bb3f2311 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -48,6 +48,8 @@ config DEV_DAX_CXL
 	tristate "CXL DAX: direct access to CXL RAM regions"
 	depends on CXL_BUS && CXL_REGION && DEV_DAX
 	default CXL_REGION && DEV_DAX
+	depends on CXL_ACPI >= DEV_DAX_HMEM
+	depends on CXL_PCI >= DEV_DAX_HMEM
 	help
 	  CXL RAM regions are either mapped by platform-firmware
 	  and published in the initial system-memory map as "System RAM", mapped
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1cf7c2a0ee1c..008172fc3607 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -139,6 +139,16 @@ static __init int dax_hmem_init(void)
 {
 	int rc;
 
+	/*
+	 * Ensure that cxl_acpi and cxl_pci have a chance to kick off
+	 * CXL topology discovery at least once before scanning the
+	 * iomem resource tree for IORES_DESC_CXL resources.
+	 */
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL)) {
+		request_module("cxl_acpi");
+		request_module("cxl_pci");
+	}
+
 	rc = platform_driver_register(&dax_hmem_platform_driver);
 	if (rc)
 		return rc;
@@ -159,13 +169,6 @@ static __exit void dax_hmem_exit(void)
 module_init(dax_hmem_init);
 module_exit(dax_hmem_exit);
 
-/* Allow for CXL to define its own dax regions */
-#if IS_ENABLED(CONFIG_CXL_REGION)
-#if IS_MODULE(CONFIG_CXL_ACPI)
-MODULE_SOFTDEP("pre: cxl_acpi");
-#endif
-#endif
-
 MODULE_ALIAS("platform:hmem*");
 MODULE_ALIAS("platform:hmem_platform*");
 MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 2/7] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
@ 2026-03-19  1:14 ` Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

From: Dan Williams <dan.j.williams@intel.com>

Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that HMEM only defers Soft Reserved ranges when CXL DAX support is
enabled. This makes the coordination between HMEM and the CXL stack more
precise and prevents deferral in unrelated CXL configurations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/dax/hmem/hmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 008172fc3607..1e3424358490 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -66,7 +66,7 @@ static int hmem_register_device(struct device *host, int target_nid,
 	long id;
 	int rc;
 
-	if (IS_ENABLED(CONFIG_CXL_REGION) &&
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
 		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 2/7] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
@ 2026-03-19  1:14 ` Smita Koralahalli
  2026-03-19  5:48   ` Alison Schofield
  2026-03-19  1:14 ` [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

From: Dan Williams <dan.j.williams@intel.com>

Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
dax_cxl.

In addition, defer registration of the dax_cxl driver to a workqueue
instead of using module_cxl_driver(). This ensures that dax_hmem has
an opportunity to initialize and register its deferred callback and make
ownership decisions before dax_cxl begins probing and claiming Soft
Reserved ranges.

Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
out of line from other synchronous probing avoiding ordering
dependencies while coordinating ownership decisions with dax_hmem.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/dax/Makefile |  3 +--
 drivers/dax/cxl.c    | 27 ++++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 5ed5c39857c8..70e996bf1526 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-y += hmem/
 obj-$(CONFIG_DAX) += dax.o
 obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
@@ -10,5 +11,3 @@ dax-y += bus.o
 device_dax-y := device.o
 dax_pmem-y := pmem.o
 dax_cxl-y := cxl.o
-
-obj-y += hmem/
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 13cd94d32ff7..a2136adfa186 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
 	.id = CXL_DEVICE_DAX_REGION,
 	.drv = {
 		.suppress_bind_attrs = true,
+		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
 };
 
-module_cxl_driver(cxl_dax_region_driver);
+static void cxl_dax_region_driver_register(struct work_struct *work)
+{
+	cxl_driver_register(&cxl_dax_region_driver);
+}
+
+static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
+
+static int __init cxl_dax_region_init(void)
+{
+	/*
+	 * Need to resolve a race with dax_hmem wanting to drive regions
+	 * instead of CXL
+	 */
+	queue_work(system_long_wq, &cxl_dax_region_driver_work);
+	return 0;
+}
+module_init(cxl_dax_region_init);
+
+static void __exit cxl_dax_region_exit(void)
+{
+	flush_work(&cxl_dax_region_driver_work);
+	cxl_driver_unregister(&cxl_dax_region_driver);
+}
+module_exit(cxl_dax_region_exit);
+
 MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
 MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
 MODULE_LICENSE("GPL");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (2 preceding siblings ...)
  2026-03-19  1:14 ` [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
@ 2026-03-19  1:14 ` Smita Koralahalli
  2026-03-19 13:59   ` Jonathan Cameron
  2026-03-19  1:14 ` [PATCH v7 5/7] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Introduce a global "DAX Regions" resource root and register each
dax_region->res under it via request_resource(). Release the resource on
dax_region teardown.

By enforcing a single global namespace for dax_region allocations, this
ensures only one of dax_hmem or dax_cxl can successfully register a
dax_region for a given range.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/bus.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index c94c09622516..448e2bc285c3 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -10,6 +10,7 @@
 #include "dax-private.h"
 #include "bus.h"
 
+static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
 static DEFINE_MUTEX(dax_bus_lock);
 
 /*
@@ -625,6 +626,7 @@ static void dax_region_unregister(void *region)
 {
 	struct dax_region *dax_region = region;
 
+	release_resource(&dax_region->res);
 	sysfs_remove_groups(&dax_region->dev->kobj,
 			dax_region_attribute_groups);
 	dax_region_put(dax_region);
@@ -635,6 +637,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		unsigned long flags)
 {
 	struct dax_region *dax_region;
+	int rc;
 
 	/*
 	 * The DAX core assumes that it can store its private data in
@@ -667,14 +670,25 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		.flags = IORESOURCE_MEM | flags,
 	};
 
-	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
-		kfree(dax_region);
-		return NULL;
+	rc = request_resource(&dax_regions, &dax_region->res);
+	if (rc) {
+		dev_dbg(parent, "dax_region resource conflict for %pR\n",
+			&dax_region->res);
+		goto err_res;
 	}
 
+	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
+		goto err_sysfs;
+
 	if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
 		return NULL;
 	return dax_region;
+
+err_sysfs:
+	release_resource(&dax_region->res);
+err_res:
+	kfree(dax_region);
+	return NULL;
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 5/7] cxl/region: Add helper to check Soft Reserved containment by CXL regions
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (3 preceding siblings ...)
  2026-03-19  1:14 ` [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
@ 2026-03-19  1:14 ` Smita Koralahalli
  2026-03-19  1:14 ` [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
  2026-03-19  1:15 ` [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
  6 siblings, 0 replies; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Add a helper to determine whether a given Soft Reserved memory range is
fully contained within the committed CXL region.

This helper provides a primitive for policy decisions in subsequent
patches such as co-ordination with dax_hmem to determine whether CXL has
fully claimed ownership of Soft Reserved memory ranges.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++++++++
 include/cxl/cxl.h         | 15 +++++++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 include/cxl/cxl.h

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 42874948b589..f7b20f60ac5c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -12,6 +12,7 @@
 #include <linux/idr.h>
 #include <linux/memory-tiers.h>
 #include <linux/string_choices.h>
+#include <cxl/cxl.h>
 #include <cxlmem.h>
 #include <cxl.h>
 #include "core.h"
@@ -4173,6 +4174,35 @@ static int cxl_region_setup_poison(struct cxl_region *cxlr)
 	return devm_add_action_or_reset(dev, remove_debugfs, dentry);
 }
 
+static int region_contains_resource(struct device *dev, void *data)
+{
+	struct resource *res = data;
+	struct cxl_region *cxlr;
+	struct cxl_region_params *p;
+
+	if (!is_cxl_region(dev))
+		return 0;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+
+	if (p->state != CXL_CONFIG_COMMIT)
+		return 0;
+
+	if (!p->res)
+		return 0;
+
+	return resource_contains(p->res, res) ? 1 : 0;
+}
+
+bool cxl_region_contains_resource(struct resource *res)
+{
+	guard(rwsem_read)(&cxl_rwsem.region);
+	return bus_for_each_dev(&cxl_bus_type, NULL, res,
+				region_contains_resource) != 0;
+}
+EXPORT_SYMBOL_GPL(cxl_region_contains_resource);
+
 static int cxl_region_can_probe(struct cxl_region *cxlr)
 {
 	struct cxl_region_params *p = &cxlr->params;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..b12d3d0f6658
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2026 Advanced Micro Devices, Inc. */
+#ifndef _CXL_H_
+#define _CXL_H_
+
+#ifdef CONFIG_CXL_REGION
+bool cxl_region_contains_resource(struct resource *res);
+#else
+static inline bool cxl_region_contains_resource(struct resource *res)
+{
+	return false;
+}
+#endif
+
+#endif /* _CXL_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (4 preceding siblings ...)
  2026-03-19  1:14 ` [PATCH v7 5/7] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
@ 2026-03-19  1:14 ` Smita Koralahalli
  2026-03-19 14:29   ` Jonathan Cameron
  2026-03-19  1:15 ` [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
  6 siblings, 1 reply; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:14 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.

Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during the
initial dax_hmem probe, schedule deferred work to wait for the CXL stack
to complete enumeration and region assembly before deciding ownership.

Once the deferred work runs, evaluate each Soft Reserved range
individually: if a CXL region fully contains the range, skip it and let
dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
ownership model avoids the need for CXL region teardown and
alloc_dax_region() resource exclusion prevents double claiming.

Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
so it survives module reload. Ensure dax_cxl defers driver registration
until dax_hmem has completed ownership resolution. dax_cxl calls
dax_hmem_flush_work() before cxl_driver_register(), which both waits for
the deferred work to complete and creates a module symbol dependency that
forces dax_hmem.ko to load before dax_cxl.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/bus.h         |  7 +++++
 drivers/dax/cxl.c         |  1 +
 drivers/dax/hmem/device.c |  3 ++
 drivers/dax/hmem/hmem.c   | 66 +++++++++++++++++++++++++++++++++++++--
 4 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index cbbf64443098..ebbfe2d6da14 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
 void kill_dev_dax(struct dev_dax *dev_dax);
 bool static_dev_dax(struct dev_dax *dev_dax);
 
+#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
+extern bool dax_hmem_initial_probe;
+void dax_hmem_flush_work(void);
+#else
+static inline void dax_hmem_flush_work(void) { }
+#endif
+
 #define MODULE_ALIAS_DAX_DEVICE(type) \
 	MODULE_ALIAS("dax:t" __stringify(type) "*")
 #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index a2136adfa186..3ab39b77843d 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
 
 static void cxl_dax_region_driver_register(struct work_struct *work)
 {
+	dax_hmem_flush_work();
 	cxl_driver_register(&cxl_dax_region_driver);
 }
 
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 56e3cbd181b5..991a4bf7d969 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -8,6 +8,9 @@
 static bool nohmem;
 module_param_named(disable, nohmem, bool, 0444);
 
+bool dax_hmem_initial_probe;
+EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
+
 static bool platform_initialized;
 static DEFINE_MUTEX(hmem_resource_lock);
 static struct resource hmem_active = {
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1e3424358490..8c574123bd3b 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,6 +3,7 @@
 #include <linux/memregion.h>
 #include <linux/module.h>
 #include <linux/dax.h>
+#include <cxl/cxl.h>
 #include "../bus.h"
 
 static bool region_idle;
@@ -58,6 +59,19 @@ static void release_hmem(void *pdev)
 	platform_device_unregister(pdev);
 }
 
+struct dax_defer_work {
+	struct platform_device *pdev;
+	struct work_struct work;
+};
+
+static struct dax_defer_work dax_hmem_work;
+
+void dax_hmem_flush_work(void)
+{
+	flush_work(&dax_hmem_work.work);
+}
+EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
+
 static int hmem_register_device(struct device *host, int target_nid,
 				const struct resource *res)
 {
@@ -69,8 +83,11 @@ static int hmem_register_device(struct device *host, int target_nid,
 	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
-		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-		return 0;
+		if (!dax_hmem_initial_probe) {
+			dev_dbg(host, "deferring range to CXL: %pr\n", res);
+			queue_work(system_long_wq, &dax_hmem_work.work);
+			return 0;
+		}
 	}
 
 	rc = region_intersects_soft_reserve(res->start, resource_size(res));
@@ -123,8 +140,48 @@ static int hmem_register_device(struct device *host, int target_nid,
 	return rc;
 }
 
+static int hmem_register_cxl_device(struct device *host, int target_nid,
+				    const struct resource *res)
+{
+	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) == REGION_DISJOINT)
+		return 0;
+
+	if (cxl_region_contains_resource((struct resource *)res)) {
+		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
+		return 0;
+	}
+
+	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
+	return hmem_register_device(host, target_nid, res);
+}
+
+static void process_defer_work(struct work_struct *w)
+{
+	struct dax_defer_work *work = container_of(w, typeof(*work), work);
+	struct platform_device *pdev = work->pdev;
+
+	wait_for_device_probe();
+
+	guard(device)(&pdev->dev);
+	if (!pdev->dev.driver)
+		return;
+
+	dax_hmem_initial_probe = true;
+	walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
+}
+
 static int dax_hmem_platform_probe(struct platform_device *pdev)
 {
+	if (work_pending(&dax_hmem_work.work))
+		return -EBUSY;
+
+	if (!dax_hmem_work.pdev) {
+		get_device(&pdev->dev);
+		dax_hmem_work.pdev = pdev;
+		INIT_WORK(&dax_hmem_work.work, process_defer_work);
+	}
+
 	return walk_hmem_resources(&pdev->dev, hmem_register_device);
 }
 
@@ -162,6 +219,11 @@ static __init int dax_hmem_init(void)
 
 static __exit void dax_hmem_exit(void)
 {
+	flush_work(&dax_hmem_work.work);
+
+	if (dax_hmem_work.pdev)
+		put_device(&dax_hmem_work.pdev->dev);
+
 	platform_driver_unregister(&dax_hmem_driver);
 	platform_driver_unregister(&dax_hmem_platform_driver);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
  2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (5 preceding siblings ...)
  2026-03-19  1:14 ` [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
@ 2026-03-19  1:15 ` Smita Koralahalli
  2026-03-19 14:35   ` Jonathan Cameron
  6 siblings, 1 reply; 22+ messages in thread
From: Smita Koralahalli @ 2026-03-19  1:15 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Reworked from a patch by Alison Schofield <alison.schofield@intel.com>

Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
to consume.

This restores visibility in /proc/iomem for ranges actively in use, while
avoiding the early-boot conflicts that occurred when Soft Reserved was
published into iomem before CXL window and region discovery.

Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/hmem.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 8c574123bd3b..15e462589b92 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -72,6 +72,34 @@ void dax_hmem_flush_work(void)
 }
 EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
 
+static void remove_soft_reserved(void *r)
+{
+	remove_resource(r);
+	kfree(r);
+}
+
+static int add_soft_reserve_into_iomem(struct device *host,
+				       const struct resource *res)
+{
+	int rc;
+
+	struct resource *soft __free(kfree) =
+		kmalloc(sizeof(*res), GFP_KERNEL);
+	if (!soft)
+		return -ENOMEM;
+
+	*soft = DEFINE_RES_NAMED_DESC(res->start, (res->end - res->start + 1),
+				      "Soft Reserved", IORESOURCE_MEM,
+				      IORES_DESC_SOFT_RESERVED);
+
+	rc = insert_resource(&iomem_resource, soft);
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(host, remove_soft_reserved,
+					no_free_ptr(soft));
+}
+
 static int hmem_register_device(struct device *host, int target_nid,
 				const struct resource *res)
 {
@@ -94,7 +122,9 @@ static int hmem_register_device(struct device *host, int target_nid,
 	if (rc != REGION_INTERSECTS)
 		return 0;
 
-	/* TODO: Add Soft-Reserved memory back to iomem */
+	rc = add_soft_reserve_into_iomem(host, res);
+	if (rc)
+		return rc;
 
 	id = memregion_alloc(GFP_KERNEL);
 	if (id < 0) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19  1:14 ` [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
@ 2026-03-19  5:48   ` Alison Schofield
  2026-03-19 14:11     ` Jonathan Cameron
  2026-03-19 15:46     ` Koralahalli Channabasappa, Smita
  0 siblings, 2 replies; 22+ messages in thread
From: Alison Schofield @ 2026-03-19  5:48 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Thu, Mar 19, 2026 at 01:14:56AM +0000, Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
> dax_cxl.
> 
> In addition, defer registration of the dax_cxl driver to a workqueue
> instead of using module_cxl_driver(). This ensures that dax_hmem has
> an opportunity to initialize and register its deferred callback and make
> ownership decisions before dax_cxl begins probing and claiming Soft
> Reserved ranges.
> 
> Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
> out of line from other synchronous probing avoiding ordering
> dependencies while coordinating ownership decisions with dax_hmem.

Hi Smita,

Replying to this patch, as it's my best guess as to why I may be
seeing this WARN when I modprobe cxl-test.

We are able to pass all the CXL unit tests because it is only that
first load that causes the WARN. All subsequent reloads of cxl-test
do not unload dax_cxl and dax_hmem so they chug happily along.

I can reproduce by unloading each piece before reloading cxl-test
# modprobe -r cxl-test
# modprobe -r dax_cxl
# modprobe -r dax_hmem
# modprobe cxl-test
and the WARN repeats.

Guessing you may recognize what is going on. Let me know if I can
try anything else out.


# dmesg (trimmed to just the init calls)
[   34.229033] calling  fwctl_init+0x0/0xff0 [fwctl] @ 1057
[   34.230616] initcall fwctl_init+0x0/0xff0 [fwctl] returned 0 after 186 usecs
[   34.257096] calling  cxl_core_init+0x0/0x100 [cxl_core] @ 1057
[   34.258395] initcall cxl_core_init+0x0/0x100 [cxl_core] returned 0 after 538 usecs
[   34.264170] calling  cxl_port_init+0x0/0xff0 [cxl_port] @ 1057
[   34.264982] initcall cxl_port_init+0x0/0xff0 [cxl_port] returned 0 after 110 usecs
[   34.268058] calling  cxl_mem_driver_init+0x0/0xff0 [cxl_mem] @ 1057
[   34.268743] initcall cxl_mem_driver_init+0x0/0xff0 [cxl_mem] returned 0 after 110 usecs
[   34.274670] calling  cxl_pmem_init+0x0/0xff0 [cxl_pmem] @ 1057
[   34.277835] initcall cxl_pmem_init+0x0/0xff0 [cxl_pmem] returned 0 after 1671 usecs
[   34.285807] calling  cxl_acpi_init+0x0/0xff0 [cxl_acpi] @ 1057
[   34.287105] initcall cxl_acpi_init+0x0/0xff0 [cxl_acpi] returned 0 after 262 usecs
[   34.292967] calling  cxl_test_init+0x0/0xff0 [cxl_test] @ 1057
[   34.339841] initcall cxl_test_init+0x0/0xff0 [cxl_test] returned 0 after 45832 usecs
[   34.342259] calling  cxl_mock_mem_driver_init+0x0/0xff0 [cxl_mock_mem] @ 1063
[   34.343459] initcall cxl_mock_mem_driver_init+0x0/0xff0 [cxl_mock_mem] returned 0 after 356 usecs
[   34.658602] calling  dax_hmem_init+0x0/0xff0 [dax_hmem] @ 1059
[   34.670106] calling  cxl_pci_driver_init+0x0/0xff0 [cxl_pci] @ 1100
[   34.671023] initcall cxl_pci_driver_init+0x0/0xff0 [cxl_pci] returned 0 after 197 usecs
[   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 after 2225 usecs
[   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
[   34.676856] ------------[ cut here ]------------
[   34.677533] WARNING: kernel/workqueue.c:4289 at __flush_work+0x4f9/0x550, CPU#3: kworker/3:2/136
[   34.678596] Modules linked in: dax_cxl(+) cxl_pci dax_hmem cxl_mock_mem(O) cxl_test(O) cxl_acpi(O) cxl_pmem(O) cxl_mem(O) cxl_port(O) cxl_mock(O) cxl_core(O) fwctl nd_pmem nd_btt dax_pmem nfit nd_e820 libnvdimm
[   34.680632] initcall cxl_dax_region_init+0x0/0xff0 [dax_cxl] returned 0 after 3842 usecs
[   34.680918] CPU: 3 UID: 0 PID: 136 Comm: kworker/3:2 Tainted: G           O        7.0.0-rc4+ #156 PREEMPT(full) 
[   34.684368] Tainted: [O]=OOT_MODULE
[   34.684993] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[   34.686098] Workqueue: events_long cxl_dax_region_driver_register [dax_cxl]
[   34.687108] RIP: 0010:__flush_work+0x4f9/0x550

That addr is this line in flush_work()
        if (WARN_ON(!work->func))
                return false;


[   34.687811] Code: ff 49 8b 45 00 49 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 49 0f ba 6d 00 03 e9 a1 fc ff ff 0f 0b e9 e6 fe ff ff <0f> 0b e9 df fe ff ff e8 9b 48 15 01 85 c0 0f 84 26 ff ff ff 80 3d
[   34.690107] RSP: 0018:ffffc900020b7cf8 EFLAGS: 00010246
[   34.690673] RAX: 0000000000000000 RBX: ffffffffa0ea2088 RCX: ffff8880088b2b78
[   34.691388] RDX: 00000000834fb194 RSI: 0000000000000000 RDI: ffffffffa0ea2088
[   34.692135] RBP: ffffc900020b7de0 R08: 0000000031ab93b0 R09: 00000000effb42e8
[   34.692876] R10: 000000008effb42e R11: 0000000000000000 R12: ffff88807d9bb340
[   34.693588] R13: ffffffffa0ea2088 R14: ffffffffa0ed2020 R15: 0000000000000001
[   34.694358] FS:  0000000000000000(0000) GS:ffff8880fa45f000(0000) knlGS:0000000000000000
[   34.695179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.695775] CR2: 00007fe888b4e34c CR3: 00000000090ed004 CR4: 0000000000370ef0
[   34.696494] Call Trace:
[   34.696889]  <TASK>
[   34.697238]  ? __lock_acquire+0xb08/0x2930
[   34.697730]  ? __this_cpu_preempt_check+0x13/0x20
[   34.698277]  flush_work+0x17/0x30
[   34.698705]  dax_hmem_flush_work+0x10/0x20 [dax_hmem]
[   34.699270]  cxl_dax_region_driver_register+0x9/0x30 [dax_cxl]
[   34.699943]  process_one_work+0x203/0x6c0
[   34.700452]  worker_thread+0x197/0x350
[   34.700942]  ? __pfx_worker_thread+0x10/0x10
[   34.701455]  kthread+0x108/0x140
[   34.701915]  ? __pfx_kthread+0x10/0x10
[   34.702396]  ret_from_fork+0x28a/0x310
[   34.702880]  ? __pfx_kthread+0x10/0x10
[   34.703363]  ret_from_fork_asm+0x1a/0x30
[   34.703872]  </TASK>
[   34.704227] irq event stamp: 11015
[   34.704656] hardirqs last  enabled at (11025): [<ffffffff813486de>] __up_console_sem+0x5e/0x80
[   34.705493] hardirqs last disabled at (11036): [<ffffffff813486c3>] __up_console_sem+0x43/0x80
[   34.706354] softirqs last  enabled at (10500): [<ffffffff812ab9f3>] __irq_exit_rcu+0xc3/0x120
[   34.707197] softirqs last disabled at (10495): [<ffffffff812ab9f3>] __irq_exit_rcu+0xc3/0x120
[   34.708015] ---[ end trace 0000000000000000 ]---
[   34.752127] calling  dax_init+0x0/0xff0 [device_dax] @ 1089
[   34.754006] initcall dax_init+0x0/0xff0 [device_dax] returned 0 after 422 usecs
[   34.759609] calling  dax_kmem_init+0x0/0xff0 [kmem] @ 1089
[   37.338377] initcall dax_kmem_init+0x0/0xff0 [kmem] returned 0 after 2577658 usecs


> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
>  drivers/dax/Makefile |  3 +--
>  drivers/dax/cxl.c    | 27 ++++++++++++++++++++++++++-
>  2 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> index 5ed5c39857c8..70e996bf1526 100644
> --- a/drivers/dax/Makefile
> +++ b/drivers/dax/Makefile
> @@ -1,4 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> +obj-y += hmem/
>  obj-$(CONFIG_DAX) += dax.o
>  obj-$(CONFIG_DEV_DAX) += device_dax.o
>  obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
> @@ -10,5 +11,3 @@ dax-y += bus.o
>  device_dax-y := device.o
>  dax_pmem-y := pmem.o
>  dax_cxl-y := cxl.o
> -
> -obj-y += hmem/
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 13cd94d32ff7..a2136adfa186 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
>  	.id = CXL_DEVICE_DAX_REGION,
>  	.drv = {
>  		.suppress_bind_attrs = true,
> +		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
>  	},
>  };
>  
> -module_cxl_driver(cxl_dax_region_driver);
> +static void cxl_dax_region_driver_register(struct work_struct *work)
> +{
> +	cxl_driver_register(&cxl_dax_region_driver);
> +}
> +
> +static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
> +
> +static int __init cxl_dax_region_init(void)
> +{
> +	/*
> +	 * Need to resolve a race with dax_hmem wanting to drive regions
> +	 * instead of CXL
> +	 */
> +	queue_work(system_long_wq, &cxl_dax_region_driver_work);
> +	return 0;
> +}
> +module_init(cxl_dax_region_init);
> +
> +static void __exit cxl_dax_region_exit(void)
> +{
> +	flush_work(&cxl_dax_region_driver_work);
> +	cxl_driver_unregister(&cxl_dax_region_driver);
> +}
> +module_exit(cxl_dax_region_exit);
> +
>  MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
>  MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
>  MODULE_LICENSE("GPL");
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree
  2026-03-19  1:14 ` [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
@ 2026-03-19 13:59   ` Jonathan Cameron
  2026-03-20 16:58     ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2026-03-19 13:59 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Thu, 19 Mar 2026 01:14:57 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> Introduce a global "DAX Regions" resource root and register each
> dax_region->res under it via request_resource(). Release the resource on
> dax_region teardown.
> 
> By enforcing a single global namespace for dax_region allocations, this
> ensures only one of dax_hmem or dax_cxl can successfully register a
> dax_region for a given range.
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

The comment below is about the existing code.  If we decide not to tidy that
up for now and you swap the ordering of release_resource() and sysfs_remove_groups()
in unregister.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/dax/bus.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index c94c09622516..448e2bc285c3 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -10,6 +10,7 @@
>  #include "dax-private.h"
>  #include "bus.h"
>  
> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
>  static DEFINE_MUTEX(dax_bus_lock);
>  
>  /*
> @@ -625,6 +626,7 @@ static void dax_region_unregister(void *region)
>  {
>  	struct dax_region *dax_region = region;
>  
> +	release_resource(&dax_region->res);

Should reverse the line above and the line below so we unwind in reverse of
setup.  I doubt it matters in practice today but keeping ordering like that
makes it much easier to see if a future patch messes things up.

>  	sysfs_remove_groups(&dax_region->dev->kobj,
>  			dax_region_attribute_groups);
>  	dax_region_put(dax_region);
> @@ -635,6 +637,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		unsigned long flags)
>  {
>  	struct dax_region *dax_region;
> +	int rc;
>  
>  	/*
>  	 * The DAX core assumes that it can store its private data in
> @@ -667,14 +670,25 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		.flags = IORESOURCE_MEM | flags,
>  	};
>  
> -	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> -		kfree(dax_region);
> -		return NULL;
> +	rc = request_resource(&dax_regions, &dax_region->res);
> +	if (rc) {
> +		dev_dbg(parent, "dax_region resource conflict for %pR\n",
> +			&dax_region->res);
> +		goto err_res;
>  	}
>  
> +	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
> +		goto err_sysfs;
> +
>  	if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))

This is curious. The code flips over to a kref_put() based release but we didn't
do anything with the kref in the previous call. So whilst not 'buggy' as such
it's definitely inconsistent and we should clean it up.

This should really have been doing the release via dax_region_put() from the
kref_init().  In practice that means never calling kfree(dax_regions) error paths
because the kref_init() is just after the allocation. Instead call dax_region_put()
in all those error paths.

 

>  		return NULL;
>  	return dax_region;
> +
> +err_sysfs:
> +	release_resource(&dax_region->res);
> +err_res:
> +	kfree(dax_region);

From above I think this should be
	dax_region_put(dax_region);

> +	return NULL;
>  }
>  EXPORT_SYMBOL_GPL(alloc_dax_region);
>  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19  5:48   ` Alison Schofield
@ 2026-03-19 14:11     ` Jonathan Cameron
  2026-03-19 15:46     ` Koralahalli Channabasappa, Smita
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2026-03-19 14:11 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
	Yazen Ghannam, Dave Jiang, Davidlohr Bueso, Matthew Wilcox,
	Jan Kara, Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
	Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski

On Wed, 18 Mar 2026 22:48:30 -0700
Alison Schofield <alison.schofield@intel.com> wrote:

> On Thu, Mar 19, 2026 at 01:14:56AM +0000, Smita Koralahalli wrote:
> > From: Dan Williams <dan.j.williams@intel.com>
> > 
> > Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
> > dax_cxl.
> > 
> > In addition, defer registration of the dax_cxl driver to a workqueue
> > instead of using module_cxl_driver(). This ensures that dax_hmem has
> > an opportunity to initialize and register its deferred callback and make
> > ownership decisions before dax_cxl begins probing and claiming Soft
> > Reserved ranges.
> > 
> > Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
> > out of line from other synchronous probing avoiding ordering
> > dependencies while coordinating ownership decisions with dax_hmem.  
> 
> Hi Smita,
> 
> Replying to this patch, as it's my best guess as to why I may be
> seeing this WARN when I modprobe cxl-test.

Not patch 6?  dax_hmem_flush_work() is in there + it doesn't
use a static declaration of the work items.

I've not figure out the path yet but it looks more suspicious to me
than this path.

Jonathan

> 
> We are able to pass all the CXL unit tests because it is only that
> first load that causes the WARN. All subsequent reloads of cxl-test
> do not unload dax_cxl and dax_hmem so they chug happily along.
> 
> I can reproduce by unloading each piece before reloading cxl-test
> # modprobe -r cxl-test
> # modprobe -r dax_cxl
> # modprobe -r dax_hmem
> # modprobe cxl-test
> and the WARN repeats.
> 
> Guessing you may recognize what is going on. Let me know if I can
> try anything else out.
> 
> 
> # dmesg (trimmed to just the init calls)
> [   34.229033] calling  fwctl_init+0x0/0xff0 [fwctl] @ 1057
> [   34.230616] initcall fwctl_init+0x0/0xff0 [fwctl] returned 0 after 186 usecs
> [   34.257096] calling  cxl_core_init+0x0/0x100 [cxl_core] @ 1057
> [   34.258395] initcall cxl_core_init+0x0/0x100 [cxl_core] returned 0 after 538 usecs
> [   34.264170] calling  cxl_port_init+0x0/0xff0 [cxl_port] @ 1057
> [   34.264982] initcall cxl_port_init+0x0/0xff0 [cxl_port] returned 0 after 110 usecs
> [   34.268058] calling  cxl_mem_driver_init+0x0/0xff0 [cxl_mem] @ 1057
> [   34.268743] initcall cxl_mem_driver_init+0x0/0xff0 [cxl_mem] returned 0 after 110 usecs
> [   34.274670] calling  cxl_pmem_init+0x0/0xff0 [cxl_pmem] @ 1057
> [   34.277835] initcall cxl_pmem_init+0x0/0xff0 [cxl_pmem] returned 0 after 1671 usecs
> [   34.285807] calling  cxl_acpi_init+0x0/0xff0 [cxl_acpi] @ 1057
> [   34.287105] initcall cxl_acpi_init+0x0/0xff0 [cxl_acpi] returned 0 after 262 usecs
> [   34.292967] calling  cxl_test_init+0x0/0xff0 [cxl_test] @ 1057
> [   34.339841] initcall cxl_test_init+0x0/0xff0 [cxl_test] returned 0 after 45832 usecs
> [   34.342259] calling  cxl_mock_mem_driver_init+0x0/0xff0 [cxl_mock_mem] @ 1063
> [   34.343459] initcall cxl_mock_mem_driver_init+0x0/0xff0 [cxl_mock_mem] returned 0 after 356 usecs
> [   34.658602] calling  dax_hmem_init+0x0/0xff0 [dax_hmem] @ 1059
> [   34.670106] calling  cxl_pci_driver_init+0x0/0xff0 [cxl_pci] @ 1100
> [   34.671023] initcall cxl_pci_driver_init+0x0/0xff0 [cxl_pci] returned 0 after 197 usecs
> [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 after 2225 usecs
> [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
> [   34.676856] ------------[ cut here ]------------
> [   34.677533] WARNING: kernel/workqueue.c:4289 at __flush_work+0x4f9/0x550, CPU#3: kworker/3:2/136
> [   34.678596] Modules linked in: dax_cxl(+) cxl_pci dax_hmem cxl_mock_mem(O) cxl_test(O) cxl_acpi(O) cxl_pmem(O) cxl_mem(O) cxl_port(O) cxl_mock(O) cxl_core(O) fwctl nd_pmem nd_btt dax_pmem nfit nd_e820 libnvdimm
> [   34.680632] initcall cxl_dax_region_init+0x0/0xff0 [dax_cxl] returned 0 after 3842 usecs
> [   34.680918] CPU: 3 UID: 0 PID: 136 Comm: kworker/3:2 Tainted: G           O        7.0.0-rc4+ #156 PREEMPT(full) 
> [   34.684368] Tainted: [O]=OOT_MODULE
> [   34.684993] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [   34.686098] Workqueue: events_long cxl_dax_region_driver_register [dax_cxl]
> [   34.687108] RIP: 0010:__flush_work+0x4f9/0x550
> 
> That addr is this line in flush_work()
>         if (WARN_ON(!work->func))
>                 return false;
> 
> 
> [   34.687811] Code: ff 49 8b 45 00 49 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 49 0f ba 6d 00 03 e9 a1 fc ff ff 0f 0b e9 e6 fe ff ff <0f> 0b e9 df fe ff ff e8 9b 48 15 01 85 c0 0f 84 26 ff ff ff 80 3d
> [   34.690107] RSP: 0018:ffffc900020b7cf8 EFLAGS: 00010246
> [   34.690673] RAX: 0000000000000000 RBX: ffffffffa0ea2088 RCX: ffff8880088b2b78
> [   34.691388] RDX: 00000000834fb194 RSI: 0000000000000000 RDI: ffffffffa0ea2088
> [   34.692135] RBP: ffffc900020b7de0 R08: 0000000031ab93b0 R09: 00000000effb42e8
> [   34.692876] R10: 000000008effb42e R11: 0000000000000000 R12: ffff88807d9bb340
> [   34.693588] R13: ffffffffa0ea2088 R14: ffffffffa0ed2020 R15: 0000000000000001
> [   34.694358] FS:  0000000000000000(0000) GS:ffff8880fa45f000(0000) knlGS:0000000000000000
> [   34.695179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   34.695775] CR2: 00007fe888b4e34c CR3: 00000000090ed004 CR4: 0000000000370ef0
> [   34.696494] Call Trace:
> [   34.696889]  <TASK>
> [   34.697238]  ? __lock_acquire+0xb08/0x2930
> [   34.697730]  ? __this_cpu_preempt_check+0x13/0x20
> [   34.698277]  flush_work+0x17/0x30
> [   34.698705]  dax_hmem_flush_work+0x10/0x20 [dax_hmem]
> [   34.699270]  cxl_dax_region_driver_register+0x9/0x30 [dax_cxl]
> [   34.699943]  process_one_work+0x203/0x6c0
> [   34.700452]  worker_thread+0x197/0x350
> [   34.700942]  ? __pfx_worker_thread+0x10/0x10
> [   34.701455]  kthread+0x108/0x140
> [   34.701915]  ? __pfx_kthread+0x10/0x10
> [   34.702396]  ret_from_fork+0x28a/0x310
> [   34.702880]  ? __pfx_kthread+0x10/0x10
> [   34.703363]  ret_from_fork_asm+0x1a/0x30
> [   34.703872]  </TASK>
> [   34.704227] irq event stamp: 11015
> [   34.704656] hardirqs last  enabled at (11025): [<ffffffff813486de>] __up_console_sem+0x5e/0x80
> [   34.705493] hardirqs last disabled at (11036): [<ffffffff813486c3>] __up_console_sem+0x43/0x80
> [   34.706354] softirqs last  enabled at (10500): [<ffffffff812ab9f3>] __irq_exit_rcu+0xc3/0x120
> [   34.707197] softirqs last disabled at (10495): [<ffffffff812ab9f3>] __irq_exit_rcu+0xc3/0x120
> [   34.708015] ---[ end trace 0000000000000000 ]---
> [   34.752127] calling  dax_init+0x0/0xff0 [device_dax] @ 1089
> [   34.754006] initcall dax_init+0x0/0xff0 [device_dax] returned 0 after 422 usecs
> [   34.759609] calling  dax_kmem_init+0x0/0xff0 [kmem] @ 1089
> [   37.338377] initcall dax_kmem_init+0x0/0xff0 [kmem] returned 0 after 2577658 usecs
> 
> 
> > 
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > ---
> >  drivers/dax/Makefile |  3 +--
> >  drivers/dax/cxl.c    | 27 ++++++++++++++++++++++++++-
> >  2 files changed, 27 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> > index 5ed5c39857c8..70e996bf1526 100644
> > --- a/drivers/dax/Makefile
> > +++ b/drivers/dax/Makefile
> > @@ -1,4 +1,5 @@
> >  # SPDX-License-Identifier: GPL-2.0
> > +obj-y += hmem/
> >  obj-$(CONFIG_DAX) += dax.o
> >  obj-$(CONFIG_DEV_DAX) += device_dax.o
> >  obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
> > @@ -10,5 +11,3 @@ dax-y += bus.o
> >  device_dax-y := device.o
> >  dax_pmem-y := pmem.o
> >  dax_cxl-y := cxl.o
> > -
> > -obj-y += hmem/
> > diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> > index 13cd94d32ff7..a2136adfa186 100644
> > --- a/drivers/dax/cxl.c
> > +++ b/drivers/dax/cxl.c
> > @@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
> >  	.id = CXL_DEVICE_DAX_REGION,
> >  	.drv = {
> >  		.suppress_bind_attrs = true,
> > +		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
> >  	},
> >  };
> >  
> > -module_cxl_driver(cxl_dax_region_driver);
> > +static void cxl_dax_region_driver_register(struct work_struct *work)
> > +{
> > +	cxl_driver_register(&cxl_dax_region_driver);
> > +}
> > +
> > +static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
> > +
> > +static int __init cxl_dax_region_init(void)
> > +{
> > +	/*
> > +	 * Need to resolve a race with dax_hmem wanting to drive regions
> > +	 * instead of CXL
> > +	 */
> > +	queue_work(system_long_wq, &cxl_dax_region_driver_work);
> > +	return 0;
> > +}
> > +module_init(cxl_dax_region_init);
> > +
> > +static void __exit cxl_dax_region_exit(void)
> > +{
> > +	flush_work(&cxl_dax_region_driver_work);
> > +	cxl_driver_unregister(&cxl_dax_region_driver);
> > +}
> > +module_exit(cxl_dax_region_exit);
> > +
> >  MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
> >  MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
> >  MODULE_LICENSE("GPL");
> > -- 
> > 2.17.1
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-19  1:14 ` [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
@ 2026-03-19 14:29   ` Jonathan Cameron
  2026-03-19 20:03     ` Alison Schofield
  2026-03-20 17:17     ` Koralahalli Channabasappa, Smita
  0 siblings, 2 replies; 22+ messages in thread
From: Jonathan Cameron @ 2026-03-19 14:29 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Thu, 19 Mar 2026 01:14:59 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
> 
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during the
> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
> to complete enumeration and region assembly before deciding ownership.
> 
> Once the deferred work runs, evaluate each Soft Reserved range
> individually: if a CXL region fully contains the range, skip it and let
> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
> ownership model avoids the need for CXL region teardown and
> alloc_dax_region() resource exclusion prevents double claiming.
> 
> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
> so it survives module reload. Ensure dax_cxl defers driver registration
> until dax_hmem has completed ownership resolution. dax_cxl calls
> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
> the deferred work to complete and creates a module symbol dependency that
> forces dax_hmem.ko to load before dax_cxl.
> 
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Hi Smita,

I think this is very likely to be what is causing the bug Alison
saw in cxl_test.

It looks to be possible to flush work before the work structure has
been configured.  Even though it's not on a work queue and there is
nothing to do, there are early sanity checks that fail giving the warning
Alison reported.

A couple of ways to fix that inline.  I'd be tempted to both initialize
the function statically and gate against flushing if the whole thing isn't
set up yet.

Jonathan

> ---
>  drivers/dax/bus.h         |  7 +++++
>  drivers/dax/cxl.c         |  1 +
>  drivers/dax/hmem/device.c |  3 ++
>  drivers/dax/hmem/hmem.c   | 66 +++++++++++++++++++++++++++++++++++++--
>  4 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index cbbf64443098..ebbfe2d6da14 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
>  void kill_dev_dax(struct dev_dax *dev_dax);
>  bool static_dev_dax(struct dev_dax *dev_dax);
>  
> +#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
> +extern bool dax_hmem_initial_probe;
> +void dax_hmem_flush_work(void);
> +#else
> +static inline void dax_hmem_flush_work(void) { }
> +#endif
> +
>  #define MODULE_ALIAS_DAX_DEVICE(type) \
>  	MODULE_ALIAS("dax:t" __stringify(type) "*")
>  #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index a2136adfa186..3ab39b77843d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>  
>  static void cxl_dax_region_driver_register(struct work_struct *work)
>  {
> +	dax_hmem_flush_work();
>  	cxl_driver_register(&cxl_dax_region_driver);
>  }
>  
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 56e3cbd181b5..991a4bf7d969 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -8,6 +8,9 @@
>  static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
>  
> +bool dax_hmem_initial_probe;
> +EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
> +
>  static bool platform_initialized;
>  static DEFINE_MUTEX(hmem_resource_lock);
>  static struct resource hmem_active = {
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1e3424358490..8c574123bd3b 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
>  #include <linux/memregion.h>
>  #include <linux/module.h>
>  #include <linux/dax.h>
> +#include <cxl/cxl.h>
>  #include "../bus.h"
>  
>  static bool region_idle;
> @@ -58,6 +59,19 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
> +static struct dax_defer_work dax_hmem_work;

static struct dax_defer_work dax_hmem_work = {
	.work = __WORK_INITIALIZER(&dax_hmem_work.work,
				   process_defer_work),
};
or something similar.


> +
> +void dax_hmem_flush_work(void)
> +{
> +	flush_work(&dax_hmem_work.work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> +
>  static int hmem_register_device(struct device *host, int target_nid,
>  				const struct resource *res)
>  {
> @@ -69,8 +83,11 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>  	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>  			      IORES_DESC_CXL) != REGION_DISJOINT) {
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> +		if (!dax_hmem_initial_probe) {
> +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			queue_work(system_long_wq, &dax_hmem_work.work);
> +			return 0;
> +		}
>  	}
>  
>  	rc = region_intersects_soft_reserve(res->start, resource_size(res));
> @@ -123,8 +140,48 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	return rc;
>  }
>  
> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> +				    const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) == REGION_DISJOINT)
> +		return 0;
> +
> +	if (cxl_region_contains_resource((struct resource *)res)) {
> +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
> +		return 0;
> +	}
> +
> +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
> +	return hmem_register_device(host, target_nid, res);
> +}
> +
> +static void process_defer_work(struct work_struct *w)
> +{
> +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
> +	struct platform_device *pdev = work->pdev;
If you do the suggested __INITIALIZE_WORK() then I'd add
a paranoid

	if (!work->pdev)
		return;
We don't actually queue the work before pdev is set, but that might
be obvious once we spilt up assigning the function and the data
it uses.

> +
> +	wait_for_device_probe();
> +
> +	guard(device)(&pdev->dev);
> +	if (!pdev->dev.driver)
> +		return;
> +
> +	dax_hmem_initial_probe = true;
> +	walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> +}
> +
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> +	if (work_pending(&dax_hmem_work.work))
> +		return -EBUSY;
> +
> +	if (!dax_hmem_work.pdev) {
> +		get_device(&pdev->dev);
> +		dax_hmem_work.pdev = pdev;

Using the pdev rather than dev breaks the pattern of doing a get_device()
and assigning in one line. This is a bit ugly.

		dax_hmem_work.pdev = to_pci_dev(get_device(&pdev->dev));

but perhaps makes the association tighter than current code.

> +		INIT_WORK(&dax_hmem_work.work, process_defer_work);

See above. I think assigning the work function should be static
which should resolve the issue Alison was seeing as then it should
be fine to call flush_work() on the item that isn't on a work queue
yet but is initialized.

> +	}
> +
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>  }
>  
> @@ -162,6 +219,11 @@ static __init int dax_hmem_init(void)
>  
>  static __exit void dax_hmem_exit(void)
>  {
> +	flush_work(&dax_hmem_work.work);

I think this needs to be under the if (dax_hmem_work.pdev) 
Not sure there is any guarantee dax_hmem_platform_probe() has run
before we get here otherwise.  Alternative is to assign
the work function statically.



> +
> +	if (dax_hmem_work.pdev)
> +		put_device(&dax_hmem_work.pdev->dev);
> +
>  	platform_driver_unregister(&dax_hmem_driver);
>  	platform_driver_unregister(&dax_hmem_platform_driver);
>  }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
  2026-03-19  1:15 ` [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
@ 2026-03-19 14:35   ` Jonathan Cameron
  2026-03-20 17:00     ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2026-03-19 14:35 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Thu, 19 Mar 2026 01:15:00 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> Reworked from a patch by Alison Schofield <alison.schofield@intel.com>
> 
> Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
> to consume.
> 
> This restores visibility in /proc/iomem for ranges actively in use, while
> avoiding the early-boot conflicts that occurred when Soft Reserved was
> published into iomem before CXL window and region discovery.
> 
> Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
> Co-developed-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
> Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
One minor update needed as kmalloc_obj() has shown up in meantime.

Thanks

Jonathan
> ---
>  drivers/dax/hmem/hmem.c | 32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 8c574123bd3b..15e462589b92 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -72,6 +72,34 @@ void dax_hmem_flush_work(void)
>  }
>  EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
>  
> +static void remove_soft_reserved(void *r)
> +{
> +	remove_resource(r);
> +	kfree(r);
> +}
> +
> +static int add_soft_reserve_into_iomem(struct device *host,
> +				       const struct resource *res)
> +{
> +	int rc;
> +
> +	struct resource *soft __free(kfree) =
> +		kmalloc(sizeof(*res), GFP_KERNEL);

Update to

	struct resource *soft __free(kfree) = kmalloc_obj(*soft);

Got added in 7.0 with lots of call sites updated via scripting.

Not sure why this had sizeof(*res) rather than sizeof(*soft).
Same type but should have been soft!  If nothing else that would
probably have broken the scripts looking for where we should
be using kmalloc_obj().


	
> +	if (!soft)
> +		return -ENOMEM;
> +
> +	*soft = DEFINE_RES_NAMED_DESC(res->start, (res->end - res->start + 1),
> +				      "Soft Reserved", IORESOURCE_MEM,
> +				      IORES_DESC_SOFT_RESERVED);
> +
> +	rc = insert_resource(&iomem_resource, soft);
> +	if (rc)
> +		return rc;
> +
> +	return devm_add_action_or_reset(host, remove_soft_reserved,
> +					no_free_ptr(soft));
> +}
> +
>  static int hmem_register_device(struct device *host, int target_nid,
>  				const struct resource *res)
>  {
> @@ -94,7 +122,9 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	if (rc != REGION_INTERSECTS)
>  		return 0;
>  
> -	/* TODO: Add Soft-Reserved memory back to iomem */
> +	rc = add_soft_reserve_into_iomem(host, res);
> +	if (rc)
> +		return rc;
>  
>  	id = memregion_alloc(GFP_KERNEL);
>  	if (id < 0) {


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19  5:48   ` Alison Schofield
  2026-03-19 14:11     ` Jonathan Cameron
@ 2026-03-19 15:46     ` Koralahalli Channabasappa, Smita
  2026-03-19 16:45       ` Koralahalli Channabasappa, Smita
  1 sibling, 1 reply; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-19 15:46 UTC (permalink / raw)
  To: Alison Schofield, Smita Koralahalli, Jonathan Cameron
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
	Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

Hi Jonathan and Alison,

Thanks for the report and suggestions. I took a look at Jonathan's 
comments in Patch 6 and tying it together here.

On 3/18/2026 10:48 PM, Alison Schofield wrote:
> On Thu, Mar 19, 2026 at 01:14:56AM +0000, Smita Koralahalli wrote:
>> From: Dan Williams <dan.j.williams@intel.com>
>>
>> Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
>> dax_cxl.
>>
>> In addition, defer registration of the dax_cxl driver to a workqueue
>> instead of using module_cxl_driver(). This ensures that dax_hmem has
>> an opportunity to initialize and register its deferred callback and make
>> ownership decisions before dax_cxl begins probing and claiming Soft
>> Reserved ranges.
>>
>> Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
>> out of line from other synchronous probing avoiding ordering
>> dependencies while coordinating ownership decisions with dax_hmem.
> 
> Hi Smita,
> 
> Replying to this patch, as it's my best guess as to why I may be
> seeing this WARN when I modprobe cxl-test.
> 
> We are able to pass all the CXL unit tests because it is only that
> first load that causes the WARN. All subsequent reloads of cxl-test
> do not unload dax_cxl and dax_hmem so they chug happily along.
> 
> I can reproduce by unloading each piece before reloading cxl-test
> # modprobe -r cxl-test
> # modprobe -r dax_cxl
> # modprobe -r dax_hmem
> # modprobe cxl-test
> and the WARN repeats.
> 
> Guessing you may recognize what is going on. Let me know if I can
> try anything else out.
> 
> 
> # dmesg (trimmed to just the init calls)
> [   34.229033] calling  fwctl_init+0x0/0xff0 [fwctl] @ 1057
> [   34.230616] initcall fwctl_init+0x0/0xff0 [fwctl] returned 0 after 186 usecs
> [   34.257096] calling  cxl_core_init+0x0/0x100 [cxl_core] @ 1057
> [   34.258395] initcall cxl_core_init+0x0/0x100 [cxl_core] returned 0 after 538 usecs
> [   34.264170] calling  cxl_port_init+0x0/0xff0 [cxl_port] @ 1057
> [   34.264982] initcall cxl_port_init+0x0/0xff0 [cxl_port] returned 0 after 110 usecs
> [   34.268058] calling  cxl_mem_driver_init+0x0/0xff0 [cxl_mem] @ 1057
> [   34.268743] initcall cxl_mem_driver_init+0x0/0xff0 [cxl_mem] returned 0 after 110 usecs
> [   34.274670] calling  cxl_pmem_init+0x0/0xff0 [cxl_pmem] @ 1057
> [   34.277835] initcall cxl_pmem_init+0x0/0xff0 [cxl_pmem] returned 0 after 1671 usecs
> [   34.285807] calling  cxl_acpi_init+0x0/0xff0 [cxl_acpi] @ 1057
> [   34.287105] initcall cxl_acpi_init+0x0/0xff0 [cxl_acpi] returned 0 after 262 usecs
> [   34.292967] calling  cxl_test_init+0x0/0xff0 [cxl_test] @ 1057
> [   34.339841] initcall cxl_test_init+0x0/0xff0 [cxl_test] returned 0 after 45832 usecs
> [   34.342259] calling  cxl_mock_mem_driver_init+0x0/0xff0 [cxl_mock_mem] @ 1063
> [   34.343459] initcall cxl_mock_mem_driver_init+0x0/0xff0 [cxl_mock_mem] returned 0 after 356 usecs
> [   34.658602] calling  dax_hmem_init+0x0/0xff0 [dax_hmem] @ 1059
> [   34.670106] calling  cxl_pci_driver_init+0x0/0xff0 [cxl_pci] @ 1100
> [   34.671023] initcall cxl_pci_driver_init+0x0/0xff0 [cxl_pci] returned 0 after 197 usecs
> [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 after 2225 usecs

I agree with Jonathan's comments in Patch 6, using __WORK_INITIALIZER or 
initializing work in dax_hmem_init() and gating flush on pdev will fix 
the WARN — I will add both for v8. But I think the WARN is likely 
indicating an ordering issue here..

On initial boot, the Makefile ordering ensures dax_hmem_init() runs
before cxl_dax_region_init(), so both work items land on system_long_wq
in the right order and dax_hmem's deferred work is queued before 
dax_cxl's driver registration work.

On module reload which Alison is trying here I dont think, modules are 
loaded by Makefile order. I think dax_cxl's workqueue is calling 
dax_hmem_flush_work() before dax_hmem probe has had a chance to queue 
its work, so flush_work() flushes nothing and dax_cxl registers its 
driver without waiting.

__WORK_INITIALIZER fixes the WARN, but doesn't fix the race I guess if 
we are hitting that here..

[   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 
after 2225 usecs
[   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059

These two lines indicate cxl_dax started after dax_hmem_init() returns 
but I dont think that guarantees dax_hmem_platform_probe() has actually 
run..

I dont know if wait_for_device_probe() in cxl_dax_region_driver_register
might help..

Thanks
Smita

> [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
> [   34.676856] ------------[ cut here ]------------
> [   34.677533] WARNING: kernel/workqueue.c:4289 at __flush_work+0x4f9/0x550, CPU#3: kworker/3:2/136
> [   34.678596] Modules linked in: dax_cxl(+) cxl_pci dax_hmem cxl_mock_mem(O) cxl_test(O) cxl_acpi(O) cxl_pmem(O) cxl_mem(O) cxl_port(O) cxl_mock(O) cxl_core(O) fwctl nd_pmem nd_btt dax_pmem nfit nd_e820 libnvdimm
> [   34.680632] initcall cxl_dax_region_init+0x0/0xff0 [dax_cxl] returned 0 after 3842 usecs
> [   34.680918] CPU: 3 UID: 0 PID: 136 Comm: kworker/3:2 Tainted: G           O        7.0.0-rc4+ #156 PREEMPT(full)
> [   34.684368] Tainted: [O]=OOT_MODULE
> [   34.684993] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [   34.686098] Workqueue: events_long cxl_dax_region_driver_register [dax_cxl]
> [   34.687108] RIP: 0010:__flush_work+0x4f9/0x550
> 
> That addr is this line in flush_work()
>          if (WARN_ON(!work->func))
>                  return false;
> 
> 
> [   34.687811] Code: ff 49 8b 45 00 49 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 49 0f ba 6d 00 03 e9 a1 fc ff ff 0f 0b e9 e6 fe ff ff <0f> 0b e9 df fe ff ff e8 9b 48 15 01 85 c0 0f 84 26 ff ff ff 80 3d
> [   34.690107] RSP: 0018:ffffc900020b7cf8 EFLAGS: 00010246
> [   34.690673] RAX: 0000000000000000 RBX: ffffffffa0ea2088 RCX: ffff8880088b2b78
> [   34.691388] RDX: 00000000834fb194 RSI: 0000000000000000 RDI: ffffffffa0ea2088
> [   34.692135] RBP: ffffc900020b7de0 R08: 0000000031ab93b0 R09: 00000000effb42e8
> [   34.692876] R10: 000000008effb42e R11: 0000000000000000 R12: ffff88807d9bb340
> [   34.693588] R13: ffffffffa0ea2088 R14: ffffffffa0ed2020 R15: 0000000000000001
> [   34.694358] FS:  0000000000000000(0000) GS:ffff8880fa45f000(0000) knlGS:0000000000000000
> [   34.695179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   34.695775] CR2: 00007fe888b4e34c CR3: 00000000090ed004 CR4: 0000000000370ef0
> [   34.696494] Call Trace:
> [   34.696889]  <TASK>
> [   34.697238]  ? __lock_acquire+0xb08/0x2930
> [   34.697730]  ? __this_cpu_preempt_check+0x13/0x20
> [   34.698277]  flush_work+0x17/0x30
> [   34.698705]  dax_hmem_flush_work+0x10/0x20 [dax_hmem]
> [   34.699270]  cxl_dax_region_driver_register+0x9/0x30 [dax_cxl]
> [   34.699943]  process_one_work+0x203/0x6c0
> [   34.700452]  worker_thread+0x197/0x350
> [   34.700942]  ? __pfx_worker_thread+0x10/0x10
> [   34.701455]  kthread+0x108/0x140
> [   34.701915]  ? __pfx_kthread+0x10/0x10
> [   34.702396]  ret_from_fork+0x28a/0x310
> [   34.702880]  ? __pfx_kthread+0x10/0x10
> [   34.703363]  ret_from_fork_asm+0x1a/0x30
> [   34.703872]  </TASK>
> [   34.704227] irq event stamp: 11015
> [   34.704656] hardirqs last  enabled at (11025): [<ffffffff813486de>] __up_console_sem+0x5e/0x80
> [   34.705493] hardirqs last disabled at (11036): [<ffffffff813486c3>] __up_console_sem+0x43/0x80
> [   34.706354] softirqs last  enabled at (10500): [<ffffffff812ab9f3>] __irq_exit_rcu+0xc3/0x120
> [   34.707197] softirqs last disabled at (10495): [<ffffffff812ab9f3>] __irq_exit_rcu+0xc3/0x120
> [   34.708015] ---[ end trace 0000000000000000 ]---
> [   34.752127] calling  dax_init+0x0/0xff0 [device_dax] @ 1089
> [   34.754006] initcall dax_init+0x0/0xff0 [device_dax] returned 0 after 422 usecs
> [   34.759609] calling  dax_kmem_init+0x0/0xff0 [kmem] @ 1089
> [   37.338377] initcall dax_kmem_init+0x0/0xff0 [kmem] returned 0 after 2577658 usecs
> 
> 
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> ---
>>   drivers/dax/Makefile |  3 +--
>>   drivers/dax/cxl.c    | 27 ++++++++++++++++++++++++++-
>>   2 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
>> index 5ed5c39857c8..70e996bf1526 100644
>> --- a/drivers/dax/Makefile
>> +++ b/drivers/dax/Makefile
>> @@ -1,4 +1,5 @@
>>   # SPDX-License-Identifier: GPL-2.0
>> +obj-y += hmem/
>>   obj-$(CONFIG_DAX) += dax.o
>>   obj-$(CONFIG_DEV_DAX) += device_dax.o
>>   obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
>> @@ -10,5 +11,3 @@ dax-y += bus.o
>>   device_dax-y := device.o
>>   dax_pmem-y := pmem.o
>>   dax_cxl-y := cxl.o
>> -
>> -obj-y += hmem/
>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>> index 13cd94d32ff7..a2136adfa186 100644
>> --- a/drivers/dax/cxl.c
>> +++ b/drivers/dax/cxl.c
>> @@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
>>   	.id = CXL_DEVICE_DAX_REGION,
>>   	.drv = {
>>   		.suppress_bind_attrs = true,
>> +		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
>>   	},
>>   };
>>   
>> -module_cxl_driver(cxl_dax_region_driver);
>> +static void cxl_dax_region_driver_register(struct work_struct *work)
>> +{
>> +	cxl_driver_register(&cxl_dax_region_driver);
>> +}
>> +
>> +static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
>> +
>> +static int __init cxl_dax_region_init(void)
>> +{
>> +	/*
>> +	 * Need to resolve a race with dax_hmem wanting to drive regions
>> +	 * instead of CXL
>> +	 */
>> +	queue_work(system_long_wq, &cxl_dax_region_driver_work);
>> +	return 0;
>> +}
>> +module_init(cxl_dax_region_init);
>> +
>> +static void __exit cxl_dax_region_exit(void)
>> +{
>> +	flush_work(&cxl_dax_region_driver_work);
>> +	cxl_driver_unregister(&cxl_dax_region_driver);
>> +}
>> +module_exit(cxl_dax_region_exit);
>> +
>>   MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
>>   MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
>>   MODULE_LICENSE("GPL");
>> -- 
>> 2.17.1
>>
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19 15:46     ` Koralahalli Channabasappa, Smita
@ 2026-03-19 16:45       ` Koralahalli Channabasappa, Smita
  2026-03-19 23:07         ` Dan Williams
  0 siblings, 1 reply; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-19 16:45 UTC (permalink / raw)
  To: Alison Schofield, Smita Koralahalli, Jonathan Cameron
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
	Yazen Ghannam, Dave Jiang, Davidlohr Bueso, Matthew Wilcox,
	Jan Kara, Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
	Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski

On 3/19/2026 8:46 AM, Koralahalli Channabasappa, Smita wrote:
> Hi Jonathan and Alison,
> 
> Thanks for the report and suggestions. I took a look at Jonathan's 
> comments in Patch 6 and tying it together here.
> 
> On 3/18/2026 10:48 PM, Alison Schofield wrote:
>> On Thu, Mar 19, 2026 at 01:14:56AM +0000, Smita Koralahalli wrote:
>>> From: Dan Williams <dan.j.williams@intel.com>
>>>
>>> Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
>>> dax_cxl.
>>>
>>> In addition, defer registration of the dax_cxl driver to a workqueue
>>> instead of using module_cxl_driver(). This ensures that dax_hmem has
>>> an opportunity to initialize and register its deferred callback and make
>>> ownership decisions before dax_cxl begins probing and claiming Soft
>>> Reserved ranges.
>>>
>>> Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
>>> out of line from other synchronous probing avoiding ordering
>>> dependencies while coordinating ownership decisions with dax_hmem.
>>
>> Hi Smita,
>>
>> Replying to this patch, as it's my best guess as to why I may be
>> seeing this WARN when I modprobe cxl-test.
>>
>> We are able to pass all the CXL unit tests because it is only that
>> first load that causes the WARN. All subsequent reloads of cxl-test
>> do not unload dax_cxl and dax_hmem so they chug happily along.
>>
>> I can reproduce by unloading each piece before reloading cxl-test
>> # modprobe -r cxl-test
>> # modprobe -r dax_cxl
>> # modprobe -r dax_hmem
>> # modprobe cxl-test
>> and the WARN repeats.
>>
>> Guessing you may recognize what is going on. Let me know if I can
>> try anything else out.
>>
>>
>> # dmesg (trimmed to just the init calls)
>> [   34.229033] calling  fwctl_init+0x0/0xff0 [fwctl] @ 1057
>> [   34.230616] initcall fwctl_init+0x0/0xff0 [fwctl] returned 0 after 
>> 186 usecs
>> [   34.257096] calling  cxl_core_init+0x0/0x100 [cxl_core] @ 1057
>> [   34.258395] initcall cxl_core_init+0x0/0x100 [cxl_core] returned 0 
>> after 538 usecs
>> [   34.264170] calling  cxl_port_init+0x0/0xff0 [cxl_port] @ 1057
>> [   34.264982] initcall cxl_port_init+0x0/0xff0 [cxl_port] returned 0 
>> after 110 usecs
>> [   34.268058] calling  cxl_mem_driver_init+0x0/0xff0 [cxl_mem] @ 1057
>> [   34.268743] initcall cxl_mem_driver_init+0x0/0xff0 [cxl_mem] 
>> returned 0 after 110 usecs
>> [   34.274670] calling  cxl_pmem_init+0x0/0xff0 [cxl_pmem] @ 1057
>> [   34.277835] initcall cxl_pmem_init+0x0/0xff0 [cxl_pmem] returned 0 
>> after 1671 usecs
>> [   34.285807] calling  cxl_acpi_init+0x0/0xff0 [cxl_acpi] @ 1057
>> [   34.287105] initcall cxl_acpi_init+0x0/0xff0 [cxl_acpi] returned 0 
>> after 262 usecs
>> [   34.292967] calling  cxl_test_init+0x0/0xff0 [cxl_test] @ 1057
>> [   34.339841] initcall cxl_test_init+0x0/0xff0 [cxl_test] returned 0 
>> after 45832 usecs
>> [   34.342259] calling  cxl_mock_mem_driver_init+0x0/0xff0 
>> [cxl_mock_mem] @ 1063
>> [   34.343459] initcall cxl_mock_mem_driver_init+0x0/0xff0 
>> [cxl_mock_mem] returned 0 after 356 usecs
>> [   34.658602] calling  dax_hmem_init+0x0/0xff0 [dax_hmem] @ 1059
>> [   34.670106] calling  cxl_pci_driver_init+0x0/0xff0 [cxl_pci] @ 1100
>> [   34.671023] initcall cxl_pci_driver_init+0x0/0xff0 [cxl_pci] 
>> returned 0 after 197 usecs
>> [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 
>> after 2225 usecs
> 
> I agree with Jonathan's comments in Patch 6, using __WORK_INITIALIZER or 
> initializing work in dax_hmem_init() and gating flush on pdev will fix 
> the WARN — I will add both for v8. But I think the WARN is likely 
> indicating an ordering issue here..
> 
> On initial boot, the Makefile ordering ensures dax_hmem_init() runs
> before cxl_dax_region_init(), so both work items land on system_long_wq
> in the right order and dax_hmem's deferred work is queued before 
> dax_cxl's driver registration work.
> 
> On module reload which Alison is trying here I dont think, modules are 
> loaded by Makefile order. I think dax_cxl's workqueue is calling 
> dax_hmem_flush_work() before dax_hmem probe has had a chance to queue 
> its work, so flush_work() flushes nothing and dax_cxl registers its 
> driver without waiting.
> 
> __WORK_INITIALIZER fixes the WARN, but doesn't fix the race I guess if 
> we are hitting that here..
> 
> [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 
> after 2225 usecs
> [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
> 
> These two lines indicate cxl_dax started after dax_hmem_init() returns 
> but I dont think that guarantees dax_hmem_platform_probe() has actually 
> run..
> 
> I dont know if wait_for_device_probe() in cxl_dax_region_driver_register
> might help..
> 
> Thanks
> Smita

Actually, thinking about this more..

dax_hmem_initial_probe lives in device.c (built-in) so it survives 
module reload. On reload it's still true from the first boot. This means 
hmem_register_device() skips the deferral path entirely..

The problem is this bypasses the cxl_region_contains_resource() check 
that the deferred work normally does. On first boot, 
process_defer_work() walks each range and decides per-range: if CXL 
covers it, skip. If not, register with HMEM. On reload, that check never 
happens — whoever registers first via alloc_dax_region() wins, 
regardless of whether CXL actually covers the range.

So if dax_cxl registers first on reload, it could claim a range that CXL 
doesn't actually cover, and dax_hmem would lose a range it should own..

I dont know if Im thinking through this right..

Thanks
Smita

> 
>> [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
>> [   34.676856] ------------[ cut here ]------------
>> [   34.677533] WARNING: kernel/workqueue.c:4289 at 
>> __flush_work+0x4f9/0x550, CPU#3: kworker/3:2/136
>> [   34.678596] Modules linked in: dax_cxl(+) cxl_pci dax_hmem 
>> cxl_mock_mem(O) cxl_test(O) cxl_acpi(O) cxl_pmem(O) cxl_mem(O) 
>> cxl_port(O) cxl_mock(O) cxl_core(O) fwctl nd_pmem nd_btt dax_pmem nfit 
>> nd_e820 libnvdimm
>> [   34.680632] initcall cxl_dax_region_init+0x0/0xff0 [dax_cxl] 
>> returned 0 after 3842 usecs
>> [   34.680918] CPU: 3 UID: 0 PID: 136 Comm: kworker/3:2 Tainted: 
>> G           O        7.0.0-rc4+ #156 PREEMPT(full)
>> [   34.684368] Tainted: [O]=OOT_MODULE
>> [   34.684993] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
>> BIOS 0.0.0 02/06/2015
>> [   34.686098] Workqueue: events_long cxl_dax_region_driver_register 
>> [dax_cxl]
>> [   34.687108] RIP: 0010:__flush_work+0x4f9/0x550
>>
>> That addr is this line in flush_work()
>>          if (WARN_ON(!work->func))
>>                  return false;
>>
>>
>> [   34.687811] Code: ff 49 8b 45 00 49 8b 55 08 89 c7 48 c1 e8 04 83 
>> e7 08 83 e0 0f 83 cf 02 49 0f ba 6d 00 03 e9 a1 fc ff ff 0f 0b e9 e6 
>> fe ff ff <0f> 0b e9 df fe ff ff e8 9b 48 15 01 85 c0 0f 84 26 ff ff ff 
>> 80 3d
>> [   34.690107] RSP: 0018:ffffc900020b7cf8 EFLAGS: 00010246
>> [   34.690673] RAX: 0000000000000000 RBX: ffffffffa0ea2088 RCX: 
>> ffff8880088b2b78
>> [   34.691388] RDX: 00000000834fb194 RSI: 0000000000000000 RDI: 
>> ffffffffa0ea2088
>> [   34.692135] RBP: ffffc900020b7de0 R08: 0000000031ab93b0 R09: 
>> 00000000effb42e8
>> [   34.692876] R10: 000000008effb42e R11: 0000000000000000 R12: 
>> ffff88807d9bb340
>> [   34.693588] R13: ffffffffa0ea2088 R14: ffffffffa0ed2020 R15: 
>> 0000000000000001
>> [   34.694358] FS:  0000000000000000(0000) GS:ffff8880fa45f000(0000) 
>> knlGS:0000000000000000
>> [   34.695179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   34.695775] CR2: 00007fe888b4e34c CR3: 00000000090ed004 CR4: 
>> 0000000000370ef0
>> [   34.696494] Call Trace:
>> [   34.696889]  <TASK>
>> [   34.697238]  ? __lock_acquire+0xb08/0x2930
>> [   34.697730]  ? __this_cpu_preempt_check+0x13/0x20
>> [   34.698277]  flush_work+0x17/0x30
>> [   34.698705]  dax_hmem_flush_work+0x10/0x20 [dax_hmem]
>> [   34.699270]  cxl_dax_region_driver_register+0x9/0x30 [dax_cxl]
>> [   34.699943]  process_one_work+0x203/0x6c0
>> [   34.700452]  worker_thread+0x197/0x350
>> [   34.700942]  ? __pfx_worker_thread+0x10/0x10
>> [   34.701455]  kthread+0x108/0x140
>> [   34.701915]  ? __pfx_kthread+0x10/0x10
>> [   34.702396]  ret_from_fork+0x28a/0x310
>> [   34.702880]  ? __pfx_kthread+0x10/0x10
>> [   34.703363]  ret_from_fork_asm+0x1a/0x30
>> [   34.703872]  </TASK>
>> [   34.704227] irq event stamp: 11015
>> [   34.704656] hardirqs last  enabled at (11025): [<ffffffff813486de>] 
>> __up_console_sem+0x5e/0x80
>> [   34.705493] hardirqs last disabled at (11036): [<ffffffff813486c3>] 
>> __up_console_sem+0x43/0x80
>> [   34.706354] softirqs last  enabled at (10500): [<ffffffff812ab9f3>] 
>> __irq_exit_rcu+0xc3/0x120
>> [   34.707197] softirqs last disabled at (10495): [<ffffffff812ab9f3>] 
>> __irq_exit_rcu+0xc3/0x120
>> [   34.708015] ---[ end trace 0000000000000000 ]---
>> [   34.752127] calling  dax_init+0x0/0xff0 [device_dax] @ 1089
>> [   34.754006] initcall dax_init+0x0/0xff0 [device_dax] returned 0 
>> after 422 usecs
>> [   34.759609] calling  dax_kmem_init+0x0/0xff0 [kmem] @ 1089
>> [   37.338377] initcall dax_kmem_init+0x0/0xff0 [kmem] returned 0 
>> after 2577658 usecs
>>
>>
>>>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> Signed-off-by: Smita Koralahalli 
>>> <Smita.KoralahalliChannabasappa@amd.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> ---
>>>   drivers/dax/Makefile |  3 +--
>>>   drivers/dax/cxl.c    | 27 ++++++++++++++++++++++++++-
>>>   2 files changed, 27 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
>>> index 5ed5c39857c8..70e996bf1526 100644
>>> --- a/drivers/dax/Makefile
>>> +++ b/drivers/dax/Makefile
>>> @@ -1,4 +1,5 @@
>>>   # SPDX-License-Identifier: GPL-2.0
>>> +obj-y += hmem/
>>>   obj-$(CONFIG_DAX) += dax.o
>>>   obj-$(CONFIG_DEV_DAX) += device_dax.o
>>>   obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
>>> @@ -10,5 +11,3 @@ dax-y += bus.o
>>>   device_dax-y := device.o
>>>   dax_pmem-y := pmem.o
>>>   dax_cxl-y := cxl.o
>>> -
>>> -obj-y += hmem/
>>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>>> index 13cd94d32ff7..a2136adfa186 100644
>>> --- a/drivers/dax/cxl.c
>>> +++ b/drivers/dax/cxl.c
>>> @@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
>>>       .id = CXL_DEVICE_DAX_REGION,
>>>       .drv = {
>>>           .suppress_bind_attrs = true,
>>> +        .probe_type = PROBE_PREFER_ASYNCHRONOUS,
>>>       },
>>>   };
>>> -module_cxl_driver(cxl_dax_region_driver);
>>> +static void cxl_dax_region_driver_register(struct work_struct *work)
>>> +{
>>> +    cxl_driver_register(&cxl_dax_region_driver);
>>> +}
>>> +
>>> +static DECLARE_WORK(cxl_dax_region_driver_work, 
>>> cxl_dax_region_driver_register);
>>> +
>>> +static int __init cxl_dax_region_init(void)
>>> +{
>>> +    /*
>>> +     * Need to resolve a race with dax_hmem wanting to drive regions
>>> +     * instead of CXL
>>> +     */
>>> +    queue_work(system_long_wq, &cxl_dax_region_driver_work);
>>> +    return 0;
>>> +}
>>> +module_init(cxl_dax_region_init);
>>> +
>>> +static void __exit cxl_dax_region_exit(void)
>>> +{
>>> +    flush_work(&cxl_dax_region_driver_work);
>>> +    cxl_driver_unregister(&cxl_dax_region_driver);
>>> +}
>>> +module_exit(cxl_dax_region_exit);
>>> +
>>>   MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
>>>   MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
>>>   MODULE_LICENSE("GPL");
>>> -- 
>>> 2.17.1
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-19 14:29   ` Jonathan Cameron
@ 2026-03-19 20:03     ` Alison Schofield
  2026-03-20 17:17     ` Koralahalli Channabasappa, Smita
  1 sibling, 0 replies; 22+ messages in thread
From: Alison Schofield @ 2026-03-19 20:03 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
	Yazen Ghannam, Dave Jiang, Davidlohr Bueso, Matthew Wilcox,
	Jan Kara, Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
	Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski

On Thu, Mar 19, 2026 at 02:29:10PM +0000, Jonathan Cameron wrote:
> On Thu, 19 Mar 2026 01:14:59 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> 
> > The current probe time ownership check for Soft Reserved memory based
> > solely on CXL window intersection is insufficient. dax_hmem probing is not
> > always guaranteed to run after CXL enumeration and region assembly, which
> > can lead to incorrect ownership decisions before the CXL stack has
> > finished publishing windows and assembling committed regions.
> > 
> > Introduce deferred ownership handling for Soft Reserved ranges that
> > intersect CXL windows. When such a range is encountered during the
> > initial dax_hmem probe, schedule deferred work to wait for the CXL stack
> > to complete enumeration and region assembly before deciding ownership.
> > 
> > Once the deferred work runs, evaluate each Soft Reserved range
> > individually: if a CXL region fully contains the range, skip it and let
> > dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
> > ownership model avoids the need for CXL region teardown and
> > alloc_dax_region() resource exclusion prevents double claiming.
> > 
> > Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
> > so it survives module reload. Ensure dax_cxl defers driver registration
> > until dax_hmem has completed ownership resolution. dax_cxl calls
> > dax_hmem_flush_work() before cxl_driver_register(), which both waits for
> > the deferred work to complete and creates a module symbol dependency that
> > forces dax_hmem.ko to load before dax_cxl.
> > 
> > Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Hi Smita,
> 
> I think this is very likely to be what is causing the bug Alison
> saw in cxl_test.
> 
> It looks to be possible to flush work before the work structure has
> been configured.  Even though it's not on a work queue and there is
> nothing to do, there are early sanity checks that fail giving the warning
> Alison reported.
> 
> A couple of ways to fix that inline.  I'd be tempted to both initialize
> the function statically and gate against flushing if the whole thing isn't
> set up yet.
> 
> Jonathan

snip

> 
> > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > index 1e3424358490..8c574123bd3b 100644
> > --- a/drivers/dax/hmem/hmem.c
> > +++ b/drivers/dax/hmem/hmem.c
> > @@ -3,6 +3,7 @@
> >  #include <linux/memregion.h>
> >  #include <linux/module.h>
> >  #include <linux/dax.h>
> > +#include <cxl/cxl.h>
> >  #include "../bus.h"
> >  
> >  static bool region_idle;
> > @@ -58,6 +59,19 @@ static void release_hmem(void *pdev)
> >  	platform_device_unregister(pdev);
> >  }
> >  
> > +struct dax_defer_work {
> > +	struct platform_device *pdev;
> > +	struct work_struct work;
> > +};
> > +
> > +static struct dax_defer_work dax_hmem_work;
> 
> static struct dax_defer_work dax_hmem_work = {
> 	.work = __WORK_INITIALIZER(&dax_hmem_work.work,
> 				   process_defer_work),
> };
> or something similar.
> 

Just confirming this stopped the WARN:

-static struct dax_defer_work dax_hmem_work;
+static void process_defer_work(struct work_struct *work);
+
+static struct dax_defer_work dax_hmem_work = {
+        .work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
+};


> 
> > +
> > +void dax_hmem_flush_work(void)
> > +{
> > +	flush_work(&dax_hmem_work.work);
> > +}
> > +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> > +
> >  static int hmem_register_device(struct device *host, int target_nid,
> >  				const struct resource *res)
> >  {
> > @@ -69,8 +83,11 @@ static int hmem_register_device(struct device *host, int target_nid,
> >  	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> >  	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> >  			      IORES_DESC_CXL) != REGION_DISJOINT) {
> > -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> > -		return 0;
> > +		if (!dax_hmem_initial_probe) {
> > +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> > +			queue_work(system_long_wq, &dax_hmem_work.work);
> > +			return 0;
> > +		}
> >  	}
> >  
> >  	rc = region_intersects_soft_reserve(res->start, resource_size(res));
> > @@ -123,8 +140,48 @@ static int hmem_register_device(struct device *host, int target_nid,
> >  	return rc;
> >  }
> >  
> > +static int hmem_register_cxl_device(struct device *host, int target_nid,
> > +				    const struct resource *res)
> > +{
> > +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> > +			      IORES_DESC_CXL) == REGION_DISJOINT)
> > +		return 0;
> > +
> > +	if (cxl_region_contains_resource((struct resource *)res)) {
> > +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
> > +		return 0;
> > +	}
> > +
> > +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
> > +	return hmem_register_device(host, target_nid, res);
> > +}
> > +
> > +static void process_defer_work(struct work_struct *w)
> > +{
> > +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
> > +	struct platform_device *pdev = work->pdev;
> If you do the suggested __INITIALIZE_WORK() then I'd add
> a paranoid
> 
> 	if (!work->pdev)
> 		return;
> We don't actually queue the work before pdev is set, but that might
> be obvious once we spilt up assigning the function and the data
> it uses.
> 
> > +
> > +	wait_for_device_probe();
> > +
> > +	guard(device)(&pdev->dev);
> > +	if (!pdev->dev.driver)
> > +		return;
> > +
> > +	dax_hmem_initial_probe = true;
> > +	walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> > +}
> > +
> >  static int dax_hmem_platform_probe(struct platform_device *pdev)
> >  {
> > +	if (work_pending(&dax_hmem_work.work))
> > +		return -EBUSY;
> > +
> > +	if (!dax_hmem_work.pdev) {
> > +		get_device(&pdev->dev);
> > +		dax_hmem_work.pdev = pdev;
> 
> Using the pdev rather than dev breaks the pattern of doing a get_device()
> and assigning in one line. This is a bit ugly.
> 
> 		dax_hmem_work.pdev = to_pci_dev(get_device(&pdev->dev));
> 
> but perhaps makes the association tighter than current code.
> 
> > +		INIT_WORK(&dax_hmem_work.work, process_defer_work);
> 
> See above. I think assigning the work function should be static
> which should resolve the issue Alison was seeing as then it should
> be fine to call flush_work() on the item that isn't on a work queue
> yet but is initialized.
> 
> > +	}
> > +
> >  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
> >  }
> >  
> > @@ -162,6 +219,11 @@ static __init int dax_hmem_init(void)
> >  
> >  static __exit void dax_hmem_exit(void)
> >  {
> > +	flush_work(&dax_hmem_work.work);
> 
> I think this needs to be under the if (dax_hmem_work.pdev) 
> Not sure there is any guarantee dax_hmem_platform_probe() has run
> before we get here otherwise.  Alternative is to assign
> the work function statically.
> 
> 
> 
> > +
> > +	if (dax_hmem_work.pdev)
> > +		put_device(&dax_hmem_work.pdev->dev);
> > +
> >  	platform_driver_unregister(&dax_hmem_driver);
> >  	platform_driver_unregister(&dax_hmem_platform_driver);
> >  }
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19 16:45       ` Koralahalli Channabasappa, Smita
@ 2026-03-19 23:07         ` Dan Williams
  2026-03-20 17:29           ` Koralahalli Channabasappa, Smita
  2026-03-20 20:42           ` Koralahalli Channabasappa, Smita
  0 siblings, 2 replies; 22+ messages in thread
From: Dan Williams @ 2026-03-19 23:07 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita, Alison Schofield,
	Smita Koralahalli, Jonathan Cameron
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
	Yazen Ghannam, Dave Jiang, Davidlohr Bueso, Matthew Wilcox,
	Jan Kara, Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
	Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski

Koralahalli Channabasappa, Smita wrote:
[..]
> > I agree with Jonathan's comments in Patch 6, using __WORK_INITIALIZER or 
> > initializing work in dax_hmem_init() and gating flush on pdev will fix 
> > the WARN — I will add both for v8. But I think the WARN is likely 
> > indicating an ordering issue here..

Yes, Jonathan is right, static initialization is also my expecation.

> > On initial boot, the Makefile ordering ensures dax_hmem_init() runs
> > before cxl_dax_region_init(), so both work items land on system_long_wq
> > in the right order and dax_hmem's deferred work is queued before 
> > dax_cxl's driver registration work.

There is nothing that guarantees that 2 work items in system_long_wq run
in submission order. Unlikely that matters given the explicit flushing.

> > On module reload which Alison is trying here I dont think, modules are 
> > loaded by Makefile order. I think dax_cxl's workqueue is calling 
> > dax_hmem_flush_work() before dax_hmem probe has had a chance to queue 
> > its work, so flush_work() flushes nothing and dax_cxl registers its 
> > driver without waiting.

Module load order does not matter after initial probe completion.

...and dax_hmem is guaranteed to always load before dax_cxl due to the
symbol dependency of dax_hmem_flush_work().

> > __WORK_INITIALIZER fixes the WARN, but doesn't fix the race I guess if 
> > we are hitting that here..
> > 
> > [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0 
> > after 2225 usecs
> > [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
> > 
> > These two lines indicate cxl_dax started after dax_hmem_init() returns 
> > but I dont think that guarantees dax_hmem_platform_probe() has actually 
> > run..
> > 
> > I dont know if wait_for_device_probe() in cxl_dax_region_driver_register
> > might help..
> > 
> > Thanks
> > Smita
> 
> Actually, thinking about this more..
> 
> dax_hmem_initial_probe lives in device.c (built-in) so it survives 
> module reload. On reload it's still true from the first boot. This means 
> hmem_register_device() skips the deferral path entirely..

Yes, that is the expectation.

> The problem is this bypasses the cxl_region_contains_resource() check 
> that the deferred work normally does. On first boot, 
> process_defer_work() walks each range and decides per-range: if CXL 
> covers it, skip. If not, register with HMEM. On reload, that check never 
> happens — whoever registers first via alloc_dax_region() wins, 
> regardless of whether CXL actually covers the range.

Yes, I think you have hit on a real issue. There is no point in having
dax_hmem auto-attach on driver reload. If userspace unloads the driver
it gets to keep the pieces. So that means something like this:

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 15e462589b92..7478bc78a698 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -112,10 +112,12 @@ static int hmem_register_device(struct device *host, int target_nid,
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
 		if (!dax_hmem_initial_probe) {
-			dev_dbg(host, "deferring range to CXL: %pr\n", res);
+			dev_dbg(host, "await CXL initial probe: %pr\n", res);
 			queue_work(system_long_wq, &dax_hmem_work.work);
 			return 0;
 		}
+		dev_dbg(host, "deferring range to CXL: %pr\n", res);
+		return 0;
 	}
 
 	rc = region_intersects_soft_reserve(res->start, resource_size(res));

---

...because if userspace wants to reload the dax_hmem driver, then it
needs to pick what happens with the CXL intersection. Userspace can
always unload cxl_acpi to force everything back to dax_hmem.

Now, you might say, "but this means that if the initial probe results in
a partial result of some regions in dax_hmem and others in dax_cxl, that
state can not be recovered outside of a reboot". I think that is ok.
This mechanism is automatic best-effort workaround for bugs / missing
capabilities in the CXL driver. Module reload fidelity is out of scope.

> So if dax_cxl registers first on reload, it could claim a range that CXL 
> doesn't actually cover, and dax_hmem would lose a range it should own..

With the above change, dax_cxl always wins in the "reload" scenario iff
cxl_acpi is loaded. Otherwise dax_hmem owns all the Soft Reserved.

> I dont know if Im thinking through this right..

You definitely identified the need for that fixup above.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree
  2026-03-19 13:59   ` Jonathan Cameron
@ 2026-03-20 16:58     ` Koralahalli Channabasappa, Smita
  0 siblings, 0 replies; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-20 16:58 UTC (permalink / raw)
  To: Jonathan Cameron, Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On 3/19/2026 6:59 AM, Jonathan Cameron wrote:
> On Thu, 19 Mar 2026 01:14:57 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> 
>> Introduce a global "DAX Regions" resource root and register each
>> dax_region->res under it via request_resource(). Release the resource on
>> dax_region teardown.
>>
>> By enforcing a single global namespace for dax_region allocations, this
>> ensures only one of dax_hmem or dax_cxl can successfully register a
>> dax_region for a given range.
>>
>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> 
> The comment below is about the existing code.  If we decide not to tidy that
> up for now and you swap the ordering of release_resource() and sysfs_remove_groups()
> in unregister.

Okay I think I can do both.

> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> 
>> ---
>>   drivers/dax/bus.c | 20 +++++++++++++++++---
>>   1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> index c94c09622516..448e2bc285c3 100644
>> --- a/drivers/dax/bus.c
>> +++ b/drivers/dax/bus.c
>> @@ -10,6 +10,7 @@
>>   #include "dax-private.h"
>>   #include "bus.h"
>>   
>> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
>>   static DEFINE_MUTEX(dax_bus_lock);
>>   
>>   /*
>> @@ -625,6 +626,7 @@ static void dax_region_unregister(void *region)
>>   {
>>   	struct dax_region *dax_region = region;
>>   
>> +	release_resource(&dax_region->res);
> 
> Should reverse the line above and the line below so we unwind in reverse of
> setup.  I doubt it matters in practice today but keeping ordering like that
> makes it much easier to see if a future patch messes things up.

Okay.

> 
>>   	sysfs_remove_groups(&dax_region->dev->kobj,
>>   			dax_region_attribute_groups);
>>   	dax_region_put(dax_region);
>> @@ -635,6 +637,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>>   		unsigned long flags)
>>   {
>>   	struct dax_region *dax_region;
>> +	int rc;
>>   
>>   	/*
>>   	 * The DAX core assumes that it can store its private data in
>> @@ -667,14 +670,25 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>>   		.flags = IORESOURCE_MEM | flags,
>>   	};
>>   
>> -	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
>> -		kfree(dax_region);
>> -		return NULL;
>> +	rc = request_resource(&dax_regions, &dax_region->res);
>> +	if (rc) {
>> +		dev_dbg(parent, "dax_region resource conflict for %pR\n",
>> +			&dax_region->res);
>> +		goto err_res;
>>   	}
>>   
>> +	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
>> +		goto err_sysfs;
>> +
>>   	if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
> 
> This is curious. The code flips over to a kref_put() based release but we didn't
> do anything with the kref in the previous call. So whilst not 'buggy' as such
> it's definitely inconsistent and we should clean it up.
> 
> This should really have been doing the release via dax_region_put() from the
> kref_init().  In practice that means never calling kfree(dax_regions) error paths
> because the kref_init() is just after the allocation. Instead call dax_region_put()
> in all those error paths.
> 
>   
> 
>>   		return NULL;
>>   	return dax_region;
>> +
>> +err_sysfs:
>> +	release_resource(&dax_region->res);
>> +err_res:
>> +	kfree(dax_region);
> 
>  From above I think this should be
> 	dax_region_put(dax_region);

Thank you for pointing this out. I will have a separate patch for this 
change first in the series.

Thanks
Smita

> 
>> +	return NULL;
>>   }
>>   EXPORT_SYMBOL_GPL(alloc_dax_region);
>>   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
  2026-03-19 14:35   ` Jonathan Cameron
@ 2026-03-20 17:00     ` Koralahalli Channabasappa, Smita
  0 siblings, 0 replies; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-20 17:00 UTC (permalink / raw)
  To: Jonathan Cameron, Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On 3/19/2026 7:35 AM, Jonathan Cameron wrote:
> On Thu, 19 Mar 2026 01:15:00 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> 
>> Reworked from a patch by Alison Schofield <alison.schofield@intel.com>
>>
>> Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
>> to consume.
>>
>> This restores visibility in /proc/iomem for ranges actively in use, while
>> avoiding the early-boot conflicts that occurred when Soft Reserved was
>> published into iomem before CXL window and region discovery.
>>
>> Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
>> Co-developed-by: Alison Schofield <alison.schofield@intel.com>
>> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
>> Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
>> Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> One minor update needed as kmalloc_obj() has shown up in meantime.
> 
> Thanks
> 
> Jonathan
>> ---
>>   drivers/dax/hmem/hmem.c | 32 +++++++++++++++++++++++++++++++-
>>   1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index 8c574123bd3b..15e462589b92 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -72,6 +72,34 @@ void dax_hmem_flush_work(void)
>>   }
>>   EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
>>   
>> +static void remove_soft_reserved(void *r)
>> +{
>> +	remove_resource(r);
>> +	kfree(r);
>> +}
>> +
>> +static int add_soft_reserve_into_iomem(struct device *host,
>> +				       const struct resource *res)
>> +{
>> +	int rc;
>> +
>> +	struct resource *soft __free(kfree) =
>> +		kmalloc(sizeof(*res), GFP_KERNEL);
> 
> Update to
> 
> 	struct resource *soft __free(kfree) = kmalloc_obj(*soft);
> 
> Got added in 7.0 with lots of call sites updated via scripting.
> 
> Not sure why this had sizeof(*res) rather than sizeof(*soft).
> Same type but should have been soft!  If nothing else that would
> probably have broken the scripts looking for where we should
> be using kmalloc_obj().

Okay I will update it. sizeof(*res) was a typo from my end. Sorry.
Will change to kmalloc_obj().

Thanks
Smita
> 
> 
> 	
>> +	if (!soft)
>> +		return -ENOMEM;
>> +
>> +	*soft = DEFINE_RES_NAMED_DESC(res->start, (res->end - res->start + 1),
>> +				      "Soft Reserved", IORESOURCE_MEM,
>> +				      IORES_DESC_SOFT_RESERVED);
>> +
>> +	rc = insert_resource(&iomem_resource, soft);
>> +	if (rc)
>> +		return rc;
>> +
>> +	return devm_add_action_or_reset(host, remove_soft_reserved,
>> +					no_free_ptr(soft));
>> +}
>> +
>>   static int hmem_register_device(struct device *host, int target_nid,
>>   				const struct resource *res)
>>   {
>> @@ -94,7 +122,9 @@ static int hmem_register_device(struct device *host, int target_nid,
>>   	if (rc != REGION_INTERSECTS)
>>   		return 0;
>>   
>> -	/* TODO: Add Soft-Reserved memory back to iomem */
>> +	rc = add_soft_reserve_into_iomem(host, res);
>> +	if (rc)
>> +		return rc;
>>   
>>   	id = memregion_alloc(GFP_KERNEL);
>>   	if (id < 0) {
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-19 14:29   ` Jonathan Cameron
  2026-03-19 20:03     ` Alison Schofield
@ 2026-03-20 17:17     ` Koralahalli Channabasappa, Smita
  1 sibling, 0 replies; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-20 17:17 UTC (permalink / raw)
  To: Jonathan Cameron, Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

Hi Jonathan,

Thanks for all the comments. I will fix all of them in v8.

Thanks
Smita

On 3/19/2026 7:29 AM, Jonathan Cameron wrote:
> On Thu, 19 Mar 2026 01:14:59 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> 
>> The current probe time ownership check for Soft Reserved memory based
>> solely on CXL window intersection is insufficient. dax_hmem probing is not
>> always guaranteed to run after CXL enumeration and region assembly, which
>> can lead to incorrect ownership decisions before the CXL stack has
>> finished publishing windows and assembling committed regions.
>>
>> Introduce deferred ownership handling for Soft Reserved ranges that
>> intersect CXL windows. When such a range is encountered during the
>> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
>> to complete enumeration and region assembly before deciding ownership.
>>
>> Once the deferred work runs, evaluate each Soft Reserved range
>> individually: if a CXL region fully contains the range, skip it and let
>> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
>> ownership model avoids the need for CXL region teardown and
>> alloc_dax_region() resource exclusion prevents double claiming.
>>
>> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
>> so it survives module reload. Ensure dax_cxl defers driver registration
>> until dax_hmem has completed ownership resolution. dax_cxl calls
>> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
>> the deferred work to complete and creates a module symbol dependency that
>> forces dax_hmem.ko to load before dax_cxl.
>>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Hi Smita,
> 
> I think this is very likely to be what is causing the bug Alison
> saw in cxl_test.
> 
> It looks to be possible to flush work before the work structure has
> been configured.  Even though it's not on a work queue and there is
> nothing to do, there are early sanity checks that fail giving the warning
> Alison reported.
> 
> A couple of ways to fix that inline.  I'd be tempted to both initialize
> the function statically and gate against flushing if the whole thing isn't
> set up yet.
> 
> Jonathan
> 
>> ---
>>   drivers/dax/bus.h         |  7 +++++
>>   drivers/dax/cxl.c         |  1 +
>>   drivers/dax/hmem/device.c |  3 ++
>>   drivers/dax/hmem/hmem.c   | 66 +++++++++++++++++++++++++++++++++++++--
>>   4 files changed, 75 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>> index cbbf64443098..ebbfe2d6da14 100644
>> --- a/drivers/dax/bus.h
>> +++ b/drivers/dax/bus.h
>> @@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
>>   void kill_dev_dax(struct dev_dax *dev_dax);
>>   bool static_dev_dax(struct dev_dax *dev_dax);
>>   
>> +#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
>> +extern bool dax_hmem_initial_probe;
>> +void dax_hmem_flush_work(void);
>> +#else
>> +static inline void dax_hmem_flush_work(void) { }
>> +#endif
>> +
>>   #define MODULE_ALIAS_DAX_DEVICE(type) \
>>   	MODULE_ALIAS("dax:t" __stringify(type) "*")
>>   #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>> index a2136adfa186..3ab39b77843d 100644
>> --- a/drivers/dax/cxl.c
>> +++ b/drivers/dax/cxl.c
>> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>>   
>>   static void cxl_dax_region_driver_register(struct work_struct *work)
>>   {
>> +	dax_hmem_flush_work();
>>   	cxl_driver_register(&cxl_dax_region_driver);
>>   }
>>   
>> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
>> index 56e3cbd181b5..991a4bf7d969 100644
>> --- a/drivers/dax/hmem/device.c
>> +++ b/drivers/dax/hmem/device.c
>> @@ -8,6 +8,9 @@
>>   static bool nohmem;
>>   module_param_named(disable, nohmem, bool, 0444);
>>   
>> +bool dax_hmem_initial_probe;
>> +EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
>> +
>>   static bool platform_initialized;
>>   static DEFINE_MUTEX(hmem_resource_lock);
>>   static struct resource hmem_active = {
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index 1e3424358490..8c574123bd3b 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -3,6 +3,7 @@
>>   #include <linux/memregion.h>
>>   #include <linux/module.h>
>>   #include <linux/dax.h>
>> +#include <cxl/cxl.h>
>>   #include "../bus.h"
>>   
>>   static bool region_idle;
>> @@ -58,6 +59,19 @@ static void release_hmem(void *pdev)
>>   	platform_device_unregister(pdev);
>>   }
>>   
>> +struct dax_defer_work {
>> +	struct platform_device *pdev;
>> +	struct work_struct work;
>> +};
>> +
>> +static struct dax_defer_work dax_hmem_work;
> 
> static struct dax_defer_work dax_hmem_work = {
> 	.work = __WORK_INITIALIZER(&dax_hmem_work.work,
> 				   process_defer_work),
> };
> or something similar.
> 
> 
>> +
>> +void dax_hmem_flush_work(void)
>> +{
>> +	flush_work(&dax_hmem_work.work);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
>> +
>>   static int hmem_register_device(struct device *host, int target_nid,
>>   				const struct resource *res)
>>   {
>> @@ -69,8 +83,11 @@ static int hmem_register_device(struct device *host, int target_nid,
>>   	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>>   			      IORES_DESC_CXL) != REGION_DISJOINT) {
>> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
>> -		return 0;
>> +		if (!dax_hmem_initial_probe) {
>> +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
>> +			queue_work(system_long_wq, &dax_hmem_work.work);
>> +			return 0;
>> +		}
>>   	}
>>   
>>   	rc = region_intersects_soft_reserve(res->start, resource_size(res));
>> @@ -123,8 +140,48 @@ static int hmem_register_device(struct device *host, int target_nid,
>>   	return rc;
>>   }
>>   
>> +static int hmem_register_cxl_device(struct device *host, int target_nid,
>> +				    const struct resource *res)
>> +{
>> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> +			      IORES_DESC_CXL) == REGION_DISJOINT)
>> +		return 0;
>> +
>> +	if (cxl_region_contains_resource((struct resource *)res)) {
>> +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
>> +		return 0;
>> +	}
>> +
>> +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
>> +	return hmem_register_device(host, target_nid, res);
>> +}
>> +
>> +static void process_defer_work(struct work_struct *w)
>> +{
>> +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
>> +	struct platform_device *pdev = work->pdev;
> If you do the suggested __INITIALIZE_WORK() then I'd add
> a paranoid
> 
> 	if (!work->pdev)
> 		return;
> We don't actually queue the work before pdev is set, but that might
> be obvious once we spilt up assigning the function and the data
> it uses.
> 
>> +
>> +	wait_for_device_probe();
>> +
>> +	guard(device)(&pdev->dev);
>> +	if (!pdev->dev.driver)
>> +		return;
>> +
>> +	dax_hmem_initial_probe = true;
>> +	walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
>> +}
>> +
>>   static int dax_hmem_platform_probe(struct platform_device *pdev)
>>   {
>> +	if (work_pending(&dax_hmem_work.work))
>> +		return -EBUSY;
>> +
>> +	if (!dax_hmem_work.pdev) {
>> +		get_device(&pdev->dev);
>> +		dax_hmem_work.pdev = pdev;
> 
> Using the pdev rather than dev breaks the pattern of doing a get_device()
> and assigning in one line. This is a bit ugly.
> 
> 		dax_hmem_work.pdev = to_pci_dev(get_device(&pdev->dev));
> 
> but perhaps makes the association tighter than current code.
> 
>> +		INIT_WORK(&dax_hmem_work.work, process_defer_work);
> 
> See above. I think assigning the work function should be static
> which should resolve the issue Alison was seeing as then it should
> be fine to call flush_work() on the item that isn't on a work queue
> yet but is initialized.
> 
>> +	}
>> +
>>   	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>>   }
>>   
>> @@ -162,6 +219,11 @@ static __init int dax_hmem_init(void)
>>   
>>   static __exit void dax_hmem_exit(void)
>>   {
>> +	flush_work(&dax_hmem_work.work);
> 
> I think this needs to be under the if (dax_hmem_work.pdev)
> Not sure there is any guarantee dax_hmem_platform_probe() has run
> before we get here otherwise.  Alternative is to assign
> the work function statically.
> 
> 
> 
>> +
>> +	if (dax_hmem_work.pdev)
>> +		put_device(&dax_hmem_work.pdev->dev);
>> +
>>   	platform_driver_unregister(&dax_hmem_driver);
>>   	platform_driver_unregister(&dax_hmem_platform_driver);
>>   }
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19 23:07         ` Dan Williams
@ 2026-03-20 17:29           ` Koralahalli Channabasappa, Smita
  2026-03-20 20:42           ` Koralahalli Channabasappa, Smita
  1 sibling, 0 replies; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-20 17:29 UTC (permalink / raw)
  To: Dan Williams, Alison Schofield, Smita Koralahalli,
	Jonathan Cameron
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Vishal Verma, Ira Weiny, Yazen Ghannam,
	Dave Jiang, Davidlohr Bueso, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
	Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski

On 3/19/2026 4:07 PM, Dan Williams wrote:
> Koralahalli Channabasappa, Smita wrote:
> [..]
>>> I agree with Jonathan's comments in Patch 6, using __WORK_INITIALIZER or
>>> initializing work in dax_hmem_init() and gating flush on pdev will fix
>>> the WARN — I will add both for v8. But I think the WARN is likely
>>> indicating an ordering issue here..
> 
> Yes, Jonathan is right, static initialization is also my expecation.
> 
>>> On initial boot, the Makefile ordering ensures dax_hmem_init() runs
>>> before cxl_dax_region_init(), so both work items land on system_long_wq
>>> in the right order and dax_hmem's deferred work is queued before
>>> dax_cxl's driver registration work.
> 
> There is nothing that guarantees that 2 work items in system_long_wq run
> in submission order. Unlikely that matters given the explicit flushing.
> 
>>> On module reload which Alison is trying here I dont think, modules are
>>> loaded by Makefile order. I think dax_cxl's workqueue is calling
>>> dax_hmem_flush_work() before dax_hmem probe has had a chance to queue
>>> its work, so flush_work() flushes nothing and dax_cxl registers its
>>> driver without waiting.
> 
> Module load order does not matter after initial probe completion.

Thanks for the clarification on system_long_wq ordering.

> 
> ...and dax_hmem is guaranteed to always load before dax_cxl due to the
> symbol dependency of dax_hmem_flush_work().
> 
>>> __WORK_INITIALIZER fixes the WARN, but doesn't fix the race I guess if
>>> we are hitting that here..
>>>
>>> [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0
>>> after 2225 usecs
>>> [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
>>>
>>> These two lines indicate cxl_dax started after dax_hmem_init() returns
>>> but I dont think that guarantees dax_hmem_platform_probe() has actually
>>> run..
>>>
>>> I dont know if wait_for_device_probe() in cxl_dax_region_driver_register
>>> might help..
>>>
>>> Thanks
>>> Smita
>>
>> Actually, thinking about this more..
>>
>> dax_hmem_initial_probe lives in device.c (built-in) so it survives
>> module reload. On reload it's still true from the first boot. This means
>> hmem_register_device() skips the deferral path entirely..
> 
> Yes, that is the expectation.
> 
>> The problem is this bypasses the cxl_region_contains_resource() check
>> that the deferred work normally does. On first boot,
>> process_defer_work() walks each range and decides per-range: if CXL
>> covers it, skip. If not, register with HMEM. On reload, that check never
>> happens — whoever registers first via alloc_dax_region() wins,
>> regardless of whether CXL actually covers the range.
> 
> Yes, I think you have hit on a real issue. There is no point in having
> dax_hmem auto-attach on driver reload. If userspace unloads the driver
> it gets to keep the pieces. So that means something like this:
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 15e462589b92..7478bc78a698 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -112,10 +112,12 @@ static int hmem_register_device(struct device *host, int target_nid,
>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>   			      IORES_DESC_CXL) != REGION_DISJOINT) {
>   		if (!dax_hmem_initial_probe) {
> -			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
>   			queue_work(system_long_wq, &dax_hmem_work.work);
>   			return 0;
>   		}
> +		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +		return 0;
>   	}
>   
>   	rc = region_intersects_soft_reserve(res->start, resource_size(res));
> 
> ---
> 
> ...because if userspace wants to reload the dax_hmem driver, then it
> needs to pick what happens with the CXL intersection. Userspace can
> always unload cxl_acpi to force everything back to dax_hmem.
> 
> Now, you might say, "but this means that if the initial probe results in
> a partial result of some regions in dax_hmem and others in dax_cxl, that
> state can not be recovered outside of a reboot". I think that is ok.
> This mechanism is automatic best-effort workaround for bugs / missing
> capabilities in the CXL driver. Module reload fidelity is out of scope.

The fixup for the reload case makes sense.
I will incorporate this into v8 along with Jonathan's __WORK_INITIALIZER 
and the pdev gating.

Thanks
Smita

> 
>> So if dax_cxl registers first on reload, it could claim a range that CXL
>> doesn't actually cover, and dax_hmem would lose a range it should own..
> 
> With the above change, dax_cxl always wins in the "reload" scenario iff
> cxl_acpi is loaded. Otherwise dax_hmem owns all the Soft Reserved.
> 
>> I dont know if Im thinking through this right..
> 
> You definitely identified the need for that fixup above.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-19 23:07         ` Dan Williams
  2026-03-20 17:29           ` Koralahalli Channabasappa, Smita
@ 2026-03-20 20:42           ` Koralahalli Channabasappa, Smita
  1 sibling, 0 replies; 22+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-20 20:42 UTC (permalink / raw)
  To: Dan Williams, Alison Schofield, Smita Koralahalli,
	Jonathan Cameron
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Vishal Verma, Ira Weiny, Yazen Ghannam,
	Dave Jiang, Davidlohr Bueso, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
	Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski

Hi Dan,

On 3/19/2026 4:07 PM, Dan Williams wrote:
> Koralahalli Channabasappa, Smita wrote:
> [..]
>>> I agree with Jonathan's comments in Patch 6, using __WORK_INITIALIZER or
>>> initializing work in dax_hmem_init() and gating flush on pdev will fix
>>> the WARN — I will add both for v8. But I think the WARN is likely
>>> indicating an ordering issue here..
> 
> Yes, Jonathan is right, static initialization is also my expecation.
> 
>>> On initial boot, the Makefile ordering ensures dax_hmem_init() runs
>>> before cxl_dax_region_init(), so both work items land on system_long_wq
>>> in the right order and dax_hmem's deferred work is queued before
>>> dax_cxl's driver registration work.
> 
> There is nothing that guarantees that 2 work items in system_long_wq run
> in submission order. Unlikely that matters given the explicit flushing.
> 
>>> On module reload which Alison is trying here I dont think, modules are
>>> loaded by Makefile order. I think dax_cxl's workqueue is calling
>>> dax_hmem_flush_work() before dax_hmem probe has had a chance to queue
>>> its work, so flush_work() flushes nothing and dax_cxl registers its
>>> driver without waiting.
> 
> Module load order does not matter after initial probe completion.
> 
> ...and dax_hmem is guaranteed to always load before dax_cxl due to the
> symbol dependency of dax_hmem_flush_work().
> 
>>> __WORK_INITIALIZER fixes the WARN, but doesn't fix the race I guess if
>>> we are hitting that here..
>>>
>>> [   34.673051] initcall dax_hmem_init+0x0/0xff0 [dax_hmem] returned 0
>>> after 2225 usecs
>>> [   34.676011] calling  cxl_dax_region_init+0x0/0xff0 [dax_cxl] @ 1059
>>>
>>> These two lines indicate cxl_dax started after dax_hmem_init() returns
>>> but I dont think that guarantees dax_hmem_platform_probe() has actually
>>> run..
>>>
>>> I dont know if wait_for_device_probe() in cxl_dax_region_driver_register
>>> might help..
>>>
>>> Thanks
>>> Smita
>>
>> Actually, thinking about this more..
>>
>> dax_hmem_initial_probe lives in device.c (built-in) so it survives
>> module reload. On reload it's still true from the first boot. This means
>> hmem_register_device() skips the deferral path entirely..
> 
> Yes, that is the expectation.
> 
>> The problem is this bypasses the cxl_region_contains_resource() check
>> that the deferred work normally does. On first boot,
>> process_defer_work() walks each range and decides per-range: if CXL
>> covers it, skip. If not, register with HMEM. On reload, that check never
>> happens — whoever registers first via alloc_dax_region() wins,
>> regardless of whether CXL actually covers the range.
> 
> Yes, I think you have hit on a real issue. There is no point in having
> dax_hmem auto-attach on driver reload. If userspace unloads the driver
> it gets to keep the pieces. So that means something like this:
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 15e462589b92..7478bc78a698 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -112,10 +112,12 @@ static int hmem_register_device(struct device *host, int target_nid,
>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>   			      IORES_DESC_CXL) != REGION_DISJOINT) {
>   		if (!dax_hmem_initial_probe) {
> -			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
>   			queue_work(system_long_wq, &dax_hmem_work.work);
>   			return 0;
>   		}
> +		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +		return 0;
>   	}

One issue with the reload fix - At boot, hmem_register_cxl_device() 
calls hmem_register_device() to register ranges that aren't claimed by 
CXL. But hmem_register_device() now always returns 0 for those ranges at 
boot.

I was thinking factoring out registration logic into 
__hmem_register_device() and have hmem_register_cxl_device() call that 
directly, bypassing the CXL gating. Something like:


+static int __hmem_register_device(...)
+{
+	/* Remaining in hmem_register_device after the CXL check */
+}

static int hmem_register_device(..)
{
	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) && .. {
+		if (!dax_hmem_initial_probe_done) {
+			queue_work(system_long_wq, &dax_hmem_work);
+			return 0;
+		}
+		return 0;
+	}

+	return __hmem_register_device(host, target_nid, res);
}

+static int hmem_register_cxl_device(...)
+{
	...
+	return __hmem_register_device(host, target_nid, res);
+}

+static void process_defer_work(...)
+{
+	...
-	dax_hmem_initial_probe = true;
-	walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
+	if (!dax_hmem_initial_probe) {
+		dax_hmem_initial_probe = true;
+		walk_hmem_resources(.., hmem_register_cxl_device);
+	}
..
+}

Tracing it

At boot:

probe -> walk(hmem_register_device)
    CXL range, !dax_hmem_initial_probe -> queue_work, return 0
    non-CXL ranges -> __hmem_register_device -> registers

process_defer_work:
    !dax_hmem_initial_probe
       dax_hmem_initial_probe = true
       walk(hmem_register_cxl_device)
       CXL covers -> return 0
       CXL doesn't cover -> __hmem_register_device()
          no CXL check again, straight to registration..

On reload:

probe -> walk(hmem_register_device)
    CXL range, dax_hmem_initial_probe = true, "your return 0" -> skips
    non-CXL ranges -> __hmem_register_device -> registers

process_defer_work:
    dax_hmem_initial_probe = true -> skip the walk entirely..

Or do you think this can be simplified better and the above approach has 
some caveats?

Thanks
Smita
>   
>   	rc = region_intersects_soft_reserve(res->start, resource_size(res));
> 
> ---
> 
> ...because if userspace wants to reload the dax_hmem driver, then it
> needs to pick what happens with the CXL intersection. Userspace can
> always unload cxl_acpi to force everything back to dax_hmem.
> 
> Now, you might say, "but this means that if the initial probe results in
> a partial result of some regions in dax_hmem and others in dax_cxl, that
> state can not be recovered outside of a reboot". I think that is ok.
> This mechanism is automatic best-effort workaround for bugs / missing
> capabilities in the CXL driver. Module reload fidelity is out of scope.
> 
>> So if dax_cxl registers first on reload, it could claim a range that CXL
>> doesn't actually cover, and dax_hmem would lose a range it should own..
> 
> With the above change, dax_cxl always wins in the "reload" scenario iff
> cxl_acpi is loaded. Otherwise dax_hmem owns all the Soft Reserved.
> 
>> I dont know if Im thinking through this right..
> 
> You definitely identified the need for that fixup above.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-20 20:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  1:14 [PATCH v7 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-03-19  1:14 ` [PATCH v7 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-03-19  1:14 ` [PATCH v7 2/7] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
2026-03-19  1:14 ` [PATCH v7 3/7] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
2026-03-19  5:48   ` Alison Schofield
2026-03-19 14:11     ` Jonathan Cameron
2026-03-19 15:46     ` Koralahalli Channabasappa, Smita
2026-03-19 16:45       ` Koralahalli Channabasappa, Smita
2026-03-19 23:07         ` Dan Williams
2026-03-20 17:29           ` Koralahalli Channabasappa, Smita
2026-03-20 20:42           ` Koralahalli Channabasappa, Smita
2026-03-19  1:14 ` [PATCH v7 4/7] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
2026-03-19 13:59   ` Jonathan Cameron
2026-03-20 16:58     ` Koralahalli Channabasappa, Smita
2026-03-19  1:14 ` [PATCH v7 5/7] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
2026-03-19  1:14 ` [PATCH v7 6/7] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
2026-03-19 14:29   ` Jonathan Cameron
2026-03-19 20:03     ` Alison Schofield
2026-03-20 17:17     ` Koralahalli Channabasappa, Smita
2026-03-19  1:15 ` [PATCH v7 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2026-03-19 14:35   ` Jonathan Cameron
2026-03-20 17:00     ` Koralahalli Channabasappa, Smita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox