public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
@ 2026-03-22 19:53 Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

This series aims to address long-standing conflicts between HMEM and
CXL when handling Soft Reserved memory ranges.

Reworked from Dan's patch:
https://lore.kernel.org/all/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/

Previous work:
https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/

Link to v7:
https://lore.kernel.org/all/20260319011500.241426-1-Smita.KoralahalliChannabasappa@amd.com

The series is based on Linux 7.0-rc4 and base-commit is
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c

[1] After offlining the memory I can tear down the regions and recreate
them back. dax_cxl creates dax devices and onlines memory.
850000000-284fffffff : CXL Window 0
  850000000-284fffffff : region0
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)

[2] With CONFIG_CXL_REGION disabled, all the resources are handled by
HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
and dax devices are created from HMEM.
850000000-284fffffff : CXL Window 0
  850000000-284fffffff : Soft Reserved
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)

[3] Region assembly failure: Soft Reserved range shows up in /proc/iomem
and dax devices are handled by HMEM.
850000000-284fffffff : Soft Reserved
  850000000-284fffffff : CXL Window 0
    850000000-284fffffff : region0
      850000000-284fffffff : dax6.0
        850000000-284fffffff : System RAM (kmem)

[4] REGISTER path:
The results are as expected with both CXL_BUS = y and CXL_BUS = m.
To validate the REGISTER path, I forced REGISTER even in cases where SR
completely overlaps the CXL region as I did not have access to a system
where the CXL region range is smaller than the SR range.

850000000-284fffffff : Soft Reserved
  850000000-284fffffff : CXL Window 0
    850000000-280fffffff : region0
      850000000-284fffffff : dax6.0
        850000000-284fffffff : System RAM (kmem)

kreview complained on the deadlock for taking pdev->dev.mutex before
wait_for_device_probe(). Hence, I moved it.

From kreview:
The guard(device) takes pdev->dev.mutex and holds it across
wait_for_device_probe(). If any probe function in the system tries to
access this device (directly or indirectly), it would need the same
mutex:

process_defer_work()
  guard(device)(&pdev->dev)     <- Takes pdev->dev.mutex
  wait_for_device_probe()       <- Waits for all probes globally
    wait_event(probe_count == 0)

Meanwhile, if another driver's probe:

  some_driver_probe()
    device_lock(&pdev->dev)     <- Blocks waiting for mutex

The probe can't complete while waiting for the mutex, and
wait_for_device_probe() won't return while the probe is pending..

v8 updates:
- New patch to handle kref lifecycle correctly.
- New patch to factor hmem registration.
- Reversed teardown order in dax_region_unregister().
- Replaced INIT_WORK() with __WORK_INITIALIZER.
- Added forward declaration for process_defer_work().
- Added if !work->pdev return in process_defer_work() as a
  defensive check.
- One liner pdev assignment using to_platform_device(get_device()).
- Module reload handling: Reload fix to return 0 if dax_hmem_initial_probe
  is set.
- Enforced CXL to always win irrespective of whether SR covers cxl
  regions. If userspace wants HMEM to own, unload cxl_acpi.
- hmem_register_cxl_device() calls __hmem_register_device() instead of
  hmem_register_Device() to properly register resources through HMEM
  during deferred walk bypassing cxl check at boot.
- Gated flush_work() and put_device() under if dax_hmem_work.pdev in
  dax_hmem_exit().
- kmalloc -> kmalloc_obj.
- Added if (!dax_hmem_initial_probe) guard in process_defer_work() to
  skip the walk entirely. Without !dax_hmem_initial_probe guard I could
  see below on region assembly failure testings at boot..

  hmem_register_device: hmem_platform hmem_platform.0: await CXL initial probe: ..
  hmem_register_cxl_device: hmem_platform hmem_platform.0: CXL did not claim resource ..
  alloc_dev_dax_range:  dax6.0: alloc range[0]: ..
  hmem_register_cxl_device: hmem_platform hmem_platform.0: CXL did not claim resource ..
  alloc_dax_region: hmem hmem.9: dax_region resource conflict for ..
  hmem hmem.9: probe with driver hmem failed with error -12 .. 

v7 updates:
- Added Reviewed-by tags.
- co-developed-by -> Suggested-by for Patch 4.
- Dropped "cxl/region: Skip decoder reset for auto-discovered regions"
  patch.
- cxl_region_contains_soft_reserve() -> cxl_region_contains_resource()
- Dropped scoped_guard around request_resource() and release_resource().
- Dropped patch 7. All deferred work infrastructure moved from bus.c into
  hmem.c
- Dropped enum dax_cxl_mode (DEFER/REGISTER/DROP) and replaced with bool
  dax_hmem_initial_probe in device.c (built-in, survives module reload).
- Changed from all-or-nothing to per-range ownership decisions. Each range
  decided individually — CXL keeps what it covers, HMEM gets the rest.
- Replaces single pass walk instead of 2 passes to exercise per range
  ownership.
- Moved wait_for_device_probe() before guard(device) to avoid lockdep
  warning (kreview, Gregory).
- Added guard(device) + driver bound check.
- Added get_device()/put_device() for pdev refcount.
- Added flush_work() in dax_hmem_exit() to ensure work completes before
  module unload.
- dax_hmem_flush_work() exported from dax_hmem.ko — symbol dependency
  forces dax_hmem to load before dax_cxl (Dan requirement 2).
- Added static inline no-op stub in bus.h for CONFIG_DEV_DAX_HMEM = n.
- Added work_pending() check (Dan requirement 3).
- pdev and work_struct initialized together on first probe, making
  singleton nature explicit. static struct and INIT_WORK once.
- Reverted back to container_of() in work function instead of global
  variables.
- No kill_defer_work() with the struct being static.

v6 updates:
- Patch 1-3 no changes.
- New Patches 4-5.
- (void *)res -> res.
- cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
- New file include/cxl/cxl.h
- Introduced singleton workqueue.
- hmem to queue the work and cxl to flush.
- cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
- Included descriptions for dax_cxl_mode.
- kzalloc -> kmalloc in add_soft_reserve_into_iomem()
- dax_cxl_mode is exported to CXL.
- Introduced hmem_register_cxl_device() for walking only CXL
  intersected SR ranges the second time.

v5 updates:
- Patch 1 dropped as its been merged for-7.0/cxl-init.
- Added Reviewed-by tags.
- Shared dax_cxl_mode between dax/cxl.c and dax/hmem.c and used
  -EPROBE_DEFER to defer dax_cxl.
- CXL_REGION_F_AUTO check for resetting decoders.
- Teardown all CXL regions if any one CXL region doesn't fully contain
  the Soft Reserved range.
- Added helper cxl_region_contains_sr() to determine Soft Reserved
  ownership.
- bus_rescan_devices() to retry dax_cxl.
- Added guard(rwsem_read)(&cxl_rwsem.region).

v4 updates:
- No changes patches 1-3.
- New patches 4-7.
- handle_deferred_cxl() has been enhanced to handle case where CXL
  regions do not contiguously and fully cover Soft Reserved ranges.
- Support added to defer cxl_dax registration.
- Support added to teardown cxl regions.

v3 updates:
- Fixed two "From".

v2 updates:
- Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem
  depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian)
- Added TODO note. (Zhijian)
- Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE
  conditional check. (Zhijian)
- insert_resource_late() -> insert_resource_expand_to_fit() and
  __insert_resource_expand_to_fit() replacement. (Boris)
- Fixed Co-developed and Signed-off by. (Dan)
- Combined 2/6 and 3/6 into a single patch. (Zhijian).
- Skip local variable in remove_soft_reserved. (Jonathan)
- Drop kfree with __free(). (Jonathan)
- return 0 -> return dev_add_action_or_reset(host...) (Jonathan)
- Dropped 6/6.
- Reviewed-by tags (Dave, Jonathan)

Dan Williams (3):
  dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
    ranges
  dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
  dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding

Smita Koralahalli (6):
  dax/bus: Use dax_region_put() in alloc_dax_region() error path
  dax/hmem: Factor HMEM registration into __hmem_register_device()
  dax: Track all dax_region allocations under a global resource tree
  cxl/region: Add helper to check Soft Reserved containment by CXL
    regions
  dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree

 drivers/cxl/core/region.c |  30 ++++++++
 drivers/dax/Kconfig       |   2 +
 drivers/dax/Makefile      |   3 +-
 drivers/dax/bus.c         |  20 +++++-
 drivers/dax/bus.h         |   7 ++
 drivers/dax/cxl.c         |  28 +++++++-
 drivers/dax/hmem/device.c |   3 +
 drivers/dax/hmem/hmem.c   | 146 +++++++++++++++++++++++++++++++++-----
 include/cxl/cxl.h         |  15 ++++
 9 files changed, 231 insertions(+), 23 deletions(-)
 create mode 100644 include/cxl/cxl.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-23 17:11   ` Dave Jiang
                     ` (2 more replies)
  2026-03-22 19:53 ` [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device() Smita Koralahalli
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

alloc_dax_region() calls kref_init() on the dax_region early in the
function, but the error path for sysfs_create_groups() failure uses
kfree() directly to free the dax_region. This bypasses the kref lifecycle.

Use dax_region_put() instead to handle kref lifecycle correctly.

Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index c94c09622516..299134c9b294 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -668,7 +668,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 	};
 
 	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
-		kfree(dax_region);
+		dax_region_put(dax_region);
 		return NULL;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device()
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-23 17:14   ` Dave Jiang
  2026-03-23 17:59   ` Jonathan Cameron
  2026-03-22 19:53 ` [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Separate the CXL overlap check from the HMEM registration path and keep
the platform-device setup in a dedicated __hmem_register_device().

This makes hmem_register_device() the policy entry point for deciding
whether a range should be deferred to CXL, while __hmem_register_device()
handles the HMEM registration flow.

No functional changes.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/hmem/hmem.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1cf7c2a0ee1c..a3d45032355c 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -58,21 +58,14 @@ static void release_hmem(void *pdev)
 	platform_device_unregister(pdev);
 }
 
-static int hmem_register_device(struct device *host, int target_nid,
-				const struct resource *res)
+static int __hmem_register_device(struct device *host, int target_nid,
+				  const struct resource *res)
 {
 	struct platform_device *pdev;
 	struct memregion_info info;
 	long id;
 	int rc;
 
-	if (IS_ENABLED(CONFIG_CXL_REGION) &&
-	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
-			      IORES_DESC_CXL) != REGION_DISJOINT) {
-		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-		return 0;
-	}
-
 	rc = region_intersects_soft_reserve(res->start, resource_size(res));
 	if (rc != REGION_INTERSECTS)
 		return 0;
@@ -123,6 +116,19 @@ static int hmem_register_device(struct device *host, int target_nid,
 	return rc;
 }
 
+static int hmem_register_device(struct device *host, int target_nid,
+				const struct resource *res)
+{
+	if (IS_ENABLED(CONFIG_CXL_REGION) &&
+	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) != REGION_DISJOINT) {
+		dev_dbg(host, "deferring range to CXL: %pr\n", res);
+		return 0;
+	}
+
+	return __hmem_register_device(host, target_nid, res);
+}
+
 static int dax_hmem_platform_probe(struct platform_device *pdev)
 {
 	return walk_hmem_resources(&pdev->dev, hmem_register_device);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device() Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-23 19:54   ` Dan Williams
  2026-03-22 19:53 ` [PATCH v8 4/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

From: Dan Williams <dan.j.williams@intel.com>

Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.

Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.

Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.

Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/dax/Kconfig     |  2 ++
 drivers/dax/hmem/hmem.c | 17 ++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d656e4c0eb84..3683bb3f2311 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -48,6 +48,8 @@ config DEV_DAX_CXL
 	tristate "CXL DAX: direct access to CXL RAM regions"
 	depends on CXL_BUS && CXL_REGION && DEV_DAX
 	default CXL_REGION && DEV_DAX
+	depends on CXL_ACPI >= DEV_DAX_HMEM
+	depends on CXL_PCI >= DEV_DAX_HMEM
 	help
 	  CXL RAM regions are either mapped by platform-firmware
 	  and published in the initial system-memory map as "System RAM", mapped
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index a3d45032355c..85e751675f65 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -145,6 +145,16 @@ static __init int dax_hmem_init(void)
 {
 	int rc;
 
+	/*
+	 * Ensure that cxl_acpi and cxl_pci have a chance to kick off
+	 * CXL topology discovery at least once before scanning the
+	 * iomem resource tree for IORES_DESC_CXL resources.
+	 */
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL)) {
+		request_module("cxl_acpi");
+		request_module("cxl_pci");
+	}
+
 	rc = platform_driver_register(&dax_hmem_platform_driver);
 	if (rc)
 		return rc;
@@ -165,13 +175,6 @@ static __exit void dax_hmem_exit(void)
 module_init(dax_hmem_init);
 module_exit(dax_hmem_exit);
 
-/* Allow for CXL to define its own dax regions */
-#if IS_ENABLED(CONFIG_CXL_REGION)
-#if IS_MODULE(CONFIG_CXL_ACPI)
-MODULE_SOFTDEP("pre: cxl_acpi");
-#endif
-#endif
-
 MODULE_ALIAS("platform:hmem*");
 MODULE_ALIAS("platform:hmem_platform*");
 MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 4/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (2 preceding siblings ...)
  2026-03-22 19:53 ` [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 5/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

From: Dan Williams <dan.j.williams@intel.com>

Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that HMEM only defers Soft Reserved ranges when CXL DAX support is
enabled. This makes the coordination between HMEM and the CXL stack more
precise and prevents deferral in unrelated CXL configurations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/dax/hmem/hmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 85e751675f65..ca752db03201 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -119,7 +119,7 @@ static int __hmem_register_device(struct device *host, int target_nid,
 static int hmem_register_device(struct device *host, int target_nid,
 				const struct resource *res)
 {
-	if (IS_ENABLED(CONFIG_CXL_REGION) &&
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
 		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 5/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (3 preceding siblings ...)
  2026-03-22 19:53 ` [PATCH v8 4/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

From: Dan Williams <dan.j.williams@intel.com>

Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
dax_cxl.

In addition, defer registration of the dax_cxl driver to a workqueue
instead of using module_cxl_driver(). This ensures that dax_hmem has
an opportunity to initialize and register its deferred callback and make
ownership decisions before dax_cxl begins probing and claiming Soft
Reserved ranges.

Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
out of line from other synchronous probing avoiding ordering
dependencies while coordinating ownership decisions with dax_hmem.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/dax/Makefile |  3 +--
 drivers/dax/cxl.c    | 27 ++++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 5ed5c39857c8..70e996bf1526 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-y += hmem/
 obj-$(CONFIG_DAX) += dax.o
 obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
@@ -10,5 +11,3 @@ dax-y += bus.o
 device_dax-y := device.o
 dax_pmem-y := pmem.o
 dax_cxl-y := cxl.o
-
-obj-y += hmem/
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 13cd94d32ff7..a2136adfa186 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
 	.id = CXL_DEVICE_DAX_REGION,
 	.drv = {
 		.suppress_bind_attrs = true,
+		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
 };
 
-module_cxl_driver(cxl_dax_region_driver);
+static void cxl_dax_region_driver_register(struct work_struct *work)
+{
+	cxl_driver_register(&cxl_dax_region_driver);
+}
+
+static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
+
+static int __init cxl_dax_region_init(void)
+{
+	/*
+	 * Need to resolve a race with dax_hmem wanting to drive regions
+	 * instead of CXL
+	 */
+	queue_work(system_long_wq, &cxl_dax_region_driver_work);
+	return 0;
+}
+module_init(cxl_dax_region_init);
+
+static void __exit cxl_dax_region_exit(void)
+{
+	flush_work(&cxl_dax_region_driver_work);
+	cxl_driver_unregister(&cxl_dax_region_driver);
+}
+module_exit(cxl_dax_region_exit);
+
 MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
 MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
 MODULE_LICENSE("GPL");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (4 preceding siblings ...)
  2026-03-22 19:53 ` [PATCH v8 5/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-23 17:31   ` Dave Jiang
  2026-03-23 20:55   ` Dan Williams
  2026-03-22 19:53 ` [PATCH v8 7/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Introduce a global "DAX Regions" resource root and register each
dax_region->res under it via request_resource(). Release the resource on
dax_region teardown.

By enforcing a single global namespace for dax_region allocations, this
ensures only one of dax_hmem or dax_cxl can successfully register a
dax_region for a given range.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/dax/bus.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 299134c9b294..68437c05e21d 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -10,6 +10,7 @@
 #include "dax-private.h"
 #include "bus.h"
 
+static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
 static DEFINE_MUTEX(dax_bus_lock);
 
 /*
@@ -627,6 +628,7 @@ static void dax_region_unregister(void *region)
 
 	sysfs_remove_groups(&dax_region->dev->kobj,
 			dax_region_attribute_groups);
+	release_resource(&dax_region->res);
 	dax_region_put(dax_region);
 }
 
@@ -635,6 +637,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		unsigned long flags)
 {
 	struct dax_region *dax_region;
+	int rc;
 
 	/*
 	 * The DAX core assumes that it can store its private data in
@@ -667,14 +670,25 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		.flags = IORESOURCE_MEM | flags,
 	};
 
-	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
-		dax_region_put(dax_region);
-		return NULL;
+	rc = request_resource(&dax_regions, &dax_region->res);
+	if (rc) {
+		dev_dbg(parent, "dax_region resource conflict for %pR\n",
+			&dax_region->res);
+		goto err_res;
 	}
 
+	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
+		goto err_sysfs;
+
 	if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
 		return NULL;
 	return dax_region;
+
+err_sysfs:
+	release_resource(&dax_region->res);
+err_res:
+	dax_region_put(dax_region);
+	return NULL;
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 7/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (5 preceding siblings ...)
  2026-03-22 19:53 ` [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
  8 siblings, 0 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Add a helper to determine whether a given Soft Reserved memory range is
fully contained within the committed CXL region.

This helper provides a primitive for policy decisions in subsequent
patches such as co-ordination with dax_hmem to determine whether CXL has
fully claimed ownership of Soft Reserved memory ranges.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++++++++
 include/cxl/cxl.h         | 15 +++++++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 include/cxl/cxl.h

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 42874948b589..f7b20f60ac5c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -12,6 +12,7 @@
 #include <linux/idr.h>
 #include <linux/memory-tiers.h>
 #include <linux/string_choices.h>
+#include <cxl/cxl.h>
 #include <cxlmem.h>
 #include <cxl.h>
 #include "core.h"
@@ -4173,6 +4174,35 @@ static int cxl_region_setup_poison(struct cxl_region *cxlr)
 	return devm_add_action_or_reset(dev, remove_debugfs, dentry);
 }
 
+static int region_contains_resource(struct device *dev, void *data)
+{
+	struct resource *res = data;
+	struct cxl_region *cxlr;
+	struct cxl_region_params *p;
+
+	if (!is_cxl_region(dev))
+		return 0;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+
+	if (p->state != CXL_CONFIG_COMMIT)
+		return 0;
+
+	if (!p->res)
+		return 0;
+
+	return resource_contains(p->res, res) ? 1 : 0;
+}
+
+bool cxl_region_contains_resource(struct resource *res)
+{
+	guard(rwsem_read)(&cxl_rwsem.region);
+	return bus_for_each_dev(&cxl_bus_type, NULL, res,
+				region_contains_resource) != 0;
+}
+EXPORT_SYMBOL_GPL(cxl_region_contains_resource);
+
 static int cxl_region_can_probe(struct cxl_region *cxlr)
 {
 	struct cxl_region_params *p = &cxlr->params;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..b12d3d0f6658
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2026 Advanced Micro Devices, Inc. */
+#ifndef _CXL_H_
+#define _CXL_H_
+
+#ifdef CONFIG_CXL_REGION
+bool cxl_region_contains_resource(struct resource *res);
+#else
+static inline bool cxl_region_contains_resource(struct resource *res)
+{
+	return false;
+}
+#endif
+
+#endif /* _CXL_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (6 preceding siblings ...)
  2026-03-22 19:53 ` [PATCH v8 7/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-23 18:03   ` Jonathan Cameron
                     ` (2 more replies)
  2026-03-22 19:53 ` [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
  8 siblings, 3 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.

Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during the
initial dax_hmem probe, schedule deferred work to wait for the CXL stack
to complete enumeration and region assembly before deciding ownership.

Once the deferred work runs, evaluate each Soft Reserved range
individually: if a CXL region fully contains the range, skip it and let
dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
ownership model avoids the need for CXL region teardown and
alloc_dax_region() resource exclusion prevents double claiming.

Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
so it survives module reload. Ensure dax_cxl defers driver registration
until dax_hmem has completed ownership resolution. dax_cxl calls
dax_hmem_flush_work() before cxl_driver_register(), which both waits for
the deferred work to complete and creates a module symbol dependency that
forces dax_hmem.ko to load before dax_cxl.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/bus.h         |  7 ++++
 drivers/dax/cxl.c         |  1 +
 drivers/dax/hmem/device.c |  3 ++
 drivers/dax/hmem/hmem.c   | 74 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+)

diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index cbbf64443098..ebbfe2d6da14 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
 void kill_dev_dax(struct dev_dax *dev_dax);
 bool static_dev_dax(struct dev_dax *dev_dax);
 
+#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
+extern bool dax_hmem_initial_probe;
+void dax_hmem_flush_work(void);
+#else
+static inline void dax_hmem_flush_work(void) { }
+#endif
+
 #define MODULE_ALIAS_DAX_DEVICE(type) \
 	MODULE_ALIAS("dax:t" __stringify(type) "*")
 #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index a2136adfa186..3ab39b77843d 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
 
 static void cxl_dax_region_driver_register(struct work_struct *work)
 {
+	dax_hmem_flush_work();
 	cxl_driver_register(&cxl_dax_region_driver);
 }
 
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 56e3cbd181b5..991a4bf7d969 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -8,6 +8,9 @@
 static bool nohmem;
 module_param_named(disable, nohmem, bool, 0444);
 
+bool dax_hmem_initial_probe;
+EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
+
 static bool platform_initialized;
 static DEFINE_MUTEX(hmem_resource_lock);
 static struct resource hmem_active = {
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index ca752db03201..9ceda6b5cadf 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,6 +3,7 @@
 #include <linux/memregion.h>
 #include <linux/module.h>
 #include <linux/dax.h>
+#include <cxl/cxl.h>
 #include "../bus.h"
 
 static bool region_idle;
@@ -58,6 +59,23 @@ static void release_hmem(void *pdev)
 	platform_device_unregister(pdev);
 }
 
+struct dax_defer_work {
+	struct platform_device *pdev;
+	struct work_struct work;
+};
+
+static void process_defer_work(struct work_struct *w);
+
+static struct dax_defer_work dax_hmem_work = {
+	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
+};
+
+void dax_hmem_flush_work(void)
+{
+	flush_work(&dax_hmem_work.work);
+}
+EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
+
 static int __hmem_register_device(struct device *host, int target_nid,
 				  const struct resource *res)
 {
@@ -122,6 +140,11 @@ static int hmem_register_device(struct device *host, int target_nid,
 	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
+		if (!dax_hmem_initial_probe) {
+			dev_dbg(host, "await CXL initial probe: %pr\n", res);
+			queue_work(system_long_wq, &dax_hmem_work.work);
+			return 0;
+		}
 		dev_dbg(host, "deferring range to CXL: %pr\n", res);
 		return 0;
 	}
@@ -129,8 +152,54 @@ static int hmem_register_device(struct device *host, int target_nid,
 	return __hmem_register_device(host, target_nid, res);
 }
 
+static int hmem_register_cxl_device(struct device *host, int target_nid,
+				    const struct resource *res)
+{
+	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) == REGION_DISJOINT)
+		return 0;
+
+	if (cxl_region_contains_resource((struct resource *)res)) {
+		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
+		return 0;
+	}
+
+	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
+	return __hmem_register_device(host, target_nid, res);
+}
+
+static void process_defer_work(struct work_struct *w)
+{
+	struct dax_defer_work *work = container_of(w, typeof(*work), work);
+	struct platform_device *pdev;
+
+	if (!work->pdev)
+		return;
+
+	pdev = work->pdev;
+
+	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
+	wait_for_device_probe();
+
+	guard(device)(&pdev->dev);
+	if (!pdev->dev.driver)
+		return;
+
+	if (!dax_hmem_initial_probe) {
+		dax_hmem_initial_probe = true;
+		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
+	}
+}
+
 static int dax_hmem_platform_probe(struct platform_device *pdev)
 {
+	if (work_pending(&dax_hmem_work.work))
+		return -EBUSY;
+
+	if (!dax_hmem_work.pdev)
+		dax_hmem_work.pdev =
+			to_platform_device(get_device(&pdev->dev));
+
 	return walk_hmem_resources(&pdev->dev, hmem_register_device);
 }
 
@@ -168,6 +237,11 @@ static __init int dax_hmem_init(void)
 
 static __exit void dax_hmem_exit(void)
 {
+	if (dax_hmem_work.pdev) {
+		flush_work(&dax_hmem_work.work);
+		put_device(&dax_hmem_work.pdev->dev);
+	}
+
 	platform_driver_unregister(&dax_hmem_driver);
 	platform_driver_unregister(&dax_hmem_platform_driver);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
  2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
                   ` (7 preceding siblings ...)
  2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
@ 2026-03-22 19:53 ` Smita Koralahalli
  2026-03-23 21:09   ` Dan Williams
  8 siblings, 1 reply; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Reworked from a patch by Alison Schofield <alison.schofield@intel.com>

Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
to consume.

This restores visibility in /proc/iomem for ranges actively in use, while
avoiding the early-boot conflicts that occurred when Soft Reserved was
published into iomem before CXL window and region discovery.

Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/hmem/hmem.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 9ceda6b5cadf..b590e1251bb8 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -76,6 +76,33 @@ void dax_hmem_flush_work(void)
 }
 EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
 
+static void remove_soft_reserved(void *r)
+{
+	remove_resource(r);
+	kfree(r);
+}
+
+static int add_soft_reserve_into_iomem(struct device *host,
+				       const struct resource *res)
+{
+	int rc;
+
+	struct resource *soft __free(kfree) = kmalloc_obj(*soft);
+	if (!soft)
+		return -ENOMEM;
+
+	*soft = DEFINE_RES_NAMED_DESC(res->start, (res->end - res->start + 1),
+				      "Soft Reserved", IORESOURCE_MEM,
+				      IORES_DESC_SOFT_RESERVED);
+
+	rc = insert_resource(&iomem_resource, soft);
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(host, remove_soft_reserved,
+					no_free_ptr(soft));
+}
+
 static int __hmem_register_device(struct device *host, int target_nid,
 				  const struct resource *res)
 {
@@ -88,7 +115,9 @@ static int __hmem_register_device(struct device *host, int target_nid,
 	if (rc != REGION_INTERSECTS)
 		return 0;
 
-	/* TODO: Add Soft-Reserved memory back to iomem */
+	rc = add_soft_reserve_into_iomem(host, res);
+	if (rc)
+		return rc;
 
 	id = memregion_alloc(GFP_KERNEL);
 	if (id < 0) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
@ 2026-03-23 17:11   ` Dave Jiang
  2026-03-23 17:57   ` Jonathan Cameron
  2026-03-23 19:37   ` Dan Williams
  2 siblings, 0 replies; 26+ messages in thread
From: Dave Jiang @ 2026-03-23 17:11 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski



On 3/22/26 12:53 PM, Smita Koralahalli wrote:
> alloc_dax_region() calls kref_init() on the dax_region early in the
> function, but the error path for sysfs_create_groups() failure uses
> kfree() directly to free the dax_region. This bypasses the kref lifecycle.
> 
> Use dax_region_put() instead to handle kref lifecycle correctly.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/dax/bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index c94c09622516..299134c9b294 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -668,7 +668,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  	};
>  
>  	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> -		kfree(dax_region);
> +		dax_region_put(dax_region);
>  		return NULL;
>  	}
>  


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device()
  2026-03-22 19:53 ` [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device() Smita Koralahalli
@ 2026-03-23 17:14   ` Dave Jiang
  2026-03-23 17:59   ` Jonathan Cameron
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Jiang @ 2026-03-23 17:14 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski



On 3/22/26 12:53 PM, Smita Koralahalli wrote:
> Separate the CXL overlap check from the HMEM registration path and keep
> the platform-device setup in a dedicated __hmem_register_device().
> 
> This makes hmem_register_device() the policy entry point for deciding
> whether a range should be deferred to CXL, while __hmem_register_device()
> handles the HMEM registration flow.
> 
> No functional changes.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/dax/hmem/hmem.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1cf7c2a0ee1c..a3d45032355c 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -58,21 +58,14 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> -static int hmem_register_device(struct device *host, int target_nid,
> -				const struct resource *res)
> +static int __hmem_register_device(struct device *host, int target_nid,
> +				  const struct resource *res)
>  {
>  	struct platform_device *pdev;
>  	struct memregion_info info;
>  	long id;
>  	int rc;
>  
> -	if (IS_ENABLED(CONFIG_CXL_REGION) &&
> -	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> -			      IORES_DESC_CXL) != REGION_DISJOINT) {
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> -	}
> -
>  	rc = region_intersects_soft_reserve(res->start, resource_size(res));
>  	if (rc != REGION_INTERSECTS)
>  		return 0;
> @@ -123,6 +116,19 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	return rc;
>  }
>  
> +static int hmem_register_device(struct device *host, int target_nid,
> +				const struct resource *res)
> +{
> +	if (IS_ENABLED(CONFIG_CXL_REGION) &&
> +	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +		return 0;
> +	}
> +
> +	return __hmem_register_device(host, target_nid, res);
> +}
> +
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree
  2026-03-22 19:53 ` [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
@ 2026-03-23 17:31   ` Dave Jiang
  2026-03-23 20:55   ` Dan Williams
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Jiang @ 2026-03-23 17:31 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski



On 3/22/26 12:53 PM, Smita Koralahalli wrote:
> Introduce a global "DAX Regions" resource root and register each
> dax_region->res under it via request_resource(). Release the resource on
> dax_region teardown.
> 
> By enforcing a single global namespace for dax_region allocations, this
> ensures only one of dax_hmem or dax_cxl can successfully register a
> dax_region for a given range.
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/dax/bus.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 299134c9b294..68437c05e21d 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -10,6 +10,7 @@
>  #include "dax-private.h"
>  #include "bus.h"
>  
> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
>  static DEFINE_MUTEX(dax_bus_lock);
>  
>  /*
> @@ -627,6 +628,7 @@ static void dax_region_unregister(void *region)
>  
>  	sysfs_remove_groups(&dax_region->dev->kobj,
>  			dax_region_attribute_groups);
> +	release_resource(&dax_region->res);
>  	dax_region_put(dax_region);
>  }
>  
> @@ -635,6 +637,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		unsigned long flags)
>  {
>  	struct dax_region *dax_region;
> +	int rc;
>  
>  	/*
>  	 * The DAX core assumes that it can store its private data in
> @@ -667,14 +670,25 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		.flags = IORESOURCE_MEM | flags,
>  	};
>  
> -	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> -		dax_region_put(dax_region);
> -		return NULL;
> +	rc = request_resource(&dax_regions, &dax_region->res);
> +	if (rc) {
> +		dev_dbg(parent, "dax_region resource conflict for %pR\n",
> +			&dax_region->res);
> +		goto err_res;
>  	}
>  
> +	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
> +		goto err_sysfs;
> +
>  	if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
>  		return NULL;
>  	return dax_region;
> +
> +err_sysfs:
> +	release_resource(&dax_region->res);
> +err_res:
> +	dax_region_put(dax_region);
> +	return NULL;
>  }
>  EXPORT_SYMBOL_GPL(alloc_dax_region);
>  


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
  2026-03-23 17:11   ` Dave Jiang
@ 2026-03-23 17:57   ` Jonathan Cameron
  2026-03-23 19:37   ` Dan Williams
  2 siblings, 0 replies; 26+ messages in thread
From: Jonathan Cameron @ 2026-03-23 17:57 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Sun, 22 Mar 2026 19:53:34 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> alloc_dax_region() calls kref_init() on the dax_region early in the
> function, but the error path for sysfs_create_groups() failure uses
> kfree() directly to free the dax_region. This bypasses the kref lifecycle.
> 
> Use dax_region_put() instead to handle kref lifecycle correctly.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/dax/bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index c94c09622516..299134c9b294 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -668,7 +668,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  	};
>  
>  	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> -		kfree(dax_region);
> +		dax_region_put(dax_region);
>  		return NULL;
>  	}
>  


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device()
  2026-03-22 19:53 ` [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device() Smita Koralahalli
  2026-03-23 17:14   ` Dave Jiang
@ 2026-03-23 17:59   ` Jonathan Cameron
  1 sibling, 0 replies; 26+ messages in thread
From: Jonathan Cameron @ 2026-03-23 17:59 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Sun, 22 Mar 2026 19:53:35 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> Separate the CXL overlap check from the HMEM registration path and keep
> the platform-device setup in a dedicated __hmem_register_device().
> 
> This makes hmem_register_device() the policy entry point for deciding
> whether a range should be deferred to CXL, while __hmem_register_device()
> handles the HMEM registration flow.
> 
> No functional changes.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
@ 2026-03-23 18:03   ` Jonathan Cameron
  2026-03-23 18:13   ` Jonathan Cameron
  2026-03-23 18:17   ` Dave Jiang
  2 siblings, 0 replies; 26+ messages in thread
From: Jonathan Cameron @ 2026-03-23 18:03 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Sun, 22 Mar 2026 19:53:41 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
> 
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during the
> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
> to complete enumeration and region assembly before deciding ownership.
> 
> Once the deferred work runs, evaluate each Soft Reserved range
> individually: if a CXL region fully contains the range, skip it and let
> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
> ownership model avoids the need for CXL region teardown and
> alloc_dax_region() resource exclusion prevents double claiming.
> 
> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
> so it survives module reload. Ensure dax_cxl defers driver registration
> until dax_hmem has completed ownership resolution. dax_cxl calls
> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
> the deferred work to complete and creates a module symbol dependency that
> forces dax_hmem.ko to load before dax_cxl.
> 
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
  2026-03-23 18:03   ` Jonathan Cameron
@ 2026-03-23 18:13   ` Jonathan Cameron
  2026-03-24 21:50     ` Koralahalli Channabasappa, Smita
  2026-03-23 18:17   ` Dave Jiang
  2 siblings, 1 reply; 26+ messages in thread
From: Jonathan Cameron @ 2026-03-23 18:13 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

On Sun, 22 Mar 2026 19:53:41 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
> 
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during the
> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
> to complete enumeration and region assembly before deciding ownership.
> 
> Once the deferred work runs, evaluate each Soft Reserved range
> individually: if a CXL region fully contains the range, skip it and let
> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
> ownership model avoids the need for CXL region teardown and
> alloc_dax_region() resource exclusion prevents double claiming.
> 
> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
> so it survives module reload. Ensure dax_cxl defers driver registration
> until dax_hmem has completed ownership resolution. dax_cxl calls
> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
> the deferred work to complete and creates a module symbol dependency that
> forces dax_hmem.ko to load before dax_cxl.
> 
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

https://sashiko.dev/#/patchset/20260322195343.206900-1-Smita.KoralahalliChannabasappa%40amd.com
Might be worth a look.  I think the last comment is potentially correct
though unlikely a platform_driver_register() actually fails.

I've not looked too closely at the others. Given this was doing something
unusual I thought I'd see what it found. Looks like some interesting
questions if nothing else.

> ---
>  drivers/dax/bus.h         |  7 ++++
>  drivers/dax/cxl.c         |  1 +
>  drivers/dax/hmem/device.c |  3 ++
>  drivers/dax/hmem/hmem.c   | 74 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 85 insertions(+)
> 
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index cbbf64443098..ebbfe2d6da14 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
>  void kill_dev_dax(struct dev_dax *dev_dax);
>  bool static_dev_dax(struct dev_dax *dev_dax);
>  
> +#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
> +extern bool dax_hmem_initial_probe;
> +void dax_hmem_flush_work(void);
> +#else
> +static inline void dax_hmem_flush_work(void) { }
> +#endif
> +
>  #define MODULE_ALIAS_DAX_DEVICE(type) \
>  	MODULE_ALIAS("dax:t" __stringify(type) "*")
>  #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index a2136adfa186..3ab39b77843d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>  
>  static void cxl_dax_region_driver_register(struct work_struct *work)
>  {
> +	dax_hmem_flush_work();
>  	cxl_driver_register(&cxl_dax_region_driver);
>  }
>  
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 56e3cbd181b5..991a4bf7d969 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -8,6 +8,9 @@
>  static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
>  
> +bool dax_hmem_initial_probe;
> +EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
> +
>  static bool platform_initialized;
>  static DEFINE_MUTEX(hmem_resource_lock);
>  static struct resource hmem_active = {
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index ca752db03201..9ceda6b5cadf 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
>  #include <linux/memregion.h>
>  #include <linux/module.h>
>  #include <linux/dax.h>
> +#include <cxl/cxl.h>
>  #include "../bus.h"
>  
>  static bool region_idle;
> @@ -58,6 +59,23 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
> +static void process_defer_work(struct work_struct *w);
> +
> +static struct dax_defer_work dax_hmem_work = {
> +	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
> +};
> +
> +void dax_hmem_flush_work(void)
> +{
> +	flush_work(&dax_hmem_work.work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> +
>  static int __hmem_register_device(struct device *host, int target_nid,
>  				  const struct resource *res)
>  {
> @@ -122,6 +140,11 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>  	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>  			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (!dax_hmem_initial_probe) {
> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
> +			queue_work(system_long_wq, &dax_hmem_work.work);
> +			return 0;
> +		}
>  		dev_dbg(host, "deferring range to CXL: %pr\n", res);
>  		return 0;
>  	}
> @@ -129,8 +152,54 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	return __hmem_register_device(host, target_nid, res);
>  }
>  
> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> +				    const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) == REGION_DISJOINT)
> +		return 0;
> +
> +	if (cxl_region_contains_resource((struct resource *)res)) {
> +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
> +		return 0;
> +	}
> +
> +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
> +	return __hmem_register_device(host, target_nid, res);
> +}
> +
> +static void process_defer_work(struct work_struct *w)
> +{
> +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
> +	struct platform_device *pdev;
> +
> +	if (!work->pdev)
> +		return;
> +
> +	pdev = work->pdev;
> +
> +	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
> +	wait_for_device_probe();
> +
> +	guard(device)(&pdev->dev);
> +	if (!pdev->dev.driver)
> +		return;
> +
> +	if (!dax_hmem_initial_probe) {
> +		dax_hmem_initial_probe = true;
> +		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> +	}
> +}
> +
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> +	if (work_pending(&dax_hmem_work.work))
> +		return -EBUSY;
> +
> +	if (!dax_hmem_work.pdev)
> +		dax_hmem_work.pdev =
> +			to_platform_device(get_device(&pdev->dev));
> +
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>  }
>  
> @@ -168,6 +237,11 @@ static __init int dax_hmem_init(void)
>  
>  static __exit void dax_hmem_exit(void)
>  {
> +	if (dax_hmem_work.pdev) {
> +		flush_work(&dax_hmem_work.work);
> +		put_device(&dax_hmem_work.pdev->dev);
> +	}
> +
>  	platform_driver_unregister(&dax_hmem_driver);
>  	platform_driver_unregister(&dax_hmem_platform_driver);
>  }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
  2026-03-23 18:03   ` Jonathan Cameron
  2026-03-23 18:13   ` Jonathan Cameron
@ 2026-03-23 18:17   ` Dave Jiang
  2 siblings, 0 replies; 26+ messages in thread
From: Dave Jiang @ 2026-03-23 18:17 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski



On 3/22/26 12:53 PM, Smita Koralahalli wrote:
> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
> 
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during the
> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
> to complete enumeration and region assembly before deciding ownership.
> 
> Once the deferred work runs, evaluate each Soft Reserved range
> individually: if a CXL region fully contains the range, skip it and let
> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
> ownership model avoids the need for CXL region teardown and
> alloc_dax_region() resource exclusion prevents double claiming.
> 
> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
> so it survives module reload. Ensure dax_cxl defers driver registration
> until dax_hmem has completed ownership resolution. dax_cxl calls
> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
> the deferred work to complete and creates a module symbol dependency that
> forces dax_hmem.ko to load before dax_cxl.
> 
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


> ---
>  drivers/dax/bus.h         |  7 ++++
>  drivers/dax/cxl.c         |  1 +
>  drivers/dax/hmem/device.c |  3 ++
>  drivers/dax/hmem/hmem.c   | 74 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 85 insertions(+)
> 
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index cbbf64443098..ebbfe2d6da14 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
>  void kill_dev_dax(struct dev_dax *dev_dax);
>  bool static_dev_dax(struct dev_dax *dev_dax);
>  
> +#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
> +extern bool dax_hmem_initial_probe;
> +void dax_hmem_flush_work(void);
> +#else
> +static inline void dax_hmem_flush_work(void) { }
> +#endif
> +
>  #define MODULE_ALIAS_DAX_DEVICE(type) \
>  	MODULE_ALIAS("dax:t" __stringify(type) "*")
>  #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index a2136adfa186..3ab39b77843d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>  
>  static void cxl_dax_region_driver_register(struct work_struct *work)
>  {
> +	dax_hmem_flush_work();
>  	cxl_driver_register(&cxl_dax_region_driver);
>  }
>  
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index 56e3cbd181b5..991a4bf7d969 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -8,6 +8,9 @@
>  static bool nohmem;
>  module_param_named(disable, nohmem, bool, 0444);
>  
> +bool dax_hmem_initial_probe;
> +EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
> +
>  static bool platform_initialized;
>  static DEFINE_MUTEX(hmem_resource_lock);
>  static struct resource hmem_active = {
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index ca752db03201..9ceda6b5cadf 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
>  #include <linux/memregion.h>
>  #include <linux/module.h>
>  #include <linux/dax.h>
> +#include <cxl/cxl.h>
>  #include "../bus.h"
>  
>  static bool region_idle;
> @@ -58,6 +59,23 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
> +static void process_defer_work(struct work_struct *w);
> +
> +static struct dax_defer_work dax_hmem_work = {
> +	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
> +};
> +
> +void dax_hmem_flush_work(void)
> +{
> +	flush_work(&dax_hmem_work.work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> +
>  static int __hmem_register_device(struct device *host, int target_nid,
>  				  const struct resource *res)
>  {
> @@ -122,6 +140,11 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>  	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>  			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (!dax_hmem_initial_probe) {
> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
> +			queue_work(system_long_wq, &dax_hmem_work.work);
> +			return 0;
> +		}
>  		dev_dbg(host, "deferring range to CXL: %pr\n", res);
>  		return 0;
>  	}
> @@ -129,8 +152,54 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	return __hmem_register_device(host, target_nid, res);
>  }
>  
> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> +				    const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) == REGION_DISJOINT)
> +		return 0;
> +
> +	if (cxl_region_contains_resource((struct resource *)res)) {
> +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
> +		return 0;
> +	}
> +
> +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
> +	return __hmem_register_device(host, target_nid, res);
> +}
> +
> +static void process_defer_work(struct work_struct *w)
> +{
> +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
> +	struct platform_device *pdev;
> +
> +	if (!work->pdev)
> +		return;
> +
> +	pdev = work->pdev;
> +
> +	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
> +	wait_for_device_probe();
> +
> +	guard(device)(&pdev->dev);
> +	if (!pdev->dev.driver)
> +		return;
> +
> +	if (!dax_hmem_initial_probe) {
> +		dax_hmem_initial_probe = true;
> +		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> +	}
> +}
> +
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> +	if (work_pending(&dax_hmem_work.work))
> +		return -EBUSY;
> +
> +	if (!dax_hmem_work.pdev)
> +		dax_hmem_work.pdev =
> +			to_platform_device(get_device(&pdev->dev));
> +
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>  }
>  
> @@ -168,6 +237,11 @@ static __init int dax_hmem_init(void)
>  
>  static __exit void dax_hmem_exit(void)
>  {
> +	if (dax_hmem_work.pdev) {
> +		flush_work(&dax_hmem_work.work);
> +		put_device(&dax_hmem_work.pdev->dev);
> +	}
> +
>  	platform_driver_unregister(&dax_hmem_driver);
>  	platform_driver_unregister(&dax_hmem_platform_driver);
>  }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
  2026-03-23 17:11   ` Dave Jiang
  2026-03-23 17:57   ` Jonathan Cameron
@ 2026-03-23 19:37   ` Dan Williams
  2 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2026-03-23 19:37 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Smita Koralahalli wrote:
> alloc_dax_region() calls kref_init() on the dax_region early in the
> function, but the error path for sysfs_create_groups() failure uses
> kfree() directly to free the dax_region. This bypasses the kref lifecycle.
> 
> Use dax_region_put() instead to handle kref lifecycle correctly.

There is no correctness issue here, the object was never published.

I am ok with the change, but be clear that this is for pure symmetry
reasons, not correctness.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
  2026-03-22 19:53 ` [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
@ 2026-03-23 19:54   ` Dan Williams
  2026-03-24  5:46     ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 26+ messages in thread
From: Dan Williams @ 2026-03-23 19:54 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
> Reserved ranges.
> 
> Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
> request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
> loading, it does not enforce that the dependency has finished init
> before the current module runs. This can cause HMEM to start before
> cxl_acpi has populated the resource tree, breaking detection of overlaps
> between Soft Reserved and CXL Windows.
> 
> Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
> cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
> that trigger further module loads. Asynchronous probe flushing
> (wait_for_device_probe()) is added later in the series in a deferred
> context before HMEM makes ownership decisions for Soft Reserved ranges.
> 
> Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
> must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
> Soft Reserved ranges before CXL drivers have had a chance to claim them.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/dax/Kconfig     |  2 ++
>  drivers/dax/hmem/hmem.c | 17 ++++++++++-------
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index d656e4c0eb84..3683bb3f2311 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -48,6 +48,8 @@ config DEV_DAX_CXL
>  	tristate "CXL DAX: direct access to CXL RAM regions"
>  	depends on CXL_BUS && CXL_REGION && DEV_DAX
>  	default CXL_REGION && DEV_DAX
> +	depends on CXL_ACPI >= DEV_DAX_HMEM
> +	depends on CXL_PCI >= DEV_DAX_HMEM

As I learned from Keith's recent CXL_PMEM dependency fix for CXL_ACPI
[1], this wants to be:

depends on DEV_DAX_HMEM || !DEV_DAX_HMEM
depends on CXL_ACPI || !CXL_ACPI
depends on CXL_PCI || !CXL_PCI

...to make sure that DEV_DAX_CXL can never be built-in unless all of its
dependencies are built-in.

[1]: http://lore.kernel.org/69aa341fcf526_6423c1002c@dwillia2-mobl4.notmuch

At this point I am wondering if all of the feedback I have for this
series should just be incremental fixes. I also want to have a canned
unit test that verifies the base expectations. That can also be
something I reply incrementally.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree
  2026-03-22 19:53 ` [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
  2026-03-23 17:31   ` Dave Jiang
@ 2026-03-23 20:55   ` Dan Williams
  1 sibling, 0 replies; 26+ messages in thread
From: Dan Williams @ 2026-03-23 20:55 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Smita Koralahalli wrote:
> Introduce a global "DAX Regions" resource root and register each
> dax_region->res under it via request_resource(). Release the resource on
> dax_region teardown.
> 
> By enforcing a single global namespace for dax_region allocations, this
> ensures only one of dax_hmem or dax_cxl can successfully register a
> dax_region for a given range.
> 
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
>  drivers/dax/bus.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 299134c9b294..68437c05e21d 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -10,6 +10,7 @@
>  #include "dax-private.h"
>  #include "bus.h"
>  
> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");

Just type it out, skip using the DEFINE_RES* macro, like the definitions
of iomem_resource and soft_reserve_resource. Since the argument is a
size not an end address.

>  static DEFINE_MUTEX(dax_bus_lock);
>  
>  /*
> @@ -627,6 +628,7 @@ static void dax_region_unregister(void *region)
>  
>  	sysfs_remove_groups(&dax_region->dev->kobj,
>  			dax_region_attribute_groups);
> +	release_resource(&dax_region->res);
>  	dax_region_put(dax_region);
>  }
>  
> @@ -635,6 +637,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		unsigned long flags)
>  {
>  	struct dax_region *dax_region;
> +	int rc;
>  
>  	/*
>  	 * The DAX core assumes that it can store its private data in
> @@ -667,14 +670,25 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>  		.flags = IORESOURCE_MEM | flags,
>  	};
>  
> -	if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> -		dax_region_put(dax_region);
> -		return NULL;
> +	rc = request_resource(&dax_regions, &dax_region->res);
> +	if (rc) {
> +		dev_dbg(parent, "dax_region resource conflict for %pR\n",
> +			&dax_region->res);

I normally do not like a driver to be chatty, but resource conflicts are
significant. This one deserves to be dev_err().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
  2026-03-22 19:53 ` [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
@ 2026-03-23 21:09   ` Dan Williams
  0 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2026-03-23 21:09 UTC (permalink / raw)
  To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

Smita Koralahalli wrote:
> Reworked from a patch by Alison Schofield <alison.schofield@intel.com>
> 
> Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
> to consume.
> 
> This restores visibility in /proc/iomem for ranges actively in use, while
> avoiding the early-boot conflicts that occurred when Soft Reserved was
> published into iomem before CXL window and region discovery.

I recommend dropping this patch. Given that the v7.0 kernel already set
a new precedent of not publishing "Soft Reserve", there is no pressing
need at this time to bring it back. We can always revive a patch like
this with a regression rationale, but otherwise a less busy /proc/iomem
is attractive.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
  2026-03-23 19:54   ` Dan Williams
@ 2026-03-24  5:46     ` Koralahalli Channabasappa, Smita
  2026-03-24 16:25       ` Dan Williams
  0 siblings, 1 reply; 26+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-24  5:46 UTC (permalink / raw)
  To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
	linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

Hi Dan,

On 3/23/2026 12:54 PM, Dan Williams wrote:
> Smita Koralahalli wrote:
>> From: Dan Williams <dan.j.williams@intel.com>
>>
>> Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
>> Reserved ranges.
>>
>> Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
>> request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
>> loading, it does not enforce that the dependency has finished init
>> before the current module runs. This can cause HMEM to start before
>> cxl_acpi has populated the resource tree, breaking detection of overlaps
>> between Soft Reserved and CXL Windows.
>>
>> Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
>> cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
>> that trigger further module loads. Asynchronous probe flushing
>> (wait_for_device_probe()) is added later in the series in a deferred
>> context before HMEM makes ownership decisions for Soft Reserved ranges.
>>
>> Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
>> must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
>> Soft Reserved ranges before CXL drivers have had a chance to claim them.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> ---
>>   drivers/dax/Kconfig     |  2 ++
>>   drivers/dax/hmem/hmem.c | 17 ++++++++++-------
>>   2 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
>> index d656e4c0eb84..3683bb3f2311 100644
>> --- a/drivers/dax/Kconfig
>> +++ b/drivers/dax/Kconfig
>> @@ -48,6 +48,8 @@ config DEV_DAX_CXL
>>   	tristate "CXL DAX: direct access to CXL RAM regions"
>>   	depends on CXL_BUS && CXL_REGION && DEV_DAX
>>   	default CXL_REGION && DEV_DAX
>> +	depends on CXL_ACPI >= DEV_DAX_HMEM
>> +	depends on CXL_PCI >= DEV_DAX_HMEM
> 
> As I learned from Keith's recent CXL_PMEM dependency fix for CXL_ACPI
> [1], this wants to be:
> 
> depends on DEV_DAX_HMEM || !DEV_DAX_HMEM
> depends on CXL_ACPI || !CXL_ACPI
> depends on CXL_PCI || !CXL_PCI
> 
> ...to make sure that DEV_DAX_CXL can never be built-in unless all of its
> dependencies are built-in.
> 
> [1]: http://lore.kernel.org/69aa341fcf526_6423c1002c@dwillia2-mobl4.notmuch
> 
> At this point I am wondering if all of the feedback I have for this
> series should just be incremental fixes. I also want to have a canned
> unit test that verifies the base expectations. That can also be
> something I reply incrementally.

Two things on the Kconfig change:

When DEV_DAX_HMEM = y and CXL_ACPI = m and CXL_PCI = m

1. Regarding switching from >= to || ! pattern:

The >= pattern disabled DEV_DAX_CXL entirely when DEV_DAX_HMEM = y and 
CXL_ACPI/CXL_PCI = m. So, HMEM unconditionally owned all ranges - the 
CXL deferral path is never entered.

With the || ! pattern, DEV_DAX_CXL is enabled, which changes the 
ownership behavior based on how the probes starts for CXL_ACPI/CXL_PCI.

On my system I see:

   [  7.379] dax_hmem_platform_probe began
   [  7.384] alloc_dev_dax_range: dax0.0
   [ 28.560] cxl acpi probe started     <- 21 seconds later

HMEM ends up owning in this case because CXL windows aren't published 
yet when HMEM probes (built-in runs before modules load and 
request_module might not work this early??), so region_intersects() 
returns DISJOINT for all CXL ranges.

But it could go the other way if CXL ACPI and PCI probe starts before 
the deferred work is queued in HMEM. (And I think this is the expected 
path if DEV_DAX_CXL is enabled..)

But do you think it is okay as of now with resource exclusion handling??

2. Separate build issue with DEV_DAX_HMEM = y,  CXL_BUS/ACPI/PCI = m and
CXL_REGION = y.

I hit this build error when I was testing the above config: (Sorry I 
should have checked this config before)..

When DEV_DAX_HMEM = y and CXL core is built as a module hmem.c calls 
cxl_region_contains_resource() which lives in cxl_core.ko causing an 
undefined reference at link time.

This happens with both the >= and || ! Kconfig patterns.

The current #ifdef CONFIG_CXL_REGION guard evaluates to true even when 
CXL_REGION is compiled into a module. Changing the guard to check 
reachability of the actual module in include/cxl/cxl.h worked for me to 
overcome the error:

-#ifdef CONFIG_CXL_REGION
+#if IS_REACHABLE(CONFIG_CXL_BUS) && defined(CONFIG_CXL_REGION)
bool cxl_region_contains_resource(struct resource *res);
#else
...

Not sure if CONFIG_CXL_BUS is the right check here or it should be more 
specifically checking on CXL_ACPI or PCI..

Thanks
Smita



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
  2026-03-24  5:46     ` Koralahalli Channabasappa, Smita
@ 2026-03-24 16:25       ` Dan Williams
  0 siblings, 0 replies; 26+ messages in thread
From: Dan Williams @ 2026-03-24 16:25 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita, Dan Williams, Smita Koralahalli,
	linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

Koralahalli Channabasappa, Smita wrote:
[..]
> > As I learned from Keith's recent CXL_PMEM dependency fix for CXL_ACPI
> > [1], this wants to be:
> > 
> > depends on DEV_DAX_HMEM || !DEV_DAX_HMEM
> > depends on CXL_ACPI || !CXL_ACPI
> > depends on CXL_PCI || !CXL_PCI
> > 
> > ...to make sure that DEV_DAX_CXL can never be built-in unless all of its
> > dependencies are built-in.
> > 
> > [1]: http://lore.kernel.org/69aa341fcf526_6423c1002c@dwillia2-mobl4.notmuch
> > 
> > At this point I am wondering if all of the feedback I have for this
> > series should just be incremental fixes. I also want to have a canned
> > unit test that verifies the base expectations. That can also be
> > something I reply incrementally.
> 
> Two things on the Kconfig change:
> 
> When DEV_DAX_HMEM = y and CXL_ACPI = m and CXL_PCI = m

Right, this should not be possible. The patch I am testing moves the
optional CXL dependencies to DEV_DAX_HMEM where they belong. I
mistakenly showed them against DEV_DAX_CXL in my comment.

> 1. Regarding switching from >= to || ! pattern:
> 
> The >= pattern disabled DEV_DAX_CXL entirely when DEV_DAX_HMEM = y and 
> CXL_ACPI/CXL_PCI = m. So, HMEM unconditionally owned all ranges - the 
> CXL deferral path is never entered.

That is one of the broken configurations to fix. It should never be
possible to set DEV_DAX_HMEM=y unless CXL_ACPI and CXL_PCI are both
disabled or both built-in.

> When DEV_DAX_HMEM = y and CXL core is built as a module hmem.c calls 
> cxl_region_contains_resource() which lives in cxl_core.ko causing an 
> undefined reference at link time.

Yes, I hit this as well and requires another CXL_BUS dependency.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-23 18:13   ` Jonathan Cameron
@ 2026-03-24 21:50     ` Koralahalli Channabasappa, Smita
  2026-03-25 12:12       ` Jonathan Cameron
  0 siblings, 1 reply; 26+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-24 21:50 UTC (permalink / raw)
  To: Jonathan Cameron, Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
	Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
	Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
	Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
	Tomasz Wolski

Hi Jonathan,

On 3/23/2026 11:13 AM, Jonathan Cameron wrote:
> On Sun, 22 Mar 2026 19:53:41 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> 
>> The current probe time ownership check for Soft Reserved memory based
>> solely on CXL window intersection is insufficient. dax_hmem probing is not
>> always guaranteed to run after CXL enumeration and region assembly, which
>> can lead to incorrect ownership decisions before the CXL stack has
>> finished publishing windows and assembling committed regions.
>>
>> Introduce deferred ownership handling for Soft Reserved ranges that
>> intersect CXL windows. When such a range is encountered during the
>> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
>> to complete enumeration and region assembly before deciding ownership.
>>
>> Once the deferred work runs, evaluate each Soft Reserved range
>> individually: if a CXL region fully contains the range, skip it and let
>> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
>> ownership model avoids the need for CXL region teardown and
>> alloc_dax_region() resource exclusion prevents double claiming.
>>
>> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
>> so it survives module reload. Ensure dax_cxl defers driver registration
>> until dax_hmem has completed ownership resolution. dax_cxl calls
>> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
>> the deferred work to complete and creates a module symbol dependency that
>> forces dax_hmem.ko to load before dax_cxl.
>>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> 
> https://sashiko.dev/#/patchset/20260322195343.206900-1-Smita.KoralahalliChannabasappa%40amd.com
> Might be worth a look.  I think the last comment is potentially correct
> though unlikely a platform_driver_register() actually fails.
> 
> I've not looked too closely at the others. Given this was doing something
> unusual I thought I'd see what it found. Looks like some interesting
> questions if nothing else.

Thanks for pointing this out. I went through the findings:

The init error path one is valid I think, if 
platform_driver_register(&dax_hmem_driver) fails after 
dax_hmem_platform_driver has already probed and queued work, the error 
path doesn't flush the work or release the pdev reference.

I was thinking something like below for v9:

@@ -258,8 +262,13 @@ static __init int dax_hmem_init(void)
		return rc;

	rc = platform_driver_register(&dax_hmem_driver);
-	if (rc)
+	if (rc) {
+		if (dax_hmem_work.pdev) {
+			flush_work(&dax_hmem_work.work);
+			put_device(&dax_hmem_work.pdev->dev);
+		}
		platform_driver_unregister(&dax_hmem_platform_driver);
+	}

	return rc;
  }


Worth adding considering the unlikeliness?

The others I looked at the IS_ENABLED vs IS_REACHABLE question is 
something I'm discussing with Dan in 3/9 (there's a Kconfig dependency 
and CXL_BUS dependency fix needed I guess), the module reload behavior 
is intentional and others are mostly false positives I think..

Thanks,
Smita

> 
>> ---
>>   drivers/dax/bus.h         |  7 ++++
>>   drivers/dax/cxl.c         |  1 +
>>   drivers/dax/hmem/device.c |  3 ++
>>   drivers/dax/hmem/hmem.c   | 74 +++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>> index cbbf64443098..ebbfe2d6da14 100644
>> --- a/drivers/dax/bus.h
>> +++ b/drivers/dax/bus.h
>> @@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
>>   void kill_dev_dax(struct dev_dax *dev_dax);
>>   bool static_dev_dax(struct dev_dax *dev_dax);
>>   
>> +#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
>> +extern bool dax_hmem_initial_probe;
>> +void dax_hmem_flush_work(void);
>> +#else
>> +static inline void dax_hmem_flush_work(void) { }
>> +#endif
>> +
>>   #define MODULE_ALIAS_DAX_DEVICE(type) \
>>   	MODULE_ALIAS("dax:t" __stringify(type) "*")
>>   #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>> index a2136adfa186..3ab39b77843d 100644
>> --- a/drivers/dax/cxl.c
>> +++ b/drivers/dax/cxl.c
>> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>>   
>>   static void cxl_dax_region_driver_register(struct work_struct *work)
>>   {
>> +	dax_hmem_flush_work();
>>   	cxl_driver_register(&cxl_dax_region_driver);
>>   }
>>   
>> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
>> index 56e3cbd181b5..991a4bf7d969 100644
>> --- a/drivers/dax/hmem/device.c
>> +++ b/drivers/dax/hmem/device.c
>> @@ -8,6 +8,9 @@
>>   static bool nohmem;
>>   module_param_named(disable, nohmem, bool, 0444);
>>   
>> +bool dax_hmem_initial_probe;
>> +EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
>> +
>>   static bool platform_initialized;
>>   static DEFINE_MUTEX(hmem_resource_lock);
>>   static struct resource hmem_active = {
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index ca752db03201..9ceda6b5cadf 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -3,6 +3,7 @@
>>   #include <linux/memregion.h>
>>   #include <linux/module.h>
>>   #include <linux/dax.h>
>> +#include <cxl/cxl.h>
>>   #include "../bus.h"
>>   
>>   static bool region_idle;
>> @@ -58,6 +59,23 @@ static void release_hmem(void *pdev)
>>   	platform_device_unregister(pdev);
>>   }
>>   
>> +struct dax_defer_work {
>> +	struct platform_device *pdev;
>> +	struct work_struct work;
>> +};
>> +
>> +static void process_defer_work(struct work_struct *w);
>> +
>> +static struct dax_defer_work dax_hmem_work = {
>> +	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
>> +};
>> +
>> +void dax_hmem_flush_work(void)
>> +{
>> +	flush_work(&dax_hmem_work.work);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
>> +
>>   static int __hmem_register_device(struct device *host, int target_nid,
>>   				  const struct resource *res)
>>   {
>> @@ -122,6 +140,11 @@ static int hmem_register_device(struct device *host, int target_nid,
>>   	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>>   			      IORES_DESC_CXL) != REGION_DISJOINT) {
>> +		if (!dax_hmem_initial_probe) {
>> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
>> +			queue_work(system_long_wq, &dax_hmem_work.work);
>> +			return 0;
>> +		}
>>   		dev_dbg(host, "deferring range to CXL: %pr\n", res);
>>   		return 0;
>>   	}
>> @@ -129,8 +152,54 @@ static int hmem_register_device(struct device *host, int target_nid,
>>   	return __hmem_register_device(host, target_nid, res);
>>   }
>>   
>> +static int hmem_register_cxl_device(struct device *host, int target_nid,
>> +				    const struct resource *res)
>> +{
>> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> +			      IORES_DESC_CXL) == REGION_DISJOINT)
>> +		return 0;
>> +
>> +	if (cxl_region_contains_resource((struct resource *)res)) {
>> +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
>> +		return 0;
>> +	}
>> +
>> +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
>> +	return __hmem_register_device(host, target_nid, res);
>> +}
>> +
>> +static void process_defer_work(struct work_struct *w)
>> +{
>> +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
>> +	struct platform_device *pdev;
>> +
>> +	if (!work->pdev)
>> +		return;
>> +
>> +	pdev = work->pdev;
>> +
>> +	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
>> +	wait_for_device_probe();
>> +
>> +	guard(device)(&pdev->dev);
>> +	if (!pdev->dev.driver)
>> +		return;
>> +
>> +	if (!dax_hmem_initial_probe) {
>> +		dax_hmem_initial_probe = true;
>> +		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
>> +	}
>> +}
>> +
>>   static int dax_hmem_platform_probe(struct platform_device *pdev)
>>   {
>> +	if (work_pending(&dax_hmem_work.work))
>> +		return -EBUSY;
>> +
>> +	if (!dax_hmem_work.pdev)
>> +		dax_hmem_work.pdev =
>> +			to_platform_device(get_device(&pdev->dev));
>> +
>>   	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>>   }
>>   
>> @@ -168,6 +237,11 @@ static __init int dax_hmem_init(void)
>>   
>>   static __exit void dax_hmem_exit(void)
>>   {
>> +	if (dax_hmem_work.pdev) {
>> +		flush_work(&dax_hmem_work.work);
>> +		put_device(&dax_hmem_work.pdev->dev);
>> +	}
>> +
>>   	platform_driver_unregister(&dax_hmem_driver);
>>   	platform_driver_unregister(&dax_hmem_platform_driver);
>>   }
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  2026-03-24 21:50     ` Koralahalli Channabasappa, Smita
@ 2026-03-25 12:12       ` Jonathan Cameron
  0 siblings, 0 replies; 26+ messages in thread
From: Jonathan Cameron @ 2026-03-25 12:12 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita
  Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm, Ard Biesheuvel, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Tomasz Wolski

On Tue, 24 Mar 2026 14:50:59 -0700
"Koralahalli Channabasappa, Smita" <skoralah@amd.com> wrote:

> Hi Jonathan,
> 
> On 3/23/2026 11:13 AM, Jonathan Cameron wrote:
> > On Sun, 22 Mar 2026 19:53:41 +0000
> > Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> >   
> >> The current probe time ownership check for Soft Reserved memory based
> >> solely on CXL window intersection is insufficient. dax_hmem probing is not
> >> always guaranteed to run after CXL enumeration and region assembly, which
> >> can lead to incorrect ownership decisions before the CXL stack has
> >> finished publishing windows and assembling committed regions.
> >>
> >> Introduce deferred ownership handling for Soft Reserved ranges that
> >> intersect CXL windows. When such a range is encountered during the
> >> initial dax_hmem probe, schedule deferred work to wait for the CXL stack
> >> to complete enumeration and region assembly before deciding ownership.
> >>
> >> Once the deferred work runs, evaluate each Soft Reserved range
> >> individually: if a CXL region fully contains the range, skip it and let
> >> dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
> >> ownership model avoids the need for CXL region teardown and
> >> alloc_dax_region() resource exclusion prevents double claiming.
> >>
> >> Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
> >> so it survives module reload. Ensure dax_cxl defers driver registration
> >> until dax_hmem has completed ownership resolution. dax_cxl calls
> >> dax_hmem_flush_work() before cxl_driver_register(), which both waits for
> >> the deferred work to complete and creates a module symbol dependency that
> >> forces dax_hmem.ko to load before dax_cxl.
> >>
> >> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>  
> > 
> > https://sashiko.dev/#/patchset/20260322195343.206900-1-Smita.KoralahalliChannabasappa%40amd.com
> > Might be worth a look.  I think the last comment is potentially correct
> > though unlikely a platform_driver_register() actually fails.
> > 
> > I've not looked too closely at the others. Given this was doing something
> > unusual I thought I'd see what it found. Looks like some interesting
> > questions if nothing else.  
> 
> Thanks for pointing this out. I went through the findings:
> 
> The init error path one is valid I think, if 
> platform_driver_register(&dax_hmem_driver) fails after 
> dax_hmem_platform_driver has already probed and queued work, the error 
> path doesn't flush the work or release the pdev reference.
> 
> I was thinking something like below for v9:
> 
> @@ -258,8 +262,13 @@ static __init int dax_hmem_init(void)
> 		return rc;
> 
> 	rc = platform_driver_register(&dax_hmem_driver);
> -	if (rc)
> +	if (rc) {
> +		if (dax_hmem_work.pdev) {
> +			flush_work(&dax_hmem_work.work);
> +			put_device(&dax_hmem_work.pdev->dev);
> +		}
> 		platform_driver_unregister(&dax_hmem_platform_driver);
> +	}
> 
> 	return rc;
>   }
> 
> 
> Worth adding considering the unlikeliness?

I think so.  Alternative would be a very obvious comment to say we've
deliberately not handled this corner.  Code seems easier to me and
lines up with what remove is doing.


> 
> The others I looked at the IS_ENABLED vs IS_REACHABLE question is 
> something I'm discussing with Dan in 3/9 (there's a Kconfig dependency 
> and CXL_BUS dependency fix needed I guess), the module reload behavior 
> is intentional and others are mostly false positives I think..

I was more suspicious of those ones as can never remember exactly what
the effective rule are.

Thanks,

J
> 
> Thanks,
> Smita
> 
> >   
> >> ---
> >>   drivers/dax/bus.h         |  7 ++++
> >>   drivers/dax/cxl.c         |  1 +
> >>   drivers/dax/hmem/device.c |  3 ++
> >>   drivers/dax/hmem/hmem.c   | 74 +++++++++++++++++++++++++++++++++++++++
> >>   4 files changed, 85 insertions(+)
> >>
> >> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> >> index cbbf64443098..ebbfe2d6da14 100644
> >> --- a/drivers/dax/bus.h
> >> +++ b/drivers/dax/bus.h
> >> @@ -49,6 +49,13 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv);
> >>   void kill_dev_dax(struct dev_dax *dev_dax);
> >>   bool static_dev_dax(struct dev_dax *dev_dax);
> >>   
> >> +#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
> >> +extern bool dax_hmem_initial_probe;
> >> +void dax_hmem_flush_work(void);
> >> +#else
> >> +static inline void dax_hmem_flush_work(void) { }
> >> +#endif
> >> +
> >>   #define MODULE_ALIAS_DAX_DEVICE(type) \
> >>   	MODULE_ALIAS("dax:t" __stringify(type) "*")
> >>   #define DAX_DEVICE_MODALIAS_FMT "dax:t%d"
> >> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> >> index a2136adfa186..3ab39b77843d 100644
> >> --- a/drivers/dax/cxl.c
> >> +++ b/drivers/dax/cxl.c
> >> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
> >>   
> >>   static void cxl_dax_region_driver_register(struct work_struct *work)
> >>   {
> >> +	dax_hmem_flush_work();
> >>   	cxl_driver_register(&cxl_dax_region_driver);
> >>   }
> >>   
> >> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> >> index 56e3cbd181b5..991a4bf7d969 100644
> >> --- a/drivers/dax/hmem/device.c
> >> +++ b/drivers/dax/hmem/device.c
> >> @@ -8,6 +8,9 @@
> >>   static bool nohmem;
> >>   module_param_named(disable, nohmem, bool, 0444);
> >>   
> >> +bool dax_hmem_initial_probe;
> >> +EXPORT_SYMBOL_GPL(dax_hmem_initial_probe);
> >> +
> >>   static bool platform_initialized;
> >>   static DEFINE_MUTEX(hmem_resource_lock);
> >>   static struct resource hmem_active = {
> >> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> >> index ca752db03201..9ceda6b5cadf 100644
> >> --- a/drivers/dax/hmem/hmem.c
> >> +++ b/drivers/dax/hmem/hmem.c
> >> @@ -3,6 +3,7 @@
> >>   #include <linux/memregion.h>
> >>   #include <linux/module.h>
> >>   #include <linux/dax.h>
> >> +#include <cxl/cxl.h>
> >>   #include "../bus.h"
> >>   
> >>   static bool region_idle;
> >> @@ -58,6 +59,23 @@ static void release_hmem(void *pdev)
> >>   	platform_device_unregister(pdev);
> >>   }
> >>   
> >> +struct dax_defer_work {
> >> +	struct platform_device *pdev;
> >> +	struct work_struct work;
> >> +};
> >> +
> >> +static void process_defer_work(struct work_struct *w);
> >> +
> >> +static struct dax_defer_work dax_hmem_work = {
> >> +	.work = __WORK_INITIALIZER(dax_hmem_work.work, process_defer_work),
> >> +};
> >> +
> >> +void dax_hmem_flush_work(void)
> >> +{
> >> +	flush_work(&dax_hmem_work.work);
> >> +}
> >> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> >> +
> >>   static int __hmem_register_device(struct device *host, int target_nid,
> >>   				  const struct resource *res)
> >>   {
> >> @@ -122,6 +140,11 @@ static int hmem_register_device(struct device *host, int target_nid,
> >>   	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> >>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> >>   			      IORES_DESC_CXL) != REGION_DISJOINT) {
> >> +		if (!dax_hmem_initial_probe) {
> >> +			dev_dbg(host, "await CXL initial probe: %pr\n", res);
> >> +			queue_work(system_long_wq, &dax_hmem_work.work);
> >> +			return 0;
> >> +		}
> >>   		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> >>   		return 0;
> >>   	}
> >> @@ -129,8 +152,54 @@ static int hmem_register_device(struct device *host, int target_nid,
> >>   	return __hmem_register_device(host, target_nid, res);
> >>   }
> >>   
> >> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> >> +				    const struct resource *res)
> >> +{
> >> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> >> +			      IORES_DESC_CXL) == REGION_DISJOINT)
> >> +		return 0;
> >> +
> >> +	if (cxl_region_contains_resource((struct resource *)res)) {
> >> +		dev_dbg(host, "CXL claims resource, dropping: %pr\n", res);
> >> +		return 0;
> >> +	}
> >> +
> >> +	dev_dbg(host, "CXL did not claim resource, registering: %pr\n", res);
> >> +	return __hmem_register_device(host, target_nid, res);
> >> +}
> >> +
> >> +static void process_defer_work(struct work_struct *w)
> >> +{
> >> +	struct dax_defer_work *work = container_of(w, typeof(*work), work);
> >> +	struct platform_device *pdev;
> >> +
> >> +	if (!work->pdev)
> >> +		return;
> >> +
> >> +	pdev = work->pdev;
> >> +
> >> +	/* Relies on cxl_acpi and cxl_pci having had a chance to load */
> >> +	wait_for_device_probe();
> >> +
> >> +	guard(device)(&pdev->dev);
> >> +	if (!pdev->dev.driver)
> >> +		return;
> >> +
> >> +	if (!dax_hmem_initial_probe) {
> >> +		dax_hmem_initial_probe = true;
> >> +		walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> >> +	}
> >> +}
> >> +
> >>   static int dax_hmem_platform_probe(struct platform_device *pdev)
> >>   {
> >> +	if (work_pending(&dax_hmem_work.work))
> >> +		return -EBUSY;
> >> +
> >> +	if (!dax_hmem_work.pdev)
> >> +		dax_hmem_work.pdev =
> >> +			to_platform_device(get_device(&pdev->dev));
> >> +
> >>   	return walk_hmem_resources(&pdev->dev, hmem_register_device);
> >>   }
> >>   
> >> @@ -168,6 +237,11 @@ static __init int dax_hmem_init(void)
> >>   
> >>   static __exit void dax_hmem_exit(void)
> >>   {
> >> +	if (dax_hmem_work.pdev) {
> >> +		flush_work(&dax_hmem_work.work);
> >> +		put_device(&dax_hmem_work.pdev->dev);
> >> +	}
> >> +
> >>   	platform_driver_unregister(&dax_hmem_driver);
> >>   	platform_driver_unregister(&dax_hmem_platform_driver);
> >>   }  
> >   
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-03-25 12:12 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
2026-03-23 17:11   ` Dave Jiang
2026-03-23 17:57   ` Jonathan Cameron
2026-03-23 19:37   ` Dan Williams
2026-03-22 19:53 ` [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device() Smita Koralahalli
2026-03-23 17:14   ` Dave Jiang
2026-03-23 17:59   ` Jonathan Cameron
2026-03-22 19:53 ` [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-03-23 19:54   ` Dan Williams
2026-03-24  5:46     ` Koralahalli Channabasappa, Smita
2026-03-24 16:25       ` Dan Williams
2026-03-22 19:53 ` [PATCH v8 4/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 5/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
2026-03-23 17:31   ` Dave Jiang
2026-03-23 20:55   ` Dan Williams
2026-03-22 19:53 ` [PATCH v8 7/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
2026-03-23 18:03   ` Jonathan Cameron
2026-03-23 18:13   ` Jonathan Cameron
2026-03-24 21:50     ` Koralahalli Channabasappa, Smita
2026-03-25 12:12       ` Jonathan Cameron
2026-03-23 18:17   ` Dave Jiang
2026-03-22 19:53 ` [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2026-03-23 21:09   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox