linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
@ 2025-09-30  4:47 Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 1/5] dax/hmem, e820, resource: Defer Soft Reserved registration until hmem is ready Smita Koralahalli
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Smita Koralahalli @ 2025-09-30  4:47 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Smita Koralahalli, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Ard Biesheuvel

This series aims to address long-standing conflicts between dax_hmem and
CXL when handling Soft Reserved memory ranges.

Reworked from Dan's patch:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d

Previous work:
https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/

Link to v2:
https://lore.kernel.org/all/20250930042814.213912-1-Smita.KoralahalliChannabasappa@amd.com/

To note:
I have dropped 6/6th patch from v1 based on discussion with Zhijian. This
patch series doesn't cover the case of DEV_DAX_HMEM=y and CXL=m, which
results in DEV_DAX_CXL being disabled.

In that configuration, ownership of the soft-reserved ranges is handled by
HMEM instead of being managed by CXL.

/proc/iomem for this case looks like below:

850000000-284fffffff : CXL Window 0
  850000000-284fffffff : region3
    850000000-284fffffff : Soft Reserved
      850000000-284fffffff : dax0.0
        850000000-284fffffff : System RAM (kmem)
2850000000-484fffffff : CXL Window 1
  2850000000-484fffffff : region4
    2850000000-484fffffff : Soft Reserved
      2850000000-484fffffff : dax1.0
        2850000000-484fffffff : System RAM (kmem)
4850000000-684fffffff : CXL Window 2
  4850000000-684fffffff : region5
    4850000000-684fffffff : Soft Reserved
      4850000000-684fffffff : dax2.0
        4850000000-684fffffff : System RAM (kmem)

Link to the patch and discussions on this:
https://lore.kernel.org/all/20250822034202.26896-7-Smita.KoralahalliChannabasappa@amd.com/

I would appreciate input on how best to handle this scenario efficiently.

Applies to mainline master.

v3 updates:
 - Fixed two "From".

v2 updates:
 - Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem
   depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian)
 - Added TODO note. (Zhijian)
 - Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE
   conditional check. (Zhijian)
 - insert_resource_late() -> insert_resource_expand_to_fit() and
   __insert_resource_expand_to_fit() replacement. (Boris)
 - Fixed Co-developed and Signed-off by. (Dan)
 - Combined 2/6 and 3/6 into a single patch. (Zhijian).
 - Skip local variable in remove_soft_reserved. (Jonathan)
 - Drop kfree with __free(). (Jonathan)
 - return 0 -> return dev_add_action_or_reset(host...) (Jonathan)
 - Dropped 6/6.
 - Reviewed-by tags (Dave, Jonathan)

Dan Williams (4):
  dax/hmem, e820, resource: Defer Soft Reserved registration until hmem
    is ready
  dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
    ranges
  dax/hmem: Use DEV_DAX_CXL instead of CXL_REGION for deferral
  dax/hmem: Defer Soft Reserved overlap handling until CXL region
    assembly completes

Smita Koralahalli (1):
  dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree

 arch/x86/kernel/e820.c    |   2 +-
 drivers/cxl/acpi.c        |   2 +-
 drivers/dax/Kconfig       |   2 +
 drivers/dax/hmem/device.c |   4 +-
 drivers/dax/hmem/hmem.c   | 128 ++++++++++++++++++++++++++++++++++----
 include/linux/ioport.h    |  13 +++-
 kernel/resource.c         |  92 +++++++++++++++++++++++----
 7 files changed, 213 insertions(+), 30 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/5] dax/hmem, e820, resource: Defer Soft Reserved registration until hmem is ready
  2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
@ 2025-09-30  4:47 ` Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 2/5] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Smita Koralahalli @ 2025-09-30  4:47 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Smita Koralahalli, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Ard Biesheuvel

From: Dan Williams <dan.j.williams@intel.com>

Insert Soft Reserved memory into a dedicated soft_reserve_resource tree
instead of the iomem_resource tree at boot.

Publishing Soft Reserved ranges into iomem too early causes conflicts with
CXL hotplug and region assembly failure, especially when Soft Reserved
overlaps CXL regions.

Re-inserting these ranges into iomem will be handled in follow-up patches,
after ensuring CXL window publication ordering is stabilized and when the
dax_hmem is ready to consume them.

This avoids trimming or deleting resources later and provides a cleaner
handoff between EFI-defined memory and CXL resource management.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Co-developed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/e820.c    |  2 +-
 drivers/cxl/acpi.c        |  2 +-
 drivers/dax/hmem/device.c |  4 +-
 drivers/dax/hmem/hmem.c   |  7 ++-
 include/linux/ioport.h    | 13 +++++-
 kernel/resource.c         | 92 +++++++++++++++++++++++++++++++++------
 6 files changed, 100 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c3acbd26408b..c32f144f0e4a 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1153,7 +1153,7 @@ void __init e820__reserve_resources_late(void)
 	res = e820_res;
 	for (i = 0; i < e820_table->nr_entries; i++) {
 		if (!res->parent && res->end)
-			insert_resource_expand_to_fit(&iomem_resource, res);
+			insert_resource_expand_to_fit(res);
 		res++;
 	}
 
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 712624cba2b6..3b73adf80bb4 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -839,7 +839,7 @@ static int add_cxl_resources(struct resource *cxl_res)
 		 */
 		cxl_set_public_resource(res, new);
 
-		insert_resource_expand_to_fit(&iomem_resource, new);
+		__insert_resource_expand_to_fit(&iomem_resource, new);
 
 		next = res->sibling;
 		while (next && resource_overlaps(new, next)) {
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index f9e1a76a04a9..22732b729017 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -83,8 +83,8 @@ static __init int hmem_register_one(struct resource *res, void *data)
 
 static __init int hmem_init(void)
 {
-	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
-			IORESOURCE_MEM, 0, -1, NULL, hmem_register_one);
+	walk_soft_reserve_res_desc(IORES_DESC_SOFT_RESERVED, IORESOURCE_MEM, 0,
+				   -1, NULL, hmem_register_one);
 	return 0;
 }
 
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index c18451a37e4f..48f4642f4bb8 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -73,11 +73,14 @@ static int hmem_register_device(struct device *host, int target_nid,
 		return 0;
 	}
 
-	rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
-			       IORES_DESC_SOFT_RESERVED);
+	rc = region_intersects_soft_reserve(res->start, resource_size(res),
+					    IORESOURCE_MEM,
+					    IORES_DESC_SOFT_RESERVED);
 	if (rc != REGION_INTERSECTS)
 		return 0;
 
+	/* TODO: Add Soft-Reserved memory back to iomem */
+
 	id = memregion_alloc(GFP_KERNEL);
 	if (id < 0) {
 		dev_err(host, "memregion allocation failure for %pr\n", res);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index e8b2d6aa4013..e20226870a81 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -232,6 +232,9 @@ struct resource_constraint {
 /* PC/ISA/whatever - the normal PC address spaces: IO and memory */
 extern struct resource ioport_resource;
 extern struct resource iomem_resource;
+#ifdef CONFIG_EFI_SOFT_RESERVE
+extern struct resource soft_reserve_resource;
+#endif
 
 extern struct resource *request_resource_conflict(struct resource *root, struct resource *new);
 extern int request_resource(struct resource *root, struct resource *new);
@@ -242,7 +245,8 @@ extern void reserve_region_with_split(struct resource *root,
 			     const char *name);
 extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new);
 extern int insert_resource(struct resource *parent, struct resource *new);
-extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new);
+extern void __insert_resource_expand_to_fit(struct resource *root, struct resource *new);
+extern void insert_resource_expand_to_fit(struct resource *new);
 extern int remove_resource(struct resource *old);
 extern void arch_remove_reservations(struct resource *avail);
 extern int allocate_resource(struct resource *root, struct resource *new,
@@ -409,6 +413,13 @@ walk_system_ram_res_rev(u64 start, u64 end, void *arg,
 extern int
 walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
 		    void *arg, int (*func)(struct resource *, void *));
+extern int
+walk_soft_reserve_res_desc(unsigned long desc, unsigned long flags,
+			   u64 start, u64 end, void *arg,
+			   int (*func)(struct resource *, void *));
+extern int
+region_intersects_soft_reserve(resource_size_t start, size_t size,
+			       unsigned long flags, unsigned long desc);
 
 struct resource *devm_request_free_mem_region(struct device *dev,
 		struct resource *base, unsigned long size);
diff --git a/kernel/resource.c b/kernel/resource.c
index f9bb5481501a..70e750cc0d7b 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -321,13 +321,14 @@ static bool is_type_match(struct resource *p, unsigned long flags, unsigned long
 }
 
 /**
- * find_next_iomem_res - Finds the lowest iomem resource that covers part of
- *			 [@start..@end].
+ * find_next_res - Finds the lowest resource that covers part of
+ *		   [@start..@end].
  *
  * If a resource is found, returns 0 and @*res is overwritten with the part
  * of the resource that's within [@start..@end]; if none is found, returns
  * -ENODEV.  Returns -EINVAL for invalid parameters.
  *
+ * @parent:	resource tree root to search
  * @start:	start address of the resource searched for
  * @end:	end address of same resource
  * @flags:	flags which the resource must have
@@ -337,9 +338,9 @@ static bool is_type_match(struct resource *p, unsigned long flags, unsigned long
  * The caller must specify @start, @end, @flags, and @desc
  * (which may be IORES_DESC_NONE).
  */
-static int find_next_iomem_res(resource_size_t start, resource_size_t end,
-			       unsigned long flags, unsigned long desc,
-			       struct resource *res)
+static int find_next_res(struct resource *parent, resource_size_t start,
+			 resource_size_t end, unsigned long flags,
+			 unsigned long desc, struct resource *res)
 {
 	struct resource *p;
 
@@ -351,7 +352,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 
 	read_lock(&resource_lock);
 
-	for_each_resource(&iomem_resource, p, false) {
+	for_each_resource(parent, p, false) {
 		/* If we passed the resource we are looking for, stop */
 		if (p->start > end) {
 			p = NULL;
@@ -382,16 +383,23 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 	return p ? 0 : -ENODEV;
 }
 
-static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
-				 unsigned long flags, unsigned long desc,
-				 void *arg,
-				 int (*func)(struct resource *, void *))
+static int find_next_iomem_res(resource_size_t start, resource_size_t end,
+			       unsigned long flags, unsigned long desc,
+			       struct resource *res)
+{
+	return find_next_res(&iomem_resource, start, end, flags, desc, res);
+}
+
+static int walk_res_desc(struct resource *parent, resource_size_t start,
+			 resource_size_t end, unsigned long flags,
+			 unsigned long desc, void *arg,
+			 int (*func)(struct resource *, void *))
 {
 	struct resource res;
 	int ret = -EINVAL;
 
 	while (start < end &&
-	       !find_next_iomem_res(start, end, flags, desc, &res)) {
+	       !find_next_res(parent, start, end, flags, desc, &res)) {
 		ret = (*func)(&res, arg);
 		if (ret)
 			break;
@@ -402,6 +410,15 @@ static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
 	return ret;
 }
 
+static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
+				 unsigned long flags, unsigned long desc,
+				 void *arg,
+				 int (*func)(struct resource *, void *))
+{
+	return walk_res_desc(&iomem_resource, start, end, flags, desc, arg, func);
+}
+
+
 /**
  * walk_iomem_res_desc - Walks through iomem resources and calls func()
  *			 with matching resource ranges.
@@ -426,6 +443,26 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start,
 }
 EXPORT_SYMBOL_GPL(walk_iomem_res_desc);
 
+#ifdef CONFIG_EFI_SOFT_RESERVE
+struct resource soft_reserve_resource = {
+	.name	= "Soft Reserved",
+	.start	= 0,
+	.end	= -1,
+	.desc	= IORES_DESC_SOFT_RESERVED,
+	.flags	= IORESOURCE_MEM,
+};
+EXPORT_SYMBOL_GPL(soft_reserve_resource);
+
+int walk_soft_reserve_res_desc(unsigned long desc, unsigned long flags,
+			       u64 start, u64 end, void *arg,
+			       int (*func)(struct resource *, void *))
+{
+	return walk_res_desc(&soft_reserve_resource, start, end, flags, desc,
+			     arg, func);
+}
+EXPORT_SYMBOL_GPL(walk_soft_reserve_res_desc);
+#endif
+
 /*
  * This function calls the @func callback against all memory ranges of type
  * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
@@ -648,6 +685,22 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
+#ifdef CONFIG_EFI_SOFT_RESERVE
+int region_intersects_soft_reserve(resource_size_t start, size_t size,
+				   unsigned long flags, unsigned long desc)
+{
+	int ret;
+
+	read_lock(&resource_lock);
+	ret = __region_intersects(&soft_reserve_resource, start, size, flags,
+				  desc);
+	read_unlock(&resource_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(region_intersects_soft_reserve);
+#endif
+
 void __weak arch_remove_reservations(struct resource *avail)
 {
 }
@@ -966,7 +1019,7 @@ EXPORT_SYMBOL_GPL(insert_resource);
  * Insert a resource into the resource tree, possibly expanding it in order
  * to make it encompass any conflicting resources.
  */
-void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
+void __insert_resource_expand_to_fit(struct resource *root, struct resource *new)
 {
 	if (new->parent)
 		return;
@@ -997,7 +1050,20 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
  * to use this interface. The former are built-in and only the latter,
  * CXL, is a module.
  */
-EXPORT_SYMBOL_NS_GPL(insert_resource_expand_to_fit, "CXL");
+EXPORT_SYMBOL_NS_GPL(__insert_resource_expand_to_fit, "CXL");
+
+void insert_resource_expand_to_fit(struct resource *new)
+{
+	struct resource *root = &iomem_resource;
+
+#ifdef CONFIG_EFI_SOFT_RESERVE
+	if (new->desc == IORES_DESC_SOFT_RESERVED)
+		root = &soft_reserve_resource;
+#endif
+
+	__insert_resource_expand_to_fit(root, new);
+}
+EXPORT_SYMBOL_GPL(insert_resource_expand_to_fit);
 
 /**
  * remove_resource - Remove a resource in the resource tree
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/5] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
  2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 1/5] dax/hmem, e820, resource: Defer Soft Reserved registration until hmem is ready Smita Koralahalli
@ 2025-09-30  4:47 ` Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 3/5] dax/hmem: Use DEV_DAX_CXL instead of CXL_REGION for deferral Smita Koralahalli
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Smita Koralahalli @ 2025-09-30  4:47 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Smita Koralahalli, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Ard Biesheuvel

From: Dan Williams <dan.j.williams@intel.com>

Ensure that cxl_acpi has published CXL Window resources before dax_hmem
walks Soft Reserved ranges.

Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause dax_hmem to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.

Also, request cxl_pci before dax_hmem walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before dax_hmem makes ownership decisions for Soft Reserved
ranges.

Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents dax_hmem from
consuming Soft Reserved ranges before CXL drivers have had a chance to
claim them.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dax/Kconfig     |  2 ++
 drivers/dax/hmem/hmem.c | 17 ++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d656e4c0eb84..3683bb3f2311 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -48,6 +48,8 @@ config DEV_DAX_CXL
 	tristate "CXL DAX: direct access to CXL RAM regions"
 	depends on CXL_BUS && CXL_REGION && DEV_DAX
 	default CXL_REGION && DEV_DAX
+	depends on CXL_ACPI >= DEV_DAX_HMEM
+	depends on CXL_PCI >= DEV_DAX_HMEM
 	help
 	  CXL RAM regions are either mapped by platform-firmware
 	  and published in the initial system-memory map as "System RAM", mapped
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 48f4642f4bb8..02e79c7adf75 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -141,6 +141,16 @@ static __init int dax_hmem_init(void)
 {
 	int rc;
 
+	/*
+	 * Ensure that cxl_acpi and cxl_pci have a chance to kick off
+	 * CXL topology discovery at least once before scanning the
+	 * iomem resource tree for IORES_DESC_CXL resources.
+	 */
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL)) {
+		request_module("cxl_acpi");
+		request_module("cxl_pci");
+	}
+
 	rc = platform_driver_register(&dax_hmem_platform_driver);
 	if (rc)
 		return rc;
@@ -161,13 +171,6 @@ static __exit void dax_hmem_exit(void)
 module_init(dax_hmem_init);
 module_exit(dax_hmem_exit);
 
-/* Allow for CXL to define its own dax regions */
-#if IS_ENABLED(CONFIG_CXL_REGION)
-#if IS_MODULE(CONFIG_CXL_ACPI)
-MODULE_SOFTDEP("pre: cxl_acpi");
-#endif
-#endif
-
 MODULE_ALIAS("platform:hmem*");
 MODULE_ALIAS("platform:hmem_platform*");
 MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/5] dax/hmem: Use DEV_DAX_CXL instead of CXL_REGION for deferral
  2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 1/5] dax/hmem, e820, resource: Defer Soft Reserved registration until hmem is ready Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 2/5] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
@ 2025-09-30  4:47 ` Smita Koralahalli
  2025-09-30  4:47 ` [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes Smita Koralahalli
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Smita Koralahalli @ 2025-09-30  4:47 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Smita Koralahalli, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Ard Biesheuvel

From: Dan Williams <dan.j.williams@intel.com>

Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that dax_hmem only defers Soft Reserved ranges when CXL DAX support
is enabled. This makes the coordination between dax_hmem and the CXL
stack more precise and prevents deferral in unrelated CXL configurations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/hmem/hmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 02e79c7adf75..c2c110b194e5 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -66,7 +66,7 @@ static int hmem_register_device(struct device *host, int target_nid,
 	long id;
 	int rc;
 
-	if (IS_ENABLED(CONFIG_CXL_REGION) &&
+	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
 		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes
  2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
                   ` (2 preceding siblings ...)
  2025-09-30  4:47 ` [PATCH v3 3/5] dax/hmem: Use DEV_DAX_CXL instead of CXL_REGION for deferral Smita Koralahalli
@ 2025-09-30  4:47 ` Smita Koralahalli
  2025-10-07  1:27   ` Alison Schofield
  2025-09-30  4:47 ` [PATCH v3 5/5] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
  2025-10-07  1:16 ` [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Alison Schofield
  5 siblings, 1 reply; 16+ messages in thread
From: Smita Koralahalli @ 2025-09-30  4:47 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Smita Koralahalli, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Ard Biesheuvel

From: Dan Williams <dan.j.williams@intel.com>

Previously, dax_hmem deferred to CXL only when an immediate resource
intersection with a CXL window was detected. This left a gap: if cxl_acpi
or cxl_pci probing or region assembly had not yet started, hmem could
prematurely claim ranges.

Fix this by introducing a dax_cxl_mode state machine and a deferred
work mechanism.

The new workqueue delays consideration of Soft Reserved overlaps until
the CXL subsystem has had a chance to complete its discovery and region
assembly. This avoids premature iomem claims, eliminates race conditions
with async cxl_pci probe, and provides a cleaner handoff between hmem and
CXL resource management.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/dax/hmem/hmem.c | 72 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index c2c110b194e5..0498cb234c06 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -58,9 +58,45 @@ static void release_hmem(void *pdev)
 	platform_device_unregister(pdev);
 }
 
+static enum dax_cxl_mode {
+	DAX_CXL_MODE_DEFER,
+	DAX_CXL_MODE_REGISTER,
+	DAX_CXL_MODE_DROP,
+} dax_cxl_mode;
+
+static int handle_deferred_cxl(struct device *host, int target_nid,
+				const struct resource *res)
+{
+	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+			      IORES_DESC_CXL) != REGION_DISJOINT) {
+		if (dax_cxl_mode == DAX_CXL_MODE_DROP)
+			dev_dbg(host, "dropping CXL range: %pr\n", res);
+	}
+	return 0;
+}
+
+struct dax_defer_work {
+	struct platform_device *pdev;
+	struct work_struct work;
+};
+
+static void process_defer_work(struct work_struct *_work)
+{
+	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
+	struct platform_device *pdev = work->pdev;
+
+	/* relies on cxl_acpi and cxl_pci having had a chance to load */
+	wait_for_device_probe();
+
+	dax_cxl_mode = DAX_CXL_MODE_DROP;
+
+	walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
+}
+
 static int hmem_register_device(struct device *host, int target_nid,
 				const struct resource *res)
 {
+	struct dax_defer_work *work = dev_get_drvdata(host);
 	struct platform_device *pdev;
 	struct memregion_info info;
 	long id;
@@ -69,8 +105,18 @@ static int hmem_register_device(struct device *host, int target_nid,
 	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
 	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
 			      IORES_DESC_CXL) != REGION_DISJOINT) {
-		dev_dbg(host, "deferring range to CXL: %pr\n", res);
-		return 0;
+		switch (dax_cxl_mode) {
+		case DAX_CXL_MODE_DEFER:
+			dev_dbg(host, "deferring range to CXL: %pr\n", res);
+			schedule_work(&work->work);
+			return 0;
+		case DAX_CXL_MODE_REGISTER:
+			dev_dbg(host, "registering CXL range: %pr\n", res);
+			break;
+		case DAX_CXL_MODE_DROP:
+			dev_dbg(host, "dropping CXL range: %pr\n", res);
+			return 0;
+		}
 	}
 
 	rc = region_intersects_soft_reserve(res->start, resource_size(res),
@@ -125,8 +171,30 @@ static int hmem_register_device(struct device *host, int target_nid,
 	return rc;
 }
 
+static void kill_defer_work(void *_work)
+{
+	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
+
+	cancel_work_sync(&work->work);
+	kfree(work);
+}
+
 static int dax_hmem_platform_probe(struct platform_device *pdev)
 {
+	struct dax_defer_work *work = kzalloc(sizeof(*work), GFP_KERNEL);
+	int rc;
+
+	if (!work)
+		return -ENOMEM;
+
+	work->pdev = pdev;
+	INIT_WORK(&work->work, process_defer_work);
+
+	rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, work);
+	if (rc)
+		return rc;
+
+	platform_set_drvdata(pdev, work);
 	return walk_hmem_resources(&pdev->dev, hmem_register_device);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/5] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
  2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
                   ` (3 preceding siblings ...)
  2025-09-30  4:47 ` [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes Smita Koralahalli
@ 2025-09-30  4:47 ` Smita Koralahalli
  2025-10-07  1:16 ` [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Alison Schofield
  5 siblings, 0 replies; 16+ messages in thread
From: Smita Koralahalli @ 2025-09-30  4:47 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Smita Koralahalli, Terry Bowman, Robert Richter,
	Benjamin Cheatham, Zhijian Li, Borislav Petkov, Ard Biesheuvel

Reworked from a patch by Alison Schofield <alison.schofield@intel.com>

Reintroduce Soft Reserved range into the iomem_resource tree for dax_hmem
to consume.

This restores visibility in /proc/iomem for ranges actively in use, while
avoiding the early-boot conflicts that occurred when Soft Reserved was
published into iomem before CXL window and region discovery.

Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/dax/hmem/hmem.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 0498cb234c06..9dc6eb15c4d2 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -93,6 +93,34 @@ static void process_defer_work(struct work_struct *_work)
 	walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
 }
 
+static void remove_soft_reserved(void *r)
+{
+	remove_resource(r);
+	kfree(r);
+}
+
+static int add_soft_reserve_into_iomem(struct device *host,
+				       const struct resource *res)
+{
+	struct resource *soft __free(kfree) =
+		kzalloc(sizeof(*soft), GFP_KERNEL);
+	int rc;
+
+	if (!soft)
+		return -ENOMEM;
+
+	*soft = DEFINE_RES_NAMED_DESC(res->start, (res->end - res->start + 1),
+				      "Soft Reserved", IORESOURCE_MEM,
+				      IORES_DESC_SOFT_RESERVED);
+
+	rc = insert_resource(&iomem_resource, soft);
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(host, remove_soft_reserved,
+					no_free_ptr(soft));
+}
+
 static int hmem_register_device(struct device *host, int target_nid,
 				const struct resource *res)
 {
@@ -125,7 +153,9 @@ static int hmem_register_device(struct device *host, int target_nid,
 	if (rc != REGION_INTERSECTS)
 		return 0;
 
-	/* TODO: Add Soft-Reserved memory back to iomem */
+	rc = add_soft_reserve_into_iomem(host, res);
+	if (rc)
+		return rc;
 
 	id = memregion_alloc(GFP_KERNEL);
 	if (id < 0) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
                   ` (4 preceding siblings ...)
  2025-09-30  4:47 ` [PATCH v3 5/5] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
@ 2025-10-07  1:16 ` Alison Schofield
  2025-10-10 20:49   ` Alison Schofield
  5 siblings, 1 reply; 16+ messages in thread
From: Alison Schofield @ 2025-10-07  1:16 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

On Tue, Sep 30, 2025 at 04:47:52AM +0000, Smita Koralahalli wrote:
> This series aims to address long-standing conflicts between dax_hmem and
> CXL when handling Soft Reserved memory ranges.

Hi Smita,

Thanks for the updates Smita!

About those "long-standing conflicts": In the next rev, can you resurrect,
or recreate the issues list that this set is addressing. It's been a
long and winding road with several handoffs (me included) and it'll help
keep the focus.

Hotplug works :)  Auto region comes up, we tear it down and can recreate it,
in place, because the soft reserved resource is gone (no longer occupying
the CXL Window and causing recreate to fail.)

!CONFIG_CXL_REGION works :) All resources go directly to DAX.

The scenario that is failing is handoff to DAX after region assembly
failure. (Dan reminded me to check that today.) That is mostly related
to Patch4, so I'll respond there.

--Alison


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes
  2025-09-30  4:47 ` [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes Smita Koralahalli
@ 2025-10-07  1:27   ` Alison Schofield
  2025-10-07  2:03     ` Alison Schofield
  0 siblings, 1 reply; 16+ messages in thread
From: Alison Schofield @ 2025-10-07  1:27 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

On Tue, Sep 30, 2025 at 04:47:56AM +0000, Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> Previously, dax_hmem deferred to CXL only when an immediate resource
> intersection with a CXL window was detected. This left a gap: if cxl_acpi
> or cxl_pci probing or region assembly had not yet started, hmem could
> prematurely claim ranges.
> 
> Fix this by introducing a dax_cxl_mode state machine and a deferred
> work mechanism.
> 
> The new workqueue delays consideration of Soft Reserved overlaps until
> the CXL subsystem has had a chance to complete its discovery and region
> assembly. This avoids premature iomem claims, eliminates race conditions
> with async cxl_pci probe, and provides a cleaner handoff between hmem and
> CXL resource management.

Hi Smita,

I've attached what I did to make this work for handoff to DAX after
region assembly failure. I don't know how it fits into the complete
solution. Please take a look.

Thanks,
Alison


> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  drivers/dax/hmem/hmem.c | 72 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index c2c110b194e5..0498cb234c06 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -58,9 +58,45 @@ static void release_hmem(void *pdev)
>  	platform_device_unregister(pdev);
>  }
>  
> +static enum dax_cxl_mode {
> +	DAX_CXL_MODE_DEFER,
> +	DAX_CXL_MODE_REGISTER,
> +	DAX_CXL_MODE_DROP,
> +} dax_cxl_mode;

DAX_CXL_MOD_REGISTER isn't used (yet).  I used it below.
The state machine now goes directly from DEFER -> DROP.
See suggestion in process_defer_work() below.

> +
> +static int handle_deferred_cxl(struct device *host, int target_nid,
> +				const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (dax_cxl_mode == DAX_CXL_MODE_DROP)
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);


IORES_DESC_CXL doesn't tell us if a CXL region was successfully assembled.
Even if CXL region assembly fails I think the window resources still be in
the iomem tree. So maybe this check always returns true?

Can we check if the SR conflicts with an existing iomem resource? If CXL
region assembled successfully, it'll conflict, otherwise they'll be no
conflict. No conflict means the range is available for DAX, so register
it.

Here's what worked for me:

	rc = add_soft_reserve_into_iomem(host, res);
        /* The above add probably means patch 5 drops */
	if (rc == -EBUSY) {
		dev_dbg(host, "range already in iomem (CXL owns it): %pr\n", res);
		return 0;
	}
	if (rc) {
		dev_err(host, "failed to add soft reserve to iomem: %d\n", rc);
		return rc;
	}

	dev_dbg(host, "registering released/unclaimed range with DAX: %pr\n", res);

	return hmem_register_device(host, target_nid, res);
	}

> +	}
> +	return 0;
> +}
> +
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
> +static void process_defer_work(struct work_struct *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +	struct platform_device *pdev = work->pdev;
> +
> +	/* relies on cxl_acpi and cxl_pci having had a chance to load */
> +	wait_for_device_probe();

The wait_for_device_probe() didn't wait for region probe to complete.
I couldn't figure out why, so I just 'slept' here in my testing. 
How is that suppose work? Could I have something config'd wrong?

After the long sleep that allowed region assembly to complete, and
fail, this worked for me: 

	/*
        * At this point, CXL has had its chance. Resources that CXL
        * successfully claimed will have resources in iomem. Resources
        * where CXL region assembly failed will be available.
        */
       dax_cxl_mode = DAX_CXL_MODE_REGISTER;

       /*
        * Walk all Soft Reserved ranges and register the ones
        * that CXL didn't claim or that CXL released after failure.
        */
       walk_hmem_resources(&pdev->dev, handle_deferred_cxl);

       /*
        * Future attempts should drop CXL overlaps immediately
        * without deferring again.
        */
> +	dax_cxl_mode = DAX_CXL_MODE_DROP;
> +
> +	walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
> +}
> +
>  static int hmem_register_device(struct device *host, int target_nid,
>  				const struct resource *res)
>  {
> +	struct dax_defer_work *work = dev_get_drvdata(host);
>  	struct platform_device *pdev;
>  	struct memregion_info info;
>  	long id;
> @@ -69,8 +105,18 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>  	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>  			      IORES_DESC_CXL) != REGION_DISJOINT) {
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> +		switch (dax_cxl_mode) {
> +		case DAX_CXL_MODE_DEFER:
> +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			schedule_work(&work->work);
> +			return 0;
> +		case DAX_CXL_MODE_REGISTER:
> +			dev_dbg(host, "registering CXL range: %pr\n", res);
> +			break;
> +		case DAX_CXL_MODE_DROP:
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);
> +			return 0;
> +		}
>  	}
>  
>  	rc = region_intersects_soft_reserve(res->start, resource_size(res),
> @@ -125,8 +171,30 @@ static int hmem_register_device(struct device *host, int target_nid,
>  	return rc;
>  }
>  
> +static void kill_defer_work(void *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +
> +	cancel_work_sync(&work->work);
> +	kfree(work);
> +}
> +
>  static int dax_hmem_platform_probe(struct platform_device *pdev)
>  {
> +	struct dax_defer_work *work = kzalloc(sizeof(*work), GFP_KERNEL);
> +	int rc;
> +
> +	if (!work)
> +		return -ENOMEM;
> +
> +	work->pdev = pdev;
> +	INIT_WORK(&work->work, process_defer_work);
> +
> +	rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, work);
> +	if (rc)
> +		return rc;
> +
> +	platform_set_drvdata(pdev, work);
>  	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes
  2025-10-07  1:27   ` Alison Schofield
@ 2025-10-07  2:03     ` Alison Schofield
  0 siblings, 0 replies; 16+ messages in thread
From: Alison Schofield @ 2025-10-07  2:03 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

On Mon, Oct 06, 2025 at 06:27:12PM -0700, Alison Schofield wrote:

snip

> > +
> > +	/* relies on cxl_acpi and cxl_pci having had a chance to load */
> > +	wait_for_device_probe();
> 
> The wait_for_device_probe() didn't wait for region probe to complete.
> I couldn't figure out why, so I just 'slept' here in my testing. 
> How is that suppose work? Could I have something config'd wrong?

FWIW I tried region driver with .probe_type = PROBE_PREFER_ASYNCHRONOUS,
but no luck.

snip


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-10-07  1:16 ` [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Alison Schofield
@ 2025-10-10 20:49   ` Alison Schofield
  2025-10-14 17:52     ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 16+ messages in thread
From: Alison Schofield @ 2025-10-10 20:49 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

On Mon, Oct 06, 2025 at 06:16:24PM -0700, Alison Schofield wrote:
> On Tue, Sep 30, 2025 at 04:47:52AM +0000, Smita Koralahalli wrote:
> > This series aims to address long-standing conflicts between dax_hmem and
> > CXL when handling Soft Reserved memory ranges.
> 
> Hi Smita,
> 
> Thanks for the updates Smita!
> 
> About those "long-standing conflicts": In the next rev, can you resurrect,
> or recreate the issues list that this set is addressing. It's been a
> long and winding road with several handoffs (me included) and it'll help
> keep the focus.
> 
> Hotplug works :)  Auto region comes up, we tear it down and can recreate it,
> in place, because the soft reserved resource is gone (no longer occupying
> the CXL Window and causing recreate to fail.)
> 
> !CONFIG_CXL_REGION works :) All resources go directly to DAX.
> 
> The scenario that is failing is handoff to DAX after region assembly
> failure. (Dan reminded me to check that today.) That is mostly related
> to Patch4, so I'll respond there.
> 
> --Alison

Hi Smita -

(after off-list chat w Smita about what is and is not included)

This CXL failover to DAX case is not implemented. In my response in Patch 4,
I cobbled something together that made it work in one test case. But to be
clear, there was some trickery in the CXL region driver to even do that.

One path forward is to update this set restating the issues it addresses, and
remove any code and comments that are tied to failing over to DAX after a
region assembly failure.

That leaves the issue Dan raised, "shutdown CXL in favor of vanilla DAX devices
as an emergency fallback for platform configuration quirks and bugs"[1], for a
future patch.

-- Alison

[1] The failover to DAX was last described in response to v5 of the 'prior' patchset.
https://lore.kernel.org/linux-cxl/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
https://lore.kernel.org/linux-cxl/687ffcc0ee1c8_137e6b100ed@dwillia2-xfh.jf.intel.com.notmuch/
https://lore.kernel.org/linux-cxl/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/

> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-10-10 20:49   ` Alison Schofield
@ 2025-10-14 17:52     ` Koralahalli Channabasappa, Smita
  2025-10-21  0:06       ` Alison Schofield
  0 siblings, 1 reply; 16+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2025-10-14 17:52 UTC (permalink / raw)
  To: Alison Schofield, Smita Koralahalli
  Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
	Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

Hi Alison,

On 10/10/2025 1:49 PM, Alison Schofield wrote:
> On Mon, Oct 06, 2025 at 06:16:24PM -0700, Alison Schofield wrote:
>> On Tue, Sep 30, 2025 at 04:47:52AM +0000, Smita Koralahalli wrote:
>>> This series aims to address long-standing conflicts between dax_hmem and
>>> CXL when handling Soft Reserved memory ranges.
>>
>> Hi Smita,
>>
>> Thanks for the updates Smita!
>>
>> About those "long-standing conflicts": In the next rev, can you resurrect,
>> or recreate the issues list that this set is addressing. It's been a
>> long and winding road with several handoffs (me included) and it'll help
>> keep the focus.
>>
>> Hotplug works :)  Auto region comes up, we tear it down and can recreate it,
>> in place, because the soft reserved resource is gone (no longer occupying
>> the CXL Window and causing recreate to fail.)
>>
>> !CONFIG_CXL_REGION works :) All resources go directly to DAX.
>>
>> The scenario that is failing is handoff to DAX after region assembly
>> failure. (Dan reminded me to check that today.) That is mostly related
>> to Patch4, so I'll respond there.
>>
>> --Alison
> 
> Hi Smita -
> 
> (after off-list chat w Smita about what is and is not included)
> 
> This CXL failover to DAX case is not implemented. In my response in Patch 4,
> I cobbled something together that made it work in one test case. But to be
> clear, there was some trickery in the CXL region driver to even do that.
> 
> One path forward is to update this set restating the issues it addresses, and
> remove any code and comments that are tied to failing over to DAX after a
> region assembly failure.
> 
> That leaves the issue Dan raised, "shutdown CXL in favor of vanilla DAX devices
> as an emergency fallback for platform configuration quirks and bugs"[1], for a
> future patch.
> 
> -- Alison
> 
> [1] The failover to DAX was last described in response to v5 of the 'prior' patchset.
> https://lore.kernel.org/linux-cxl/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
> https://lore.kernel.org/linux-cxl/687ffcc0ee1c8_137e6b100ed@dwillia2-xfh.jf.intel.com.notmuch/
> https://lore.kernel.org/linux-cxl/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/

[+cc Nathan, Terry]

 From the AMD side, our primary concern in this series is CXL hotplug. 
With the patches as is, the hotplug flows are working for us: region 
comes up, we can tear it down, and recreate it in place because the soft 
reserved window is released.

On our systems I consistently see wait_for_device_probe() block until 
region assembly has completed so I don’t currently have evidence of a 
sequencing hole there on AMD platforms.

Once CXL windows are discovered, would it be acceptable for dax_hmem to 
simply ignore soft reserved ranges inside those windows, assuming CXL 
will own and manage them? That aligns with Dan’s guidance about letting 
CXL win those ranges when present.
https://lore.kernel.org/all/687fef9ec0dd9_137e6b100c8@dwillia2-xfh.jf.intel.com.notmuch/

If that approach sounds right, I can reword the commit descriptions in 
patches 4/5 and 5/5 to drop the parts about region assembly failures and 
remove the REGISTER enum.

And then leave the “shutdown CXL in favor of vanilla DAX as an emergency 
fallback for platform configuration quirks and bugs” to a future, 
dedicated patch.

Thanks
Smita

> 
>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-10-14 17:52     ` Koralahalli Channabasappa, Smita
@ 2025-10-21  0:06       ` Alison Schofield
  2025-10-24 20:08         ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 16+ messages in thread
From: Alison Schofield @ 2025-10-21  0:06 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita
  Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

On Tue, Oct 14, 2025 at 10:52:20AM -0700, Koralahalli Channabasappa, Smita wrote:
> Hi Alison,
> 
> On 10/10/2025 1:49 PM, Alison Schofield wrote:
> > On Mon, Oct 06, 2025 at 06:16:24PM -0700, Alison Schofield wrote:
> > > On Tue, Sep 30, 2025 at 04:47:52AM +0000, Smita Koralahalli wrote:
> > > > This series aims to address long-standing conflicts between dax_hmem and
> > > > CXL when handling Soft Reserved memory ranges.
> > > 
> > > Hi Smita,
> > > 
> > > Thanks for the updates Smita!
> > > 
> > > About those "long-standing conflicts": In the next rev, can you resurrect,
> > > or recreate the issues list that this set is addressing. It's been a
> > > long and winding road with several handoffs (me included) and it'll help
> > > keep the focus.
> > > 
> > > Hotplug works :)  Auto region comes up, we tear it down and can recreate it,
> > > in place, because the soft reserved resource is gone (no longer occupying
> > > the CXL Window and causing recreate to fail.)
> > > 
> > > !CONFIG_CXL_REGION works :) All resources go directly to DAX.
> > > 
> > > The scenario that is failing is handoff to DAX after region assembly
> > > failure. (Dan reminded me to check that today.) That is mostly related
> > > to Patch4, so I'll respond there.
> > > 
> > > --Alison
> > 
> > Hi Smita -
> > 
> > (after off-list chat w Smita about what is and is not included)
> > 
> > This CXL failover to DAX case is not implemented. In my response in Patch 4,
> > I cobbled something together that made it work in one test case. But to be
> > clear, there was some trickery in the CXL region driver to even do that.
> > 
> > One path forward is to update this set restating the issues it addresses, and
> > remove any code and comments that are tied to failing over to DAX after a
> > region assembly failure.
> > 
> > That leaves the issue Dan raised, "shutdown CXL in favor of vanilla DAX devices
> > as an emergency fallback for platform configuration quirks and bugs"[1], for a
> > future patch.
> > 
> > -- Alison
> > 
> > [1] The failover to DAX was last described in response to v5 of the 'prior' patchset.
> > https://lore.kernel.org/linux-cxl/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
> > https://lore.kernel.org/linux-cxl/687ffcc0ee1c8_137e6b100ed@dwillia2-xfh.jf.intel.com.notmuch/
> > https://lore.kernel.org/linux-cxl/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/
> 
> [+cc Nathan, Terry]
> 
> From the AMD side, our primary concern in this series is CXL hotplug. With
> the patches as is, the hotplug flows are working for us: region comes up, we
> can tear it down, and recreate it in place because the soft reserved window
> is released.
> 
> On our systems I consistently see wait_for_device_probe() block until region
> assembly has completed so I don’t currently have evidence of a sequencing
> hole there on AMD platforms.
> 
> Once CXL windows are discovered, would it be acceptable for dax_hmem to
> simply ignore soft reserved ranges inside those windows, assuming CXL will
> own and manage them? That aligns with Dan’s guidance about letting CXL win
> those ranges when present.
> https://lore.kernel.org/all/687fef9ec0dd9_137e6b100c8@dwillia2-xfh.jf.intel.com.notmuch/
> 
> If that approach sounds right, I can reword the commit descriptions in
> patches 4/5 and 5/5 to drop the parts about region assembly failures and
> remove the REGISTER enum.
> 
> And then leave the “shutdown CXL in favor of vanilla DAX as an emergency
> fallback for platform configuration quirks and bugs” to a future, dedicated
> patch.
> 
> Thanks
> Smita

Hi Smita,

I was able to discard the big sleep after picking up the patch "cxl/mem:
Arrange for always-synchronous memdev attach" from Alejandro's Type2 set.

With that patch, all CXL probing completed before the HMEM probe so the
deferred waiting mechanism of the HMEM driver seems unnecessary. Please
take a look.

That patch, is one of four in this branch Dan provided:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order

After chats with Dan and DaveJ, we thought the Soft Reserved set was the
right place to introduce these probe order patches (let Type 2 follow).
So, the SR set adds these three patches:

- **cxl/mem: Arrange for always-synchronous memdev attach**
- cxl/port: Arrange for always synchronous endpoint attach
- cxl/mem: Introduce a memdev creation ->probe() operation

**I actually grabbed this one from v19 Type2 set, not the CXL branch,
so you may need to see if Alejandro changed anything in that one.

When picking those up, there's a bit of wordsmithing to do in the
commit logs. Probably replace mentions of needing for accelerators
with needing for synchronizing the usage of soft-reserved resources.

Note that the HMEM driver is also not picking up unused SR ranges.
That was described in review comments here:
https://lore.kernel.org/linux-cxl/aORscMprmQyGlohw@aschofie-mobl2.lan

Summarized for my benefit ;)
- pick up all the probe order patches, 
- determine whether the HMEM deferral is needed, maybe drop it,
- register the unused SR, don't drop based on intersect w 'CXL Window'

With all that, nothing would be left undone in the HMEM driver. The region
driver would still need to fail gracefully and release resources in a
follow-on patch.

Let me know what you find wrt the timing, ie is the wait_for_device_probe()
needed at all?

Thanks!
-- Alison


> 
> > 
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-10-21  0:06       ` Alison Schofield
@ 2025-10-24 20:08         ` Koralahalli Channabasappa, Smita
  2025-10-28  2:12           ` Alison Schofield
  0 siblings, 1 reply; 16+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2025-10-24 20:08 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

Hi Alison,

Thanks for the pointers and the branch. Here’s where I landed on the 
three items. Responses inline.

On 10/20/2025 5:06 PM, Alison Schofield wrote:
> On Tue, Oct 14, 2025 at 10:52:20AM -0700, Koralahalli Channabasappa, Smita wrote:
>> Hi Alison,
>>
>> On 10/10/2025 1:49 PM, Alison Schofield wrote:
>>> On Mon, Oct 06, 2025 at 06:16:24PM -0700, Alison Schofield wrote:
>>>> On Tue, Sep 30, 2025 at 04:47:52AM +0000, Smita Koralahalli wrote:
>>>>> This series aims to address long-standing conflicts between dax_hmem and
>>>>> CXL when handling Soft Reserved memory ranges.
>>>>
>>>> Hi Smita,
>>>>
>>>> Thanks for the updates Smita!
>>>>
>>>> About those "long-standing conflicts": In the next rev, can you resurrect,
>>>> or recreate the issues list that this set is addressing. It's been a
>>>> long and winding road with several handoffs (me included) and it'll help
>>>> keep the focus.
>>>>
>>>> Hotplug works :)  Auto region comes up, we tear it down and can recreate it,
>>>> in place, because the soft reserved resource is gone (no longer occupying
>>>> the CXL Window and causing recreate to fail.)
>>>>
>>>> !CONFIG_CXL_REGION works :) All resources go directly to DAX.
>>>>
>>>> The scenario that is failing is handoff to DAX after region assembly
>>>> failure. (Dan reminded me to check that today.) That is mostly related
>>>> to Patch4, so I'll respond there.
>>>>
>>>> --Alison
>>>
>>> Hi Smita -
>>>
>>> (after off-list chat w Smita about what is and is not included)
>>>
>>> This CXL failover to DAX case is not implemented. In my response in Patch 4,
>>> I cobbled something together that made it work in one test case. But to be
>>> clear, there was some trickery in the CXL region driver to even do that.
>>>
>>> One path forward is to update this set restating the issues it addresses, and
>>> remove any code and comments that are tied to failing over to DAX after a
>>> region assembly failure.
>>>
>>> That leaves the issue Dan raised, "shutdown CXL in favor of vanilla DAX devices
>>> as an emergency fallback for platform configuration quirks and bugs"[1], for a
>>> future patch.
>>>
>>> -- Alison
>>>
>>> [1] The failover to DAX was last described in response to v5 of the 'prior' patchset.
>>> https://lore.kernel.org/linux-cxl/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
>>> https://lore.kernel.org/linux-cxl/687ffcc0ee1c8_137e6b100ed@dwillia2-xfh.jf.intel.com.notmuch/
>>> https://lore.kernel.org/linux-cxl/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/
>>
>> [+cc Nathan, Terry]
>>
>>  From the AMD side, our primary concern in this series is CXL hotplug. With
>> the patches as is, the hotplug flows are working for us: region comes up, we
>> can tear it down, and recreate it in place because the soft reserved window
>> is released.
>>
>> On our systems I consistently see wait_for_device_probe() block until region
>> assembly has completed so I don’t currently have evidence of a sequencing
>> hole there on AMD platforms.
>>
>> Once CXL windows are discovered, would it be acceptable for dax_hmem to
>> simply ignore soft reserved ranges inside those windows, assuming CXL will
>> own and manage them? That aligns with Dan’s guidance about letting CXL win
>> those ranges when present.
>> https://lore.kernel.org/all/687fef9ec0dd9_137e6b100c8@dwillia2-xfh.jf.intel.com.notmuch/
>>
>> If that approach sounds right, I can reword the commit descriptions in
>> patches 4/5 and 5/5 to drop the parts about region assembly failures and
>> remove the REGISTER enum.
>>
>> And then leave the “shutdown CXL in favor of vanilla DAX as an emergency
>> fallback for platform configuration quirks and bugs” to a future, dedicated
>> patch.
>>
>> Thanks
>> Smita
> 
> Hi Smita,
> 
> I was able to discard the big sleep after picking up the patch "cxl/mem:
> Arrange for always-synchronous memdev attach" from Alejandro's Type2 set.
> 
> With that patch, all CXL probing completed before the HMEM probe so the
> deferred waiting mechanism of the HMEM driver seems unnecessary. Please
> take a look.
> 
> That patch, is one of four in this branch Dan provided:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order
> 
> After chats with Dan and DaveJ, we thought the Soft Reserved set was the
> right place to introduce these probe order patches (let Type 2 follow).
> So, the SR set adds these three patches:
> 
> - **cxl/mem: Arrange for always-synchronous memdev attach**
> - cxl/port: Arrange for always synchronous endpoint attach
> - cxl/mem: Introduce a memdev creation ->probe() operation
> 
> **I actually grabbed this one from v19 Type2 set, not the CXL branch,
> so you may need to see if Alejandro changed anything in that one.
> 
> When picking those up, there's a bit of wordsmithing to do in the
> commit logs. Probably replace mentions of needing for accelerators
> with needing for synchronizing the usage of soft-reserved resources.
> 
> Note that the HMEM driver is also not picking up unused SR ranges.
> That was described in review comments here:
> https://lore.kernel.org/linux-cxl/aORscMprmQyGlohw@aschofie-mobl2.lan
> 
> Summarized for my benefit ;)
> - pick up all the probe order patches,
> - determine whether the HMEM deferral is needed, maybe drop it,
> - register the unused SR, don't drop based on intersect w 'CXL Window'
> 
> With all that, nothing would be left undone in the HMEM driver. The region
> driver would still need to fail gracefully and release resources in a
> follow-on patch.
> 
> Let me know what you find wrt the timing, ie is the wait_for_device_probe()
> needed at all?
> 
> Thanks!
> -- Alison
> 

1. Pick up all the probe order patches
I pulled in the three patches you listed.
They build and run fine here.

2. Determine whether HMEM deferral is needed (and maybe drop it)
On my system, even with those three patches, the HMEM probe still races 
ahead of CXL region assembly. A short dmesg timeline shows HMEM 
registering before init_hdm_decoder() and region construction:

..
[   26.597369] hmem_register_device: hmem_platform hmem_platform.0: 
registering released/unclaimed range with DAX: [mem 
0x850000000-0x284fffffff flags 0x80000200]
[   26.602371] init_hdm_decoder: cxl_port port1: decoder1.0: range: 
0x850000000-0x284fffffff iw: 1 ig: 256
[   26.628614] init_hdm_decoder: cxl_port endpoint7: decoder7.0: range: 
0x850000000-0x284fffffff iw: 1 ig: 256
[   26.628711] __construct_region: cxl_pci 0000:e1:00.0: 
mem2:decoder7.0: __construct_region region0 res: [mem 
0x850000000-0x284fffffff flags 0x200] iw: 1 ig: 256
[   26.628714] cxl_calc_interleave_pos: cxl_mem mem2: decoder:decoder7.0 
parent:0000:e1:00.0 port:endpoint7 range:0x850000000-0x284fffffff pos:0
[   44.022792] __hmem_register_resource: hmem range [mem 
0x850000000-0x284fffffff flags 0x80000200] already active
[   49.991221] kmem dax0.0: mapping0: 0x850000000-0x284fffffff could not 
reserve region
..

As, region assembly still completes after HMEM on my platform, 
wait_for_device_probe() might be needed to avoid HMEM claiming ranges 
before CXL region assembly.

3. Register unused SR, don’t drop based on intersect with “CXL Window”
Agree with your review note: checking region_intersects(..., 
IORES_DESC_CXL) is not reliable for 'CXL owns this'. IORES_DESC_CXL 
marks just the 'CXL Windows' so the intersect test is true regardless of 
whether a region was actually assembled.

I tried the insert SR and rely on -EBUSY approach suggested.

https://lore.kernel.org/linux-cxl/aORscMprmQyGlohw@aschofie-mobl2.lan/#t

On my setup it never returns -EBUSY, the SR inserts cleanly even when 
the CXL region has already been assembled successfully before dax_hmem.

insert_resource() is treating 'fully contains' as a valid hierarchy, not 
a conflict. The SR I insert covers exactly the same range as the CXL 
window/region. In that situation, insert_resource(&iomem_resource, SR) 
does not report a conflict, instead, it inserts SR and reparents the 
existing CXL window/region under SR. That matches what I see in the tree:

850000000-284fffffff : Soft Reserved
   850000000-284fffffff : CXL Window 0
     850000000-284fffffff : region0
       850000000-284fffffff : dax0.0
         850000000-284fffffff : System RAM (kmem)
... (same for the other windows)

So there is no overlap error to trigger -EBUSY, the tree is simply 
restructured.

insert_resource_conflict() is also behaving the same.

and hence the kmem failure
kmem dax6.0: mapping0: 0x850000000-0x284fffffff could not reserve region
kmem dax6.0: probe with driver kmem failed with error -16

walk_iomem_res_desc() was also not a good discriminator here: it passes 
a temporary struct resource to the callback (name == NULL, no 
child/sibling links), so I couldn't reliably detect the 'region under 
window' relationship from that walker alone. (only CXL windows were 
discovered properly).

Below worked for me instead. I could see the region assembly success and 
failure cases handled properly.

Walk the real iomem_resource tree: find the enclosing CXL window for the 
SR range, then check if there’s a region child that covers sr->start, 
sr->end.

If yes, drop (CXL owns it).

If no, register as unused SR with DAX.


+static struct resource *cxl_window_exists(resource_size_t start,
+                                        resource_size_t end)
+{
+       struct resource *r;
+
+       for (r = iomem_resource.child; r; r = r->sibling) {
+               if (r->desc == IORES_DESC_CXL &&
+                   r->start == start && r->end == end)
+                       return r;
+       }
+
+       return NULL;
+}
+
+static bool cxl_region_exists(resource_size_t start, resource_size_t end)
+{
+       const struct resource *res, *child;
+
+       res = cxl_window_exists(start, end);
+       if (!res)
+               return false;
+
+       for (child = res->child; child; child = child->sibling) {
+               if (child->start <= start && child->end <= end)
+                       return true;
+       }
+
+       return false;
+}
+
  static int handle_deferred_cxl(struct device *host, int target_nid,
                                const struct resource *res)
  {
-       /* TODO: Handle region assembly failures */
+       if (region_intersects(res->start, resource_size(res), 
IORESOURCE_MEM,
+                             IORES_DESC_CXL) != REGION_DISJOINT) {
+
+               if (cxl_region_exists(res->start, res->end)) {
+                       dax_cxl_mode = DAX_CXL_MODE_DROP;
+                       dev_dbg(host, "dropping CXL range: %pr\n", res);
+               }
+               else {
+                       dax_cxl_mode = DAX_CXL_MODE_REGISTER;
+                       dev_dbg(host, "registering CXL range: %pr\n", res);
+               }
+
+               hmem_register_device(host, target_nid, res);
+       }
+
         return 0;
  }

static void process_defer_work(struct work_struct *_work)
{
         struct dax_defer_work *work = container_of(_work, 
typeof(*work), work);
         struct platform_device *pdev = work->pdev;

         /* relies on cxl_acpi and cxl_pci having had a chance to load */
         wait_for_device_probe();

         walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
}

For region assembly failure (Thanks for the patch to test this!):

hmem_register_device: hmem_platform hmem_platform.0: deferring range to 
CXL: [mem 0x850000000-0x284fffffff flags 0x80000200]
handle_deferred_cxl: hmem_platform hmem_platform.0: registering CXL 
range: [mem 0x850000000-0x284fffffff flags 0x80000200]
hmem_register_device: hmem_platform hmem_platform.0: registering CXL 
range: [mem 0x850000000-0x284fffffff flags 0x80000200]

For region assembly success:

hmem_register_device: hmem_platform hmem_platform.0: deferring range to 
CXL: [mem 0x850000000-0x284fffffff flags 0x80000200]
handle_deferred_cxl: hmem_platform hmem_platform.0: dropping CXL range: 
[mem 0x850000000-0x284fffffff flags 0x80000200]
hmem_register_device: hmem_platform hmem_platform.0: dropping CXL range: 
[mem 0x850000000-0x284fffffff flags 0x80000200]

Happy to fold this into v4 if it looks good.

Thanks
Smita
> 
>>
>>>
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-10-24 20:08         ` Koralahalli Channabasappa, Smita
@ 2025-10-28  2:12           ` Alison Schofield
  2025-11-03 11:18             ` Tomasz Wolski
  0 siblings, 1 reply; 16+ messages in thread
From: Alison Schofield @ 2025-10-28  2:12 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita
  Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
	linux-pm, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams, Matthew Wilcox, Jan Kara,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
	Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra, Greg KH,
	Nathan Fontenot, Terry Bowman, Robert Richter, Benjamin Cheatham,
	Zhijian Li, Borislav Petkov, Ard Biesheuvel

On Fri, Oct 24, 2025 at 01:08:19PM -0700, Koralahalli Channabasappa, Smita wrote:
> Hi Alison,
> 
> Thanks for the pointers and the branch. Here’s where I landed on the three
> items. Responses inline.
> 
> On 10/20/2025 5:06 PM, Alison Schofield wrote:
> > On Tue, Oct 14, 2025 at 10:52:20AM -0700, Koralahalli Channabasappa, Smita wrote:
> > > Hi Alison,
> > > 
> > > On 10/10/2025 1:49 PM, Alison Schofield wrote:
> > > > On Mon, Oct 06, 2025 at 06:16:24PM -0700, Alison Schofield wrote:
> > > > > On Tue, Sep 30, 2025 at 04:47:52AM +0000, Smita Koralahalli wrote:
> > > > > > This series aims to address long-standing conflicts between dax_hmem and
> > > > > > CXL when handling Soft Reserved memory ranges.
> > > > > 
> > > > > Hi Smita,
> > > > > 
> > > > > Thanks for the updates Smita!
> > > > > 
> > > > > About those "long-standing conflicts": In the next rev, can you resurrect,
> > > > > or recreate the issues list that this set is addressing. It's been a
> > > > > long and winding road with several handoffs (me included) and it'll help
> > > > > keep the focus.
> > > > > 
> > > > > Hotplug works :)  Auto region comes up, we tear it down and can recreate it,
> > > > > in place, because the soft reserved resource is gone (no longer occupying
> > > > > the CXL Window and causing recreate to fail.)
> > > > > 
> > > > > !CONFIG_CXL_REGION works :) All resources go directly to DAX.
> > > > > 
> > > > > The scenario that is failing is handoff to DAX after region assembly
> > > > > failure. (Dan reminded me to check that today.) That is mostly related
> > > > > to Patch4, so I'll respond there.
> > > > > 
> > > > > --Alison
> > > > 
> > > > Hi Smita -
> > > > 
> > > > (after off-list chat w Smita about what is and is not included)
> > > > 
> > > > This CXL failover to DAX case is not implemented. In my response in Patch 4,
> > > > I cobbled something together that made it work in one test case. But to be
> > > > clear, there was some trickery in the CXL region driver to even do that.
> > > > 
> > > > One path forward is to update this set restating the issues it addresses, and
> > > > remove any code and comments that are tied to failing over to DAX after a
> > > > region assembly failure.
> > > > 
> > > > That leaves the issue Dan raised, "shutdown CXL in favor of vanilla DAX devices
> > > > as an emergency fallback for platform configuration quirks and bugs"[1], for a
> > > > future patch.
> > > > 
> > > > -- Alison
> > > > 
> > > > [1] The failover to DAX was last described in response to v5 of the 'prior' patchset.
> > > > https://lore.kernel.org/linux-cxl/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
> > > > https://lore.kernel.org/linux-cxl/687ffcc0ee1c8_137e6b100ed@dwillia2-xfh.jf.intel.com.notmuch/
> > > > https://lore.kernel.org/linux-cxl/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/
> > > 
> > > [+cc Nathan, Terry]
> > > 
> > >  From the AMD side, our primary concern in this series is CXL hotplug. With
> > > the patches as is, the hotplug flows are working for us: region comes up, we
> > > can tear it down, and recreate it in place because the soft reserved window
> > > is released.
> > > 
> > > On our systems I consistently see wait_for_device_probe() block until region
> > > assembly has completed so I don’t currently have evidence of a sequencing
> > > hole there on AMD platforms.
> > > 
> > > Once CXL windows are discovered, would it be acceptable for dax_hmem to
> > > simply ignore soft reserved ranges inside those windows, assuming CXL will
> > > own and manage them? That aligns with Dan’s guidance about letting CXL win
> > > those ranges when present.
> > > https://lore.kernel.org/all/687fef9ec0dd9_137e6b100c8@dwillia2-xfh.jf.intel.com.notmuch/
> > > 
> > > If that approach sounds right, I can reword the commit descriptions in
> > > patches 4/5 and 5/5 to drop the parts about region assembly failures and
> > > remove the REGISTER enum.
> > > 
> > > And then leave the “shutdown CXL in favor of vanilla DAX as an emergency
> > > fallback for platform configuration quirks and bugs” to a future, dedicated
> > > patch.
> > > 
> > > Thanks
> > > Smita
> > 
> > Hi Smita,
> > 
> > I was able to discard the big sleep after picking up the patch "cxl/mem:
> > Arrange for always-synchronous memdev attach" from Alejandro's Type2 set.
> > 
> > With that patch, all CXL probing completed before the HMEM probe so the
> > deferred waiting mechanism of the HMEM driver seems unnecessary. Please
> > take a look.
> > 
> > That patch, is one of four in this branch Dan provided:
> > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order
> > 
> > After chats with Dan and DaveJ, we thought the Soft Reserved set was the
> > right place to introduce these probe order patches (let Type 2 follow).
> > So, the SR set adds these three patches:
> > 
> > - **cxl/mem: Arrange for always-synchronous memdev attach**
> > - cxl/port: Arrange for always synchronous endpoint attach
> > - cxl/mem: Introduce a memdev creation ->probe() operation
> > 
> > **I actually grabbed this one from v19 Type2 set, not the CXL branch,
> > so you may need to see if Alejandro changed anything in that one.
> > 
> > When picking those up, there's a bit of wordsmithing to do in the
> > commit logs. Probably replace mentions of needing for accelerators
> > with needing for synchronizing the usage of soft-reserved resources.
> > 
> > Note that the HMEM driver is also not picking up unused SR ranges.
> > That was described in review comments here:
> > https://lore.kernel.org/linux-cxl/aORscMprmQyGlohw@aschofie-mobl2.lan
> > 
> > Summarized for my benefit ;)
> > - pick up all the probe order patches,
> > - determine whether the HMEM deferral is needed, maybe drop it,
> > - register the unused SR, don't drop based on intersect w 'CXL Window'
> > 
> > With all that, nothing would be left undone in the HMEM driver. The region
> > driver would still need to fail gracefully and release resources in a
> > follow-on patch.
> > 
> > Let me know what you find wrt the timing, ie is the wait_for_device_probe()
> > needed at all?
> > 
> > Thanks!
> > -- Alison
> > 
> 
> 1. Pick up all the probe order patches
> I pulled in the three patches you listed.
> They build and run fine here.
> 
> 2. Determine whether HMEM deferral is needed (and maybe drop it)
> On my system, even with those three patches, the HMEM probe still races
> ahead of CXL region assembly. A short dmesg timeline shows HMEM registering
> before init_hdm_decoder() and region construction:
> 
> ..
> [   26.597369] hmem_register_device: hmem_platform hmem_platform.0:
> registering released/unclaimed range with DAX: [mem 0x850000000-0x284fffffff
> flags 0x80000200]
> [   26.602371] init_hdm_decoder: cxl_port port1: decoder1.0: range:
> 0x850000000-0x284fffffff iw: 1 ig: 256
> [   26.628614] init_hdm_decoder: cxl_port endpoint7: decoder7.0: range:
> 0x850000000-0x284fffffff iw: 1 ig: 256
> [   26.628711] __construct_region: cxl_pci 0000:e1:00.0: mem2:decoder7.0:
> __construct_region region0 res: [mem 0x850000000-0x284fffffff flags 0x200]
> iw: 1 ig: 256
> [   26.628714] cxl_calc_interleave_pos: cxl_mem mem2: decoder:decoder7.0
> parent:0000:e1:00.0 port:endpoint7 range:0x850000000-0x284fffffff pos:0
> [   44.022792] __hmem_register_resource: hmem range [mem
> 0x850000000-0x284fffffff flags 0x80000200] already active
> [   49.991221] kmem dax0.0: mapping0: 0x850000000-0x284fffffff could not
> reserve region
> ..
> 
> As, region assembly still completes after HMEM on my platform,
> wait_for_device_probe() might be needed to avoid HMEM claiming ranges before
> CXL region assembly.
> 
> 3. Register unused SR, don’t drop based on intersect with “CXL Window”
> Agree with your review note: checking region_intersects(..., IORES_DESC_CXL)
> is not reliable for 'CXL owns this'. IORES_DESC_CXL marks just the 'CXL
> Windows' so the intersect test is true regardless of whether a region was
> actually assembled.
> 
> I tried the insert SR and rely on -EBUSY approach suggested.
> 
> https://lore.kernel.org/linux-cxl/aORscMprmQyGlohw@aschofie-mobl2.lan/#t
> 
> On my setup it never returns -EBUSY, the SR inserts cleanly even when the
> CXL region has already been assembled successfully before dax_hmem.
> 
> insert_resource() is treating 'fully contains' as a valid hierarchy, not a
> conflict. The SR I insert covers exactly the same range as the CXL
> window/region. In that situation, insert_resource(&iomem_resource, SR) does
> not report a conflict, instead, it inserts SR and reparents the existing CXL
> window/region under SR. That matches what I see in the tree:
> 
> 850000000-284fffffff : Soft Reserved
>   850000000-284fffffff : CXL Window 0
>     850000000-284fffffff : region0
>       850000000-284fffffff : dax0.0
>         850000000-284fffffff : System RAM (kmem)
> ... (same for the other windows)
> 
> So there is no overlap error to trigger -EBUSY, the tree is simply
> restructured.
> 
> insert_resource_conflict() is also behaving the same.
> 
> and hence the kmem failure
> kmem dax6.0: mapping0: 0x850000000-0x284fffffff could not reserve region
> kmem dax6.0: probe with driver kmem failed with error -16
> 
> walk_iomem_res_desc() was also not a good discriminator here: it passes a
> temporary struct resource to the callback (name == NULL, no child/sibling
> links), so I couldn't reliably detect the 'region under window' relationship
> from that walker alone. (only CXL windows were discovered properly).
> 
> Below worked for me instead. I could see the region assembly success and
> failure cases handled properly.
> 
> Walk the real iomem_resource tree: find the enclosing CXL window for the SR
> range, then check if there’s a region child that covers sr->start, sr->end.
> 
> If yes, drop (CXL owns it).
> 
> If no, register as unused SR with DAX.
> 
> 
> +static struct resource *cxl_window_exists(resource_size_t start,
> +                                        resource_size_t end)
> +{
> +       struct resource *r;
> +
> +       for (r = iomem_resource.child; r; r = r->sibling) {
> +               if (r->desc == IORES_DESC_CXL &&
> +                   r->start == start && r->end == end)
> +                       return r;
> +       }
> +
> +       return NULL;
> +}
> +
> +static bool cxl_region_exists(resource_size_t start, resource_size_t end)
> +{
> +       const struct resource *res, *child;
> +
> +       res = cxl_window_exists(start, end);
> +       if (!res)
> +               return false;
> +
> +       for (child = res->child; child; child = child->sibling) {
> +               if (child->start <= start && child->end <= end)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
>  static int handle_deferred_cxl(struct device *host, int target_nid,
>                                const struct resource *res)
>  {
> -       /* TODO: Handle region assembly failures */
> +       if (region_intersects(res->start, resource_size(res),
> IORESOURCE_MEM,
> +                             IORES_DESC_CXL) != REGION_DISJOINT) {
> +

Will it work to search directly for the region above by using params
IORESOURCE_MEM, IORES_DESC_NONE. This way we only get region conflicts,
no empty windows to examine. I think that might replace cxl_region_exists()
work below.



> +               if (cxl_region_exists(res->start, res->end)) {
> +                       dax_cxl_mode = DAX_CXL_MODE_DROP;
> +                       dev_dbg(host, "dropping CXL range: %pr\n", res);
> +               }
> +               else {
> +                       dax_cxl_mode = DAX_CXL_MODE_REGISTER;
> +                       dev_dbg(host, "registering CXL range: %pr\n", res);
> +               }
> +
> +               hmem_register_device(host, target_nid, res);
> +       }
> +
>         return 0;
>  }
> 
> static void process_defer_work(struct work_struct *_work)
> {
>         struct dax_defer_work *work = container_of(_work, typeof(*work),
> work);
>         struct platform_device *pdev = work->pdev;
> 
>         /* relies on cxl_acpi and cxl_pci having had a chance to load */
>         wait_for_device_probe();
> 
>         walk_hmem_resources(&pdev->dev, handle_deferred_cxl);
> }
> 
> For region assembly failure (Thanks for the patch to test this!):
> 
> hmem_register_device: hmem_platform hmem_platform.0: deferring range to CXL:
> [mem 0x850000000-0x284fffffff flags 0x80000200]
> handle_deferred_cxl: hmem_platform hmem_platform.0: registering CXL range:
> [mem 0x850000000-0x284fffffff flags 0x80000200]
> hmem_register_device: hmem_platform hmem_platform.0: registering CXL range:
> [mem 0x850000000-0x284fffffff flags 0x80000200]
> 
> For region assembly success:
> 
> hmem_register_device: hmem_platform hmem_platform.0: deferring range to CXL:
> [mem 0x850000000-0x284fffffff flags 0x80000200]
> handle_deferred_cxl: hmem_platform hmem_platform.0: dropping CXL range: [mem
> 0x850000000-0x284fffffff flags 0x80000200]
> hmem_register_device: hmem_platform hmem_platform.0: dropping CXL range:
> [mem 0x850000000-0x284fffffff flags 0x80000200]
> 
> Happy to fold this into v4 if it looks good.
> 
> Thanks
> Smita
> > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-10-28  2:12           ` Alison Schofield
@ 2025-11-03 11:18             ` Tomasz Wolski
  2025-11-05  2:59               ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 16+ messages in thread
From: Tomasz Wolski @ 2025-11-03 11:18 UTC (permalink / raw)
  To: alison.schofield
  Cc: Smita.KoralahalliChannabasappa, ardb, benjamin.cheatham, bp,
	dan.j.williams, dave.jiang, dave, gregkh, huang.ying.caritas,
	ira.weiny, jack, jeff.johnson, jonathan.cameron, len.brown,
	linux-cxl, linux-fsdevel, linux-kernel, linux-pm, lizhijian,
	ming.li, nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
	skoralah, terry.bowman, vishal.l.verma, willy, yaoxt.fnst

Hi Alison and Smita,

I’ve been following your patch proposal and testing it on a few QEMU setups

> Will it work to search directly for the region above by using params
> IORESOURCE_MEM, IORES_DESC_NONE. This way we only get region conflicts,
> no empty windows to examine. I think that might replace cxl_region_exists()
> work below.

I see expected 'dropping CXL range' message (case when region covers full CXL window)

[   31.783945] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
[   31.784609] deferring range to CXL: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
[   31.790588] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
[   31.791102] dropping CXL range: [mem 0xa90000000-0xb8fffffff flags 0x80000200]

a90000000-b8fffffff : CXL Window 0
  a90000000-b8fffffff : region0
    a90000000-b8fffffff : dax0.0
      a90000000-b8fffffff : System RAM (kmem)

[   31.384899] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
[   31.385586] deferring range to CXL: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
[   31.391107] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
[   31.391676] dropping CXL range: [mem 0xa90000000-0xc8fffffff flags 0x80000200]

a90000000-c8fffffff : CXL Window 0
  a90000000-b8fffffff : region0
    a90000000-b8fffffff : dax0.0
      a90000000-b8fffffff : System RAM (kmem)
  b90000000-c8fffffff : region1
    b90000000-c8fffffff : dax1.0
      b90000000-c8fffffff : System RAM (kmem)
	  
a90000000-b8fffffff : CXL Window 0
  a90000000-b8fffffff : region0
    a90000000-b8fffffff : dax0.0
      a90000000-b8fffffff : System RAM (kmem)
b90000000-c8fffffff : CXL Window 1
  b90000000-c8fffffff : region1
    b90000000-c8fffffff : dax1.0
      b90000000-c8fffffff : System RAM (kmem)

However, when testing version with cxl_region_exists() I didn't see expected 'registering CXL range' message
when the CXL region does not fully occupy CXL window - please see below.
I should mention that I’m still getting familiar with CXL internals, so maybe I might be missing some context :)

a90000000-bcfffffff : CXL Window 0
  a90000000-b8fffffff : region0
    a90000000-b8fffffff : dax0.0
      a90000000-b8fffffff : System RAM (kmem)

[   30.434385] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
[   30.435116] deferring range to CXL: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
[   30.436530] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
[   30.437070] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
[   30.437599] dropping CXL range: [mem 0xa90000000-0xbcfffffff flags 0x80000200]

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL
  2025-11-03 11:18             ` Tomasz Wolski
@ 2025-11-05  2:59               ` Koralahalli Channabasappa, Smita
  0 siblings, 0 replies; 16+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2025-11-05  2:59 UTC (permalink / raw)
  To: Tomasz Wolski, alison.schofield, Dan Williams
  Cc: Smita.KoralahalliChannabasappa, ardb, benjamin.cheatham, bp,
	dan.j.williams, dave.jiang, dave, gregkh, huang.ying.caritas,
	ira.weiny, jack, jeff.johnson, jonathan.cameron, len.brown,
	linux-cxl, linux-fsdevel, linux-kernel, linux-pm, lizhijian,
	ming.li, nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
	terry.bowman, vishal.l.verma, willy, yaoxt.fnst

Hi Tomasz,

On 11/3/2025 3:18 AM, Tomasz Wolski wrote:
> Hi Alison and Smita,
> 
> I’ve been following your patch proposal and testing it on a few QEMU setups
> 
>> Will it work to search directly for the region above by using params
>> IORESOURCE_MEM, IORES_DESC_NONE. This way we only get region conflicts,
>> no empty windows to examine. I think that might replace cxl_region_exists()
>> work below.
> 
> I see expected 'dropping CXL range' message (case when region covers full CXL window)
> 
> [   31.783945] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
> [   31.784609] deferring range to CXL: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
> [   31.790588] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
> [   31.791102] dropping CXL range: [mem 0xa90000000-0xb8fffffff flags 0x80000200]
> 
> a90000000-b8fffffff : CXL Window 0
>    a90000000-b8fffffff : region0
>      a90000000-b8fffffff : dax0.0
>        a90000000-b8fffffff : System RAM (kmem)
> 
> [   31.384899] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
> [   31.385586] deferring range to CXL: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
> [   31.391107] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
> [   31.391676] dropping CXL range: [mem 0xa90000000-0xc8fffffff flags 0x80000200]
> 
> a90000000-c8fffffff : CXL Window 0
>    a90000000-b8fffffff : region0
>      a90000000-b8fffffff : dax0.0
>        a90000000-b8fffffff : System RAM (kmem)
>    b90000000-c8fffffff : region1
>      b90000000-c8fffffff : dax1.0
>        b90000000-c8fffffff : System RAM (kmem)
> 	
> a90000000-b8fffffff : CXL Window 0
>    a90000000-b8fffffff : region0
>      a90000000-b8fffffff : dax0.0
>        a90000000-b8fffffff : System RAM (kmem)
> b90000000-c8fffffff : CXL Window 1
>    b90000000-c8fffffff : region1
>      b90000000-c8fffffff : dax1.0
>        b90000000-c8fffffff : System RAM (kmem)
> 
> However, when testing version with cxl_region_exists() I didn't see expected 'registering CXL range' message
> when the CXL region does not fully occupy CXL window - please see below.
> I should mention that I’m still getting familiar with CXL internals, so maybe I might be missing some context :)
> 
> a90000000-bcfffffff : CXL Window 0
>    a90000000-b8fffffff : region0
>      a90000000-b8fffffff : dax0.0
>        a90000000-b8fffffff : System RAM (kmem)
> 
> [   30.434385] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
> [   30.435116] deferring range to CXL: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
> [   30.436530] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
> [   30.437070] hmem_platform hmem_platform.0: dropping CXL range: [mem 0xa90000000-0xbcfffffff flags 0x80000200]
> [   30.437599] dropping CXL range: [mem 0xa90000000-0xbcfffffff flags 0x80000200]

Thanks for testing and sharing the logs.

After off-list discussion with Alison and Dan (please jump in if I’m 
misrepresenting anything)

Ownership is determined by CXL regions, not window sizing. A CXL Window 
may be larger or smaller than the Soft Reserved (SR) span and that 
should not affect the decision.

Key thing to check is: Do the CXL regions fully and contiguously cover 
the entire Soft Reserved range?

Yes - CXL owns SR (“dropping CXL range”).

No - CXL must give up SR (“registering CXL range”). More on giving up SR 
below.

The previous child->start <= start && child->end <= end check needs to 
be replaced with a full coverage test:

1. Decide ownership based on region coverage: We check whether all CXL 
regions together fully and contiguously cover the "given" SR range.
If fully covered - CXL owns it.
If not fully covered - CXL must give up and the SR is owned by HMEM.

2. If CXL must give up - Remove the CXL regions that overlap SR before 
registering the SR via hmem_register_device().

3. Ensure dax_kmem never onlines memory until after this decision. 
dax_kmem must always probe after dax_hmem decides ownership.

Some of the valid configs (CXL owns: drop CXL range)

1.3ff0d0000000-3ff10fffffff : SR
     3ff0d0000000-3ff10fffffff : Window 1
         3ff0d0000000-3ff0dfffffff : region1
         3ff0e0000000-3ff0efffffff : region2
          3ff0f0000000-3ff0ffffffff : region3
          3ff100000000-3ff10fffffff : region4

2. 3ff0d0000000-3ff10fffffff : Window 1
      3ff0d0000000-3ff0dfffffff : SR
         3ff0d0000000-3ff0dfffffff : region1
      3ff0e0000000-3ff0efffffff : SR
         3ff0e0000000-3ff0efffffff : region2
      3ff0f0000000-3ff0ffffffff : SR
          3ff0f0000000-3ff0ffffffff : region3
      3ff100000000-3ff10fffffff : SR
          3ff100000000-3ff10fffffff : region4

3. 3ff0d0000000-3ff20fffffff : Window 1
       3ff0d0000000-3ff10fffffff : SR
         3ff0d0000000-3ff0dfffffff : region1
         3ff0e0000000-3ff0efffffff : region2
          3ff0f0000000-3ff0ffffffff : region3
          3ff100000000-3ff10fffffff : region4

4. 3ff0d0000000-3ff10fffffff : SR
     3ff0d0000000-3ff10fffffff : Window 1
         3ff0d0000000-3ff10fffffff : region1

Invalid configs (HMEM owns: registering CXL range)

1. 3ff0d0000000-3ff20fffffff : SR
     3ff0d0000000-3ff20fffffff : Window 1
         3ff0d0000000-3ff10fffffff : region1

2. 3ff0d0000000-3ff20fffffff : SR
     3ff0d0000000-3ff10fffffff : Window 1
         3ff0d0000000-3ff0dfffffff : region1
         3ff0e0000000-3ff0efffffff : region2
          3ff0f0000000-3ff0ffffffff : region3
          3ff100000000-3ff10fffffff : region4

3. region2 assembly failed or incorrect BIOS config
3ff0d0000000-3ff10fffffff : SR
     3ff0d0000000-3ff10fffffff : Window 1
         3ff0d0000000-3ff0dfffffff : region1
          3ff0f0000000-3ff0ffffffff : region3
          3ff100000000-3ff10fffffff : region4

I will work on incorporating the 3 steps mentioned above.

Thanks
Smita

> 
> Thanks,
> Tomasz


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-11-05  3:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30  4:47 [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Smita Koralahalli
2025-09-30  4:47 ` [PATCH v3 1/5] dax/hmem, e820, resource: Defer Soft Reserved registration until hmem is ready Smita Koralahalli
2025-09-30  4:47 ` [PATCH v3 2/5] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2025-09-30  4:47 ` [PATCH v3 3/5] dax/hmem: Use DEV_DAX_CXL instead of CXL_REGION for deferral Smita Koralahalli
2025-09-30  4:47 ` [PATCH v3 4/5] dax/hmem: Defer Soft Reserved overlap handling until CXL region assembly completes Smita Koralahalli
2025-10-07  1:27   ` Alison Schofield
2025-10-07  2:03     ` Alison Schofield
2025-09-30  4:47 ` [PATCH v3 5/5] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2025-10-07  1:16 ` [PATCH v3 0/5] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL Alison Schofield
2025-10-10 20:49   ` Alison Schofield
2025-10-14 17:52     ` Koralahalli Channabasappa, Smita
2025-10-21  0:06       ` Alison Schofield
2025-10-24 20:08         ` Koralahalli Channabasappa, Smita
2025-10-28  2:12           ` Alison Schofield
2025-11-03 11:18             ` Tomasz Wolski
2025-11-05  2:59               ` Koralahalli Channabasappa, Smita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).