* [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
@ 2026-02-10 6:44 Smita Koralahalli
2026-02-10 6:44 ` [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
` (12 more replies)
0 siblings, 13 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
This series aims to address long-standing conflicts between HMEM and
CXL when handling Soft Reserved memory ranges.
Reworked from Dan's patch:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
Previous work:
https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
Link to v5:
https://lore.kernel.org/all/20260122045543.218194-1-Smita.KoralahalliChannabasappa@amd.com
The series is based on branch "for-7.0/cxl-init" and base-commit is
base-commit: bc62f5b308cbdedf29132fe96e9d591e526527e1
[1] After offlining the memory I can tear down the regions and recreate
them back. dax_cxl creates dax devices and onlines memory.
850000000-284fffffff : CXL Window 0
850000000-284fffffff : region0
850000000-284fffffff : dax0.0
850000000-284fffffff : System RAM (kmem)
[2] With CONFIG_CXL_REGION disabled, all the resources are handled by
HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
and dax devices are created from HMEM.
850000000-284fffffff : CXL Window 0
850000000-284fffffff : Soft Reserved
850000000-284fffffff : dax0.0
850000000-284fffffff : System RAM (kmem)
[3] Region assembly failure works same as [2].
[4] REGISTER path:
When CXL_BUS = y (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = y),
the dax_cxl driver is probed and completes initialization before dax_hmem
probes. This scenario was tested with CXL = y, DAX_CXL = m and
DAX_HMEM = m. To validate the REGISTER path, I forced REGISTER even in
cases where SR completely overlaps the CXL region as I did not have access
to a system where the CXL region range is smaller than the SR range.
850000000-284fffffff : Soft Reserved
850000000-284fffffff : CXL Window 0
850000000-280fffffff : region0
850000000-284fffffff : dax0.0
850000000-284fffffff : System RAM (kmem)
"path":"\/platform\/ACPI0017:00\/root0\/decoder0.0\/region0\/dax_region0",
"id":0,
"size":"128.00 GiB (137.44 GB)",
"align":2097152
[ 35.961707] cxl-dax: cxl_dax_region_init()
[ 35.961713] cxl-dax: registering driver.
[ 35.961715] cxl-dax: dax_hmem work flushed.
[ 35.961754] alloc_dev_dax_range: dax0.0: alloc range[0]:
0x000000850000000:0x000000284fffffff
[ 35.976622] hmem: hmem_platform probe started.
[ 35.980821] cxl_bus_probe: cxl_dax_region dax_region0: probe: 0
[ 36.819566] hmem_platform hmem_platform.0: Soft Reserved not fully
contained in CXL; using HMEM
[ 36.819569] hmem_register_device: hmem_platform hmem_platform.0:
registering CXL range: [mem 0x850000000-0x284fffffff flags 0x80000200]
[ 36.934156] alloc_dax_region: hmem hmem.6: dax_region resource conflict
for [mem 0x850000000-0x284fffffff]
[ 36.989310] hmem hmem.6: probe with driver hmem failed with error -12
[5] When CXL_BUS = m (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = m),
DAX_CXL = m and DAX_HMEM = y the results are as expected. To validate the
REGISTER path, I forced REGISTER even in cases where SR completely
overlaps the CXL region as I did not have access to a system where the
CXL region range is smaller than the SR range.
850000000-284fffffff : Soft Reserved
850000000-284fffffff : CXL Window 0
850000000-280fffffff : region0
850000000-284fffffff : dax6.0
850000000-284fffffff : System RAM (kmem)
"path":"\/platform\/hmem.6",
"id":6,
"size":"128.00 GiB (137.44 GB)",
"align":2097152
[ 30.897665] devm_cxl_add_dax_region: cxl_region region0: region0:
register dax_region0
[ 30.921015] hmem: hmem_platform probe started.
[ 31.017946] hmem_platform hmem_platform.0: Soft Reserved not fully
contained in CXL; using HMEM
[ 31.056310] alloc_dev_dax_range: dax6.0: alloc range[0]:
0x0000000850000000:0x000000284fffffff
[ 34.781516] cxl-dax: cxl_dax_region_init()
[ 34.781522] cxl-dax: registering driver.
[ 34.781523] cxl-dax: dax_hmem work flushed.
[ 34.781549] alloc_dax_region: cxl_dax_region dax_region0: dax_region
resource conflict for [mem 0x850000000-0x284fffffff]
[ 34.781552] cxl_bus_probe: cxl_dax_region dax_region0: probe: -12
[ 34.781554] cxl_dax_region dax_region0: probe with driver cxl_dax_region
failed with error -12
v6 updates:
- Patch 1-3 no changes.
- New Patches 4-5.
- (void *)res -> res.
- cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
- New file include/cxl/cxl.h
- Introduced singleton workqueue.
- hmem to queue the work and cxl to flush.
- cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
- Included descriptions for dax_cxl_mode.
- kzalloc -> kmalloc in add_soft_reserve_into_iomem()
- dax_cxl_mode is exported to CXL.
- Introduced hmem_register_cxl_device() for walking only CXL
intersected SR ranges the second time.
v5 updates:
- Patch 1 dropped as its been merged for-7.0/cxl-init.
- Added Reviewed-by tags.
- Shared dax_cxl_mode between dax/cxl.c and dax/hmem.c and used
-EPROBE_DEFER to defer dax_cxl.
- CXL_REGION_F_AUTO check for resetting decoders.
- Teardown all CXL regions if any one CXL region doesn't fully contain
the Soft Reserved range.
- Added helper cxl_region_contains_sr() to determine Soft Reserved
ownership.
- bus_rescan_devices() to retry dax_cxl.
- Added guard(rwsem_read)(&cxl_rwsem.region).
v4 updates:
- No changes patches 1-3.
- New patches 4-7.
- handle_deferred_cxl() has been enhanced to handle case where CXL
regions do not contiguously and fully cover Soft Reserved ranges.
- Support added to defer cxl_dax registration.
- Support added to teardown cxl regions.
v3 updates:
- Fixed two "From".
v2 updates:
- Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem
depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian)
- Added TODO note. (Zhijian)
- Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE
conditional check. (Zhijian)
- insert_resource_late() -> insert_resource_expand_to_fit() and
__insert_resource_expand_to_fit() replacement. (Boris)
- Fixed Co-developed and Signed-off by. (Dan)
- Combined 2/6 and 3/6 into a single patch. (Zhijian).
- Skip local variable in remove_soft_reserved. (Jonathan)
- Drop kfree with __free(). (Jonathan)
- return 0 -> return dev_add_action_or_reset(host...) (Jonathan)
- Dropped 6/6.
- Reviewed-by tags (Dave, Jonathan)
Dan Williams (3):
dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
ranges
dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
Smita Koralahalli (6):
cxl/region: Skip decoder reset on detach for autodiscovered regions
dax: Track all dax_region allocations under a global resource tree
cxl/region: Add helper to check Soft Reserved containment by CXL
regions
dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory
ranges
dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
drivers/cxl/core/region.c | 34 +++++++++-
drivers/dax/Kconfig | 2 +
drivers/dax/Makefile | 3 +-
drivers/dax/bus.c | 84 ++++++++++++++++++++++++-
drivers/dax/bus.h | 26 ++++++++
drivers/dax/cxl.c | 28 ++++++++-
drivers/dax/hmem/hmem.c | 129 ++++++++++++++++++++++++++++++++++----
include/cxl/cxl.h | 15 +++++
8 files changed, 303 insertions(+), 18 deletions(-)
create mode 100644 include/cxl/cxl.h
--
2.17.1
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-02-19 3:22 ` Alison Schofield
2026-02-10 6:44 ` [PATCH v6 2/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
` (11 subsequent siblings)
12 siblings, 1 reply; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
From: Dan Williams <dan.j.williams@intel.com>
Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.
Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.
Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.
Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
drivers/dax/Kconfig | 2 ++
drivers/dax/hmem/hmem.c | 17 ++++++++++-------
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d656e4c0eb84..3683bb3f2311 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -48,6 +48,8 @@ config DEV_DAX_CXL
tristate "CXL DAX: direct access to CXL RAM regions"
depends on CXL_BUS && CXL_REGION && DEV_DAX
default CXL_REGION && DEV_DAX
+ depends on CXL_ACPI >= DEV_DAX_HMEM
+ depends on CXL_PCI >= DEV_DAX_HMEM
help
CXL RAM regions are either mapped by platform-firmware
and published in the initial system-memory map as "System RAM", mapped
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1cf7c2a0ee1c..008172fc3607 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -139,6 +139,16 @@ static __init int dax_hmem_init(void)
{
int rc;
+ /*
+ * Ensure that cxl_acpi and cxl_pci have a chance to kick off
+ * CXL topology discovery at least once before scanning the
+ * iomem resource tree for IORES_DESC_CXL resources.
+ */
+ if (IS_ENABLED(CONFIG_DEV_DAX_CXL)) {
+ request_module("cxl_acpi");
+ request_module("cxl_pci");
+ }
+
rc = platform_driver_register(&dax_hmem_platform_driver);
if (rc)
return rc;
@@ -159,13 +169,6 @@ static __exit void dax_hmem_exit(void)
module_init(dax_hmem_init);
module_exit(dax_hmem_exit);
-/* Allow for CXL to define its own dax regions */
-#if IS_ENABLED(CONFIG_CXL_REGION)
-#if IS_MODULE(CONFIG_CXL_ACPI)
-MODULE_SOFTDEP("pre: cxl_acpi");
-#endif
-#endif
-
MODULE_ALIAS("platform:hmem*");
MODULE_ALIAS("platform:hmem_platform*");
MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 2/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-02-10 6:44 ` [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-02-19 3:23 ` Alison Schofield
2026-02-10 6:44 ` [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
` (10 subsequent siblings)
12 siblings, 1 reply; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
From: Dan Williams <dan.j.williams@intel.com>
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that HMEM only defers Soft Reserved ranges when CXL DAX support is
enabled. This makes the coordination between HMEM and the CXL stack more
precise and prevents deferral in unrelated CXL configurations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
drivers/dax/hmem/hmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 008172fc3607..1e3424358490 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -66,7 +66,7 @@ static int hmem_register_device(struct device *host, int target_nid,
long id;
int rc;
- if (IS_ENABLED(CONFIG_CXL_REGION) &&
+ if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
IORES_DESC_CXL) != REGION_DISJOINT) {
dev_dbg(host, "deferring range to CXL: %pr\n", res);
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-02-10 6:44 ` [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-02-10 6:44 ` [PATCH v6 2/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-02-19 3:44 ` Alison Schofield
2026-03-11 21:37 ` Dan Williams
2026-02-10 6:44 ` [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
` (9 subsequent siblings)
12 siblings, 2 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
__cxl_decoder_detach() currently resets decoder programming whenever a
region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
autodiscovered regions, this can incorrectly tear down decoder state
that may be relied upon by other consumers or by subsequent ownership
decisions.
Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
set.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/region.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ae899f68551f..45ee598daf95 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
cxled->part = -1;
if (p->state > CXL_CONFIG_ACTIVE) {
- cxl_region_decode_reset(cxlr, p->interleave_ways);
+ if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
+
p->state = CXL_CONFIG_ACTIVE;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (2 preceding siblings ...)
2026-02-10 6:44 ` [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-02-18 15:54 ` Dave Jiang
2026-03-09 14:31 ` Jonathan Cameron
2026-02-10 6:44 ` [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
` (8 subsequent siblings)
12 siblings, 2 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
From: Dan Williams <dan.j.williams@intel.com>
Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
dax_cxl.
In addition, defer registration of the dax_cxl driver to a workqueue
instead of using module_cxl_driver(). This ensures that dax_hmem has
an opportunity to initialize and register its deferred callback and make
ownership decisions before dax_cxl begins probing and claiming Soft
Reserved ranges.
Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
out of line from other synchronous probing avoiding ordering
dependencies while coordinating ownership decisions with dax_hmem.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/dax/Makefile | 3 +--
drivers/dax/cxl.c | 27 ++++++++++++++++++++++++++-
2 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 5ed5c39857c8..70e996bf1526 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
+obj-y += hmem/
obj-$(CONFIG_DAX) += dax.o
obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
@@ -10,5 +11,3 @@ dax-y += bus.o
device_dax-y := device.o
dax_pmem-y := pmem.o
dax_cxl-y := cxl.o
-
-obj-y += hmem/
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 13cd94d32ff7..a2136adfa186 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
.id = CXL_DEVICE_DAX_REGION,
.drv = {
.suppress_bind_attrs = true,
+ .probe_type = PROBE_PREFER_ASYNCHRONOUS,
},
};
-module_cxl_driver(cxl_dax_region_driver);
+static void cxl_dax_region_driver_register(struct work_struct *work)
+{
+ cxl_driver_register(&cxl_dax_region_driver);
+}
+
+static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
+
+static int __init cxl_dax_region_init(void)
+{
+ /*
+ * Need to resolve a race with dax_hmem wanting to drive regions
+ * instead of CXL
+ */
+ queue_work(system_long_wq, &cxl_dax_region_driver_work);
+ return 0;
+}
+module_init(cxl_dax_region_init);
+
+static void __exit cxl_dax_region_exit(void)
+{
+ flush_work(&cxl_dax_region_driver_work);
+ cxl_driver_unregister(&cxl_dax_region_driver);
+}
+module_exit(cxl_dax_region_exit);
+
MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
MODULE_LICENSE("GPL");
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (3 preceding siblings ...)
2026-02-10 6:44 ` [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-02-18 16:04 ` Dave Jiang
` (2 more replies)
2026-02-10 6:44 ` [PATCH v6 6/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
` (7 subsequent siblings)
12 siblings, 3 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Introduce a global "DAX Regions" resource root and register each
dax_region->res under it via request_resource(). Release the resource on
dax_region teardown.
By enforcing a single global namespace for dax_region allocations, this
ensures only one of dax_hmem or dax_cxl can successfully register a
dax_region for a given range.
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/dax/bus.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index fde29e0ad68b..5f387feb95f0 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -10,6 +10,7 @@
#include "dax-private.h"
#include "bus.h"
+static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
static DEFINE_MUTEX(dax_bus_lock);
/*
@@ -625,6 +626,8 @@ static void dax_region_unregister(void *region)
{
struct dax_region *dax_region = region;
+ scoped_guard(rwsem_write, &dax_region_rwsem)
+ release_resource(&dax_region->res);
sysfs_remove_groups(&dax_region->dev->kobj,
dax_region_attribute_groups);
dax_region_put(dax_region);
@@ -635,6 +638,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
unsigned long flags)
{
struct dax_region *dax_region;
+ int rc;
/*
* The DAX core assumes that it can store its private data in
@@ -667,14 +671,27 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
.flags = IORESOURCE_MEM | flags,
};
- if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
- kfree(dax_region);
- return NULL;
+ scoped_guard(rwsem_write, &dax_region_rwsem)
+ rc = request_resource(&dax_regions, &dax_region->res);
+ if (rc) {
+ dev_dbg(parent, "dax_region resource conflict for %pR\n",
+ &dax_region->res);
+ goto err_res;
}
+ if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
+ goto err_sysfs;
+
if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
return NULL;
return dax_region;
+
+err_sysfs:
+ scoped_guard(rwsem_write, &dax_region_rwsem)
+ release_resource(&dax_region->res);
+err_res:
+ kfree(dax_region);
+ return NULL;
}
EXPORT_SYMBOL_GPL(alloc_dax_region);
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 6/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (4 preceding siblings ...)
2026-02-10 6:44 ` [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-03-12 0:29 ` Dan Williams
2026-02-10 6:44 ` [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination Smita Koralahalli
` (6 subsequent siblings)
12 siblings, 1 reply; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Add a helper to determine whether a given Soft Reserved memory range is
fully contained within the committed CXL region.
This helper provides a primitive for policy decisions in subsequent
patches such as co-ordination with dax_hmem to determine whether CXL has
fully claimed ownership of Soft Reserved memory ranges.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++++++++
include/cxl/cxl.h | 15 +++++++++++++++
2 files changed, 45 insertions(+)
create mode 100644 include/cxl/cxl.h
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 45ee598daf95..96ed550bfd2e 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -12,6 +12,7 @@
#include <linux/idr.h>
#include <linux/memory-tiers.h>
#include <linux/string_choices.h>
+#include <cxl/cxl.h>
#include <cxlmem.h>
#include <cxl.h>
#include "core.h"
@@ -3875,6 +3876,35 @@ static int cxl_region_debugfs_poison_clear(void *data, u64 offset)
DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
cxl_region_debugfs_poison_clear, "%llx\n");
+static int region_contains_soft_reserve(struct device *dev, void *data)
+{
+ struct resource *res = data;
+ struct cxl_region *cxlr;
+ struct cxl_region_params *p;
+
+ if (!is_cxl_region(dev))
+ return 0;
+
+ cxlr = to_cxl_region(dev);
+ p = &cxlr->params;
+
+ if (p->state != CXL_CONFIG_COMMIT)
+ return 0;
+
+ if (!p->res)
+ return 0;
+
+ return resource_contains(p->res, res) ? 1 : 0;
+}
+
+bool cxl_region_contains_soft_reserve(struct resource *res)
+{
+ guard(rwsem_read)(&cxl_rwsem.region);
+ return bus_for_each_dev(&cxl_bus_type, NULL, res,
+ region_contains_soft_reserve) != 0;
+}
+EXPORT_SYMBOL_GPL(cxl_region_contains_soft_reserve);
+
static int cxl_region_can_probe(struct cxl_region *cxlr)
{
struct cxl_region_params *p = &cxlr->params;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..db1f588e106c
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2026 Advanced Micro Devices, Inc. */
+#ifndef _CXL_H_
+#define _CXL_H_
+
+#ifdef CONFIG_CXL_REGION
+bool cxl_region_contains_soft_reserve(struct resource *res);
+#else
+static inline bool cxl_region_contains_soft_reserve(struct resource *res)
+{
+ return false;
+}
+#endif
+
+#endif /* _CXL_H_ */
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (5 preceding siblings ...)
2026-02-10 6:44 ` [PATCH v6 6/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
@ 2026-02-10 6:44 ` Smita Koralahalli
2026-02-18 17:52 ` Dave Jiang
2026-03-09 14:49 ` Jonathan Cameron
2026-02-10 6:45 ` [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
` (5 subsequent siblings)
12 siblings, 2 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:44 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Add helpers to register, queue and flush the deferred work.
These helpers allow dax_hmem to execute ownership resolution outside the
probe context before dax_cxl binds.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/dax/bus.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/dax/bus.h | 7 ++++++
2 files changed, 65 insertions(+)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 5f387feb95f0..92b88952ede1 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -25,6 +25,64 @@ DECLARE_RWSEM(dax_region_rwsem);
*/
DECLARE_RWSEM(dax_dev_rwsem);
+static DEFINE_MUTEX(dax_hmem_lock);
+static dax_hmem_deferred_fn hmem_deferred_fn;
+static void *dax_hmem_data;
+
+static void hmem_deferred_work(struct work_struct *work)
+{
+ dax_hmem_deferred_fn fn;
+ void *data;
+
+ scoped_guard(mutex, &dax_hmem_lock) {
+ fn = hmem_deferred_fn;
+ data = dax_hmem_data;
+ }
+
+ if (fn)
+ fn(data);
+}
+
+static DECLARE_WORK(dax_hmem_work, hmem_deferred_work);
+
+int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
+{
+ guard(mutex)(&dax_hmem_lock);
+
+ if (hmem_deferred_fn)
+ return -EINVAL;
+
+ hmem_deferred_fn = fn;
+ dax_hmem_data = data;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dax_hmem_register_work);
+
+int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data)
+{
+ guard(mutex)(&dax_hmem_lock);
+
+ if (hmem_deferred_fn != fn || dax_hmem_data != data)
+ return -EINVAL;
+
+ hmem_deferred_fn = NULL;
+ dax_hmem_data = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dax_hmem_unregister_work);
+
+void dax_hmem_queue_work(void)
+{
+ queue_work(system_long_wq, &dax_hmem_work);
+}
+EXPORT_SYMBOL_GPL(dax_hmem_queue_work);
+
+void dax_hmem_flush_work(void)
+{
+ flush_work(&dax_hmem_work);
+}
+EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
+
#define DAX_NAME_LEN 30
struct dax_id {
struct list_head list;
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index cbbf64443098..b58a88e8089c 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -41,6 +41,13 @@ struct dax_device_driver {
void (*remove)(struct dev_dax *dev);
};
+typedef void (*dax_hmem_deferred_fn)(void *data);
+
+int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
+int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data);
+void dax_hmem_queue_work(void);
+void dax_hmem_flush_work(void);
+
int __dax_driver_register(struct dax_device_driver *dax_drv,
struct module *module, const char *mod_name);
#define dax_driver_register(driver) \
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (6 preceding siblings ...)
2026-02-10 6:44 ` [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination Smita Koralahalli
@ 2026-02-10 6:45 ` Smita Koralahalli
2026-02-18 18:05 ` Dave Jiang
` (2 more replies)
2026-02-10 6:45 ` [PATCH v6 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
` (4 subsequent siblings)
12 siblings, 3 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:45 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.
Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during dax_hmem
probe, schedule deferred work and wait for the CXL stack to complete
enumeration and region assembly before deciding ownership.
Evaluate ownership of Soft Reserved ranges based on CXL region
containment.
- If all Soft Reserved ranges are fully contained within committed CXL
regions, DROP handling Soft Reserved ranges from dax_hmem and allow
dax_cxl to bind.
- If any Soft Reserved range is not fully claimed by committed CXL
region, REGISTER the Soft Reserved ranges with dax_hmem.
Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
ranges. Once, ownership resolution is complete, flush the deferred work
from dax_cxl before allowing dax_cxl to bind.
This enforces a strict ownership. Either CXL fully claims the Soft
reserved ranges or it relinquishes it entirely.
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/dax/bus.c | 3 ++
drivers/dax/bus.h | 19 ++++++++++
drivers/dax/cxl.c | 1 +
drivers/dax/hmem/hmem.c | 78 +++++++++++++++++++++++++++++++++++++++--
4 files changed, 99 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 92b88952ede1..81985bcc70f9 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -25,6 +25,9 @@ DECLARE_RWSEM(dax_region_rwsem);
*/
DECLARE_RWSEM(dax_dev_rwsem);
+enum dax_cxl_mode dax_cxl_mode = DAX_CXL_MODE_DEFER;
+EXPORT_SYMBOL_NS_GPL(dax_cxl_mode, "CXL");
+
static DEFINE_MUTEX(dax_hmem_lock);
static dax_hmem_deferred_fn hmem_deferred_fn;
static void *dax_hmem_data;
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index b58a88e8089c..82616ff52fd1 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -41,6 +41,25 @@ struct dax_device_driver {
void (*remove)(struct dev_dax *dev);
};
+/*
+ * enum dax_cxl_mode - State machine to determine ownership for CXL
+ * tagged Soft Reserved memory ranges.
+ * @DAX_CXL_MODE_DEFER: Ownership resolution pending. Set while waiting
+ * for CXL enumeration and region assembly to complete.
+ * @DAX_CXL_MODE_REGISTER: CXL regions do not fully cover Soft Reserved
+ * ranges. Fall back to registering those ranges via dax_hmem.
+ * @DAX_CXL_MODE_DROP: All Soft Reserved ranges intersecting CXL windows
+ * are fully contained within committed CXL regions. Drop HMEM handling
+ * and allow dax_cxl to bind.
+ */
+enum dax_cxl_mode {
+ DAX_CXL_MODE_DEFER,
+ DAX_CXL_MODE_REGISTER,
+ DAX_CXL_MODE_DROP,
+};
+
+extern enum dax_cxl_mode dax_cxl_mode;
+
typedef void (*dax_hmem_deferred_fn)(void *data);
int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index a2136adfa186..3ab39b77843d 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
static void cxl_dax_region_driver_register(struct work_struct *work)
{
+ dax_hmem_flush_work();
cxl_driver_register(&cxl_dax_region_driver);
}
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1e3424358490..85854e25254b 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,6 +3,7 @@
#include <linux/memregion.h>
#include <linux/module.h>
#include <linux/dax.h>
+#include <cxl/cxl.h>
#include "../bus.h"
static bool region_idle;
@@ -69,8 +70,18 @@ static int hmem_register_device(struct device *host, int target_nid,
if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
IORES_DESC_CXL) != REGION_DISJOINT) {
- dev_dbg(host, "deferring range to CXL: %pr\n", res);
- return 0;
+ switch (dax_cxl_mode) {
+ case DAX_CXL_MODE_DEFER:
+ dev_dbg(host, "deferring range to CXL: %pr\n", res);
+ dax_hmem_queue_work();
+ return 0;
+ case DAX_CXL_MODE_REGISTER:
+ dev_dbg(host, "registering CXL range: %pr\n", res);
+ break;
+ case DAX_CXL_MODE_DROP:
+ dev_dbg(host, "dropping CXL range: %pr\n", res);
+ return 0;
+ }
}
rc = region_intersects_soft_reserve(res->start, resource_size(res));
@@ -123,8 +134,70 @@ static int hmem_register_device(struct device *host, int target_nid,
return rc;
}
+static int hmem_register_cxl_device(struct device *host, int target_nid,
+ const struct resource *res)
+{
+ if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+ IORES_DESC_CXL) != REGION_DISJOINT)
+ return hmem_register_device(host, target_nid, res);
+
+ return 0;
+}
+
+static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
+ const struct resource *res)
+{
+ if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
+ IORES_DESC_CXL) != REGION_DISJOINT) {
+ if (!cxl_region_contains_soft_reserve((struct resource *)res))
+ return 1;
+ }
+
+ return 0;
+}
+
+static void process_defer_work(void *data)
+{
+ struct platform_device *pdev = data;
+ int rc;
+
+ /* relies on cxl_acpi and cxl_pci having had a chance to load */
+ wait_for_device_probe();
+
+ rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
+
+ if (!rc) {
+ dax_cxl_mode = DAX_CXL_MODE_DROP;
+ dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
+ } else {
+ dax_cxl_mode = DAX_CXL_MODE_REGISTER;
+ dev_warn(&pdev->dev,
+ "Soft Reserved not fully contained in CXL; using HMEM\n");
+ }
+
+ walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
+}
+
+static void kill_defer_work(void *data)
+{
+ struct platform_device *pdev = data;
+
+ dax_hmem_flush_work();
+ dax_hmem_unregister_work(process_defer_work, pdev);
+}
+
static int dax_hmem_platform_probe(struct platform_device *pdev)
{
+ int rc;
+
+ rc = dax_hmem_register_work(process_defer_work, pdev);
+ if (rc)
+ return rc;
+
+ rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, pdev);
+ if (rc)
+ return rc;
+
return walk_hmem_resources(&pdev->dev, hmem_register_device);
}
@@ -174,3 +247,4 @@ MODULE_ALIAS("platform:hmem_platform*");
MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
+MODULE_IMPORT_NS("CXL");
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (7 preceding siblings ...)
2026-02-10 6:45 ` [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
@ 2026-02-10 6:45 ` Smita Koralahalli
2026-02-10 19:16 ` [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Alison Schofield
` (3 subsequent siblings)
12 siblings, 0 replies; 61+ messages in thread
From: Smita Koralahalli @ 2026-02-10 6:45 UTC (permalink / raw)
To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Reworked from a patch by Alison Schofield <alison.schofield@intel.com>
Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
to consume.
This restores visibility in /proc/iomem for ranges actively in use, while
avoiding the early-boot conflicts that occurred when Soft Reserved was
published into iomem before CXL window and region discovery.
Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/hmem/hmem.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 85854e25254b..c07bf5fe833d 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -59,6 +59,34 @@ static void release_hmem(void *pdev)
platform_device_unregister(pdev);
}
+static void remove_soft_reserved(void *r)
+{
+ remove_resource(r);
+ kfree(r);
+}
+
+static int add_soft_reserve_into_iomem(struct device *host,
+ const struct resource *res)
+{
+ int rc;
+
+ struct resource *soft __free(kfree) =
+ kmalloc(sizeof(*res), GFP_KERNEL);
+ if (!soft)
+ return -ENOMEM;
+
+ *soft = DEFINE_RES_NAMED_DESC(res->start, (res->end - res->start + 1),
+ "Soft Reserved", IORESOURCE_MEM,
+ IORES_DESC_SOFT_RESERVED);
+
+ rc = insert_resource(&iomem_resource, soft);
+ if (rc)
+ return rc;
+
+ return devm_add_action_or_reset(host, remove_soft_reserved,
+ no_free_ptr(soft));
+}
+
static int hmem_register_device(struct device *host, int target_nid,
const struct resource *res)
{
@@ -88,7 +116,9 @@ static int hmem_register_device(struct device *host, int target_nid,
if (rc != REGION_INTERSECTS)
return 0;
- /* TODO: Add Soft-Reserved memory back to iomem */
+ rc = add_soft_reserve_into_iomem(host, res);
+ if (rc)
+ return rc;
id = memregion_alloc(GFP_KERNEL);
if (id < 0) {
--
2.17.1
^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (8 preceding siblings ...)
2026-02-10 6:45 ` [PATCH v6 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
@ 2026-02-10 19:16 ` Alison Schofield
2026-02-10 19:49 ` Koralahalli Channabasappa, Smita
2026-02-12 14:44 ` Tomasz Wolski
2026-02-12 20:02 ` [sos-linux-dev] " Koralahalli Channabasappa, Smita
` (2 subsequent siblings)
12 siblings, 2 replies; 61+ messages in thread
From: Alison Schofield @ 2026-02-10 19:16 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, Feb 10, 2026 at 06:44:52AM +0000, Smita Koralahalli wrote:
> This series aims to address long-standing conflicts between HMEM and
> CXL when handling Soft Reserved memory ranges.
>
> Reworked from Dan's patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>
> Previous work:
> https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
>
> Link to v5:
> https://lore.kernel.org/all/20260122045543.218194-1-Smita.KoralahalliChannabasappa@amd.com
>
> The series is based on branch "for-7.0/cxl-init" and base-commit is
> base-commit: bc62f5b308cbdedf29132fe96e9d591e526527e1
>
> [1] After offlining the memory I can tear down the regions and recreate
> them back. dax_cxl creates dax devices and onlines memory.
> 850000000-284fffffff : CXL Window 0
> 850000000-284fffffff : region0
> 850000000-284fffffff : dax0.0
> 850000000-284fffffff : System RAM (kmem)
>
> [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
> HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
> and dax devices are created from HMEM.
> 850000000-284fffffff : CXL Window 0
> 850000000-284fffffff : Soft Reserved
> 850000000-284fffffff : dax0.0
> 850000000-284fffffff : System RAM (kmem)
>
> [3] Region assembly failure works same as [2].
>
> [4] REGISTER path:
> When CXL_BUS = y (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = y),
> the dax_cxl driver is probed and completes initialization before dax_hmem
> probes. This scenario was tested with CXL = y, DAX_CXL = m and
> DAX_HMEM = m. To validate the REGISTER path, I forced REGISTER even in
> cases where SR completely overlaps the CXL region as I did not have access
> to a system where the CXL region range is smaller than the SR range.
>
> 850000000-284fffffff : Soft Reserved
> 850000000-284fffffff : CXL Window 0
> 850000000-280fffffff : region0
> 850000000-284fffffff : dax0.0
> 850000000-284fffffff : System RAM (kmem)
>
> "path":"\/platform\/ACPI0017:00\/root0\/decoder0.0\/region0\/dax_region0",
> "id":0,
> "size":"128.00 GiB (137.44 GB)",
> "align":2097152
>
> [ 35.961707] cxl-dax: cxl_dax_region_init()
> [ 35.961713] cxl-dax: registering driver.
> [ 35.961715] cxl-dax: dax_hmem work flushed.
> [ 35.961754] alloc_dev_dax_range: dax0.0: alloc range[0]:
> 0x000000850000000:0x000000284fffffff
> [ 35.976622] hmem: hmem_platform probe started.
> [ 35.980821] cxl_bus_probe: cxl_dax_region dax_region0: probe: 0
> [ 36.819566] hmem_platform hmem_platform.0: Soft Reserved not fully
> contained in CXL; using HMEM
> [ 36.819569] hmem_register_device: hmem_platform hmem_platform.0:
> registering CXL range: [mem 0x850000000-0x284fffffff flags 0x80000200]
> [ 36.934156] alloc_dax_region: hmem hmem.6: dax_region resource conflict
> for [mem 0x850000000-0x284fffffff]
> [ 36.989310] hmem hmem.6: probe with driver hmem failed with error -12
>
> [5] When CXL_BUS = m (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = m),
> DAX_CXL = m and DAX_HMEM = y the results are as expected. To validate the
> REGISTER path, I forced REGISTER even in cases where SR completely
> overlaps the CXL region as I did not have access to a system where the
> CXL region range is smaller than the SR range.
>
> 850000000-284fffffff : Soft Reserved
> 850000000-284fffffff : CXL Window 0
> 850000000-280fffffff : region0
> 850000000-284fffffff : dax6.0
> 850000000-284fffffff : System RAM (kmem)
>
> "path":"\/platform\/hmem.6",
> "id":6,
> "size":"128.00 GiB (137.44 GB)",
> "align":2097152
>
> [ 30.897665] devm_cxl_add_dax_region: cxl_region region0: region0:
> register dax_region0
> [ 30.921015] hmem: hmem_platform probe started.
> [ 31.017946] hmem_platform hmem_platform.0: Soft Reserved not fully
> contained in CXL; using HMEM
> [ 31.056310] alloc_dev_dax_range: dax6.0: alloc range[0]:
> 0x0000000850000000:0x000000284fffffff
> [ 34.781516] cxl-dax: cxl_dax_region_init()
> [ 34.781522] cxl-dax: registering driver.
> [ 34.781523] cxl-dax: dax_hmem work flushed.
> [ 34.781549] alloc_dax_region: cxl_dax_region dax_region0: dax_region
> resource conflict for [mem 0x850000000-0x284fffffff]
> [ 34.781552] cxl_bus_probe: cxl_dax_region dax_region0: probe: -12
> [ 34.781554] cxl_dax_region dax_region0: probe with driver cxl_dax_region
> failed with error -12
>
> v6 updates:
> - Patch 1-3 no changes.
> - New Patches 4-5.
> - (void *)res -> res.
> - cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
> - New file include/cxl/cxl.h
> - Introduced singleton workqueue.
> - hmem to queue the work and cxl to flush.
> - cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
> - Included descriptions for dax_cxl_mode.
> - kzalloc -> kmalloc in add_soft_reserve_into_iomem()
> - dax_cxl_mode is exported to CXL.
> - Introduced hmem_register_cxl_device() for walking only CXL
> intersected SR ranges the second time.
During v5 review of this patch:
[PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
there was discussion around handling region teardown. It's not mentioned
in the changelog, and the teardown is completely removed from the patch.
The discussion seemed to be leaning towards not tearing down 'all', but
it's not clear to me that we decided not to tear down anything - which
this update now does.
And, as you may be guessing, I'm seeing disabled regions with DAX children
and figuring out what can be done with them.
Can you explain the new approach so I can test against that intention?
FYI - I am able to confirm the dax regions are back for no-soft-reserved
case, and my basic hotplug flow works with v6.
-- Alison
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 19:16 ` [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Alison Schofield
@ 2026-02-10 19:49 ` Koralahalli Channabasappa, Smita
2026-02-12 6:38 ` Alison Schofield
2026-02-12 14:44 ` Tomasz Wolski
1 sibling, 1 reply; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-10 19:49 UTC (permalink / raw)
To: Alison Schofield, Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Hi Alison,
On 2/10/2026 11:16 AM, Alison Schofield wrote:
> On Tue, Feb 10, 2026 at 06:44:52AM +0000, Smita Koralahalli wrote:
>> This series aims to address long-standing conflicts between HMEM and
>> CXL when handling Soft Reserved memory ranges.
>>
>> Reworked from Dan's patch:
>> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>>
>> Previous work:
>> https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
>>
>> Link to v5:
>> https://lore.kernel.org/all/20260122045543.218194-1-Smita.KoralahalliChannabasappa@amd.com
>>
>> The series is based on branch "for-7.0/cxl-init" and base-commit is
>> base-commit: bc62f5b308cbdedf29132fe96e9d591e526527e1
>>
>> [1] After offlining the memory I can tear down the regions and recreate
>> them back. dax_cxl creates dax devices and onlines memory.
>> 850000000-284fffffff : CXL Window 0
>> 850000000-284fffffff : region0
>> 850000000-284fffffff : dax0.0
>> 850000000-284fffffff : System RAM (kmem)
>>
>> [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
>> HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
>> and dax devices are created from HMEM.
>> 850000000-284fffffff : CXL Window 0
>> 850000000-284fffffff : Soft Reserved
>> 850000000-284fffffff : dax0.0
>> 850000000-284fffffff : System RAM (kmem)
>>
>> [3] Region assembly failure works same as [2].
>>
>> [4] REGISTER path:
>> When CXL_BUS = y (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = y),
>> the dax_cxl driver is probed and completes initialization before dax_hmem
>> probes. This scenario was tested with CXL = y, DAX_CXL = m and
>> DAX_HMEM = m. To validate the REGISTER path, I forced REGISTER even in
>> cases where SR completely overlaps the CXL region as I did not have access
>> to a system where the CXL region range is smaller than the SR range.
>>
>> 850000000-284fffffff : Soft Reserved
>> 850000000-284fffffff : CXL Window 0
>> 850000000-280fffffff : region0
>> 850000000-284fffffff : dax0.0
>> 850000000-284fffffff : System RAM (kmem)
>>
>> "path":"\/platform\/ACPI0017:00\/root0\/decoder0.0\/region0\/dax_region0",
>> "id":0,
>> "size":"128.00 GiB (137.44 GB)",
>> "align":2097152
>>
>> [ 35.961707] cxl-dax: cxl_dax_region_init()
>> [ 35.961713] cxl-dax: registering driver.
>> [ 35.961715] cxl-dax: dax_hmem work flushed.
>> [ 35.961754] alloc_dev_dax_range: dax0.0: alloc range[0]:
>> 0x000000850000000:0x000000284fffffff
>> [ 35.976622] hmem: hmem_platform probe started.
>> [ 35.980821] cxl_bus_probe: cxl_dax_region dax_region0: probe: 0
>> [ 36.819566] hmem_platform hmem_platform.0: Soft Reserved not fully
>> contained in CXL; using HMEM
>> [ 36.819569] hmem_register_device: hmem_platform hmem_platform.0:
>> registering CXL range: [mem 0x850000000-0x284fffffff flags 0x80000200]
>> [ 36.934156] alloc_dax_region: hmem hmem.6: dax_region resource conflict
>> for [mem 0x850000000-0x284fffffff]
>> [ 36.989310] hmem hmem.6: probe with driver hmem failed with error -12
>>
>> [5] When CXL_BUS = m (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = m),
>> DAX_CXL = m and DAX_HMEM = y the results are as expected. To validate the
>> REGISTER path, I forced REGISTER even in cases where SR completely
>> overlaps the CXL region as I did not have access to a system where the
>> CXL region range is smaller than the SR range.
>>
>> 850000000-284fffffff : Soft Reserved
>> 850000000-284fffffff : CXL Window 0
>> 850000000-280fffffff : region0
>> 850000000-284fffffff : dax6.0
>> 850000000-284fffffff : System RAM (kmem)
>>
>> "path":"\/platform\/hmem.6",
>> "id":6,
>> "size":"128.00 GiB (137.44 GB)",
>> "align":2097152
>>
>> [ 30.897665] devm_cxl_add_dax_region: cxl_region region0: region0:
>> register dax_region0
>> [ 30.921015] hmem: hmem_platform probe started.
>> [ 31.017946] hmem_platform hmem_platform.0: Soft Reserved not fully
>> contained in CXL; using HMEM
>> [ 31.056310] alloc_dev_dax_range: dax6.0: alloc range[0]:
>> 0x0000000850000000:0x000000284fffffff
>> [ 34.781516] cxl-dax: cxl_dax_region_init()
>> [ 34.781522] cxl-dax: registering driver.
>> [ 34.781523] cxl-dax: dax_hmem work flushed.
>> [ 34.781549] alloc_dax_region: cxl_dax_region dax_region0: dax_region
>> resource conflict for [mem 0x850000000-0x284fffffff]
>> [ 34.781552] cxl_bus_probe: cxl_dax_region dax_region0: probe: -12
>> [ 34.781554] cxl_dax_region dax_region0: probe with driver cxl_dax_region
>> failed with error -12
>>
>> v6 updates:
>> - Patch 1-3 no changes.
>> - New Patches 4-5.
>> - (void *)res -> res.
>> - cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
>> - New file include/cxl/cxl.h
>> - Introduced singleton workqueue.
>> - hmem to queue the work and cxl to flush.
>> - cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
>> - Included descriptions for dax_cxl_mode.
>> - kzalloc -> kmalloc in add_soft_reserve_into_iomem()
>> - dax_cxl_mode is exported to CXL.
>> - Introduced hmem_register_cxl_device() for walking only CXL
>> intersected SR ranges the second time.
>
> During v5 review of this patch:
>
> [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
>
> there was discussion around handling region teardown. It's not mentioned
> in the changelog, and the teardown is completely removed from the patch.
>
> The discussion seemed to be leaning towards not tearing down 'all', but
> it's not clear to me that we decided not to tear down anything - which
> this update now does.
>
> And, as you may be guessing, I'm seeing disabled regions with DAX children
> and figuring out what can be done with them.
>
> Can you explain the new approach so I can test against that intention?
>
> FYI - I am able to confirm the dax regions are back for no-soft-reserved
> case, and my basic hotplug flow works with v6.
>
> -- Alison
Hi Alison,
Thanks for the test and confirming the no-soft-reserved and hotplug
cases work.
You're right that cxl_region_teardown_all() was removed in v6. I should
have called this out more clearly in the changelog. Here's what I learnt
from v5 review. Correct me if I misunderstood.
During v5 review, regarding dropping teardown (comments from Dan):
"If we go with the alloc_dax_region() observation in my other mail it
means that the HPA space will already be claimed and
cxl_dax_region_probe() will fail. If we can get to that point of "all
HMEM registered, and all CXL regions failing to attach their
cxl_dax_region devices" that is a good stopping point. Then can decide
if a follow-on patch is needed to cleanup that state
(cxl_region_teardown_all()) , or if it can just idle that way in the
messy state and wait for userspace to cleanup if it wants."
https://lore.kernel.org/all/697aad9546542_30951007c@dwillia2-mobl4.notmuch/
Also:
"In other words, I thought total teardown would be simpler, but as the
feedback keeps coming in, I think that brings a different set of
complexity. So just inject failures for dax_cxl to trip over and then we
can go further later to effect total teardown if that proves to not be
enough."
https://lore.kernel.org/all/697a9d46b147e_309510027@dwillia2-mobl4.notmuch/
The v6 approach replaces teardown with the alloc_dax_region() resource
exclusion in patch 5. When HMEM wins the ownership decision (REGISTER
path), it successfully claims the dax_region resource range first. When
dax_cxl later tries to probe, its alloc_dax_region() call hits a
resource conflict and fails, leaving the cxl_dax_region device in a
disabled state.
(There is a separate ordering issue when CXL is built-in and HMEM is a
module, where dax_cxl may claim the dax_region first as observed in
experiments [4] and [5], but that is an independent topic and might not
be relevant here.)
So the disabled regions with DAX children you are seeing on the CXL side
are likely expected as Dan mentioned - they show that CXL tried to claim
the range but HMEM got there first. Though the cxl region remains
committed, no dax_region gets created for it because the HPA space is
already taken.
Thanks
Smita
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 19:49 ` Koralahalli Channabasappa, Smita
@ 2026-02-12 6:38 ` Alison Schofield
2026-02-20 21:00 ` Koralahalli Channabasappa, Smita
0 siblings, 1 reply; 61+ messages in thread
From: Alison Schofield @ 2026-02-12 6:38 UTC (permalink / raw)
To: Koralahalli Channabasappa, Smita
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, Feb 10, 2026 at 11:49:04AM -0800, Koralahalli Channabasappa, Smita wrote:
> Hi Alison,
>
> On 2/10/2026 11:16 AM, Alison Schofield wrote:
> > On Tue, Feb 10, 2026 at 06:44:52AM +0000, Smita Koralahalli wrote:
> > > This series aims to address long-standing conflicts between HMEM and
> > > CXL when handling Soft Reserved memory ranges.
> > >
> > > Reworked from Dan's patch:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
> > >
> > > Previous work:
> > > https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
> > >
> > > Link to v5:
> > > https://lore.kernel.org/all/20260122045543.218194-1-Smita.KoralahalliChannabasappa@amd.com
> > >
> > > The series is based on branch "for-7.0/cxl-init" and base-commit is
> > > base-commit: bc62f5b308cbdedf29132fe96e9d591e526527e1
> > >
> > > [1] After offlining the memory I can tear down the regions and recreate
> > > them back. dax_cxl creates dax devices and onlines memory.
> > > 850000000-284fffffff : CXL Window 0
> > > 850000000-284fffffff : region0
> > > 850000000-284fffffff : dax0.0
> > > 850000000-284fffffff : System RAM (kmem)
> > >
> > > [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
> > > HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
> > > and dax devices are created from HMEM.
> > > 850000000-284fffffff : CXL Window 0
> > > 850000000-284fffffff : Soft Reserved
> > > 850000000-284fffffff : dax0.0
> > > 850000000-284fffffff : System RAM (kmem)
> > >
> > > [3] Region assembly failure works same as [2].
> > >
> > > [4] REGISTER path:
> > > When CXL_BUS = y (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = y),
> > > the dax_cxl driver is probed and completes initialization before dax_hmem
> > > probes. This scenario was tested with CXL = y, DAX_CXL = m and
> > > DAX_HMEM = m. To validate the REGISTER path, I forced REGISTER even in
> > > cases where SR completely overlaps the CXL region as I did not have access
> > > to a system where the CXL region range is smaller than the SR range.
> > >
> > > 850000000-284fffffff : Soft Reserved
> > > 850000000-284fffffff : CXL Window 0
> > > 850000000-280fffffff : region0
> > > 850000000-284fffffff : dax0.0
> > > 850000000-284fffffff : System RAM (kmem)
> > >
> > > "path":"\/platform\/ACPI0017:00\/root0\/decoder0.0\/region0\/dax_region0",
> > > "id":0,
> > > "size":"128.00 GiB (137.44 GB)",
> > > "align":2097152
> > >
> > > [ 35.961707] cxl-dax: cxl_dax_region_init()
> > > [ 35.961713] cxl-dax: registering driver.
> > > [ 35.961715] cxl-dax: dax_hmem work flushed.
> > > [ 35.961754] alloc_dev_dax_range: dax0.0: alloc range[0]:
> > > 0x000000850000000:0x000000284fffffff
> > > [ 35.976622] hmem: hmem_platform probe started.
> > > [ 35.980821] cxl_bus_probe: cxl_dax_region dax_region0: probe: 0
> > > [ 36.819566] hmem_platform hmem_platform.0: Soft Reserved not fully
> > > contained in CXL; using HMEM
> > > [ 36.819569] hmem_register_device: hmem_platform hmem_platform.0:
> > > registering CXL range: [mem 0x850000000-0x284fffffff flags 0x80000200]
> > > [ 36.934156] alloc_dax_region: hmem hmem.6: dax_region resource conflict
> > > for [mem 0x850000000-0x284fffffff]
> > > [ 36.989310] hmem hmem.6: probe with driver hmem failed with error -12
> > >
> > > [5] When CXL_BUS = m (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = m),
> > > DAX_CXL = m and DAX_HMEM = y the results are as expected. To validate the
> > > REGISTER path, I forced REGISTER even in cases where SR completely
> > > overlaps the CXL region as I did not have access to a system where the
> > > CXL region range is smaller than the SR range.
> > >
> > > 850000000-284fffffff : Soft Reserved
> > > 850000000-284fffffff : CXL Window 0
> > > 850000000-280fffffff : region0
> > > 850000000-284fffffff : dax6.0
> > > 850000000-284fffffff : System RAM (kmem)
> > >
> > > "path":"\/platform\/hmem.6",
> > > "id":6,
> > > "size":"128.00 GiB (137.44 GB)",
> > > "align":2097152
> > >
> > > [ 30.897665] devm_cxl_add_dax_region: cxl_region region0: region0:
> > > register dax_region0
> > > [ 30.921015] hmem: hmem_platform probe started.
> > > [ 31.017946] hmem_platform hmem_platform.0: Soft Reserved not fully
> > > contained in CXL; using HMEM
> > > [ 31.056310] alloc_dev_dax_range: dax6.0: alloc range[0]:
> > > 0x0000000850000000:0x000000284fffffff
> > > [ 34.781516] cxl-dax: cxl_dax_region_init()
> > > [ 34.781522] cxl-dax: registering driver.
> > > [ 34.781523] cxl-dax: dax_hmem work flushed.
> > > [ 34.781549] alloc_dax_region: cxl_dax_region dax_region0: dax_region
> > > resource conflict for [mem 0x850000000-0x284fffffff]
> > > [ 34.781552] cxl_bus_probe: cxl_dax_region dax_region0: probe: -12
> > > [ 34.781554] cxl_dax_region dax_region0: probe with driver cxl_dax_region
> > > failed with error -12
> > >
> > > v6 updates:
> > > - Patch 1-3 no changes.
> > > - New Patches 4-5.
> > > - (void *)res -> res.
> > > - cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
> > > - New file include/cxl/cxl.h
> > > - Introduced singleton workqueue.
> > > - hmem to queue the work and cxl to flush.
> > > - cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
> > > - Included descriptions for dax_cxl_mode.
> > > - kzalloc -> kmalloc in add_soft_reserve_into_iomem()
> > > - dax_cxl_mode is exported to CXL.
> > > - Introduced hmem_register_cxl_device() for walking only CXL
> > > intersected SR ranges the second time.
> >
> > During v5 review of this patch:
> >
> > [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
> >
> > there was discussion around handling region teardown. It's not mentioned
> > in the changelog, and the teardown is completely removed from the patch.
> >
> > The discussion seemed to be leaning towards not tearing down 'all', but
> > it's not clear to me that we decided not to tear down anything - which
> > this update now does.
> >
> > And, as you may be guessing, I'm seeing disabled regions with DAX children
> > and figuring out what can be done with them.
> >
> > Can you explain the new approach so I can test against that intention?
> >
> > FYI - I am able to confirm the dax regions are back for no-soft-reserved
> > case, and my basic hotplug flow works with v6.
> >
> > -- Alison
>
> Hi Alison,
>
> Thanks for the test and confirming the no-soft-reserved and hotplug cases
> work.
>
> You're right that cxl_region_teardown_all() was removed in v6. I should have
> called this out more clearly in the changelog. Here's what I learnt from v5
> review. Correct me if I misunderstood.
>
> During v5 review, regarding dropping teardown (comments from Dan):
>
> "If we go with the alloc_dax_region() observation in my other mail it means
> that the HPA space will already be claimed and cxl_dax_region_probe() will
> fail. If we can get to that point of "all HMEM registered, and all CXL
> regions failing to attach their
> cxl_dax_region devices" that is a good stopping point. Then can decide if a
> follow-on patch is needed to cleanup that state (cxl_region_teardown_all())
> , or if it can just idle that way in the messy state and wait for userspace
> to cleanup if it wants."
>
> https://lore.kernel.org/all/697aad9546542_30951007c@dwillia2-mobl4.notmuch/
>
> Also:
>
> "In other words, I thought total teardown would be simpler, but as the
> feedback keeps coming in, I think that brings a different set of complexity.
> So just inject failures for dax_cxl to trip over and then we can go further
> later to effect total teardown if that proves to not be enough."
>
> https://lore.kernel.org/all/697a9d46b147e_309510027@dwillia2-mobl4.notmuch/
>
> The v6 approach replaces teardown with the alloc_dax_region() resource
> exclusion in patch 5. When HMEM wins the ownership decision (REGISTER path),
> it successfully claims the dax_region resource range first. When dax_cxl
> later tries to probe, its alloc_dax_region() call hits a resource conflict
> and fails, leaving the cxl_dax_region device in a disabled state.
>
> (There is a separate ordering issue when CXL is built-in and HMEM is a
> module, where dax_cxl may claim the dax_region first as observed in
> experiments [4] and [5], but that is an independent topic and might not be
> relevant here.)
>
> So the disabled regions with DAX children you are seeing on the CXL side are
> likely expected as Dan mentioned - they show that CXL tried to claim the
> range but HMEM got there first. Though the cxl region remains committed, no
> dax_region gets created for it because the HPA space is already taken.
Hi Smita,
The disable regions I'm seeing are the remnants of failed region assemblies
where HMEM rightfully took over. So the take over is good, but the expected
view shown way above and repasted below is not what I'm seeing. Case [3]
is not the same as Case [2], but have a region btw the SR and DAX.
> > > [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
> > > HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
> > > and dax devices are created from HMEM.
> > > 850000000-284fffffff : CXL Window 0
> > > 850000000-284fffffff : Soft Reserved
> > > 850000000-284fffffff : dax0.0
> > > 850000000-284fffffff : System RAM (kmem)
> > >
> > > [3] Region assembly failure works same as [2].
> > >
I posted a patch[1] that I think gets us to what is expected.
FWIW I do agree with abandoning the teardown all approach. In this
patch I still don't suggest tearing down the region. It can stay for
'forensics', but I do think we should make /proc/iomem accurately
reflect the memory topology.
[1] https://lore.kernel.org/linux-cxl/20260212062250.1219043-1-alison.schofield@intel.com/
-- Alison
>
> Thanks
> Smita
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 19:16 ` [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Alison Schofield
2026-02-10 19:49 ` Koralahalli Channabasappa, Smita
@ 2026-02-12 14:44 ` Tomasz Wolski
2026-02-12 21:18 ` Alison Schofield
1 sibling, 1 reply; 61+ messages in thread
From: Tomasz Wolski @ 2026-02-12 14:44 UTC (permalink / raw)
To: alison.schofield
Cc: Smita.KoralahalliChannabasappa, ardb, benjamin.cheatham, bp,
dan.j.williams, dave.jiang, dave, gregkh, huang.ying.caritas,
ira.weiny, jack, jeff.johnson, jonathan.cameron, len.brown,
linux-cxl, linux-fsdevel, linux-kernel, linux-pm, lizhijian,
ming.li, nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
terry.bowman, tomasz.wolski, vishal.l.verma, willy, yaoxt.fnst,
yazen.ghannam
>
>FYI - I am able to confirm the dax regions are back for no-soft-reserved
>case, and my basic hotplug flow works with v6.
>
>-- Alison
Hello Alison,
I wanted to ask about this scenario.
Is my understanding correct that this fix is needed for cases without Soft Reserve and:
1) CXL memory is installed in the server (no hotplug) and OS is started
2) CXL memory is hot-plugged after the OS starts
3) Tests with cxl-test driver
In such case either the admin fails to manually create region via cxl cli (if there
was no auto-regions) or regions fails to be created automatically during driver probe
Is this correct?
Best regards,
Tomasz
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [sos-linux-dev] [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (9 preceding siblings ...)
2026-02-10 19:16 ` [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Alison Schofield
@ 2026-02-12 20:02 ` Koralahalli Channabasappa, Smita
2026-02-13 14:04 ` Gregory Price
2026-02-20 9:45 ` Tomasz Wolski
12 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-12 20:02 UTC (permalink / raw)
To: sos-linux-dev, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Tomasz Wolski
On 2/9/2026 10:44 PM, Smita Koralahalli wrote:
> This series aims to address long-standing conflicts between HMEM and
> CXL when handling Soft Reserved memory ranges.
>
> Reworked from Dan's patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>
> Previous work:
> https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
>
> Link to v5:
> https://lore.kernel.org/all/20260122045543.218194-1-Smita.KoralahalliChannabasappa@amd.com
>
> The series is based on branch "for-7.0/cxl-init" and base-commit is
> base-commit: bc62f5b308cbdedf29132fe96e9d591e526527e1
>
[snip]..
> [5] When CXL_BUS = m (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = m),
> DAX_CXL = m and DAX_HMEM = y the results are as expected. To validate the
Typo here, this is DAX_HMEM = m. Rest all looks good.
Thanks
Smita.
> REGISTER path, I forced REGISTER even in cases where SR completely
> overlaps the CXL region as I did not have access to a system where the
> CXL region range is smaller than the SR range.
>
> 850000000-284fffffff : Soft Reserved
> 850000000-284fffffff : CXL Window 0
> 850000000-280fffffff : region0
> 850000000-284fffffff : dax6.0
> 850000000-284fffffff : System RAM (kmem)
>
> "path":"\/platform\/hmem.6",
> "id":6,
> "size":"128.00 GiB (137.44 GB)",
> "align":2097152
>
> [ 30.897665] devm_cxl_add_dax_region: cxl_region region0: region0:
> register dax_region0
> [ 30.921015] hmem: hmem_platform probe started.
> [ 31.017946] hmem_platform hmem_platform.0: Soft Reserved not fully
> contained in CXL; using HMEM
> [ 31.056310] alloc_dev_dax_range: dax6.0: alloc range[0]:
> 0x0000000850000000:0x000000284fffffff
> [ 34.781516] cxl-dax: cxl_dax_region_init()
> [ 34.781522] cxl-dax: registering driver.
> [ 34.781523] cxl-dax: dax_hmem work flushed.
> [ 34.781549] alloc_dax_region: cxl_dax_region dax_region0: dax_region
> resource conflict for [mem 0x850000000-0x284fffffff]
> [ 34.781552] cxl_bus_probe: cxl_dax_region dax_region0: probe: -12
> [ 34.781554] cxl_dax_region dax_region0: probe with driver cxl_dax_region
> failed with error -12
>
[snip]
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-12 14:44 ` Tomasz Wolski
@ 2026-02-12 21:18 ` Alison Schofield
2026-02-13 7:47 ` Yasunori Goto (Fujitsu)
0 siblings, 1 reply; 61+ messages in thread
From: Alison Schofield @ 2026-02-12 21:18 UTC (permalink / raw)
To: Tomasz Wolski
Cc: Smita.KoralahalliChannabasappa, ardb, benjamin.cheatham, bp,
dan.j.williams, dave.jiang, dave, gregkh, huang.ying.caritas,
ira.weiny, jack, jeff.johnson, jonathan.cameron, len.brown,
linux-cxl, linux-fsdevel, linux-kernel, linux-pm, lizhijian,
ming.li, nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
terry.bowman, vishal.l.verma, willy, yaoxt.fnst, yazen.ghannam
On Thu, Feb 12, 2026 at 03:44:15PM +0100, Tomasz Wolski wrote:
> >
> >FYI - I am able to confirm the dax regions are back for no-soft-reserved
> >case, and my basic hotplug flow works with v6.
> >
> >-- Alison
>
> Hello Alison,
>
> I wanted to ask about this scenario.
> Is my understanding correct that this fix is needed for cases without Soft Reserve and:
> 1) CXL memory is installed in the server (no hotplug) and OS is started
> 2) CXL memory is hot-plugged after the OS starts
> 3) Tests with cxl-test driver
or QEMU
>
> In such case either the admin fails to manually create region via cxl cli (if there
> was no auto-regions) or regions fails to be created automatically during driver probe
The CXL region creates 'OK'. It is the DAX region that is not created.
>
> Is this correct?
>
> Best regards,
> Tomasz
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-12 21:18 ` Alison Schofield
@ 2026-02-13 7:47 ` Yasunori Goto (Fujitsu)
2026-02-13 17:31 ` Alison Schofield
0 siblings, 1 reply; 61+ messages in thread
From: Yasunori Goto (Fujitsu) @ 2026-02-13 7:47 UTC (permalink / raw)
To: 'Alison Schofield', Tomasz Wolski (Fujitsu)
Cc: Smita.KoralahalliChannabasappa@amd.com, ardb@kernel.org,
benjamin.cheatham@amd.com, bp@alien8.de, dan.j.williams@intel.com,
dave.jiang@intel.com, dave@stgolabs.net,
gregkh@linuxfoundation.org, huang.ying.caritas@gmail.com,
ira.weiny@intel.com, jack@suse.cz, jeff.johnson@oss.qualcomm.com,
jonathan.cameron@huawei.com, len.brown@intel.com,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Zhijian Li (Fujitsu), ming.li@zohomail.com,
nathan.fontenot@amd.com, nvdimm@lists.linux.dev, pavel@kernel.org,
peterz@infradead.org, rafael@kernel.org, rrichter@amd.com,
terry.bowman@amd.com, vishal.l.verma@intel.com,
willy@infradead.org, Xingtao Yao (Fujitsu), yazen.ghannam@amd.com
Hello, Alison-san,
I would like to clarify your answer a bit more.
> On Thu, Feb 12, 2026 at 03:44:15PM +0100, Tomasz Wolski wrote:
> > >
> > >FYI - I am able to confirm the dax regions are back for
> > >no-soft-reserved case, and my basic hotplug flow works with v6.
> > >
> > >-- Alison
> >
> > Hello Alison,
> >
> > I wanted to ask about this scenario.
> > Is my understanding correct that this fix is needed for cases without Soft
> Reserve and:
> > 1) CXL memory is installed in the server (no hotplug) and OS is
> > started
> > 2) CXL memory is hot-plugged after the OS starts
> > 3) Tests with cxl-test driver
> or QEMU
Though I can understand that cases 2) and 3) include QEMU, I'm not sure why Linux drivers must handle case 1).
In such a case, I feel that the platform vendor should modify the firmware to define EFI_MEMORY_SP.
In the past, I actually encountered another issue between our platform firmware and a Linux driver:
https://lore.kernel.org/linux-cxl/OS9PR01MB12421AEA8B27BF942CD0F18B19057A@OS9PR01MB12421.jpnprd01.prod.outlook.com/
In that case, I asked our firmware team to modify the firmware, and the issue was resolved.
Therefore, I would like to confirm why case 1) must be handled.
Have any actual machines already been released with such firmware?
Otherwise, is this just to prepare for a platform whose firmware cannot be fixed on the firmware side?
Thanks,
---
Yasunori Goto
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (10 preceding siblings ...)
2026-02-12 20:02 ` [sos-linux-dev] " Koralahalli Channabasappa, Smita
@ 2026-02-13 14:04 ` Gregory Price
2026-02-20 20:47 ` Koralahalli Channabasappa, Smita
2026-02-20 9:45 ` Tomasz Wolski
12 siblings, 1 reply; 61+ messages in thread
From: Gregory Price @ 2026-02-13 14:04 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Tomasz Wolski
On Tue, Feb 10, 2026 at 06:44:52AM +0000, Smita Koralahalli wrote:
> This series aims to address long-standing conflicts between HMEM and
> CXL when handling Soft Reserved memory ranges.
>
> Reworked from Dan's patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>
Link is broken: bad commit reference
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-13 7:47 ` Yasunori Goto (Fujitsu)
@ 2026-02-13 17:31 ` Alison Schofield
2026-02-16 5:15 ` Yasunori Goto (Fujitsu)
0 siblings, 1 reply; 61+ messages in thread
From: Alison Schofield @ 2026-02-13 17:31 UTC (permalink / raw)
To: Yasunori Goto (Fujitsu)
Cc: Tomasz Wolski (Fujitsu), Smita.KoralahalliChannabasappa@amd.com,
ardb@kernel.org, benjamin.cheatham@amd.com, bp@alien8.de,
dan.j.williams@intel.com, dave.jiang@intel.com, dave@stgolabs.net,
gregkh@linuxfoundation.org, huang.ying.caritas@gmail.com,
ira.weiny@intel.com, jack@suse.cz, jeff.johnson@oss.qualcomm.com,
jonathan.cameron@huawei.com, len.brown@intel.com,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Zhijian Li (Fujitsu), ming.li@zohomail.com,
nathan.fontenot@amd.com, nvdimm@lists.linux.dev, pavel@kernel.org,
peterz@infradead.org, rafael@kernel.org, rrichter@amd.com,
terry.bowman@amd.com, vishal.l.verma@intel.com,
willy@infradead.org, Xingtao Yao (Fujitsu), yazen.ghannam@amd.com
On Fri, Feb 13, 2026 at 07:47:08AM +0000, Yasunori Goto (Fujitsu) wrote:
> Hello, Alison-san,
>
> I would like to clarify your answer a bit more.
>
> > On Thu, Feb 12, 2026 at 03:44:15PM +0100, Tomasz Wolski wrote:
> > > >
> > > >FYI - I am able to confirm the dax regions are back for
> > > >no-soft-reserved case, and my basic hotplug flow works with v6.
> > > >
> > > >-- Alison
> > >
> > > Hello Alison,
> > >
> > > I wanted to ask about this scenario.
> > > Is my understanding correct that this fix is needed for cases without Soft
> > Reserve and:
> > > 1) CXL memory is installed in the server (no hotplug) and OS is
> > > started
> > > 2) CXL memory is hot-plugged after the OS starts
> > > 3) Tests with cxl-test driver
> > or QEMU
>
> Though I can understand that cases 2) and 3) include QEMU, I'm not sure why Linux drivers must handle case 1).
> In such a case, I feel that the platform vendor should modify the firmware to define EFI_MEMORY_SP.
>
> In the past, I actually encountered another issue between our platform firmware and a Linux driver:
> https://lore.kernel.org/linux-cxl/OS9PR01MB12421AEA8B27BF942CD0F18B19057A@OS9PR01MB12421.jpnprd01.prod.outlook.com/
> In that case, I asked our firmware team to modify the firmware, and the issue was resolved.
>
> Therefore, I would like to confirm why case 1) must be handled.
> Have any actual machines already been released with such firmware?
> Otherwise, is this just to prepare for a platform whose firmware cannot be fixed on the firmware side?
Maybe I'm misunderstanding Tomasz's Case 1), because this is not
a work-around for a firmware issue.
The CXL driver always tries to create DAX regions out of RAM regions.
That happens if the CXL region is BIOS defined 'auto' region or a
region requested via userspace. That is irregardless of Soft Reserved
existence. Soft-Reserved is not a requirement for CXL or DAX region
creation.
That piece broke in an earlier rev of this patchset [1] where the calls
to devm_cxl_add_dax_region(cxlr) started returning EPROBE_DEFER.
I intended to point out to Smita, that behavior is restored in v6.
--Alison
[1] https://lore.kernel.org/linux-cxl/aXMWzC8zf3bqIHJ0@aschofie-mobl2.lan/
>
> Thanks,
> ---
> Yasunori Goto
>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-13 17:31 ` Alison Schofield
@ 2026-02-16 5:15 ` Yasunori Goto (Fujitsu)
0 siblings, 0 replies; 61+ messages in thread
From: Yasunori Goto (Fujitsu) @ 2026-02-16 5:15 UTC (permalink / raw)
To: 'Alison Schofield'
Cc: Tomasz Wolski (Fujitsu), Smita.KoralahalliChannabasappa@amd.com,
ardb@kernel.org, benjamin.cheatham@amd.com, bp@alien8.de,
dan.j.williams@intel.com, dave.jiang@intel.com, dave@stgolabs.net,
gregkh@linuxfoundation.org, huang.ying.caritas@gmail.com,
ira.weiny@intel.com, jack@suse.cz, jeff.johnson@oss.qualcomm.com,
jonathan.cameron@huawei.com, len.brown@intel.com,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Zhijian Li (Fujitsu), ming.li@zohomail.com,
nathan.fontenot@amd.com, nvdimm@lists.linux.dev, pavel@kernel.org,
peterz@infradead.org, rafael@kernel.org, rrichter@amd.com,
terry.bowman@amd.com, vishal.l.verma@intel.com,
willy@infradead.org, Xingtao Yao (Fujitsu), yazen.ghannam@amd.com
> On Fri, Feb 13, 2026 at 07:47:08AM +0000, Yasunori Goto (Fujitsu) wrote:
> > Hello, Alison-san,
> >
> > I would like to clarify your answer a bit more.
> >
> > > On Thu, Feb 12, 2026 at 03:44:15PM +0100, Tomasz Wolski wrote:
> > > > >
> > > > >FYI - I am able to confirm the dax regions are back for
> > > > >no-soft-reserved case, and my basic hotplug flow works with v6.
> > > > >
> > > > >-- Alison
> > > >
> > > > Hello Alison,
> > > >
> > > > I wanted to ask about this scenario.
> > > > Is my understanding correct that this fix is needed for cases
> > > > without Soft
> > > Reserve and:
> > > > 1) CXL memory is installed in the server (no hotplug) and OS is
> > > > started
> > > > 2) CXL memory is hot-plugged after the OS starts
> > > > 3) Tests with cxl-test driver
> > > or QEMU
> >
> > Though I can understand that cases 2) and 3) include QEMU, I'm not sure
> why Linux drivers must handle case 1).
> > In such a case, I feel that the platform vendor should modify the firmware to
> define EFI_MEMORY_SP.
> >
> > In the past, I actually encountered another issue between our platform
> firmware and a Linux driver:
> >
> https://lore.kernel.org/linux-cxl/OS9PR01MB12421AEA8B27BF942CD0F18B1
> 90
> > 57A@OS9PR01MB12421.jpnprd01.prod.outlook.com/
> > In that case, I asked our firmware team to modify the firmware, and the issue
> was resolved.
> >
> > Therefore, I would like to confirm why case 1) must be handled.
> > Have any actual machines already been released with such firmware?
> > Otherwise, is this just to prepare for a platform whose firmware cannot be
> fixed on the firmware side?
>
> Maybe I'm misunderstanding Tomasz's Case 1), because this is not a
> work-around for a firmware issue.
>
> The CXL driver always tries to create DAX regions out of RAM regions.
> That happens if the CXL region is BIOS defined 'auto' region or a region
> requested via userspace. That is irregardless of Soft Reserved existence.
> Soft-Reserved is not a requirement for CXL or DAX region creation.
I misunderstood it
I'll re-check the specifications.
Sorry for the noise.
>
> That piece broke in an earlier rev of this patchset [1] where the calls to
> devm_cxl_add_dax_region(cxlr) started returning EPROBE_DEFER.
>
> I intended to point out to Smita, that behavior is restored in v6.
Thank you very much.
-----
Yasunori Goto
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
2026-02-10 6:44 ` [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
@ 2026-02-18 15:54 ` Dave Jiang
2026-03-09 14:31 ` Jonathan Cameron
1 sibling, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2026-02-18 15:54 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/9/26 11:44 PM, Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
> dax_cxl.
>
> In addition, defer registration of the dax_cxl driver to a workqueue
> instead of using module_cxl_driver(). This ensures that dax_hmem has
> an opportunity to initialize and register its deferred callback and make
> ownership decisions before dax_cxl begins probing and claiming Soft
> Reserved ranges.
>
> Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
> out of line from other synchronous probing avoiding ordering
> dependencies while coordinating ownership decisions with dax_hmem.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/dax/Makefile | 3 +--
> drivers/dax/cxl.c | 27 ++++++++++++++++++++++++++-
> 2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
> index 5ed5c39857c8..70e996bf1526 100644
> --- a/drivers/dax/Makefile
> +++ b/drivers/dax/Makefile
> @@ -1,4 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0
> +obj-y += hmem/
> obj-$(CONFIG_DAX) += dax.o
> obj-$(CONFIG_DEV_DAX) += device_dax.o
> obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
> @@ -10,5 +11,3 @@ dax-y += bus.o
> device_dax-y := device.o
> dax_pmem-y := pmem.o
> dax_cxl-y := cxl.o
> -
> -obj-y += hmem/
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 13cd94d32ff7..a2136adfa186 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -38,10 +38,35 @@ static struct cxl_driver cxl_dax_region_driver = {
> .id = CXL_DEVICE_DAX_REGION,
> .drv = {
> .suppress_bind_attrs = true,
> + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> },
> };
>
> -module_cxl_driver(cxl_dax_region_driver);
> +static void cxl_dax_region_driver_register(struct work_struct *work)
> +{
> + cxl_driver_register(&cxl_dax_region_driver);
> +}
> +
> +static DECLARE_WORK(cxl_dax_region_driver_work, cxl_dax_region_driver_register);
> +
> +static int __init cxl_dax_region_init(void)
> +{
> + /*
> + * Need to resolve a race with dax_hmem wanting to drive regions
> + * instead of CXL
> + */
> + queue_work(system_long_wq, &cxl_dax_region_driver_work);
> + return 0;
> +}
> +module_init(cxl_dax_region_init);
> +
> +static void __exit cxl_dax_region_exit(void)
> +{
> + flush_work(&cxl_dax_region_driver_work);
> + cxl_driver_unregister(&cxl_dax_region_driver);
> +}
> +module_exit(cxl_dax_region_exit);
> +
> MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
> MODULE_DESCRIPTION("CXL DAX: direct access to CXL regions");
> MODULE_LICENSE("GPL");
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree
2026-02-10 6:44 ` [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
@ 2026-02-18 16:04 ` Dave Jiang
2026-03-09 14:37 ` Jonathan Cameron
2026-03-12 0:27 ` Dan Williams
2 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2026-02-18 16:04 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/9/26 11:44 PM, Smita Koralahalli wrote:
> Introduce a global "DAX Regions" resource root and register each
> dax_region->res under it via request_resource(). Release the resource on
> dax_region teardown.
>
> By enforcing a single global namespace for dax_region allocations, this
> ensures only one of dax_hmem or dax_cxl can successfully register a
> dax_region for a given range.
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/dax/bus.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index fde29e0ad68b..5f387feb95f0 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -10,6 +10,7 @@
> #include "dax-private.h"
> #include "bus.h"
>
> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
> static DEFINE_MUTEX(dax_bus_lock);
>
> /*
> @@ -625,6 +626,8 @@ static void dax_region_unregister(void *region)
> {
> struct dax_region *dax_region = region;
>
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + release_resource(&dax_region->res);
> sysfs_remove_groups(&dax_region->dev->kobj,
> dax_region_attribute_groups);
> dax_region_put(dax_region);
> @@ -635,6 +638,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
> unsigned long flags)
> {
> struct dax_region *dax_region;
> + int rc;
>
> /*
> * The DAX core assumes that it can store its private data in
> @@ -667,14 +671,27 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
> .flags = IORESOURCE_MEM | flags,
> };
>
> - if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> - kfree(dax_region);
> - return NULL;
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + rc = request_resource(&dax_regions, &dax_region->res);
> + if (rc) {
> + dev_dbg(parent, "dax_region resource conflict for %pR\n",
> + &dax_region->res);
> + goto err_res;
> }
>
> + if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
> + goto err_sysfs;
> +
> if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
> return NULL;
> return dax_region;
> +
> +err_sysfs:
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + release_resource(&dax_region->res);
> +err_res:
> + kfree(dax_region);
> + return NULL;
> }
> EXPORT_SYMBOL_GPL(alloc_dax_region);
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
2026-02-10 6:44 ` [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination Smita Koralahalli
@ 2026-02-18 17:52 ` Dave Jiang
2026-02-20 0:02 ` Koralahalli Channabasappa, Smita
2026-03-09 14:49 ` Jonathan Cameron
1 sibling, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-02-18 17:52 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/9/26 11:44 PM, Smita Koralahalli wrote:
> Add helpers to register, queue and flush the deferred work.
>
> These helpers allow dax_hmem to execute ownership resolution outside the
> probe context before dax_cxl binds.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/dax/bus.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
> drivers/dax/bus.h | 7 ++++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 5f387feb95f0..92b88952ede1 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -25,6 +25,64 @@ DECLARE_RWSEM(dax_region_rwsem);
> */
> DECLARE_RWSEM(dax_dev_rwsem);
>
> +static DEFINE_MUTEX(dax_hmem_lock);
> +static dax_hmem_deferred_fn hmem_deferred_fn;
> +static void *dax_hmem_data;
> +
> +static void hmem_deferred_work(struct work_struct *work)
> +{
> + dax_hmem_deferred_fn fn;
> + void *data;
> +
> + scoped_guard(mutex, &dax_hmem_lock) {
> + fn = hmem_deferred_fn;
> + data = dax_hmem_data;
> + }
> +
> + if (fn)
> + fn(data);
> +}
Instead of having a global lock and dealing with all the global variables, why not just do this with the typical work_struct usage pattern and allocate a work item when queuing work?
DJ
> +
> +static DECLARE_WORK(dax_hmem_work, hmem_deferred_work);
> +
> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
> +{
> + guard(mutex)(&dax_hmem_lock);
> +
> + if (hmem_deferred_fn)
> + return -EINVAL;
> +
> + hmem_deferred_fn = fn;
> + dax_hmem_data = data;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_register_work);
> +
> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data)
> +{
> + guard(mutex)(&dax_hmem_lock);
> +
> + if (hmem_deferred_fn != fn || dax_hmem_data != data)
> + return -EINVAL;
> +
> + hmem_deferred_fn = NULL;
> + dax_hmem_data = NULL;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_unregister_work);
> +
> +void dax_hmem_queue_work(void)
> +{
> + queue_work(system_long_wq, &dax_hmem_work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_queue_work);
> +
> +void dax_hmem_flush_work(void)
> +{
> + flush_work(&dax_hmem_work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> +
> #define DAX_NAME_LEN 30
> struct dax_id {
> struct list_head list;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index cbbf64443098..b58a88e8089c 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -41,6 +41,13 @@ struct dax_device_driver {
> void (*remove)(struct dev_dax *dev);
> };
>
> +typedef void (*dax_hmem_deferred_fn)(void *data);
> +
> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data);
> +void dax_hmem_queue_work(void);
> +void dax_hmem_flush_work(void);
> +
> int __dax_driver_register(struct dax_device_driver *dax_drv,
> struct module *module, const char *mod_name);
> #define dax_driver_register(driver) \
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-02-10 6:45 ` [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
@ 2026-02-18 18:05 ` Dave Jiang
2026-02-20 19:54 ` Koralahalli Channabasappa, Smita
2026-02-20 10:14 ` Alejandro Lucero Palau
2026-03-12 2:28 ` Dan Williams
2 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-02-18 18:05 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/9/26 11:45 PM, Smita Koralahalli wrote:
> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
>
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during dax_hmem
> probe, schedule deferred work and wait for the CXL stack to complete
> enumeration and region assembly before deciding ownership.
>
> Evaluate ownership of Soft Reserved ranges based on CXL region
> containment.
>
> - If all Soft Reserved ranges are fully contained within committed CXL
> regions, DROP handling Soft Reserved ranges from dax_hmem and allow
> dax_cxl to bind.
>
> - If any Soft Reserved range is not fully claimed by committed CXL
> region, REGISTER the Soft Reserved ranges with dax_hmem.
>
> Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
> ranges. Once, ownership resolution is complete, flush the deferred work
> from dax_cxl before allowing dax_cxl to bind.
>
> This enforces a strict ownership. Either CXL fully claims the Soft
> reserved ranges or it relinquishes it entirely.
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/dax/bus.c | 3 ++
> drivers/dax/bus.h | 19 ++++++++++
> drivers/dax/cxl.c | 1 +
> drivers/dax/hmem/hmem.c | 78 +++++++++++++++++++++++++++++++++++++++--
> 4 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 92b88952ede1..81985bcc70f9 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -25,6 +25,9 @@ DECLARE_RWSEM(dax_region_rwsem);
> */
> DECLARE_RWSEM(dax_dev_rwsem);
>
> +enum dax_cxl_mode dax_cxl_mode = DAX_CXL_MODE_DEFER;
> +EXPORT_SYMBOL_NS_GPL(dax_cxl_mode, "CXL");
> +
> static DEFINE_MUTEX(dax_hmem_lock);
> static dax_hmem_deferred_fn hmem_deferred_fn;
> static void *dax_hmem_data;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index b58a88e8089c..82616ff52fd1 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -41,6 +41,25 @@ struct dax_device_driver {
> void (*remove)(struct dev_dax *dev);
> };
>
> +/*
> + * enum dax_cxl_mode - State machine to determine ownership for CXL
> + * tagged Soft Reserved memory ranges.
> + * @DAX_CXL_MODE_DEFER: Ownership resolution pending. Set while waiting
> + * for CXL enumeration and region assembly to complete.
> + * @DAX_CXL_MODE_REGISTER: CXL regions do not fully cover Soft Reserved
> + * ranges. Fall back to registering those ranges via dax_hmem.
> + * @DAX_CXL_MODE_DROP: All Soft Reserved ranges intersecting CXL windows
> + * are fully contained within committed CXL regions. Drop HMEM handling
> + * and allow dax_cxl to bind.
> + */
> +enum dax_cxl_mode {
> + DAX_CXL_MODE_DEFER,
> + DAX_CXL_MODE_REGISTER,
> + DAX_CXL_MODE_DROP,
> +};
> +
> +extern enum dax_cxl_mode dax_cxl_mode;
> +
> typedef void (*dax_hmem_deferred_fn)(void *data);
>
> int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index a2136adfa186..3ab39b77843d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>
> static void cxl_dax_region_driver_register(struct work_struct *work)
> {
> + dax_hmem_flush_work();
> cxl_driver_register(&cxl_dax_region_driver);
> }
>
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1e3424358490..85854e25254b 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
> #include <linux/memregion.h>
> #include <linux/module.h>
> #include <linux/dax.h>
> +#include <cxl/cxl.h>
> #include "../bus.h"
>
> static bool region_idle;
> @@ -69,8 +70,18 @@ static int hmem_register_device(struct device *host, int target_nid,
> if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> IORES_DESC_CXL) != REGION_DISJOINT) {
> - dev_dbg(host, "deferring range to CXL: %pr\n", res);
> - return 0;
> + switch (dax_cxl_mode) {
> + case DAX_CXL_MODE_DEFER:
> + dev_dbg(host, "deferring range to CXL: %pr\n", res);
> + dax_hmem_queue_work();
> + return 0;
> + case DAX_CXL_MODE_REGISTER:
> + dev_dbg(host, "registering CXL range: %pr\n", res);
> + break;
> + case DAX_CXL_MODE_DROP:
> + dev_dbg(host, "dropping CXL range: %pr\n", res);
> + return 0;
> + }
> }
>
> rc = region_intersects_soft_reserve(res->start, resource_size(res));
> @@ -123,8 +134,70 @@ static int hmem_register_device(struct device *host, int target_nid,
> return rc;
> }
>
> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> + const struct resource *res)
> +{
> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> + IORES_DESC_CXL) != REGION_DISJOINT)
> + return hmem_register_device(host, target_nid, res);
> +
> + return 0;
> +}
> +
> +static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
> + const struct resource *res)
> +{
> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> + IORES_DESC_CXL) != REGION_DISJOINT) {
> + if (!cxl_region_contains_soft_reserve((struct resource *)res))
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static void process_defer_work(void *data)
> +{
> + struct platform_device *pdev = data;
> + int rc;
> +
> + /* relies on cxl_acpi and cxl_pci having had a chance to load */
> + wait_for_device_probe();
> +
> + rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
> +
> + if (!rc) {
> + dax_cxl_mode = DAX_CXL_MODE_DROP;
> + dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
> + } else {
> + dax_cxl_mode = DAX_CXL_MODE_REGISTER;
> + dev_warn(&pdev->dev,
> + "Soft Reserved not fully contained in CXL; using HMEM\n");
> + }
> +
> + walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> +}
> +
> +static void kill_defer_work(void *data)
> +{
> + struct platform_device *pdev = data;
> +
> + dax_hmem_flush_work();
> + dax_hmem_unregister_work(process_defer_work, pdev);
> +}
> +
> static int dax_hmem_platform_probe(struct platform_device *pdev)
> {
> + int rc;
> +
> + rc = dax_hmem_register_work(process_defer_work, pdev);
Do we need to take a reference on pdev when we queue the work?
DJ
> + if (rc)
> + return rc;
> +
> + rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, pdev);
> + if (rc)
> + return rc;
> +
> return walk_hmem_resources(&pdev->dev, hmem_register_device);
> }
>
> @@ -174,3 +247,4 @@ MODULE_ALIAS("platform:hmem_platform*");
> MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
> MODULE_LICENSE("GPL v2");
> MODULE_AUTHOR("Intel Corporation");
> +MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
2026-02-10 6:44 ` [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
@ 2026-02-19 3:22 ` Alison Schofield
0 siblings, 0 replies; 61+ messages in thread
From: Alison Schofield @ 2026-02-19 3:22 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, Feb 10, 2026 at 06:44:53AM +0000, Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
> Reserved ranges.
>
> Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
> request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
> loading, it does not enforce that the dependency has finished init
> before the current module runs. This can cause HMEM to start before
> cxl_acpi has populated the resource tree, breaking detection of overlaps
> between Soft Reserved and CXL Windows.
>
> Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
> cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
> that trigger further module loads. Asynchronous probe flushing
> (wait_for_device_probe()) is added later in the series in a deferred
> context before HMEM makes ownership decisions for Soft Reserved ranges.
>
> Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
> must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
> Soft Reserved ranges before CXL drivers have had a chance to claim them.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 2/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
2026-02-10 6:44 ` [PATCH v6 2/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
@ 2026-02-19 3:23 ` Alison Schofield
0 siblings, 0 replies; 61+ messages in thread
From: Alison Schofield @ 2026-02-19 3:23 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, Feb 10, 2026 at 06:44:54AM +0000, Smita Koralahalli wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
> so that HMEM only defers Soft Reserved ranges when CXL DAX support is
> enabled. This makes the coordination between HMEM and the CXL stack more
> precise and prevents deferral in unrelated CXL configurations.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-02-10 6:44 ` [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
@ 2026-02-19 3:44 ` Alison Schofield
2026-02-20 20:35 ` Koralahalli Channabasappa, Smita
2026-03-11 21:37 ` Dan Williams
1 sibling, 1 reply; 61+ messages in thread
From: Alison Schofield @ 2026-02-19 3:44 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, Feb 10, 2026 at 06:44:55AM +0000, Smita Koralahalli wrote:
> __cxl_decoder_detach() currently resets decoder programming whenever a
> region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
Not sure 'detached' is the right word. Unregistered maybe?
> autodiscovered regions, this can incorrectly tear down decoder state
> that may be relied upon by other consumers or by subsequent ownership
> decisions.
>
> Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
> set.
I get how this is needed in the failover to DAX case, yet I'm not clear
how it fits in with folks that just want to destroy that auto region
and resuse the pieces.
Your other recent patch cxl/hdm: Avoid DVSEC fallback after region teardown[1],
showed me that the memdevs, when left with the endpoint decoders not reset,
will keep trying to create another region when reprobed.
[1] https://lore.kernel.org/linux-cxl/aY6pTk63ivjkanlR@aschofie-mobl2.lan/
I think the patch does what it says it does. Perhaps expand on why that
is always the right thing to do.
--Alison
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/region.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ae899f68551f..45ee598daf95 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> cxled->part = -1;
>
> if (p->state > CXL_CONFIG_ACTIVE) {
> - cxl_region_decode_reset(cxlr, p->interleave_ways);
> + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> + cxl_region_decode_reset(cxlr, p->interleave_ways);
> +
> p->state = CXL_CONFIG_ACTIVE;
> }
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
2026-02-18 17:52 ` Dave Jiang
@ 2026-02-20 0:02 ` Koralahalli Channabasappa, Smita
2026-02-20 15:55 ` Dave Jiang
0 siblings, 1 reply; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-20 0:02 UTC (permalink / raw)
To: Dave Jiang, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Hi Dave,
On 2/18/2026 9:52 AM, Dave Jiang wrote:
>
>
> On 2/9/26 11:44 PM, Smita Koralahalli wrote:
>> Add helpers to register, queue and flush the deferred work.
>>
>> These helpers allow dax_hmem to execute ownership resolution outside the
>> probe context before dax_cxl binds.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/dax/bus.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
>> drivers/dax/bus.h | 7 ++++++
>> 2 files changed, 65 insertions(+)
>>
>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> index 5f387feb95f0..92b88952ede1 100644
>> --- a/drivers/dax/bus.c
>> +++ b/drivers/dax/bus.c
>> @@ -25,6 +25,64 @@ DECLARE_RWSEM(dax_region_rwsem);
>> */
>> DECLARE_RWSEM(dax_dev_rwsem);
>>
>> +static DEFINE_MUTEX(dax_hmem_lock);
>> +static dax_hmem_deferred_fn hmem_deferred_fn;
>> +static void *dax_hmem_data;
>> +
>> +static void hmem_deferred_work(struct work_struct *work)
>> +{
>> + dax_hmem_deferred_fn fn;
>> + void *data;
>> +
>> + scoped_guard(mutex, &dax_hmem_lock) {
>> + fn = hmem_deferred_fn;
>> + data = dax_hmem_data;
>> + }
>> +
>> + if (fn)
>> + fn(data);
>> +}
>
> Instead of having a global lock and dealing with all the global variables, why not just do this with the typical work_struct usage pattern and allocate a work item when queuing work?
>
> DJ
Thanks for the feedback.
Just to clarify, are you hinting towards a statically allocated struct
with an embedded work_struct, something like below? Rather than the
typical kmalloc + container_of pattern?
+struct dax_hmem_deferred_ctx {
+ struct work_struct work;
+ dax_hmem_deferred_fn fn;
+ void *data;
+};
+static struct dax_hmem_deferred_ctx dax_hmem_ctx;
+int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
+{
+ if (dax_hmem_ctx.fn)
+ return -EINVAL;
+ INIT_WORK(&dax_hmem_ctx.work, hmem_deferred_work);
..
My understanding is that Dan wanted this to remain a singleton deferred
work item queued once and flushed from dax_cxl. I think with kmalloc +
container_of approach, every call would allocate and queue a new
independent work item..
Regarding the mutex: looking at it again, it may not be necessary I
think. If we can rely on the call ordering (register_work() before
queue_work()), and if flush_work() in kill_defer_work() ensures the work
has fully completed before unregister_work() NULLs the pointers, then
the static struct above would be sufficient without additional locking.
If I'm missing a scenario or race here, please correct me.
Thanks,
Smita
>
>> +
>> +static DECLARE_WORK(dax_hmem_work, hmem_deferred_work);
>> +
>> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
>> +{
>> + guard(mutex)(&dax_hmem_lock);
>> +
>> + if (hmem_deferred_fn)
>> + return -EINVAL;
>> +
>> + hmem_deferred_fn = fn;
>> + dax_hmem_data = data;
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(dax_hmem_register_work);
>> +
>> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data)
>> +{
>> + guard(mutex)(&dax_hmem_lock);
>> +
>> + if (hmem_deferred_fn != fn || dax_hmem_data != data)
>> + return -EINVAL;
>> +
>> + hmem_deferred_fn = NULL;
>> + dax_hmem_data = NULL;
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(dax_hmem_unregister_work);
>> +
>> +void dax_hmem_queue_work(void)
>> +{
>> + queue_work(system_long_wq, &dax_hmem_work);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_hmem_queue_work);
>> +
>> +void dax_hmem_flush_work(void)
>> +{
>> + flush_work(&dax_hmem_work);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
>> +
>> #define DAX_NAME_LEN 30
>> struct dax_id {
>> struct list_head list;
>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>> index cbbf64443098..b58a88e8089c 100644
>> --- a/drivers/dax/bus.h
>> +++ b/drivers/dax/bus.h
>> @@ -41,6 +41,13 @@ struct dax_device_driver {
>> void (*remove)(struct dev_dax *dev);
>> };
>>
>> +typedef void (*dax_hmem_deferred_fn)(void *data);
>> +
>> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
>> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data);
>> +void dax_hmem_queue_work(void);
>> +void dax_hmem_flush_work(void);
>> +
>> int __dax_driver_register(struct dax_device_driver *dax_drv,
>> struct module *module, const char *mod_name);
>> #define dax_driver_register(driver) \
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
` (11 preceding siblings ...)
2026-02-13 14:04 ` Gregory Price
@ 2026-02-20 9:45 ` Tomasz Wolski
2026-02-20 21:19 ` Koralahalli Channabasappa, Smita
12 siblings, 1 reply; 61+ messages in thread
From: Tomasz Wolski @ 2026-02-20 9:45 UTC (permalink / raw)
To: smita.koralahallichannabasappa
Cc: alison.schofield, ardb, benjamin.cheatham, bp, dan.j.williams,
dave.jiang, dave, gregkh, huang.ying.caritas, ira.weiny, jack,
jeff.johnson, jonathan.cameron, len.brown, linux-cxl,
linux-fsdevel, linux-kernel, linux-pm, lizhijian, ming.li,
nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
terry.bowman, tomasz.wolski, vishal.l.verma, willy, yaoxt.fnst,
yazen.ghannam
Tested on QEMU and physical setups.
I have one question about "Soft Reserve" parent entries in iomem.
On QEMU I see parent "Soft Reserved":
a90000000-b4fffffff : Soft Reserved
a90000000-b4fffffff : CXL Window 0
a90000000-b4fffffff : dax1.0
a90000000-b4fffffff : System RAM (kmem)
While on my physical setup this is missing - not sure if this is okay?
BIOS-e820: [mem 0x0000002070000000-0x000000a06fffffff] soft reserved
2070000000-606fffffff : CXL Window 0
2070000000-606fffffff : region0
2070000000-606fffffff : dax0.0
2070000000-606fffffff : System RAM (kmem)
6070000000-a06fffffff : CXL Window 1
6070000000-a06fffffff : region1
6070000000-a06fffffff : dax1.0
6070000000-a06fffffff : System RAM (kmem)
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-02-10 6:45 ` [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
2026-02-18 18:05 ` Dave Jiang
@ 2026-02-20 10:14 ` Alejandro Lucero Palau
2026-03-12 2:28 ` Dan Williams
2 siblings, 0 replies; 61+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-20 10:14 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Tomasz Wolski
On 2/10/26 06:45, Smita Koralahalli wrote:
> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
>
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during dax_hmem
> probe, schedule deferred work and wait for the CXL stack to complete
> enumeration and region assembly before deciding ownership.
>
> Evaluate ownership of Soft Reserved ranges based on CXL region
> containment.
>
> - If all Soft Reserved ranges are fully contained within committed CXL
> regions, DROP handling Soft Reserved ranges from dax_hmem and allow
> dax_cxl to bind.
>
> - If any Soft Reserved range is not fully claimed by committed CXL
> region, REGISTER the Soft Reserved ranges with dax_hmem.
>
> Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
> ranges. Once, ownership resolution is complete, flush the deferred work
> from dax_cxl before allowing dax_cxl to bind.
>
> This enforces a strict ownership. Either CXL fully claims the Soft
> reserved ranges or it relinquishes it entirely.
As I said before, I do not understand why this an all or none decision.
If I understood this right, we are not trusting on how the platform is
dealing with CXL configuration leading to some soft reserved ranges not
having a cxl region. If we do not trust it, why to give such a memory to
the kernel through hmem?
IMO, it is important to state here the reason for this decision. If I
understood this wrongly, I guess it is even more important to explain
the reason behind the decision in the commit and maybe as a comment in
the code as well. I could not understand it, but at least there would be
an explanation.
Moreover, as I also commented previously, with Type2 devices, it is
almost certain the modules containing the related drivers will not be
probed at this point, or if not fully certain, it is a potential
possibility. That implies not all the soft reserved regions could have
linked cxl regions ... leading to given all those soft reserved ranges
to hmem. I know the "approved" solution is Type2 should go without soft
reserved memory, but some Type2 devices/drivers could be happy enough
with dax. If we do not want to deal with this problem now, at least
there should be some indication of this problem.
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/dax/bus.c | 3 ++
> drivers/dax/bus.h | 19 ++++++++++
> drivers/dax/cxl.c | 1 +
> drivers/dax/hmem/hmem.c | 78 +++++++++++++++++++++++++++++++++++++++--
> 4 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 92b88952ede1..81985bcc70f9 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -25,6 +25,9 @@ DECLARE_RWSEM(dax_region_rwsem);
> */
> DECLARE_RWSEM(dax_dev_rwsem);
>
> +enum dax_cxl_mode dax_cxl_mode = DAX_CXL_MODE_DEFER;
> +EXPORT_SYMBOL_NS_GPL(dax_cxl_mode, "CXL");
> +
> static DEFINE_MUTEX(dax_hmem_lock);
> static dax_hmem_deferred_fn hmem_deferred_fn;
> static void *dax_hmem_data;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index b58a88e8089c..82616ff52fd1 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -41,6 +41,25 @@ struct dax_device_driver {
> void (*remove)(struct dev_dax *dev);
> };
>
> +/*
> + * enum dax_cxl_mode - State machine to determine ownership for CXL
> + * tagged Soft Reserved memory ranges.
> + * @DAX_CXL_MODE_DEFER: Ownership resolution pending. Set while waiting
> + * for CXL enumeration and region assembly to complete.
> + * @DAX_CXL_MODE_REGISTER: CXL regions do not fully cover Soft Reserved
> + * ranges. Fall back to registering those ranges via dax_hmem.
> + * @DAX_CXL_MODE_DROP: All Soft Reserved ranges intersecting CXL windows
> + * are fully contained within committed CXL regions. Drop HMEM handling
> + * and allow dax_cxl to bind.
> + */
> +enum dax_cxl_mode {
> + DAX_CXL_MODE_DEFER,
> + DAX_CXL_MODE_REGISTER,
> + DAX_CXL_MODE_DROP,
> +};
> +
> +extern enum dax_cxl_mode dax_cxl_mode;
> +
> typedef void (*dax_hmem_deferred_fn)(void *data);
>
> int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index a2136adfa186..3ab39b77843d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>
> static void cxl_dax_region_driver_register(struct work_struct *work)
> {
> + dax_hmem_flush_work();
> cxl_driver_register(&cxl_dax_region_driver);
> }
>
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1e3424358490..85854e25254b 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
> #include <linux/memregion.h>
> #include <linux/module.h>
> #include <linux/dax.h>
> +#include <cxl/cxl.h>
> #include "../bus.h"
>
> static bool region_idle;
> @@ -69,8 +70,18 @@ static int hmem_register_device(struct device *host, int target_nid,
> if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> IORES_DESC_CXL) != REGION_DISJOINT) {
> - dev_dbg(host, "deferring range to CXL: %pr\n", res);
> - return 0;
> + switch (dax_cxl_mode) {
> + case DAX_CXL_MODE_DEFER:
> + dev_dbg(host, "deferring range to CXL: %pr\n", res);
> + dax_hmem_queue_work();
> + return 0;
> + case DAX_CXL_MODE_REGISTER:
> + dev_dbg(host, "registering CXL range: %pr\n", res);
> + break;
> + case DAX_CXL_MODE_DROP:
> + dev_dbg(host, "dropping CXL range: %pr\n", res);
> + return 0;
> + }
> }
>
> rc = region_intersects_soft_reserve(res->start, resource_size(res));
> @@ -123,8 +134,70 @@ static int hmem_register_device(struct device *host, int target_nid,
> return rc;
> }
>
> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> + const struct resource *res)
> +{
> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> + IORES_DESC_CXL) != REGION_DISJOINT)
> + return hmem_register_device(host, target_nid, res);
> +
> + return 0;
> +}
> +
> +static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
> + const struct resource *res)
> +{
> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> + IORES_DESC_CXL) != REGION_DISJOINT) {
> + if (!cxl_region_contains_soft_reserve((struct resource *)res))
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static void process_defer_work(void *data)
> +{
> + struct platform_device *pdev = data;
> + int rc;
> +
> + /* relies on cxl_acpi and cxl_pci having had a chance to load */
> + wait_for_device_probe();
> +
> + rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
> +
> + if (!rc) {
> + dax_cxl_mode = DAX_CXL_MODE_DROP;
> + dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
> + } else {
> + dax_cxl_mode = DAX_CXL_MODE_REGISTER;
> + dev_warn(&pdev->dev,
> + "Soft Reserved not fully contained in CXL; using HMEM\n");
> + }
> +
> + walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
> +}
> +
> +static void kill_defer_work(void *data)
> +{
> + struct platform_device *pdev = data;
> +
> + dax_hmem_flush_work();
> + dax_hmem_unregister_work(process_defer_work, pdev);
> +}
> +
> static int dax_hmem_platform_probe(struct platform_device *pdev)
> {
> + int rc;
> +
> + rc = dax_hmem_register_work(process_defer_work, pdev);
> + if (rc)
> + return rc;
> +
> + rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, pdev);
> + if (rc)
> + return rc;
> +
> return walk_hmem_resources(&pdev->dev, hmem_register_device);
> }
>
> @@ -174,3 +247,4 @@ MODULE_ALIAS("platform:hmem_platform*");
> MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
> MODULE_LICENSE("GPL v2");
> MODULE_AUTHOR("Intel Corporation");
> +MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
2026-02-20 0:02 ` Koralahalli Channabasappa, Smita
@ 2026-02-20 15:55 ` Dave Jiang
0 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2026-02-20 15:55 UTC (permalink / raw)
To: Koralahalli Channabasappa, Smita, Smita Koralahalli, linux-cxl,
linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/19/26 5:02 PM, Koralahalli Channabasappa, Smita wrote:
> Hi Dave,
>
> On 2/18/2026 9:52 AM, Dave Jiang wrote:
>>
>>
>> On 2/9/26 11:44 PM, Smita Koralahalli wrote:
>>> Add helpers to register, queue and flush the deferred work.
>>>
>>> These helpers allow dax_hmem to execute ownership resolution outside the
>>> probe context before dax_cxl binds.
>>>
>>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>>> ---
>>> drivers/dax/bus.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
>>> drivers/dax/bus.h | 7 ++++++
>>> 2 files changed, 65 insertions(+)
>>>
>>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>>> index 5f387feb95f0..92b88952ede1 100644
>>> --- a/drivers/dax/bus.c
>>> +++ b/drivers/dax/bus.c
>>> @@ -25,6 +25,64 @@ DECLARE_RWSEM(dax_region_rwsem);
>>> */
>>> DECLARE_RWSEM(dax_dev_rwsem);
>>> +static DEFINE_MUTEX(dax_hmem_lock);
>>> +static dax_hmem_deferred_fn hmem_deferred_fn;
>>> +static void *dax_hmem_data;
>>> +
>>> +static void hmem_deferred_work(struct work_struct *work)
>>> +{
>>> + dax_hmem_deferred_fn fn;
>>> + void *data;
>>> +
>>> + scoped_guard(mutex, &dax_hmem_lock) {
>>> + fn = hmem_deferred_fn;
>>> + data = dax_hmem_data;
>>> + }
>>> +
>>> + if (fn)
>>> + fn(data);
>>> +}
>>
>> Instead of having a global lock and dealing with all the global variables, why not just do this with the typical work_struct usage pattern and allocate a work item when queuing work?
>>
>> DJ
>
> Thanks for the feedback.
>
> Just to clarify, are you hinting towards a statically allocated struct
> with an embedded work_struct, something like below? Rather than the typical kmalloc + container_of pattern?
>
> +struct dax_hmem_deferred_ctx {
> + struct work_struct work;
> + dax_hmem_deferred_fn fn;
> + void *data;
> +};
>
> +static struct dax_hmem_deferred_ctx dax_hmem_ctx;
>
> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
> +{
> + if (dax_hmem_ctx.fn)
> + return -EINVAL;
>
> + INIT_WORK(&dax_hmem_ctx.work, hmem_deferred_work);
> ..
>
> My understanding is that Dan wanted this to remain a singleton deferred work item queued once and flushed from dax_cxl. I think with kmalloc + container_of approach, every call would allocate and queue a new independent work item..
>
> Regarding the mutex: looking at it again, it may not be necessary I think. If we can rely on the call ordering (register_work() before queue_work()), and if flush_work() in kill_defer_work() ensures the work has fully completed before unregister_work() NULLs the pointers, then the static struct above would be sufficient without additional locking. If I'm missing a scenario or race here, please correct me.
Ok I missed the history on the single issue work item. Yes what you proposed above should work if it's single issue. and if we are only sending 1 item, a statically declared work context should be sufficient I think.
DJ
>
> Thanks,
> Smita
>
>>
>>> +
>>> +static DECLARE_WORK(dax_hmem_work, hmem_deferred_work);
>>> +
>>> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
>>> +{
>>> + guard(mutex)(&dax_hmem_lock);
>>> +
>>> + if (hmem_deferred_fn)
>>> + return -EINVAL;
>>> +
>>> + hmem_deferred_fn = fn;
>>> + dax_hmem_data = data;
>>> + return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(dax_hmem_register_work);
>>> +
>>> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data)
>>> +{
>>> + guard(mutex)(&dax_hmem_lock);
>>> +
>>> + if (hmem_deferred_fn != fn || dax_hmem_data != data)
>>> + return -EINVAL;
>>> +
>>> + hmem_deferred_fn = NULL;
>>> + dax_hmem_data = NULL;
>>> + return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(dax_hmem_unregister_work);
>>> +
>>> +void dax_hmem_queue_work(void)
>>> +{
>>> + queue_work(system_long_wq, &dax_hmem_work);
>>> +}
>>> +EXPORT_SYMBOL_GPL(dax_hmem_queue_work);
>>> +
>>> +void dax_hmem_flush_work(void)
>>> +{
>>> + flush_work(&dax_hmem_work);
>>> +}
>>> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
>>> +
>>> #define DAX_NAME_LEN 30
>>> struct dax_id {
>>> struct list_head list;
>>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>>> index cbbf64443098..b58a88e8089c 100644
>>> --- a/drivers/dax/bus.h
>>> +++ b/drivers/dax/bus.h
>>> @@ -41,6 +41,13 @@ struct dax_device_driver {
>>> void (*remove)(struct dev_dax *dev);
>>> };
>>> +typedef void (*dax_hmem_deferred_fn)(void *data);
>>> +
>>> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
>>> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data);
>>> +void dax_hmem_queue_work(void);
>>> +void dax_hmem_flush_work(void);
>>> +
>>> int __dax_driver_register(struct dax_device_driver *dax_drv,
>>> struct module *module, const char *mod_name);
>>> #define dax_driver_register(driver) \
>>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-02-18 18:05 ` Dave Jiang
@ 2026-02-20 19:54 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-20 19:54 UTC (permalink / raw)
To: Dave Jiang, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/18/2026 10:05 AM, Dave Jiang wrote:
>
>
> On 2/9/26 11:45 PM, Smita Koralahalli wrote:
>> The current probe time ownership check for Soft Reserved memory based
>> solely on CXL window intersection is insufficient. dax_hmem probing is not
>> always guaranteed to run after CXL enumeration and region assembly, which
>> can lead to incorrect ownership decisions before the CXL stack has
>> finished publishing windows and assembling committed regions.
>>
>> Introduce deferred ownership handling for Soft Reserved ranges that
>> intersect CXL windows. When such a range is encountered during dax_hmem
>> probe, schedule deferred work and wait for the CXL stack to complete
>> enumeration and region assembly before deciding ownership.
>>
>> Evaluate ownership of Soft Reserved ranges based on CXL region
>> containment.
>>
>> - If all Soft Reserved ranges are fully contained within committed CXL
>> regions, DROP handling Soft Reserved ranges from dax_hmem and allow
>> dax_cxl to bind.
>>
>> - If any Soft Reserved range is not fully claimed by committed CXL
>> region, REGISTER the Soft Reserved ranges with dax_hmem.
>>
>> Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
>> ranges. Once, ownership resolution is complete, flush the deferred work
>> from dax_cxl before allowing dax_cxl to bind.
>>
>> This enforces a strict ownership. Either CXL fully claims the Soft
>> reserved ranges or it relinquishes it entirely.
>>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/dax/bus.c | 3 ++
>> drivers/dax/bus.h | 19 ++++++++++
>> drivers/dax/cxl.c | 1 +
>> drivers/dax/hmem/hmem.c | 78 +++++++++++++++++++++++++++++++++++++++--
>> 4 files changed, 99 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> index 92b88952ede1..81985bcc70f9 100644
>> --- a/drivers/dax/bus.c
>> +++ b/drivers/dax/bus.c
>> @@ -25,6 +25,9 @@ DECLARE_RWSEM(dax_region_rwsem);
>> */
>> DECLARE_RWSEM(dax_dev_rwsem);
>>
>> +enum dax_cxl_mode dax_cxl_mode = DAX_CXL_MODE_DEFER;
>> +EXPORT_SYMBOL_NS_GPL(dax_cxl_mode, "CXL");
>> +
>> static DEFINE_MUTEX(dax_hmem_lock);
>> static dax_hmem_deferred_fn hmem_deferred_fn;
>> static void *dax_hmem_data;
>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>> index b58a88e8089c..82616ff52fd1 100644
>> --- a/drivers/dax/bus.h
>> +++ b/drivers/dax/bus.h
>> @@ -41,6 +41,25 @@ struct dax_device_driver {
>> void (*remove)(struct dev_dax *dev);
>> };
>>
>> +/*
>> + * enum dax_cxl_mode - State machine to determine ownership for CXL
>> + * tagged Soft Reserved memory ranges.
>> + * @DAX_CXL_MODE_DEFER: Ownership resolution pending. Set while waiting
>> + * for CXL enumeration and region assembly to complete.
>> + * @DAX_CXL_MODE_REGISTER: CXL regions do not fully cover Soft Reserved
>> + * ranges. Fall back to registering those ranges via dax_hmem.
>> + * @DAX_CXL_MODE_DROP: All Soft Reserved ranges intersecting CXL windows
>> + * are fully contained within committed CXL regions. Drop HMEM handling
>> + * and allow dax_cxl to bind.
>> + */
>> +enum dax_cxl_mode {
>> + DAX_CXL_MODE_DEFER,
>> + DAX_CXL_MODE_REGISTER,
>> + DAX_CXL_MODE_DROP,
>> +};
>> +
>> +extern enum dax_cxl_mode dax_cxl_mode;
>> +
>> typedef void (*dax_hmem_deferred_fn)(void *data);
>>
>> int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>> index a2136adfa186..3ab39b77843d 100644
>> --- a/drivers/dax/cxl.c
>> +++ b/drivers/dax/cxl.c
>> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>>
>> static void cxl_dax_region_driver_register(struct work_struct *work)
>> {
>> + dax_hmem_flush_work();
>> cxl_driver_register(&cxl_dax_region_driver);
>> }
>>
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index 1e3424358490..85854e25254b 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -3,6 +3,7 @@
>> #include <linux/memregion.h>
>> #include <linux/module.h>
>> #include <linux/dax.h>
>> +#include <cxl/cxl.h>
>> #include "../bus.h"
>>
>> static bool region_idle;
>> @@ -69,8 +70,18 @@ static int hmem_register_device(struct device *host, int target_nid,
>> if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> IORES_DESC_CXL) != REGION_DISJOINT) {
>> - dev_dbg(host, "deferring range to CXL: %pr\n", res);
>> - return 0;
>> + switch (dax_cxl_mode) {
>> + case DAX_CXL_MODE_DEFER:
>> + dev_dbg(host, "deferring range to CXL: %pr\n", res);
>> + dax_hmem_queue_work();
>> + return 0;
>> + case DAX_CXL_MODE_REGISTER:
>> + dev_dbg(host, "registering CXL range: %pr\n", res);
>> + break;
>> + case DAX_CXL_MODE_DROP:
>> + dev_dbg(host, "dropping CXL range: %pr\n", res);
>> + return 0;
>> + }
>> }
>>
>> rc = region_intersects_soft_reserve(res->start, resource_size(res));
>> @@ -123,8 +134,70 @@ static int hmem_register_device(struct device *host, int target_nid,
>> return rc;
>> }
>>
>> +static int hmem_register_cxl_device(struct device *host, int target_nid,
>> + const struct resource *res)
>> +{
>> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> + IORES_DESC_CXL) != REGION_DISJOINT)
>> + return hmem_register_device(host, target_nid, res);
>> +
>> + return 0;
>> +}
>> +
>> +static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
>> + const struct resource *res)
>> +{
>> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> + IORES_DESC_CXL) != REGION_DISJOINT) {
>> + if (!cxl_region_contains_soft_reserve((struct resource *)res))
>> + return 1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void process_defer_work(void *data)
>> +{
>> + struct platform_device *pdev = data;
>> + int rc;
>> +
>> + /* relies on cxl_acpi and cxl_pci having had a chance to load */
>> + wait_for_device_probe();
>> +
>> + rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
>> +
>> + if (!rc) {
>> + dax_cxl_mode = DAX_CXL_MODE_DROP;
>> + dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
>> + } else {
>> + dax_cxl_mode = DAX_CXL_MODE_REGISTER;
>> + dev_warn(&pdev->dev,
>> + "Soft Reserved not fully contained in CXL; using HMEM\n");
>> + }
>> +
>> + walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
>> +}
>> +
>> +static void kill_defer_work(void *data)
>> +{
>> + struct platform_device *pdev = data;
>> +
>> + dax_hmem_flush_work();
>> + dax_hmem_unregister_work(process_defer_work, pdev);
>> +}
>> +
>> static int dax_hmem_platform_probe(struct platform_device *pdev)
>> {
>> + int rc;
>> +
>> + rc = dax_hmem_register_work(process_defer_work, pdev);
>
> Do we need to take a reference on pdev when we queue the work?
>
> DJ
I thought it might not be required. But correct me if I'm wrong.
There is only one hmem_platform device. Also devm_add_action_or_reset()
registers kill_defer_work(), which calls flush_work() before the device
is torn down. So pdev cannot be freed while the deferred work is still
in progress. flush_work() blocks until process_defer_work() has fully
returned, and only then does device removal proceed.
But this needs a deadlock fix which Gregory pointed. If probe fails
after work is already queued, the devres cleanup calls flush_work()
which blocks on wait_for_device_probe() while still inside probe
context. I will fix in v7.
Thanks
Smita
>
>> + if (rc)
>> + return rc;
>> +
>> + rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, pdev);
>> + if (rc)
>> + return rc;
>> +
>> return walk_hmem_resources(&pdev->dev, hmem_register_device);
>> }
>>
>> @@ -174,3 +247,4 @@ MODULE_ALIAS("platform:hmem_platform*");
>> MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
>> MODULE_LICENSE("GPL v2");
>> MODULE_AUTHOR("Intel Corporation");
>> +MODULE_IMPORT_NS("CXL");
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-02-19 3:44 ` Alison Schofield
@ 2026-02-20 20:35 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-20 20:35 UTC (permalink / raw)
To: Alison Schofield, Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Hi Alison,
On 2/18/2026 7:44 PM, Alison Schofield wrote:
> On Tue, Feb 10, 2026 at 06:44:55AM +0000, Smita Koralahalli wrote:
>> __cxl_decoder_detach() currently resets decoder programming whenever a
>> region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
>
> Not sure 'detached' is the right word. Unregistered maybe?
>
>> autodiscovered regions, this can incorrectly tear down decoder state
>> that may be relied upon by other consumers or by subsequent ownership
>> decisions.
>>
>> Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
>> set.
>
> I get how this is needed in the failover to DAX case, yet I'm not clear
> how it fits in with folks that just want to destroy that auto region
> and resuse the pieces.
>
> Your other recent patch cxl/hdm: Avoid DVSEC fallback after region teardown[1],
> showed me that the memdevs, when left with the endpoint decoders not reset,
> will keep trying to create another region when reprobed.
>
> [1] https://lore.kernel.org/linux-cxl/aY6pTk63ivjkanlR@aschofie-mobl2.lan/
>
> I think the patch does what it says it does. Perhaps expand on why that
> is always the right thing to do.
>
> --Alison
>
>
>>
Thanks for the review. I will fix detached to unregistered in the commit
message for v7.
I think there are two paths here: Correct me if I'm wrong.
F_AUTO guard only applies to __cxl_decoder_detach(), which is called
from unregister_region()
(unregister_region()->detach_target()->cxl_decoder_detach()..) path and
via store_targetN()
(store_targetN()->detach_target()->cxl_decoder_detach()..). In both
cases, this patch is preserving decoder state for auto-discovered regions.
When a user explicitly destroys an auto-discovered region via cxl
destroy-region, or decommits via commit_store, those paths call
cxl_region_decode_reset()
(commit_store()->cxl_region_decode_reset()->cxld->reset)
unconditionally, they are not gated by F_AUTO. So users who want to
destroy the auto region and reuse the pieces can still do so. The
decoder state is fully reset in that path.
On the DVSEC fallback fix: The endpoint decoders were being reset (by
cxl_region_decode_reset() unconditionally), which zeroed the registers.
On reprobe, should_emulate_decoders() checked per decoder COMMITTED
bits, found them cleared, and incorrectly fell back to DVSEC range
emulation, treating the decoder as AUTO and creating a spurious region..
Also, since nothing in the dax_hmem path calls unregister_region today,
this F_AUTO guard in __cxl_decoder_detach() is only for preserving
firmware decoder state during auto region teardown. I need to rephrase
that in commit message properly. I will also expand the commit message
for v7 documenting how the explicit decommit path still remains available.
Thanks
Smita
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/region.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index ae899f68551f..45ee598daf95 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
>> cxled->part = -1;
>>
>> if (p->state > CXL_CONFIG_ACTIVE) {
>> - cxl_region_decode_reset(cxlr, p->interleave_ways);
>> + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
>> + cxl_region_decode_reset(cxlr, p->interleave_ways);
>> +
>> p->state = CXL_CONFIG_ACTIVE;
>> }
>>
>> --
>> 2.17.1
>>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-13 14:04 ` Gregory Price
@ 2026-02-20 20:47 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-20 20:47 UTC (permalink / raw)
To: Gregory Price, Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Tomasz Wolski
On 2/13/2026 6:04 AM, Gregory Price wrote:
> On Tue, Feb 10, 2026 at 06:44:52AM +0000, Smita Koralahalli wrote:
>> This series aims to address long-standing conflicts between HMEM and
>> CXL when handling Soft Reserved memory ranges.
>>
>> Reworked from Dan's patch:
>> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>>
>
> Link is broken: bad commit reference
https://lore.kernel.org/all/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/
will fix.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-12 6:38 ` Alison Schofield
@ 2026-02-20 21:00 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-20 21:00 UTC (permalink / raw)
To: Alison Schofield
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 2/11/2026 10:38 PM, Alison Schofield wrote:
> On Tue, Feb 10, 2026 at 11:49:04AM -0800, Koralahalli Channabasappa, Smita wrote:
>> Hi Alison,
>>
>> On 2/10/2026 11:16 AM, Alison Schofield wrote:
>>> On Tue, Feb 10, 2026 at 06:44:52AM +0000, Smita Koralahalli wrote:
>>>> This series aims to address long-standing conflicts between HMEM and
>>>> CXL when handling Soft Reserved memory ranges.
>>>>
>>>> Reworked from Dan's patch:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>>>>
>>>> Previous work:
>>>> https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
>>>>
>>>> Link to v5:
>>>> https://lore.kernel.org/all/20260122045543.218194-1-Smita.KoralahalliChannabasappa@amd.com
>>>>
>>>> The series is based on branch "for-7.0/cxl-init" and base-commit is
>>>> base-commit: bc62f5b308cbdedf29132fe96e9d591e526527e1
>>>>
>>>> [1] After offlining the memory I can tear down the regions and recreate
>>>> them back. dax_cxl creates dax devices and onlines memory.
>>>> 850000000-284fffffff : CXL Window 0
>>>> 850000000-284fffffff : region0
>>>> 850000000-284fffffff : dax0.0
>>>> 850000000-284fffffff : System RAM (kmem)
>>>>
>>>> [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
>>>> HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
>>>> and dax devices are created from HMEM.
>>>> 850000000-284fffffff : CXL Window 0
>>>> 850000000-284fffffff : Soft Reserved
>>>> 850000000-284fffffff : dax0.0
>>>> 850000000-284fffffff : System RAM (kmem)
>>>>
>>>> [3] Region assembly failure works same as [2].
>>>>
>>>> [4] REGISTER path:
>>>> When CXL_BUS = y (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = y),
>>>> the dax_cxl driver is probed and completes initialization before dax_hmem
>>>> probes. This scenario was tested with CXL = y, DAX_CXL = m and
>>>> DAX_HMEM = m. To validate the REGISTER path, I forced REGISTER even in
>>>> cases where SR completely overlaps the CXL region as I did not have access
>>>> to a system where the CXL region range is smaller than the SR range.
>>>>
>>>> 850000000-284fffffff : Soft Reserved
>>>> 850000000-284fffffff : CXL Window 0
>>>> 850000000-280fffffff : region0
>>>> 850000000-284fffffff : dax0.0
>>>> 850000000-284fffffff : System RAM (kmem)
>>>>
>>>> "path":"\/platform\/ACPI0017:00\/root0\/decoder0.0\/region0\/dax_region0",
>>>> "id":0,
>>>> "size":"128.00 GiB (137.44 GB)",
>>>> "align":2097152
>>>>
>>>> [ 35.961707] cxl-dax: cxl_dax_region_init()
>>>> [ 35.961713] cxl-dax: registering driver.
>>>> [ 35.961715] cxl-dax: dax_hmem work flushed.
>>>> [ 35.961754] alloc_dev_dax_range: dax0.0: alloc range[0]:
>>>> 0x000000850000000:0x000000284fffffff
>>>> [ 35.976622] hmem: hmem_platform probe started.
>>>> [ 35.980821] cxl_bus_probe: cxl_dax_region dax_region0: probe: 0
>>>> [ 36.819566] hmem_platform hmem_platform.0: Soft Reserved not fully
>>>> contained in CXL; using HMEM
>>>> [ 36.819569] hmem_register_device: hmem_platform hmem_platform.0:
>>>> registering CXL range: [mem 0x850000000-0x284fffffff flags 0x80000200]
>>>> [ 36.934156] alloc_dax_region: hmem hmem.6: dax_region resource conflict
>>>> for [mem 0x850000000-0x284fffffff]
>>>> [ 36.989310] hmem hmem.6: probe with driver hmem failed with error -12
>>>>
>>>> [5] When CXL_BUS = m (with CXL_ACPI, CXL_PCI, CXL_PORT, CXL_MEM = m),
>>>> DAX_CXL = m and DAX_HMEM = y the results are as expected. To validate the
>>>> REGISTER path, I forced REGISTER even in cases where SR completely
>>>> overlaps the CXL region as I did not have access to a system where the
>>>> CXL region range is smaller than the SR range.
>>>>
>>>> 850000000-284fffffff : Soft Reserved
>>>> 850000000-284fffffff : CXL Window 0
>>>> 850000000-280fffffff : region0
>>>> 850000000-284fffffff : dax6.0
>>>> 850000000-284fffffff : System RAM (kmem)
>>>>
>>>> "path":"\/platform\/hmem.6",
>>>> "id":6,
>>>> "size":"128.00 GiB (137.44 GB)",
>>>> "align":2097152
>>>>
>>>> [ 30.897665] devm_cxl_add_dax_region: cxl_region region0: region0:
>>>> register dax_region0
>>>> [ 30.921015] hmem: hmem_platform probe started.
>>>> [ 31.017946] hmem_platform hmem_platform.0: Soft Reserved not fully
>>>> contained in CXL; using HMEM
>>>> [ 31.056310] alloc_dev_dax_range: dax6.0: alloc range[0]:
>>>> 0x0000000850000000:0x000000284fffffff
>>>> [ 34.781516] cxl-dax: cxl_dax_region_init()
>>>> [ 34.781522] cxl-dax: registering driver.
>>>> [ 34.781523] cxl-dax: dax_hmem work flushed.
>>>> [ 34.781549] alloc_dax_region: cxl_dax_region dax_region0: dax_region
>>>> resource conflict for [mem 0x850000000-0x284fffffff]
>>>> [ 34.781552] cxl_bus_probe: cxl_dax_region dax_region0: probe: -12
>>>> [ 34.781554] cxl_dax_region dax_region0: probe with driver cxl_dax_region
>>>> failed with error -12
>>>>
>>>> v6 updates:
>>>> - Patch 1-3 no changes.
>>>> - New Patches 4-5.
>>>> - (void *)res -> res.
>>>> - cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
>>>> - New file include/cxl/cxl.h
>>>> - Introduced singleton workqueue.
>>>> - hmem to queue the work and cxl to flush.
>>>> - cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
>>>> - Included descriptions for dax_cxl_mode.
>>>> - kzalloc -> kmalloc in add_soft_reserve_into_iomem()
>>>> - dax_cxl_mode is exported to CXL.
>>>> - Introduced hmem_register_cxl_device() for walking only CXL
>>>> intersected SR ranges the second time.
>>>
>>> During v5 review of this patch:
>>>
>>> [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
>>>
>>> there was discussion around handling region teardown. It's not mentioned
>>> in the changelog, and the teardown is completely removed from the patch.
>>>
>>> The discussion seemed to be leaning towards not tearing down 'all', but
>>> it's not clear to me that we decided not to tear down anything - which
>>> this update now does.
>>>
>>> And, as you may be guessing, I'm seeing disabled regions with DAX children
>>> and figuring out what can be done with them.
>>>
>>> Can you explain the new approach so I can test against that intention?
>>>
>>> FYI - I am able to confirm the dax regions are back for no-soft-reserved
>>> case, and my basic hotplug flow works with v6.
>>>
>>> -- Alison
>>
>> Hi Alison,
>>
>> Thanks for the test and confirming the no-soft-reserved and hotplug cases
>> work.
>>
>> You're right that cxl_region_teardown_all() was removed in v6. I should have
>> called this out more clearly in the changelog. Here's what I learnt from v5
>> review. Correct me if I misunderstood.
>>
>> During v5 review, regarding dropping teardown (comments from Dan):
>>
>> "If we go with the alloc_dax_region() observation in my other mail it means
>> that the HPA space will already be claimed and cxl_dax_region_probe() will
>> fail. If we can get to that point of "all HMEM registered, and all CXL
>> regions failing to attach their
>> cxl_dax_region devices" that is a good stopping point. Then can decide if a
>> follow-on patch is needed to cleanup that state (cxl_region_teardown_all())
>> , or if it can just idle that way in the messy state and wait for userspace
>> to cleanup if it wants."
>>
>> https://lore.kernel.org/all/697aad9546542_30951007c@dwillia2-mobl4.notmuch/
>>
>> Also:
>>
>> "In other words, I thought total teardown would be simpler, but as the
>> feedback keeps coming in, I think that brings a different set of complexity.
>> So just inject failures for dax_cxl to trip over and then we can go further
>> later to effect total teardown if that proves to not be enough."
>>
>> https://lore.kernel.org/all/697a9d46b147e_309510027@dwillia2-mobl4.notmuch/
>>
>> The v6 approach replaces teardown with the alloc_dax_region() resource
>> exclusion in patch 5. When HMEM wins the ownership decision (REGISTER path),
>> it successfully claims the dax_region resource range first. When dax_cxl
>> later tries to probe, its alloc_dax_region() call hits a resource conflict
>> and fails, leaving the cxl_dax_region device in a disabled state.
>>
>> (There is a separate ordering issue when CXL is built-in and HMEM is a
>> module, where dax_cxl may claim the dax_region first as observed in
>> experiments [4] and [5], but that is an independent topic and might not be
>> relevant here.)
>>
>> So the disabled regions with DAX children you are seeing on the CXL side are
>> likely expected as Dan mentioned - they show that CXL tried to claim the
>> range but HMEM got there first. Though the cxl region remains committed, no
>> dax_region gets created for it because the HPA space is already taken.
>
> Hi Smita,
>
> The disable regions I'm seeing are the remnants of failed region assemblies
> where HMEM rightfully took over. So the take over is good, but the expected
> view shown way above and repasted below is not what I'm seeing. Case [3]
> is not the same as Case [2], but have a region btw the SR and DAX.
>
>
>>>> [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
>>>> HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
>>>> and dax devices are created from HMEM.
>>>> 850000000-284fffffff : CXL Window 0
>>>> 850000000-284fffffff : Soft Reserved
>>>> 850000000-284fffffff : dax0.0
>>>> 850000000-284fffffff : System RAM (kmem)
>>>>
>>>> [3] Region assembly failure works same as [2].
>>>>
>
> I posted a patch[1] that I think gets us to what is expected.
> FWIW I do agree with abandoning the teardown all approach. In this
> patch I still don't suggest tearing down the region. It can stay for
> 'forensics', but I do think we should make /proc/iomem accurately
> reflect the memory topology.
>
> [1] https://lore.kernel.org/linux-cxl/20260212062250.1219043-1-alison.schofield@intel.com/
>
> -- Alison
Sorry I missed this message. I will go through it.
I think the reason I wasn't seeing regions in /proc/iomem during my
testing is that I was using both of your test patches together, the fake
failure in cxl_region_sort_targets() and the cleanup patch that calls
devm_release_action()->unregister_region() on attach_target() failure in
cxl_add_to_region(). The second patch removes the region on assembly
failure, which is why the iomem tree had no region in my case.
You are right, just with faking failure in cxl_region_sort_targets() the
region will exist in iomem tree.
Thanks
Smita
>
>>
>> Thanks
>> Smita
>>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-20 9:45 ` Tomasz Wolski
@ 2026-02-20 21:19 ` Koralahalli Channabasappa, Smita
2026-02-22 23:17 ` Tomasz Wolski
0 siblings, 1 reply; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-02-20 21:19 UTC (permalink / raw)
To: Tomasz Wolski, smita.koralahallichannabasappa
Cc: alison.schofield, ardb, benjamin.cheatham, bp, dan.j.williams,
dave.jiang, dave, gregkh, huang.ying.caritas, ira.weiny, jack,
jeff.johnson, jonathan.cameron, len.brown, linux-cxl,
linux-fsdevel, linux-kernel, linux-pm, lizhijian, ming.li,
nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
terry.bowman, vishal.l.verma, willy, yaoxt.fnst, yazen.ghannam
Hi Tomasz,
On 2/20/2026 1:45 AM, Tomasz Wolski wrote:
> Tested on QEMU and physical setups.
>
> I have one question about "Soft Reserve" parent entries in iomem.
> On QEMU I see parent "Soft Reserved":
>
> a90000000-b4fffffff : Soft Reserved
> a90000000-b4fffffff : CXL Window 0
> a90000000-b4fffffff : dax1.0
> a90000000-b4fffffff : System RAM (kmem)
>
> While on my physical setup this is missing - not sure if this is okay?
>
> BIOS-e820: [mem 0x0000002070000000-0x000000a06fffffff] soft reserved
>
> 2070000000-606fffffff : CXL Window 0
> 2070000000-606fffffff : region0
> 2070000000-606fffffff : dax0.0
> 2070000000-606fffffff : System RAM (kmem)
> 6070000000-a06fffffff : CXL Window 1
> 6070000000-a06fffffff : region1
> 6070000000-a06fffffff : dax1.0
> 6070000000-a06fffffff : System RAM (kmem)
Thanks for testing on both setups!
On QEMU: there is no region, so HMEM took ownership of the Soft Reserved
range (REGISTER path). Patch 9 then reintroduced the Soft Reserved entry
back into the iomem tree to reflect HMEM ownership.
On physical setup: CXL fully claimed both ranges, region0 and region1
assembled successfully (DROP path). Since CXL owns the memory, there's
no Soft Reserved parent to reintroduce.
Soft Reserved appears in /proc/iomem only when CXL does not fully claim
the range and HMEM takes over. Your physical setup is showing it
correctly. Maybe CXL_REGION config is false or region assembly failed on
and has cleaned up on QEMU so there aren't any regions?
Thanks,
Smita
>
> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
2026-02-20 21:19 ` Koralahalli Channabasappa, Smita
@ 2026-02-22 23:17 ` Tomasz Wolski
0 siblings, 0 replies; 61+ messages in thread
From: Tomasz Wolski @ 2026-02-22 23:17 UTC (permalink / raw)
To: skoralah
Cc: alison.schofield, ardb, benjamin.cheatham, bp, dan.j.williams,
dave.jiang, dave, gregkh, huang.ying.caritas, ira.weiny, jack,
jeff.johnson, jonathan.cameron, len.brown, linux-cxl,
linux-fsdevel, linux-kernel, linux-pm, lizhijian, ming.li,
nathan.fontenot, nvdimm, pavel, peterz, rafael, rrichter,
smita.koralahallichannabasappa, terry.bowman, tomasz.wolski,
vishal.l.verma, willy, yaoxt.fnst, yazen.ghannam
Hi Smita,
>Hi Tomasz,
>
>On 2/20/2026 1:45 AM, Tomasz Wolski wrote:
>> Tested on QEMU and physical setups.
>>
>> I have one question about "Soft Reserve" parent entries in iomem.
>> On QEMU I see parent "Soft Reserved":
>>
>> a90000000-b4fffffff : Soft Reserved
>> a90000000-b4fffffff : CXL Window 0
>> a90000000-b4fffffff : dax1.0
>> a90000000-b4fffffff : System RAM (kmem)
>>
>> While on my physical setup this is missing - not sure if this is okay?
>>
>> BIOS-e820: [mem 0x0000002070000000-0x000000a06fffffff] soft reserved
>>
>> 2070000000-606fffffff : CXL Window 0
>> 2070000000-606fffffff : region0
>> 2070000000-606fffffff : dax0.0
>> 2070000000-606fffffff : System RAM (kmem)
>> 6070000000-a06fffffff : CXL Window 1
>> 6070000000-a06fffffff : region1
>> 6070000000-a06fffffff : dax1.0
>> 6070000000-a06fffffff : System RAM (kmem)
>
>Thanks for testing on both setups!
>
>On QEMU: there is no region, so HMEM took ownership of the Soft Reserved
>range (REGISTER path). Patch 9 then reintroduced the Soft Reserved entry
>back into the iomem tree to reflect HMEM ownership.
>
>On physical setup: CXL fully claimed both ranges, region0 and region1
>assembled successfully (DROP path). Since CXL owns the memory, there's
>no Soft Reserved parent to reintroduce.
>
>Soft Reserved appears in /proc/iomem only when CXL does not fully claim
>the range and HMEM takes over. Your physical setup is showing it
>correctly. Maybe CXL_REGION config is false or region assembly failed on
>and has cleaned up on QEMU so there aren't any regions?
Thanks a lot for clarifying the behavior.
I checked QEMU again and it works as you described (sorry, I must've
mixed something up in my notes during previous test)
>Thanks,
>Smita
>
> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
2026-02-10 6:44 ` [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
2026-02-18 15:54 ` Dave Jiang
@ 2026-03-09 14:31 ` Jonathan Cameron
1 sibling, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2026-03-09 14:31 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, 10 Feb 2026 06:44:56 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
> dax_cxl.
>
> In addition, defer registration of the dax_cxl driver to a workqueue
> instead of using module_cxl_driver(). This ensures that dax_hmem has
> an opportunity to initialize and register its deferred callback and make
> ownership decisions before dax_cxl begins probing and claiming Soft
> Reserved ranges.
>
> Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
> out of line from other synchronous probing avoiding ordering
> dependencies while coordinating ownership decisions with dax_hmem.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Bit of a hack but I don't have a better idea.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree
2026-02-10 6:44 ` [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
2026-02-18 16:04 ` Dave Jiang
@ 2026-03-09 14:37 ` Jonathan Cameron
2026-03-12 21:30 ` Koralahalli Channabasappa, Smita
2026-03-12 0:27 ` Dan Williams
2 siblings, 1 reply; 61+ messages in thread
From: Jonathan Cameron @ 2026-03-09 14:37 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, 10 Feb 2026 06:44:57 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Introduce a global "DAX Regions" resource root and register each
> dax_region->res under it via request_resource(). Release the resource on
> dax_region teardown.
>
> By enforcing a single global namespace for dax_region allocations, this
> ensures only one of dax_hmem or dax_cxl can successfully register a
> dax_region for a given range.
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
One question inline about the locking.
Is intent to serialize beyond this new resource tree? If it's just
the resource tree the write_lock(&resource_lock); in the request
and release_resource() should be sufficient.
> ---
> drivers/dax/bus.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index fde29e0ad68b..5f387feb95f0 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -10,6 +10,7 @@
> #include "dax-private.h"
> #include "bus.h"
>
> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
> static DEFINE_MUTEX(dax_bus_lock);
>
> /*
> @@ -625,6 +626,8 @@ static void dax_region_unregister(void *region)
> {
> struct dax_region *dax_region = region;
>
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + release_resource(&dax_region->res);
Do we need the locking? resource stuff all runs under the global
resource_lock so if aim is just to serialize adds and removes that should
be enough. Maybe there is a justification in that being an internal
implementation detail.
> sysfs_remove_groups(&dax_region->dev->kobj,
> dax_region_attribute_groups);
> dax_region_put(dax_region);
> @@ -635,6 +638,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
> unsigned long flags)
> {
> struct dax_region *dax_region;
> + int rc;
>
> /*
> * The DAX core assumes that it can store its private data in
> @@ -667,14 +671,27 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
> .flags = IORESOURCE_MEM | flags,
> };
>
> - if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> - kfree(dax_region);
> - return NULL;
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + rc = request_resource(&dax_regions, &dax_region->res);
> + if (rc) {
> + dev_dbg(parent, "dax_region resource conflict for %pR\n",
> + &dax_region->res);
> + goto err_res;
> }
>
> + if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
> + goto err_sysfs;
> +
> if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
> return NULL;
> return dax_region;
> +
> +err_sysfs:
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + release_resource(&dax_region->res);
> +err_res:
> + kfree(dax_region);
> + return NULL;
> }
> EXPORT_SYMBOL_GPL(alloc_dax_region);
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
2026-02-10 6:44 ` [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination Smita Koralahalli
2026-02-18 17:52 ` Dave Jiang
@ 2026-03-09 14:49 ` Jonathan Cameron
1 sibling, 0 replies; 61+ messages in thread
From: Jonathan Cameron @ 2026-03-09 14:49 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, 10 Feb 2026 06:44:59 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Add helpers to register, queue and flush the deferred work.
>
> These helpers allow dax_hmem to execute ownership resolution outside the
> probe context before dax_cxl binds.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
The sanity checks on valid inputs to me seem excessive for something
that is intended to have a very narrow usecase. I'm also not sure it's
harmful to just not bother with the parameter checking.
Otherwise seems fine to me.
> ---
> drivers/dax/bus.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
> drivers/dax/bus.h | 7 ++++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 5f387feb95f0..92b88952ede1 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -25,6 +25,64 @@ DECLARE_RWSEM(dax_region_rwsem);
> */
> DECLARE_RWSEM(dax_dev_rwsem);
>
> +static DEFINE_MUTEX(dax_hmem_lock);
> +static dax_hmem_deferred_fn hmem_deferred_fn;
> +static void *dax_hmem_data;
> +
> +static void hmem_deferred_work(struct work_struct *work)
> +{
> + dax_hmem_deferred_fn fn;
> + void *data;
> +
> + scoped_guard(mutex, &dax_hmem_lock) {
> + fn = hmem_deferred_fn;
> + data = dax_hmem_data;
> + }
> +
> + if (fn)
> + fn(data);
> +}
> +
> +static DECLARE_WORK(dax_hmem_work, hmem_deferred_work);
> +
> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data)
> +{
> + guard(mutex)(&dax_hmem_lock);
> +
> + if (hmem_deferred_fn)
> + return -EINVAL;
What happens if we drop the check and therefore need to return int
from these + handle errors?
The worst that happens is hmem_deferred_fn == NULL and we set the
data (might also be NULL, we don't care).
To me that looks harmless.
> +
> + hmem_deferred_fn = fn;
> + dax_hmem_data = data;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_register_work);
> +
> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data)
> +{
> + guard(mutex)(&dax_hmem_lock);
> +
> + if (hmem_deferred_fn != fn || dax_hmem_data != data)
> + return -EINVAL;
Do we need the sanity check? I'd just unconditionally clear them
both.
> +
> + hmem_deferred_fn = NULL;
> + dax_hmem_data = NULL;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_unregister_work);
> +
> +void dax_hmem_queue_work(void)
> +{
> + queue_work(system_long_wq, &dax_hmem_work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_queue_work);
> +
> +void dax_hmem_flush_work(void)
> +{
> + flush_work(&dax_hmem_work);
> +}
> +EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
> +
> #define DAX_NAME_LEN 30
> struct dax_id {
> struct list_head list;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index cbbf64443098..b58a88e8089c 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -41,6 +41,13 @@ struct dax_device_driver {
> void (*remove)(struct dev_dax *dev);
> };
>
> +typedef void (*dax_hmem_deferred_fn)(void *data);
> +
> +int dax_hmem_register_work(dax_hmem_deferred_fn fn, void *data);
> +int dax_hmem_unregister_work(dax_hmem_deferred_fn fn, void *data);
> +void dax_hmem_queue_work(void);
> +void dax_hmem_flush_work(void);
> +
> int __dax_driver_register(struct dax_device_driver *dax_drv,
> struct module *module, const char *mod_name);
> #define dax_driver_register(driver) \
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-02-10 6:44 ` [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
2026-02-19 3:44 ` Alison Schofield
@ 2026-03-11 21:37 ` Dan Williams
2026-03-12 19:53 ` Dan Williams
2026-03-18 21:27 ` Alison Schofield
1 sibling, 2 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-11 21:37 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Smita Koralahalli wrote:
> __cxl_decoder_detach() currently resets decoder programming whenever a
> region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
> autodiscovered regions, this can incorrectly tear down decoder state
> that may be relied upon by other consumers or by subsequent ownership
> decisions.
>
> Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
> set.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/region.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ae899f68551f..45ee598daf95 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> cxled->part = -1;
>
> if (p->state > CXL_CONFIG_ACTIVE) {
> - cxl_region_decode_reset(cxlr, p->interleave_ways);
> + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> + cxl_region_decode_reset(cxlr, p->interleave_ways);
> +
> p->state = CXL_CONFIG_ACTIVE;
tl;dr: I do not think we need this, I do think we need to clarify to
users what enable/disable and/or hot remove violence is handled and not
handled by the CXL core.
So this looks deceptively simple, but I think it is incomplete or at
least adds to the current confusion. A couple points to consider:
1/ There is no corresponding clear_bit(CXL_REGION_F_AUTO, ...) anywhere in
the driver. Yes, admin can still force cxl_region_decode_reset() via
commit_store() path, but admin can not force
cxl_region_teardown_targets() in the __cxl_decoder_detach() path. I do
not like that this causes us to end up with 2 separate considerations
for when __cxl_decoder_detach() skips cleanup actions
(cxl_region_teardown_targets() and cxl_region_decode_reset()). See
below, I think the cxl_region_teardown_targets() check is probably
bogus.
At a minimum I think commit_store() should clear CXL_REGION_F_AUTO on
decommit such that cleaning up decoders and targets later proceeds as
expected.
2/ The hard part about CXL region cleanup is that it needs to be prepared
for:
a/ user manually removes the region via sysfs
b/ user manually disables cxl_port, cxl_mem, or cxl_acpi causing the
endpoint port to be removed
c/ user physically removes the memdev causing the endpoint port to be
removed (CXL core can not tell the difference with 2b/ it just sees
cxl_mem_driver::remove() operation invocation)
d/ setup action fails and region setup is unwound
The cxl_region_decode_reset() is in __cxl_decoder_detach() because of
2b/ and 2c/. No other chance to cleanup the decode topology once the
endpoint decoders are on their way out of the system.
In this case though the patch was generated back when we were committed
to cleaning up failed to assemble regions, a new 2d/ case, right?
However, in that case the decoder is not leaving the system. The
questions that arrive from that analysis are:
* Is this patch still needed now that there is no auto-cleanup?
* If this patch is still needed is it better to skip
cxl_region_decode_reset() based on the 'enum cxl_detach_mode' rather
than the CXL_REGION_F_AUTO flag? I.e. skip reset in the 2d/ case, or
some other new general flag that says "please preserve hardware
configuration".
* Should cxl_region_teardown_targets() also be caring about the
cxl_detach_mode rather than the auto flag? I actually think the
CXL_REGION_F_AUTO check in cxl_region_teardown_targets() is misplaced
and it was confusing "teardown targets" with "decode reset".
All of this wants some documentation to tell users that the rule is
"Hey, after any endpoint decoder has been seen by the CXL core, if you
remove that endpoint decoder by removing or disabling any of cxl_acpi,
cxl_mem, or cxl_port the CXL core *will* violently destroy the decode
configuration". Then think about whether this needs a way to specify
"skip decoder teardown" to disambiguate "the decoder is disappearing
logically, but not physically, keep its configuration". That allows
turning any manual configuration into an auto-configuration and has an
explicit rule for all regions rather than the current "auto regions are
special" policy.
It is helpful that violence has been the default so far. So it allows to
introduce a decoder shutdown policy toggle where CXL_REGION_F_AUTO flags
decoders as "preserve" by default. Region decommit clears that flag,
and/or userspace can toggle that per endpoint decoder flag to determine
what happens when decoders leave the system. That probably also wants
some lockdown interaction such that root can not force unplug memory by
unbinding a driver.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree
2026-02-10 6:44 ` [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
2026-02-18 16:04 ` Dave Jiang
2026-03-09 14:37 ` Jonathan Cameron
@ 2026-03-12 0:27 ` Dan Williams
2026-03-12 21:31 ` Koralahalli Channabasappa, Smita
2 siblings, 1 reply; 61+ messages in thread
From: Dan Williams @ 2026-03-12 0:27 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Smita Koralahalli wrote:
> Introduce a global "DAX Regions" resource root and register each
> dax_region->res under it via request_resource(). Release the resource on
> dax_region teardown.
>
> By enforcing a single global namespace for dax_region allocations, this
> ensures only one of dax_hmem or dax_cxl can successfully register a
> dax_region for a given range.
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Did I send any code for this? If I suggested the locking below,
apologies, otherwise Suggested-by is expected unless code is adopted
from another patch.
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/dax/bus.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index fde29e0ad68b..5f387feb95f0 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -10,6 +10,7 @@
> #include "dax-private.h"
> #include "bus.h"
>
> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
> static DEFINE_MUTEX(dax_bus_lock);
>
> /*
> @@ -625,6 +626,8 @@ static void dax_region_unregister(void *region)
> {
> struct dax_region *dax_region = region;
>
> + scoped_guard(rwsem_write, &dax_region_rwsem)
> + release_resource(&dax_region->res);
I continue to dislike what scoped_guard() does to indentation. Often
scoped_guard() usage can just be replaced by "helper that uses guard()"
However, dax_region_rwsem protects subdivision of a dax_region, not
coordination across regions.
Also, release_resource() and request_resource() are already protected by
the resource_lock, why is a new lock needed?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 6/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions
2026-02-10 6:44 ` [PATCH v6 6/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
@ 2026-03-12 0:29 ` Dan Williams
0 siblings, 0 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-12 0:29 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Smita Koralahalli wrote:
> Add a helper to determine whether a given Soft Reserved memory range is
> fully contained within the committed CXL region.
>
> This helper provides a primitive for policy decisions in subsequent
> patches such as co-ordination with dax_hmem to determine whether CXL has
> fully claimed ownership of Soft Reserved memory ranges.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++++++++
> include/cxl/cxl.h | 15 +++++++++++++++
> 2 files changed, 45 insertions(+)
> create mode 100644 include/cxl/cxl.h
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 45ee598daf95..96ed550bfd2e 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -12,6 +12,7 @@
> #include <linux/idr.h>
> #include <linux/memory-tiers.h>
> #include <linux/string_choices.h>
> +#include <cxl/cxl.h>
> #include <cxlmem.h>
> #include <cxl.h>
> #include "core.h"
> @@ -3875,6 +3876,35 @@ static int cxl_region_debugfs_poison_clear(void *data, u64 offset)
> DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
> cxl_region_debugfs_poison_clear, "%llx\n");
>
> +static int region_contains_soft_reserve(struct device *dev, void *data)
> +{
> + struct resource *res = data;
> + struct cxl_region *cxlr;
> + struct cxl_region_params *p;
> +
> + if (!is_cxl_region(dev))
> + return 0;
> +
> + cxlr = to_cxl_region(dev);
> + p = &cxlr->params;
> +
> + if (p->state != CXL_CONFIG_COMMIT)
> + return 0;
> +
> + if (!p->res)
> + return 0;
> +
> + return resource_contains(p->res, res) ? 1 : 0;
> +}
> +
> +bool cxl_region_contains_soft_reserve(struct resource *res)
> +{
> + guard(rwsem_read)(&cxl_rwsem.region);
> + return bus_for_each_dev(&cxl_bus_type, NULL, res,
> + region_contains_soft_reserve) != 0;
> +}
> +EXPORT_SYMBOL_GPL(cxl_region_contains_soft_reserve);
To be specific, this function is simply
"cxl_region_contains_resource()", there is nothing "soft reserve"
specific about this implementation.
With that rename you can add:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-02-10 6:45 ` [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
2026-02-18 18:05 ` Dave Jiang
2026-02-20 10:14 ` Alejandro Lucero Palau
@ 2026-03-12 2:28 ` Dan Williams
2026-03-13 18:41 ` Koralahalli Channabasappa, Smita
2026-03-16 22:26 ` Koralahalli Channabasappa, Smita
2 siblings, 2 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-12 2:28 UTC (permalink / raw)
To: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Smita Koralahalli wrote:
> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
>
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows. When such a range is encountered during dax_hmem
> probe, schedule deferred work and wait for the CXL stack to complete
> enumeration and region assembly before deciding ownership.
>
> Evaluate ownership of Soft Reserved ranges based on CXL region
> containment.
>
> - If all Soft Reserved ranges are fully contained within committed CXL
> regions, DROP handling Soft Reserved ranges from dax_hmem and allow
> dax_cxl to bind.
>
> - If any Soft Reserved range is not fully claimed by committed CXL
> region, REGISTER the Soft Reserved ranges with dax_hmem.
>
> Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
> ranges. Once, ownership resolution is complete, flush the deferred work
> from dax_cxl before allowing dax_cxl to bind.
>
> This enforces a strict ownership. Either CXL fully claims the Soft
> reserved ranges or it relinquishes it entirely.
We have had multiple suggestions during the course of developing this
state machine, but I can not see reading this changelog or the
implementation that the full / final state machine is laid out with all
the old ideas cleaned out of the implementation.
For example, I think this has my "untested!" suggestion from:
http://lore.kernel.org/697acf78acf70_3095100c@dwillia2-mobl4.notmuch
...but it does not have the explanation of why it turned out to be
suitable and fits the end goal state machine.
It also has the original definition of "enum dax_cxl_mode". However,
with the recent simplification proposal to stop doing total CXL unwind I
think it allows for a more straightforward state machine. For example,
the "drop" state is now automatic simply by losing the race with
dax_hmem, right?
I think we are close, just some final complexity shaving.
So, with the decision to stop tearing down CXL this state machine only
has 3 requirements.
1/ CXL enumeration needs to start before dax_hmem invokes
wait_for_device_probe().
2/ dax_cxl driver registration needs to be postponed until after
dax_hmem has dispositioned all its regions.
3/ No probe path can flush the work because of the wait_for_device_probe().
Requirement 1/ is met by patch1. Requirement 2/ partially met, has a
proposal here around flushing the work from a separate workqueue
invocation, but I think you want the dependency directly on the dax_hmem
module (if enabled). Requirement 3/ not achieved.
For 3/ I think we can borrow what cxl_mem_probe() does and do:
if (work_pending(&dax_hmem_work))
return -EBUSY;
...if for some reason someone really wants to rebind the dax_hmem driver
they need to flush the queue, and that can be achived by a flush_work()
in the module_exit() path.
This does mean that patch 7 in this series disappears because bus.c has
no role to play in this mess. It is just dax_hmem and dax_cxl getting
their ordering straight.
Some notes on the implications follow:
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/dax/bus.c | 3 ++
> drivers/dax/bus.h | 19 ++++++++++
> drivers/dax/cxl.c | 1 +
> drivers/dax/hmem/hmem.c | 78 +++++++++++++++++++++++++++++++++++++++--
> 4 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 92b88952ede1..81985bcc70f9 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -25,6 +25,9 @@ DECLARE_RWSEM(dax_region_rwsem);
> */
> DECLARE_RWSEM(dax_dev_rwsem);
>
> +enum dax_cxl_mode dax_cxl_mode = DAX_CXL_MODE_DEFER;
> +EXPORT_SYMBOL_NS_GPL(dax_cxl_mode, "CXL");
> +
> static DEFINE_MUTEX(dax_hmem_lock);
> static dax_hmem_deferred_fn hmem_deferred_fn;
> static void *dax_hmem_data;
> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
> index b58a88e8089c..82616ff52fd1 100644
> --- a/drivers/dax/bus.h
> +++ b/drivers/dax/bus.h
> @@ -41,6 +41,25 @@ struct dax_device_driver {
> void (*remove)(struct dev_dax *dev);
> };
>
> +/*
> + * enum dax_cxl_mode - State machine to determine ownership for CXL
> + * tagged Soft Reserved memory ranges.
> + * @DAX_CXL_MODE_DEFER: Ownership resolution pending. Set while waiting
> + * for CXL enumeration and region assembly to complete.
> + * @DAX_CXL_MODE_REGISTER: CXL regions do not fully cover Soft Reserved
> + * ranges. Fall back to registering those ranges via dax_hmem.
> + * @DAX_CXL_MODE_DROP: All Soft Reserved ranges intersecting CXL windows
> + * are fully contained within committed CXL regions. Drop HMEM handling
> + * and allow dax_cxl to bind.
With the above, dax_cxl_mode disappears.
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index a2136adfa186..3ab39b77843d 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>
> static void cxl_dax_region_driver_register(struct work_struct *work)
> {
> + dax_hmem_flush_work();
Looks ok, as long as that symbol is from dax_hmem.ko which gets you the
load dependency (requirement 2/).
Might also want to make sure that all this deferral mess disappears when
CONFIG_DEV_DAX_HMEM=n.
> cxl_driver_register(&cxl_dax_region_driver);
> }
>
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1e3424358490..85854e25254b 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
> #include <linux/memregion.h>
> #include <linux/module.h>
> #include <linux/dax.h>
> +#include <cxl/cxl.h>
> #include "../bus.h"
>
> static bool region_idle;
> @@ -69,8 +70,18 @@ static int hmem_register_device(struct device *host, int target_nid,
> if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> IORES_DESC_CXL) != REGION_DISJOINT) {
> - dev_dbg(host, "deferring range to CXL: %pr\n", res);
> - return 0;
> + switch (dax_cxl_mode) {
> + case DAX_CXL_MODE_DEFER:
> + dev_dbg(host, "deferring range to CXL: %pr\n", res);
> + dax_hmem_queue_work();
This case is just a flag that determines if the work queue has completed
its one run. So I expect this something like:
if (!dax_hmem_initial_probe) {
queue_work()
return;
}
Otherwise just go ahead and register because dax_cxl by this time has
had a chance to have a say and the system falls back to "first come /
first served" mode. In other words the simplification of not cleaning up
goes both ways. dax_hmem naturally fails if dax_cxl already claimed the
address range.
> + return 0;
> + case DAX_CXL_MODE_REGISTER:
> + dev_dbg(host, "registering CXL range: %pr\n", res);
> + break;
> + case DAX_CXL_MODE_DROP:
> + dev_dbg(host, "dropping CXL range: %pr\n", res);
> + return 0;
> + }
> }
>
> rc = region_intersects_soft_reserve(res->start, resource_size(res));
> @@ -123,8 +134,70 @@ static int hmem_register_device(struct device *host, int target_nid,
> return rc;
> }
>
> +static int hmem_register_cxl_device(struct device *host, int target_nid,
> + const struct resource *res)
> +{
> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> + IORES_DESC_CXL) != REGION_DISJOINT)
> + return hmem_register_device(host, target_nid, res);
> +
> + return 0;
> +}
> +
> +static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
> + const struct resource *res)
> +{
> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> + IORES_DESC_CXL) != REGION_DISJOINT) {
> + if (!cxl_region_contains_soft_reserve((struct resource *)res))
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static void process_defer_work(void *data)
> +{
> + struct platform_device *pdev = data;
> + int rc;
> +
> + /* relies on cxl_acpi and cxl_pci having had a chance to load */
> + wait_for_device_probe();
> +
> + rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
> +
> + if (!rc) {
> + dax_cxl_mode = DAX_CXL_MODE_DROP;
> + dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
> + } else {
> + dax_cxl_mode = DAX_CXL_MODE_REGISTER;
> + dev_warn(&pdev->dev,
> + "Soft Reserved not fully contained in CXL; using HMEM\n");
> + }
> +
> + walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
I do not think we need to do 2 passes. Just do one
hmem_register_cxl_device() pass that skips a range when
cxl_region_contains_resource() has it covered, otherwise register an
hmem device.
> +}
> +
> +static void kill_defer_work(void *data)
> +{
> + struct platform_device *pdev = data;
> +
> + dax_hmem_flush_work();
> + dax_hmem_unregister_work(process_defer_work, pdev);
> +}
> +
> static int dax_hmem_platform_probe(struct platform_device *pdev)
> {
> + int rc;
This wants a work_pending() return -EBUSY per above.
> + rc = dax_hmem_register_work(process_defer_work, pdev);
> + if (rc)
> + return rc;
The work does not need to be registered every time. Remember this is
only a one-shot problem at first kernel boot, not every time this
platform device is probed. After the workqueue has run at least once it
never needs to be invoked again if dax_hmem is reloaded.
A flag for "dax_hmem flushed initial device probe at least once" needs
to live in drivers/dax/hmem/device.c and be cleared by
process_defer_work().
> +
> + rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, pdev);
> + if (rc)
> + return rc;
> +
> return walk_hmem_resources(&pdev->dev, hmem_register_device);
> }
The hunk that is missing is that dax_hmem_exit() should flush the work,
and process_defer_work() should give up if the device has been unbound
before it runs. Hopefully that last suggestion does not make lockdep
unhappy about running process_defer_work under the hmem_platform
device_lock(). I *think* it should be ok and solves the TOCTOU race in
hmem_register_device() around whether we are in the pre or post initial
probe world.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-11 21:37 ` Dan Williams
@ 2026-03-12 19:53 ` Dan Williams
2026-03-12 21:28 ` Koralahalli Channabasappa, Smita
2026-03-13 12:54 ` Alejandro Lucero Palau
2026-03-18 21:27 ` Alison Schofield
1 sibling, 2 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-12 19:53 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
Borislav Petkov, Smita Koralahalli, Tomasz Wolski
Dan Williams wrote:
[..]
> All of this wants some documentation to tell users that the rule is
> "Hey, after any endpoint decoder has been seen by the CXL core, if you
> remove that endpoint decoder by removing or disabling any of cxl_acpi,
> cxl_mem, or cxl_port the CXL core *will* violently destroy the decode
> configuration". Then think about whether this needs a way to specify
> "skip decoder teardown" to disambiguate "the decoder is disappearing
> logically, but not physically, keep its configuration". That allows
> turning any manual configuration into an auto-configuration and has an
> explicit rule for all regions rather than the current "auto regions are
> special" policy.
Do not worry about this paragraph of feedback. I will start a new patch
set to address this issue. It is the same problem impacting the
accelerator series where driver reload resets the decode configuration
by default. Both accelerator drivers and userspace should be able to
opt-out / opt-in to that behavior.
This will want some indication that the root decoder space is designated
such that it does not get reassigned while the driver is detached.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-12 19:53 ` Dan Williams
@ 2026-03-12 21:28 ` Koralahalli Channabasappa, Smita
2026-03-13 12:54 ` Alejandro Lucero Palau
1 sibling, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-12 21:28 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 3/12/2026 12:53 PM, Dan Williams wrote:
> Dan Williams wrote:
> [..]
>> All of this wants some documentation to tell users that the rule is
>> "Hey, after any endpoint decoder has been seen by the CXL core, if you
>> remove that endpoint decoder by removing or disabling any of cxl_acpi,
>> cxl_mem, or cxl_port the CXL core *will* violently destroy the decode
>> configuration". Then think about whether this needs a way to specify
>> "skip decoder teardown" to disambiguate "the decoder is disappearing
>> logically, but not physically, keep its configuration". That allows
>> turning any manual configuration into an auto-configuration and has an
>> explicit rule for all regions rather than the current "auto regions are
>> special" policy.
>
> Do not worry about this paragraph of feedback. I will start a new patch
> set to address this issue. It is the same problem impacting the
> accelerator series where driver reload resets the decode configuration
> by default. Both accelerator drivers and userspace should be able to
> opt-out / opt-in to that behavior.
>
> This will want some indication that the root decoder space is designated
> such that it does not get reassigned while the driver is detached.
Sure, this patch is not needed for SR series as region teardown is been
dropped. I will exclude this while sending v7
Thanks
Smita
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree
2026-03-09 14:37 ` Jonathan Cameron
@ 2026-03-12 21:30 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-12 21:30 UTC (permalink / raw)
To: Jonathan Cameron, Smita Koralahalli
Cc: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm,
Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Hi Jonathan,
On 3/9/2026 7:37 AM, Jonathan Cameron wrote:
> On Tue, 10 Feb 2026 06:44:57 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
>
>> Introduce a global "DAX Regions" resource root and register each
>> dax_region->res under it via request_resource(). Release the resource on
>> dax_region teardown.
>>
>> By enforcing a single global namespace for dax_region allocations, this
>> ensures only one of dax_hmem or dax_cxl can successfully register a
>> dax_region for a given range.
>>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>
> One question inline about the locking.
>
> Is intent to serialize beyond this new resource tree? If it's just
> the resource tree the write_lock(&resource_lock); in the request
> and release_resource() should be sufficient.
>
>> ---
>> drivers/dax/bus.c | 23 ++++++++++++++++++++---
>> 1 file changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> index fde29e0ad68b..5f387feb95f0 100644
>> --- a/drivers/dax/bus.c
>> +++ b/drivers/dax/bus.c
>> @@ -10,6 +10,7 @@
>> #include "dax-private.h"
>> #include "bus.h"
>>
>> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
>> static DEFINE_MUTEX(dax_bus_lock);
>>
>> /*
>> @@ -625,6 +626,8 @@ static void dax_region_unregister(void *region)
>> {
>> struct dax_region *dax_region = region;
>>
>> + scoped_guard(rwsem_write, &dax_region_rwsem)
>> + release_resource(&dax_region->res);
>
> Do we need the locking? resource stuff all runs under the global
> resource_lock so if aim is just to serialize adds and removes that should
> be enough. Maybe there is a justification in that being an internal
> implementation detail.
Yeah the wrapping is unnecessary I will drop it.
Thanks
Smita
>
>
>
>> sysfs_remove_groups(&dax_region->dev->kobj,
>> dax_region_attribute_groups);
>> dax_region_put(dax_region);
>> @@ -635,6 +638,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>> unsigned long flags)
>> {
>> struct dax_region *dax_region;
>> + int rc;
>>
>> /*
>> * The DAX core assumes that it can store its private data in
>> @@ -667,14 +671,27 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>> .flags = IORESOURCE_MEM | flags,
>> };
>>
>> - if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
>> - kfree(dax_region);
>> - return NULL;
>> + scoped_guard(rwsem_write, &dax_region_rwsem)
>> + rc = request_resource(&dax_regions, &dax_region->res);
>> + if (rc) {
>> + dev_dbg(parent, "dax_region resource conflict for %pR\n",
>> + &dax_region->res);
>> + goto err_res;
>> }
>>
>> + if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups))
>> + goto err_sysfs;
>> +
>> if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
>> return NULL;
>> return dax_region;
>> +
>> +err_sysfs:
>> + scoped_guard(rwsem_write, &dax_region_rwsem)
>> + release_resource(&dax_region->res);
>> +err_res:
>> + kfree(dax_region);
>> + return NULL;
>> }
>> EXPORT_SYMBOL_GPL(alloc_dax_region);
>>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree
2026-03-12 0:27 ` Dan Williams
@ 2026-03-12 21:31 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-12 21:31 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 3/11/2026 5:27 PM, Dan Williams wrote:
> Smita Koralahalli wrote:
>> Introduce a global "DAX Regions" resource root and register each
>> dax_region->res under it via request_resource(). Release the resource on
>> dax_region teardown.
>>
>> By enforcing a single global namespace for dax_region allocations, this
>> ensures only one of dax_hmem or dax_cxl can successfully register a
>> dax_region for a given range.
>>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>
> Did I send any code for this? If I suggested the locking below,
> apologies, otherwise Suggested-by is expected unless code is adopted
> from another patch.
No sorry the locking was added by me. I will make the changes and drop
the locking.
>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/dax/bus.c | 23 ++++++++++++++++++++---
>> 1 file changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> index fde29e0ad68b..5f387feb95f0 100644
>> --- a/drivers/dax/bus.c
>> +++ b/drivers/dax/bus.c
>> @@ -10,6 +10,7 @@
>> #include "dax-private.h"
>> #include "bus.h"
>>
>> +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
>> static DEFINE_MUTEX(dax_bus_lock);
>>
>> /*
>> @@ -625,6 +626,8 @@ static void dax_region_unregister(void *region)
>> {
>> struct dax_region *dax_region = region;
>>
>> + scoped_guard(rwsem_write, &dax_region_rwsem)
>> + release_resource(&dax_region->res);
>
> I continue to dislike what scoped_guard() does to indentation. Often
> scoped_guard() usage can just be replaced by "helper that uses guard()"
>
> However, dax_region_rwsem protects subdivision of a dax_region, not
> coordination across regions.
>
> Also, release_resource() and request_resource() are already protected by
> the resource_lock, why is a new lock needed?
You are right. I will remove
Thanks
Smita
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-12 19:53 ` Dan Williams
2026-03-12 21:28 ` Koralahalli Channabasappa, Smita
@ 2026-03-13 12:54 ` Alejandro Lucero Palau
2026-03-17 2:14 ` Dan Williams
1 sibling, 1 reply; 61+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-13 12:54 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 3/12/26 19:53, Dan Williams wrote:
> Dan Williams wrote:
> [..]
>> All of this wants some documentation to tell users that the rule is
>> "Hey, after any endpoint decoder has been seen by the CXL core, if you
>> remove that endpoint decoder by removing or disabling any of cxl_acpi,
>> cxl_mem, or cxl_port the CXL core *will* violently destroy the decode
>> configuration". Then think about whether this needs a way to specify
>> "skip decoder teardown" to disambiguate "the decoder is disappearing
>> logically, but not physically, keep its configuration". That allows
>> turning any manual configuration into an auto-configuration and has an
>> explicit rule for all regions rather than the current "auto regions are
>> special" policy.
> Do not worry about this paragraph of feedback. I will start a new patch
> set to address this issue. It is the same problem impacting the
> accelerator series where driver reload resets the decode configuration
> by default. Both accelerator drivers and userspace should be able to
> opt-out / opt-in to that behavior.
No, that is not the last accelerator series behavior, and I'm getting
more than frustrated with all this.
FWIW, Type2 v22 had that behavior, but v23 kept the decoders as BIOS
configured them ... because you explicitly said that is what it should
be done.
Now you are saying that should be up to the driver to decide, if I do
not misunderstand your comment above. And of course, you are going to
fix that for accelerators! Your patchset will be high priority, just to
the front of the queue. Understandable, you are the maintainer.
Let me tell you what I think.
You are mentioning the accelerators problem because it suits you. When
it does not, you usually ignore any type2 support ... until it suits
you. Of course, I can not tell you when to look at type2 patches, but
you can not tell me either to not speak up about something seemingly
arbitrary. I'm defending the kernel upstream process (in general, not
just inside CXL) internally and it is hard to back that up when I have
to report what is happening ...
This is my advice (unlikely to be followed): do not write that patchset.
It is not needed ... now. What is needed is "basic" type2 support in the
kernel, something as good as possible but not pursuing perfection trying
to think about all the potential use cases. In other words: usability.
There will be bugs (hopefully only in corner cases), there will be
unsupported scenarios, there will likely be misunderstandings of what
was really needed, but it does not matter the clever you are, you can
not solve everything now. Guess why: you need people using it with real
devices and real use cases. If we want CXL to success, what is in our
hands is to support it as better and as fast as possible for those
vendors betting on CXL having a chance. Usability, and usability in the
current scenario where systems with accelerators using CXL will be
mostly alone regarding other CXL devices. Yes, the future will be
hopefully more complex and we all working on CXL will be happy then.
Dan, I know you are not only working on CXL, and for avoiding any
misunderstanding, my rant here is not about your expertise or knowledge.
It is obvious to anyone reading the mailing list your vision and
mastering of CXL and kernel CXL support is superb, but I am not happy
with how you are managing certain aspects of the CXL subsystem/CXL
community.
This is not the first time I share my frustration, and as when I did so
in the past, I want to finish with a positive last sentence: I will keep
trying to get type2 support and hopefully further CXL stuff, and happy
to discuss the best way to do so with the CXL kernel community.
Thank you
>
> This will want some indication that the root decoder space is designated
> such that it does not get reassigned while the driver is detached.
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-03-12 2:28 ` Dan Williams
@ 2026-03-13 18:41 ` Koralahalli Channabasappa, Smita
2026-03-17 2:36 ` Dan Williams
2026-03-16 22:26 ` Koralahalli Channabasappa, Smita
1 sibling, 1 reply; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-13 18:41 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Hi Dan,
Thanks for the detailed feedback. I have put together a pseudo code
sketch incorporating all your points. Could you confirm I have
understood the direction correctly, or if I'm missing anything?
On 3/11/2026 7:28 PM, Dan Williams wrote:
> Smita Koralahalli wrote:
>> The current probe time ownership check for Soft Reserved memory based
>> solely on CXL window intersection is insufficient. dax_hmem probing is not
>> always guaranteed to run after CXL enumeration and region assembly, which
>> can lead to incorrect ownership decisions before the CXL stack has
>> finished publishing windows and assembling committed regions.
>>
>> Introduce deferred ownership handling for Soft Reserved ranges that
>> intersect CXL windows. When such a range is encountered during dax_hmem
>> probe, schedule deferred work and wait for the CXL stack to complete
>> enumeration and region assembly before deciding ownership.
>>
>> Evaluate ownership of Soft Reserved ranges based on CXL region
>> containment.
>>
>> - If all Soft Reserved ranges are fully contained within committed CXL
>> regions, DROP handling Soft Reserved ranges from dax_hmem and allow
>> dax_cxl to bind.
>>
>> - If any Soft Reserved range is not fully claimed by committed CXL
>> region, REGISTER the Soft Reserved ranges with dax_hmem.
>>
>> Use dax_cxl_mode to coordinate ownership decisions for Soft Reserved
>> ranges. Once, ownership resolution is complete, flush the deferred work
>> from dax_cxl before allowing dax_cxl to bind.
>>
>> This enforces a strict ownership. Either CXL fully claims the Soft
>> reserved ranges or it relinquishes it entirely.
>
> We have had multiple suggestions during the course of developing this
> state machine, but I can not see reading this changelog or the
> implementation that the full / final state machine is laid out with all
> the old ideas cleaned out of the implementation.
>
> For example, I think this has my "untested!" suggestion from:
>
> http://lore.kernel.org/697acf78acf70_3095100c@dwillia2-mobl4.notmuch
>
> ...but it does not have the explanation of why it turned out to be
> suitable and fits the end goal state machine.
>
> It also has the original definition of "enum dax_cxl_mode". However,
> with the recent simplification proposal to stop doing total CXL unwind I
> think it allows for a more straightforward state machine. For example,
> the "drop" state is now automatic simply by losing the race with
> dax_hmem, right?
>
> I think we are close, just some final complexity shaving.
>
> So, with the decision to stop tearing down CXL this state machine only
> has 3 requirements.
>
> 1/ CXL enumeration needs to start before dax_hmem invokes
> wait_for_device_probe().
>
> 2/ dax_cxl driver registration needs to be postponed until after
> dax_hmem has dispositioned all its regions.
>
> 3/ No probe path can flush the work because of the wait_for_device_probe().
>
> Requirement 1/ is met by patch1. Requirement 2/ partially met, has a
> proposal here around flushing the work from a separate workqueue
> invocation, but I think you want the dependency directly on the dax_hmem
> module (if enabled). Requirement 3/ not achieved.
>
> For 3/ I think we can borrow what cxl_mem_probe() does and do:
>
> if (work_pending(&dax_hmem_work))
> return -EBUSY;
>
> ...if for some reason someone really wants to rebind the dax_hmem driver
> they need to flush the queue, and that can be achived by a flush_work()
> in the module_exit() path.
>
> This does mean that patch 7 in this series disappears because bus.c has
> no role to play in this mess. It is just dax_hmem and dax_cxl getting
> their ordering straight.
>
> Some notes on the implications follow:
>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/dax/bus.c | 3 ++
>> drivers/dax/bus.h | 19 ++++++++++
>> drivers/dax/cxl.c | 1 +
>> drivers/dax/hmem/hmem.c | 78 +++++++++++++++++++++++++++++++++++++++--
>> 4 files changed, 99 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> index 92b88952ede1..81985bcc70f9 100644
>> --- a/drivers/dax/bus.c
>> +++ b/drivers/dax/bus.c
>> @@ -25,6 +25,9 @@ DECLARE_RWSEM(dax_region_rwsem);
>> */
>> DECLARE_RWSEM(dax_dev_rwsem);
>>
>> +enum dax_cxl_mode dax_cxl_mode = DAX_CXL_MODE_DEFER;
>> +EXPORT_SYMBOL_NS_GPL(dax_cxl_mode, "CXL");
>> +
>> static DEFINE_MUTEX(dax_hmem_lock);
>> static dax_hmem_deferred_fn hmem_deferred_fn;
>> static void *dax_hmem_data;
>> diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
>> index b58a88e8089c..82616ff52fd1 100644
>> --- a/drivers/dax/bus.h
>> +++ b/drivers/dax/bus.h
>> @@ -41,6 +41,25 @@ struct dax_device_driver {
>> void (*remove)(struct dev_dax *dev);
>> };
>>
>> +/*
>> + * enum dax_cxl_mode - State machine to determine ownership for CXL
>> + * tagged Soft Reserved memory ranges.
>> + * @DAX_CXL_MODE_DEFER: Ownership resolution pending. Set while waiting
>> + * for CXL enumeration and region assembly to complete.
>> + * @DAX_CXL_MODE_REGISTER: CXL regions do not fully cover Soft Reserved
>> + * ranges. Fall back to registering those ranges via dax_hmem.
>> + * @DAX_CXL_MODE_DROP: All Soft Reserved ranges intersecting CXL windows
>> + * are fully contained within committed CXL regions. Drop HMEM handling
>> + * and allow dax_cxl to bind.
>
> With the above, dax_cxl_mode disappears.
>
>> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
>> index a2136adfa186..3ab39b77843d 100644
>> --- a/drivers/dax/cxl.c
>> +++ b/drivers/dax/cxl.c
>> @@ -44,6 +44,7 @@ static struct cxl_driver cxl_dax_region_driver = {
>>
>> static void cxl_dax_region_driver_register(struct work_struct *work)
>> {
>> + dax_hmem_flush_work();
>
> Looks ok, as long as that symbol is from dax_hmem.ko which gets you the
> load dependency (requirement 2/).
>
> Might also want to make sure that all this deferral mess disappears when
> CONFIG_DEV_DAX_HMEM=n.
>
>> cxl_driver_register(&cxl_dax_region_driver);
>> }
>>
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index 1e3424358490..85854e25254b 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -3,6 +3,7 @@
>> #include <linux/memregion.h>
>> #include <linux/module.h>
>> #include <linux/dax.h>
>> +#include <cxl/cxl.h>
>> #include "../bus.h"
>>
>> static bool region_idle;
>> @@ -69,8 +70,18 @@ static int hmem_register_device(struct device *host, int target_nid,
>> if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> IORES_DESC_CXL) != REGION_DISJOINT) {
>> - dev_dbg(host, "deferring range to CXL: %pr\n", res);
>> - return 0;
>> + switch (dax_cxl_mode) {
>> + case DAX_CXL_MODE_DEFER:
>> + dev_dbg(host, "deferring range to CXL: %pr\n", res);
>> + dax_hmem_queue_work();
>
> This case is just a flag that determines if the work queue has completed
> its one run. So I expect this something like:
>
> if (!dax_hmem_initial_probe) {
> queue_work()
> return;
> }
>
> Otherwise just go ahead and register because dax_cxl by this time has
> had a chance to have a say and the system falls back to "first come /
> first served" mode. In other words the simplification of not cleaning up
> goes both ways. dax_hmem naturally fails if dax_cxl already claimed the
> address range.
>
>> + return 0;
>> + case DAX_CXL_MODE_REGISTER:
>> + dev_dbg(host, "registering CXL range: %pr\n", res);
>> + break;
>> + case DAX_CXL_MODE_DROP:
>> + dev_dbg(host, "dropping CXL range: %pr\n", res);
>> + return 0;
>> + }
>> }
>>
>> rc = region_intersects_soft_reserve(res->start, resource_size(res));
>> @@ -123,8 +134,70 @@ static int hmem_register_device(struct device *host, int target_nid,
>> return rc;
>> }
>>
>> +static int hmem_register_cxl_device(struct device *host, int target_nid,
>> + const struct resource *res)
>> +{
>> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> + IORES_DESC_CXL) != REGION_DISJOINT)
>> + return hmem_register_device(host, target_nid, res);
>> +
>> + return 0;
>> +}
>> +
>> +static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
>> + const struct resource *res)
>> +{
>> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> + IORES_DESC_CXL) != REGION_DISJOINT) {
>> + if (!cxl_region_contains_soft_reserve((struct resource *)res))
>> + return 1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void process_defer_work(void *data)
>> +{
>> + struct platform_device *pdev = data;
>> + int rc;
>> +
>> + /* relies on cxl_acpi and cxl_pci having had a chance to load */
>> + wait_for_device_probe();
>> +
>> + rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
>> +
>> + if (!rc) {
>> + dax_cxl_mode = DAX_CXL_MODE_DROP;
>> + dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
>> + } else {
>> + dax_cxl_mode = DAX_CXL_MODE_REGISTER;
>> + dev_warn(&pdev->dev,
>> + "Soft Reserved not fully contained in CXL; using HMEM\n");
>> + }
>> +
>> + walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
>
> I do not think we need to do 2 passes. Just do one
> hmem_register_cxl_device() pass that skips a range when
> cxl_region_contains_resource() has it covered, otherwise register an
> hmem device.
>
>> +}
>> +
>> +static void kill_defer_work(void *data)
>> +{
>> + struct platform_device *pdev = data;
>> +
>> + dax_hmem_flush_work();
>> + dax_hmem_unregister_work(process_defer_work, pdev);
>> +}
>> +
>> static int dax_hmem_platform_probe(struct platform_device *pdev)
>> {
>> + int rc;
>
> This wants a work_pending() return -EBUSY per above.
>
>> + rc = dax_hmem_register_work(process_defer_work, pdev);
>> + if (rc)
>> + return rc;
>
> The work does not need to be registered every time. Remember this is
> only a one-shot problem at first kernel boot, not every time this
> platform device is probed. After the workqueue has run at least once it
> never needs to be invoked again if dax_hmem is reloaded.
>
> A flag for "dax_hmem flushed initial device probe at least once" needs
> to live in drivers/dax/hmem/device.c and be cleared by
> process_defer_work().
>
>> +
>> + rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, pdev);
>> + if (rc)
>> + return rc;
>> +
>> return walk_hmem_resources(&pdev->dev, hmem_register_device);
>> }
>
> The hunk that is missing is that dax_hmem_exit() should flush the work,
> and process_defer_work() should give up if the device has been unbound
> before it runs. Hopefully that last suggestion does not make lockdep
> unhappy about running process_defer_work under the hmem_platform
> device_lock(). I *think* it should be ok and solves the TOCTOU race in
> hmem_register_device() around whether we are in the pre or post initial
> probe world.
/* hmem.c */
+static struct platform_device *hmem_pdev;
+static void process_defer_work(struct work_struct *work);
+static DECLARE_WORK(dax_hmem_work, process_defer_work);
+void dax_hmem_flush_work(void)
+{
+ flush_work(&dax_hmem_work);
+}
+EXPORT_SYMBOL_GPL(dax_hmem_flush_work);
static int hmem_register_device(..)
{
if (IS_ENABLED(CONFIG_DEV_DAX_CXL) && .. {
+ if (!dax_hmem_initial_probe_done) {
+ queue_work(system_long_wq, &dax_hmem_work);
+ return 0;
+ }
...
}
+static int hmem_register_cxl_device(..)
+{
+ if (region_intersects(..IORES_CXL..) == REGION_DISJOINT)
+ return 0;
+
+ if (cxl_region_contains_soft_reserve(res))
+ return 0;
+
+ return hmem_register_device(host, target_nid, res);
+}
+static void process_defer_work(struct work_struct *work)
+{
+ wait_for_device_probe();
+ /* Flag lives in device.c */
+ dax_hmem_initial_probe_done = true;
+ walk_hmem_resources(&hmem_pdev->dev, hmem_register_cxl_device);
+}
static int dax_hmem_platform_probe(struct platform_device *pdev)
{
+ if (work_pending(&dax_hmem_work))
+ return -EBUSY;
+ hmem_pdev = pdev;
return walk_hmem_resources(&dev->dev, hmem_register_device);
}
static void __exit dax_hmem_exit(void)
{
+ flush_work(&dax_hmem_work);
..
/* cxl.c */
static void cxl_dax_region_driver_register(struct work_struct *work)
{
+ dax_hmem_flush_work();
cxl_driver_register(&cxl_dax_region_driver);
}
/* bus.h */
+#if IS_ENABLED(CONFIG_DEV_DAX_HMEM)
+void dax_hmem_flush_work(void);
+#else
+static inline void dax_hmem_flush_work(void) { }
+#endif
A few things I want to confirm:
1. Patch 7 (bus.c helpers) drops entirely — no register/unregister API,
no mutex, no typedef. Everything lives in hmem.c.
2. enum dax_cxl_mode drops — replaced by the single bool
dax_hmem_initial_probe_done in device.c.
3. dax_hmem_flush_work() exported from dax_hmem.ko so cxl.c gets the
module dependency for requirement 2.
Thanks
Smita
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-03-12 2:28 ` Dan Williams
2026-03-13 18:41 ` Koralahalli Channabasappa, Smita
@ 2026-03-16 22:26 ` Koralahalli Channabasappa, Smita
2026-03-17 2:42 ` Dan Williams
1 sibling, 1 reply; 61+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2026-03-16 22:26 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Hi Dan,
[snip]
>> +static int hmem_register_cxl_device(struct device *host, int target_nid,
>> + const struct resource *res)
>> +{
>> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> + IORES_DESC_CXL) != REGION_DISJOINT)
>> + return hmem_register_device(host, target_nid, res);
>> +
>> + return 0;
>> +}
>> +
>> +static int soft_reserve_has_cxl_match(struct device *host, int target_nid,
>> + const struct resource *res)
>> +{
>> + if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> + IORES_DESC_CXL) != REGION_DISJOINT) {
>> + if (!cxl_region_contains_soft_reserve((struct resource *)res))
>> + return 1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void process_defer_work(void *data)
>> +{
>> + struct platform_device *pdev = data;
>> + int rc;
>> +
>> + /* relies on cxl_acpi and cxl_pci having had a chance to load */
>> + wait_for_device_probe();
>> +
>> + rc = walk_hmem_resources(&pdev->dev, soft_reserve_has_cxl_match);
>> +
>> + if (!rc) {
>> + dax_cxl_mode = DAX_CXL_MODE_DROP;
>> + dev_dbg(&pdev->dev, "All Soft Reserved ranges claimed by CXL\n");
>> + } else {
>> + dax_cxl_mode = DAX_CXL_MODE_REGISTER;
>> + dev_warn(&pdev->dev,
>> + "Soft Reserved not fully contained in CXL; using HMEM\n");
>> + }
>> +
>> + walk_hmem_resources(&pdev->dev, hmem_register_cxl_device);
>
> I do not think we need to do 2 passes. Just do one
> hmem_register_cxl_device() pass that skips a range when
> cxl_region_contains_resource() has it covered, otherwise register an
> hmem device.
>
Just want to make sure I'm not misreading this — are we dropping the
all or nothing ownership approach? In v6, if any SR range wasn't fully
covered by CXL, all CXL-intersecting ranges fell back to HMEM. With the
single-pass hmem_register_cxl_device() that skips individually covered
ranges, we would be doing per range decisions..
[snip]
Thanks
Smita
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-13 12:54 ` Alejandro Lucero Palau
@ 2026-03-17 2:14 ` Dan Williams
2026-03-18 7:33 ` Alejandro Lucero Palau
0 siblings, 1 reply; 61+ messages in thread
From: Dan Williams @ 2026-03-17 2:14 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, Smita Koralahalli,
linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Alejandro Lucero Palau wrote:
[..]
> This is not the first time I share my frustration, and as when I did so
> in the past, I want to finish with a positive last sentence: I will keep
> trying to get type2 support and hopefully further CXL stuff, and happy
> to discuss the best way to do so with the CXL kernel community.
I did not mean to imply that the type-2 set was stuck behind a new
dependency. Apologies.
It is next in the queue, it needs to go in next cycle with a high
priority.
In my view this confirmation that Smita's proposed patch addresses PJ's
test failure cleared one of the last hurdles for this set for me.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-03-13 18:41 ` Koralahalli Channabasappa, Smita
@ 2026-03-17 2:36 ` Dan Williams
0 siblings, 0 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-17 2:36 UTC (permalink / raw)
To: Koralahalli Channabasappa, Smita, Dan Williams, Smita Koralahalli,
linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Koralahalli Channabasappa, Smita wrote:
[..]
> +static int hmem_register_cxl_device(..)
> +{
> + if (region_intersects(..IORES_CXL..) == REGION_DISJOINT)
> + return 0;
> +
> + if (cxl_region_contains_soft_reserve(res))
> + return 0;
> +
> + return hmem_register_device(host, target_nid, res);
> +}
>
> +static void process_defer_work(struct work_struct *work)
> +{
I think this also needs:
guard(device)(&hmem_pdev->dev);
if (!hmem_pdev->dev.driver)
return;
...because you can remove the driver while the work is pending.
> + wait_for_device_probe();
> + /* Flag lives in device.c */
> + dax_hmem_initial_probe_done = true;
> + walk_hmem_resources(&hmem_pdev->dev, hmem_register_cxl_device);
Even though nothing deletes the hmem_pdev device today, I would still
keep its refcount elevated while work is pending.
> +}
>
> static int dax_hmem_platform_probe(struct platform_device *pdev)
> {
> + if (work_pending(&dax_hmem_work))
> + return -EBUSY;
>
> + hmem_pdev = pdev;
This wants to be initialized when @dax_hmem_work is initialized.
Otherwise this makes it look like @hmem_pdev can change dynamically. It
is a singleton.
[..]
> A few things I want to confirm:
>
> 1. Patch 7 (bus.c helpers) drops entirely — no register/unregister API,
> no mutex, no typedef. Everything lives in hmem.c.
>
> 2. enum dax_cxl_mode drops — replaced by the single bool
> dax_hmem_initial_probe_done in device.c.
>
> 3. dax_hmem_flush_work() exported from dax_hmem.ko so cxl.c gets the
> module dependency for requirement 2.
Looks good to me.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2026-03-16 22:26 ` Koralahalli Channabasappa, Smita
@ 2026-03-17 2:42 ` Dan Williams
0 siblings, 0 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-17 2:42 UTC (permalink / raw)
To: Koralahalli Channabasappa, Smita, Dan Williams, Smita Koralahalli,
linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Koralahalli Channabasappa, Smita wrote:
[..]
> Just want to make sure I'm not misreading this — are we dropping the
> all or nothing ownership approach? In v6, if any SR range wasn't fully
> covered by CXL, all CXL-intersecting ranges fell back to HMEM. With the
> single-pass hmem_register_cxl_device() that skips individually covered
> ranges, we would be doing per range decisions..
Right, I was thinking that the simplification of not destroying regions
also includes the simplification of not having the kernel enforce a
policy of unassembled regions.
Just give a chance for CXL to grab the regions, and if that fails let
the system come up and not try to take any other pre-emptive action. We
can always go more complicated later with the evidence that the simple
approach ended up being *too* simple.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-17 2:14 ` Dan Williams
@ 2026-03-18 7:33 ` Alejandro Lucero Palau
2026-03-18 21:49 ` Dave Jiang
0 siblings, 1 reply; 61+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-18 7:33 UTC (permalink / raw)
To: Dan Williams, Smita Koralahalli, linux-cxl, linux-kernel, nvdimm,
linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 3/17/26 02:14, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
> [..]
>> This is not the first time I share my frustration, and as when I did so
>> in the past, I want to finish with a positive last sentence: I will keep
>> trying to get type2 support and hopefully further CXL stuff, and happy
>> to discuss the best way to do so with the CXL kernel community.
> I did not mean to imply that the type-2 set was stuck behind a new
> dependency. Apologies.
No worries.
>
> It is next in the queue, it needs to go in next cycle with a high
> priority.
>
> In my view this confirmation that Smita's proposed patch addresses PJ's
> test failure cleared one of the last hurdles for this set for me.
Did PJ tell you so? From his report that seemed a likely reason, but he
did not comment further after I told him about it.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-11 21:37 ` Dan Williams
2026-03-12 19:53 ` Dan Williams
@ 2026-03-18 21:27 ` Alison Schofield
2026-03-24 14:06 ` Alejandro Lucero Palau
2026-03-24 19:46 ` Dan Williams
1 sibling, 2 replies; 61+ messages in thread
From: Alison Schofield @ 2026-03-18 21:27 UTC (permalink / raw)
To: Dan Williams
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Wed, Mar 11, 2026 at 02:37:46PM -0700, Dan Williams wrote:
> Smita Koralahalli wrote:
> > __cxl_decoder_detach() currently resets decoder programming whenever a
> > region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
> > autodiscovered regions, this can incorrectly tear down decoder state
> > that may be relied upon by other consumers or by subsequent ownership
> > decisions.
> >
> > Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
> > set.
> >
> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> > ---
> > drivers/cxl/core/region.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index ae899f68551f..45ee598daf95 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> > cxled->part = -1;
> >
> > if (p->state > CXL_CONFIG_ACTIVE) {
> > - cxl_region_decode_reset(cxlr, p->interleave_ways);
> > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> > + cxl_region_decode_reset(cxlr, p->interleave_ways);
> > +
> > p->state = CXL_CONFIG_ACTIVE;
>
Hi Dan,
> tl;dr: I do not think we need this, I do think we need to clarify to
> users what enable/disable and/or hot remove violence is handled and not
> handled by the CXL core.
I'm chiming in here because although this patch is no longer needed for
this series, it has become a dependency for the Type 2 series. So this
follow-up focuses on the hot-remove, endpoint-detach case where
preserving decoders across detach is still needed for later recovery.
Some inline responses to you, and then a diff is appended for a
direction check.
> So this looks deceptively simple, but I think it is incomplete or at
> least adds to the current confusion. A couple points to consider:
>
> 1/ There is no corresponding clear_bit(CXL_REGION_F_AUTO, ...) anywhere in
> the driver. Yes, admin can still force cxl_region_decode_reset() via
> commit_store() path, but admin can not force
> cxl_region_teardown_targets() in the __cxl_decoder_detach() path. I do
> not like that this causes us to end up with 2 separate considerations
> for when __cxl_decoder_detach() skips cleanup actions
> (cxl_region_teardown_targets() and cxl_region_decode_reset()). See
> below, I think the cxl_region_teardown_targets() check is probably
> bogus.
Rather than repurposing CXL_REGION_F_AUTO, this splits decode-reset policy
from AUTO. A new region-scoped CXL_REGION_F_PRESERVE_DECODE flag is introduced
and cleared on explicit decommit in commit_store(). AUTO remains origin/assembly
state.
This does still leave two cleanup decisions:
1) decode reset (now keyed off PRESERVE_DECODE)
2) target teardown (still using existing AUTO behavior)
No change to cxl_region_teardown_targets() in this step.
>
> At a minimum I think commit_store() should clear CXL_REGION_F_AUTO on
> decommit such that cleaning up decoders and targets later proceeds as
> expected.
This point is addressed by clearing CXL_REGION_F_PRESERVE_DECODE instead.
Explicit decommit is treated as destructive and disables decode preservation
before unbind/reset.
>
> 2/ The hard part about CXL region cleanup is that it needs to be prepared
> for:
>
> a/ user manually removes the region via sysfs
>
> b/ user manually disables cxl_port, cxl_mem, or cxl_acpi causing the
> endpoint port to be removed
>
> c/ user physically removes the memdev causing the endpoint port to be
> removed (CXL core can not tell the difference with 2b/ it just sees
> cxl_mem_driver::remove() operation invocation)
>
> d/ setup action fails and region setup is unwound
Agreed. This change targets 2b, 2c.
>
> The cxl_region_decode_reset() is in __cxl_decoder_detach() because of
> 2b/ and 2c/. No other chance to cleanup the decode topology once the
> endpoint decoders are on their way out of the system.
Agreed. The reset remains. Proposed change only makes it conditional on
explicit region policy rather than AUTO.
>
> In this case though the patch was generated back when we were committed
> to cleaning up failed to assemble regions, a new 2d/ case, right?
> However, in that case the decoder is not leaving the system. The
> questions that arrive from that analysis are:
>
> * Is this patch still needed now that there is no auto-cleanup?
Not for this Soft Reserved series, but yes for Type2 hotplug.
>
> * If this patch is still needed is it better to skip
> cxl_region_decode_reset() based on the 'enum cxl_detach_mode' rather
> than the CXL_REGION_F_AUTO flag? I.e. skip reset in the 2d/ case, or
> some other new general flag that says "please preserve hardware
> configuration".
I looked at using and expanding the cxl_detach_mode enum and rejected as
not the right scope. The current detach mode is attached to an individual
detach operation, whereas preserve vs reset decision applies to the region
decode topology as a whole. If we expand detach mode for this region
wide policy, then may risk inconsistent handling across endpoint of the
same region. Just seemed wrong place. I could be missing another reason
why you looked at it.
>
> * Should cxl_region_teardown_targets() also be caring about the
> cxl_detach_mode rather than the auto flag? I actually think the
> CXL_REGION_F_AUTO check in cxl_region_teardown_targets() is misplaced
> and it was confusing "teardown targets" with "decode reset".
Agreed that this is a separate question and didn't touch that here.
>
> All of this wants some documentation to tell users that the rule is
> "Hey, after any endpoint decoder has been seen by the CXL core, if you
> remove that endpoint decoder by removing or disabling any of cxl_acpi,
> cxl_mem, or cxl_port the CXL core *will* violently destroy the decode
> configuration". Then think about whether this needs a way to specify
> "skip decoder teardown" to disambiguate "the decoder is disappearing
> logically, but not physically, keep its configuration". That allows
> turning any manual configuration into an auto-configuration and has an
> explicit rule for all regions rather than the current "auto regions are
> special" policy.
>
> It is helpful that violence has been the default so far. So it allows to
> introduce a decoder shutdown policy toggle where CXL_REGION_F_AUTO flags
> decoders as "preserve" by default. Region decommit clears that flag,
> and/or userspace can toggle that per endpoint decoder flag to determine
> what happens when decoders leave the system. That probably also wants
> some lockdown interaction such that root can not force unplug memory by
> unbinding a driver.
As a step in the direction you suggest, AND aiming to address Type2
need, here is what I'd like a direction check on:
Start separating decode-reset policy rom CXL_REGION_F_AUTO:
- keep CXL_REGION_F_AUTO as origin / assembly semantics
- introduce CXL_REGION_F_PRESERVE_DECODE as a region-scoped policy
- initialize that policy from auto-assembly
- clear it on explicit decommit in commit_store()
- use it to gate cxl_region_decode_reset() in __cxl_decoder_detach()
The decode-reset decision is factored through a small helper,
cxl_region_preserve_decode(), so the policy can be extended independent
of the detach mechanics. Maybe overkill in this simple case, but I
wanted to acknowledge the 'policy' direction.
Compiled but not yet tested, pending a direction check:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 42874948b589..f99e4aca72f0 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -432,6 +432,12 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
if (rc)
return rc;
+ /*
+ * Explicit decommit is destructive. Clear preserve bit before
+ * unbinding so detach paths do not skip decoder reset.
+ */
+ clear_bit(CXL_REGION_F_PRESERVE_DECODE, &cxlr->flags);
+
/*
* Unmap the region and depend the reset-pending state to ensure
* it does not go active again until post reset
@@ -2153,6 +2159,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return 0;
}
+/* Region-scoped policy for preserving decoder programming across detach */
+static bool cxl_region_preserve_decode(struct cxl_region *cxlr)
+{
+ return test_bit(CXL_REGION_F_PRESERVE_DECODE, &cxlr->flags);
+}
+
static struct cxl_region *
__cxl_decoder_detach(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled, int pos,
@@ -2185,7 +2197,8 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
cxled->part = -1;
if (p->state > CXL_CONFIG_ACTIVE) {
- cxl_region_decode_reset(cxlr, p->interleave_ways);
+ if (!cxl_region_preserve_decode(cxlr))
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
p->state = CXL_CONFIG_ACTIVE;
}
@@ -3833,6 +3846,7 @@ static int __construct_region(struct cxl_region *cxlr,
}
set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
+ set_bit(CXL_REGION_F_PRESERVE_DECODE, &cxlr->flags);
cxlr->hpa_range = *hpa_range;
res = kmalloc_obj(*res);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9b947286eb9b..e6fbbee37252 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -532,6 +532,16 @@ enum cxl_partition_mode {
*/
#define CXL_REGION_F_NORMALIZED_ADDRESSING 3
+/*
+ * Indicate that decoder programming should be preserved when endpoint
+ * decoders detach from this region. This allows region decode state to
+ * survive endpoint removal and be recovered by subsequent enumeration.
+ * Automatic assembly may set this flag, and future userspace control
+ * may allow it to be set explicitly. Explicit region decommit should
+ * clear this flag before destructive cleanup.
+ */
+#define CXL_REGION_F_PRESERVE_DECODE 4
+
/**
* struct cxl_region - CXL region
* @dev: This region's device
^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-18 7:33 ` Alejandro Lucero Palau
@ 2026-03-18 21:49 ` Dave Jiang
0 siblings, 0 replies; 61+ messages in thread
From: Dave Jiang @ 2026-03-18 21:49 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, Smita Koralahalli,
linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Davidlohr Bueso, Matthew Wilcox,
Jan Kara, Rafael J . Wysocki, Len Brown, Pavel Machek, Li Ming,
Jeff Johnson, Ying Huang, Yao Xingtao, Peter Zijlstra,
Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman, Robert Richter,
Benjamin Cheatham, Zhijian Li, Borislav Petkov, Tomasz Wolski
On 3/18/26 12:33 AM, Alejandro Lucero Palau wrote:
>
> On 3/17/26 02:14, Dan Williams wrote:
>> Alejandro Lucero Palau wrote:
>> [..]
>>> This is not the first time I share my frustration, and as when I did so
>>> in the past, I want to finish with a positive last sentence: I will keep
>>> trying to get type2 support and hopefully further CXL stuff, and happy
>>> to discuss the best way to do so with the CXL kernel community.
>> I did not mean to imply that the type-2 set was stuck behind a new
>> dependency. Apologies.
>
>
> No worries.
>
>
>>
>> It is next in the queue, it needs to go in next cycle with a high
>> priority.
>>
>> In my view this confirmation that Smita's proposed patch addresses PJ's
>> test failure cleared one of the last hurdles for this set for me.
>
>
> Did PJ tell you so? From his report that seemed a likely reason, but he did not comment further after I told him about it.
>
PJ told me when I was talking to him WRT testing on the latest type2 series.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-18 21:27 ` Alison Schofield
@ 2026-03-24 14:06 ` Alejandro Lucero Palau
2026-03-24 19:46 ` Dan Williams
1 sibling, 0 replies; 61+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-24 14:06 UTC (permalink / raw)
To: Alison Schofield, Dan Williams
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 3/18/26 21:27, Alison Schofield wrote:
<snip>
> As a step in the direction you suggest, AND aiming to address Type2
> need, here is what I'd like a direction check on:
>
> Start separating decode-reset policy rom CXL_REGION_F_AUTO:
> - keep CXL_REGION_F_AUTO as origin / assembly semantics
> - introduce CXL_REGION_F_PRESERVE_DECODE as a region-scoped policy
> - initialize that policy from auto-assembly
> - clear it on explicit decommit in commit_store()
> - use it to gate cxl_region_decode_reset() in __cxl_decoder_detach()
>
> The decode-reset decision is factored through a small helper,
> cxl_region_preserve_decode(), so the policy can be extended independent
> of the detach mechanics. Maybe overkill in this simple case, but I
> wanted to acknowledge the 'policy' direction.
I like this approach which separates AUTO flag from this need.
>
> Compiled but not yet tested, pending a direction check:
I have tested it using the Type2 v24 and adding some debug lines for
seeing the proper flag check works when decoder detach.
Maybe there are some other aspects of this approach I can not envision,
but I'm happy with this change for current Type2 needs. Hopefully this
plus v24 can go through before the next kernel window closes.
Thank you,
Alejandro
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 42874948b589..f99e4aca72f0 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -432,6 +432,12 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
> if (rc)
> return rc;
>
> + /*
> + * Explicit decommit is destructive. Clear preserve bit before
> + * unbinding so detach paths do not skip decoder reset.
> + */
> + clear_bit(CXL_REGION_F_PRESERVE_DECODE, &cxlr->flags);
> +
> /*
> * Unmap the region and depend the reset-pending state to ensure
> * it does not go active again until post reset
> @@ -2153,6 +2159,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return 0;
> }
>
> +/* Region-scoped policy for preserving decoder programming across detach */
> +static bool cxl_region_preserve_decode(struct cxl_region *cxlr)
> +{
> + return test_bit(CXL_REGION_F_PRESERVE_DECODE, &cxlr->flags);
> +}
> +
> static struct cxl_region *
> __cxl_decoder_detach(struct cxl_region *cxlr,
> struct cxl_endpoint_decoder *cxled, int pos,
> @@ -2185,7 +2197,8 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> cxled->part = -1;
>
> if (p->state > CXL_CONFIG_ACTIVE) {
> - cxl_region_decode_reset(cxlr, p->interleave_ways);
> + if (!cxl_region_preserve_decode(cxlr))
> + cxl_region_decode_reset(cxlr, p->interleave_ways);
> p->state = CXL_CONFIG_ACTIVE;
> }
>
> @@ -3833,6 +3846,7 @@ static int __construct_region(struct cxl_region *cxlr,
> }
>
> set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
> + set_bit(CXL_REGION_F_PRESERVE_DECODE, &cxlr->flags);
> cxlr->hpa_range = *hpa_range;
>
> res = kmalloc_obj(*res);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 9b947286eb9b..e6fbbee37252 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -532,6 +532,16 @@ enum cxl_partition_mode {
> */
> #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
>
> +/*
> + * Indicate that decoder programming should be preserved when endpoint
> + * decoders detach from this region. This allows region decode state to
> + * survive endpoint removal and be recovered by subsequent enumeration.
> + * Automatic assembly may set this flag, and future userspace control
> + * may allow it to be set explicitly. Explicit region decommit should
> + * clear this flag before destructive cleanup.
> + */
> +#define CXL_REGION_F_PRESERVE_DECODE 4
> +
> /**
> * struct cxl_region - CXL region
> * @dev: This region's device
>
>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-18 21:27 ` Alison Schofield
2026-03-24 14:06 ` Alejandro Lucero Palau
@ 2026-03-24 19:46 ` Dan Williams
2026-03-24 22:23 ` Alejandro Lucero Palau
2026-03-25 1:51 ` Alison Schofield
1 sibling, 2 replies; 61+ messages in thread
From: Dan Williams @ 2026-03-24 19:46 UTC (permalink / raw)
To: Alison Schofield, Dan Williams
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
Alison Schofield wrote:
> On Wed, Mar 11, 2026 at 02:37:46PM -0700, Dan Williams wrote:
> > Smita Koralahalli wrote:
> > > __cxl_decoder_detach() currently resets decoder programming whenever a
> > > region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
> > > autodiscovered regions, this can incorrectly tear down decoder state
> > > that may be relied upon by other consumers or by subsequent ownership
> > > decisions.
> > >
> > > Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
> > > set.
> > >
> > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> > > ---
> > > drivers/cxl/core/region.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index ae899f68551f..45ee598daf95 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> > > cxled->part = -1;
> > >
> > > if (p->state > CXL_CONFIG_ACTIVE) {
> > > - cxl_region_decode_reset(cxlr, p->interleave_ways);
> > > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> > > + cxl_region_decode_reset(cxlr, p->interleave_ways);
> > > +
> > > p->state = CXL_CONFIG_ACTIVE;
> >
>
> Hi Dan,
>
> > tl;dr: I do not think we need this, I do think we need to clarify to
> > users what enable/disable and/or hot remove violence is handled and not
> > handled by the CXL core.
>
> I'm chiming in here because although this patch is no longer needed for
> this series, it has become a dependency for the Type 2 series.
Like I replied to Alejandro it is not a dependency for the type-2 series
[1]. It *is* a fix for the issue reported by PJ, but it can go in
independent of the base type-2 work as a standalone capability.
[1]: http://lore.kernel.org/69b8b9181bafd_452b100cb@dwillia2-mobl4.notmuch
> So this
> follow-up focuses on the hot-remove, endpoint-detach case where
> preserving decoders across detach is still needed for later recovery.
>
> Some inline responses to you, and then a diff is appended for a
> direction check.
>
> > So this looks deceptively simple, but I think it is incomplete or at
> > least adds to the current confusion. A couple points to consider:
> >
> > 1/ There is no corresponding clear_bit(CXL_REGION_F_AUTO, ...) anywhere in
> > the driver. Yes, admin can still force cxl_region_decode_reset() via
> > commit_store() path, but admin can not force
> > cxl_region_teardown_targets() in the __cxl_decoder_detach() path. I do
> > not like that this causes us to end up with 2 separate considerations
> > for when __cxl_decoder_detach() skips cleanup actions
> > (cxl_region_teardown_targets() and cxl_region_decode_reset()). See
> > below, I think the cxl_region_teardown_targets() check is probably
> > bogus.
>
> Rather than repurposing CXL_REGION_F_AUTO, this splits decode-reset policy
> from AUTO. A new region-scoped CXL_REGION_F_PRESERVE_DECODE flag is introduced
> and cleared on explicit decommit in commit_store(). AUTO remains origin/assembly
> state.
Just like the decoder LOCK bit the preservation setting is a decoder
property, not a region property. Region auto-assembly is then just an
automatic way to set that decoder policy.
So, no, I would not expect a new region flag for this policy.
> This does still leave two cleanup decisions:
> 1) decode reset (now keyed off PRESERVE_DECODE)
> 2) target teardown (still using existing AUTO behavior)
>
> No change to cxl_region_teardown_targets() in this step.
Turns out that cxl_region_teardown_targets() never needed to consider
the CXL_F_REGION_AUTO flag.
> > At a minimum I think commit_store() should clear CXL_REGION_F_AUTO on
> > decommit such that cleaning up decoders and targets later proceeds as
> > expected.
>
> This point is addressed by clearing CXL_REGION_F_PRESERVE_DECODE instead.
> Explicit decommit is treated as destructive and disables decode preservation
> before unbind/reset.
>
> >
> > 2/ The hard part about CXL region cleanup is that it needs to be prepared
> > for:
> >
> > a/ user manually removes the region via sysfs
> >
> > b/ user manually disables cxl_port, cxl_mem, or cxl_acpi causing the
> > endpoint port to be removed
> >
> > c/ user physically removes the memdev causing the endpoint port to be
> > removed (CXL core can not tell the difference with 2b/ it just sees
> > cxl_mem_driver::remove() operation invocation)
> >
> > d/ setup action fails and region setup is unwound
>
> Agreed. This change targets 2b, 2c.
>
> >
> > The cxl_region_decode_reset() is in __cxl_decoder_detach() because of
> > 2b/ and 2c/. No other chance to cleanup the decode topology once the
> > endpoint decoders are on their way out of the system.
>
> Agreed. The reset remains. Proposed change only makes it conditional on
> explicit region policy rather than AUTO.
>
> >
> > In this case though the patch was generated back when we were committed
> > to cleaning up failed to assemble regions, a new 2d/ case, right?
> > However, in that case the decoder is not leaving the system. The
> > questions that arrive from that analysis are:
> >
> > * Is this patch still needed now that there is no auto-cleanup?
>
> Not for this Soft Reserved series, but yes for Type2 hotplug.
Type-2 hotplug is not the issue, it is boot-time configuration
preservation over device reset which is a different challenge.
> > * If this patch is still needed is it better to skip
> > cxl_region_decode_reset() based on the 'enum cxl_detach_mode' rather
> > than the CXL_REGION_F_AUTO flag? I.e. skip reset in the 2d/ case, or
> > some other new general flag that says "please preserve hardware
> > configuration".
>
> I looked at using and expanding the cxl_detach_mode enum and rejected as
> not the right scope. The current detach mode is attached to an individual
> detach operation, whereas preserve vs reset decision applies to the region
> decode topology as a whole. If we expand detach mode for this region
> wide policy, then may risk inconsistent handling across endpoint of the
> same region. Just seemed wrong place. I could be missing another reason
> why you looked at it.
Regions are an emergent property from decoder settings. Decoder settings
come from firmware, user actions, and with the type-2 series driver
actions. Firmware, user and driver actions are per-decoder especially
because the behavior needed here is similar to the decoder LOCK bit.
Region assembly can set a default decoder policy, but the management of
that decoder policy need not go through the region.
Either way, settling this question can be post type-2 base series event,
not a lead-in dependency.
[..]
> > It is helpful that violence has been the default so far. So it allows to
> > introduce a decoder shutdown policy toggle where CXL_REGION_F_AUTO flags
> > decoders as "preserve" by default. Region decommit clears that flag,
> > and/or userspace can toggle that per endpoint decoder flag to determine
> > what happens when decoders leave the system. That probably also wants
> > some lockdown interaction such that root can not force unplug memory by
> > unbinding a driver.
>
> As a step in the direction you suggest, AND aiming to address Type2
> need, here is what I'd like a direction check on:
>
> Start separating decode-reset policy rom CXL_REGION_F_AUTO:
> - keep CXL_REGION_F_AUTO as origin / assembly semantics
> - introduce CXL_REGION_F_PRESERVE_DECODE as a region-scoped policy
Not yet convinced about this.
> - initialize that policy from auto-assembly
> - clear it on explicit decommit in commit_store()
My expectation is still clear it on decoder configuration change, add an
attribute to toggle it independent of changing the decoder
configuration.
> - use it to gate cxl_region_decode_reset() in __cxl_decoder_detach()
cxl_region_decode_reset() just automates asking each decoder to carry
out reset if the decoder policy allows.
> The decode-reset decision is factored through a small helper,
> cxl_region_preserve_decode(), so the policy can be extended independent
> of the detach mechanics. Maybe overkill in this simple case, but I
> wanted to acknowledge the 'policy' direction.
Appreciate you pulling this together. I want to land type-2 with the
existing expectation that unload is always destructive then circle back
to address this additional detail because it is more than just decoder
policy that needs to be managed. The type-2 driver may need help finding
its platform firmware configured address range if a device reset
destroyed the decoder settings.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-24 19:46 ` Dan Williams
@ 2026-03-24 22:23 ` Alejandro Lucero Palau
2026-03-25 1:51 ` Alison Schofield
1 sibling, 0 replies; 61+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-24 22:23 UTC (permalink / raw)
To: Dan Williams, Alison Schofield
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On 3/24/26 19:46, Dan Williams wrote:
> Alison Schofield wrote:
<snip>
> Like I replied to Alejandro it is not a dependency for the type-2 series
> [1]. It *is* a fix for the issue reported by PJ, but it can go in
> independent of the base type-2 work as a standalone capability.
>
> [1]: http://lore.kernel.org/69b8b9181bafd_452b100cb@dwillia2-mobl4.notmuch
I'm afraid I do not understand what you mean here.
<snip>
> Just like the decoder LOCK bit the preservation setting is a decoder
> property, not a region property. Region auto-assembly is then just an
> automatic way to set that decoder policy.
>
> So, no, I would not expect a new region flag for this policy.
Could it be acceptable an accelerator having the option of locking its
HDM if not already done by the BIOS?
<snip>
> Appreciate you pulling this together. I want to land type-2 with the
> existing expectation that unload is always destructive then circle back
As I said, v22 had that destructive behavior, but v23 kept the HDM
committed, as that was what you asked for.
Last v24 has only the support for dealing with committed decoders, what
is the expectation from current BIOS (Intel and AMD) if a Type2 device
is found at boot time. I got now some BIOS versions which lock the HDM
decoder for a Type2 device making impossible any destructive action. The
reason for only supporting this case is to have a chance to be in time
for 7.1 with the basic (but good enough) support as there are issues
with the changes to create a region which will need all to agree on how
to solve them, and unlikely before 7.1 window closes.
> to address this additional detail because it is more than just decoder
> policy that needs to be managed. The type-2 driver may need help finding
> its platform firmware configured address range if a device reset
> destroyed the decoder settings.
>
And again, this problem should be addressed, IMO, as a follow-up.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions
2026-03-24 19:46 ` Dan Williams
2026-03-24 22:23 ` Alejandro Lucero Palau
@ 2026-03-25 1:51 ` Alison Schofield
1 sibling, 0 replies; 61+ messages in thread
From: Alison Schofield @ 2026-03-25 1:51 UTC (permalink / raw)
To: Dan Williams
Cc: Smita Koralahalli, linux-cxl, linux-kernel, nvdimm, linux-fsdevel,
linux-pm, Ard Biesheuvel, Vishal Verma, Ira Weiny,
Jonathan Cameron, Yazen Ghannam, Dave Jiang, Davidlohr Bueso,
Matthew Wilcox, Jan Kara, Rafael J . Wysocki, Len Brown,
Pavel Machek, Li Ming, Jeff Johnson, Ying Huang, Yao Xingtao,
Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot, Terry Bowman,
Robert Richter, Benjamin Cheatham, Zhijian Li, Borislav Petkov,
Tomasz Wolski
On Tue, Mar 24, 2026 at 12:46:38PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
I didn't snip, but lucky for someone, I only appended. Goto EOF.
> > On Wed, Mar 11, 2026 at 02:37:46PM -0700, Dan Williams wrote:
> > > Smita Koralahalli wrote:
> > > > __cxl_decoder_detach() currently resets decoder programming whenever a
> > > > region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
> > > > autodiscovered regions, this can incorrectly tear down decoder state
> > > > that may be relied upon by other consumers or by subsequent ownership
> > > > decisions.
> > > >
> > > > Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
> > > > set.
> > > >
> > > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > > > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > > Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> > > > ---
> > > > drivers/cxl/core/region.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > > index ae899f68551f..45ee598daf95 100644
> > > > --- a/drivers/cxl/core/region.c
> > > > +++ b/drivers/cxl/core/region.c
> > > > @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> > > > cxled->part = -1;
> > > >
> > > > if (p->state > CXL_CONFIG_ACTIVE) {
> > > > - cxl_region_decode_reset(cxlr, p->interleave_ways);
> > > > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> > > > + cxl_region_decode_reset(cxlr, p->interleave_ways);
> > > > +
> > > > p->state = CXL_CONFIG_ACTIVE;
> > >
> >
> > Hi Dan,
> >
> > > tl;dr: I do not think we need this, I do think we need to clarify to
> > > users what enable/disable and/or hot remove violence is handled and not
> > > handled by the CXL core.
> >
> > I'm chiming in here because although this patch is no longer needed for
> > this series, it has become a dependency for the Type 2 series.
>
> Like I replied to Alejandro it is not a dependency for the type-2 series
> [1]. It *is* a fix for the issue reported by PJ, but it can go in
> independent of the base type-2 work as a standalone capability.
>
> [1]: http://lore.kernel.org/69b8b9181bafd_452b100cb@dwillia2-mobl4.notmuch
>
> > So this
> > follow-up focuses on the hot-remove, endpoint-detach case where
> > preserving decoders across detach is still needed for later recovery.
> >
> > Some inline responses to you, and then a diff is appended for a
> > direction check.
> >
> > > So this looks deceptively simple, but I think it is incomplete or at
> > > least adds to the current confusion. A couple points to consider:
> > >
> > > 1/ There is no corresponding clear_bit(CXL_REGION_F_AUTO, ...) anywhere in
> > > the driver. Yes, admin can still force cxl_region_decode_reset() via
> > > commit_store() path, but admin can not force
> > > cxl_region_teardown_targets() in the __cxl_decoder_detach() path. I do
> > > not like that this causes us to end up with 2 separate considerations
> > > for when __cxl_decoder_detach() skips cleanup actions
> > > (cxl_region_teardown_targets() and cxl_region_decode_reset()). See
> > > below, I think the cxl_region_teardown_targets() check is probably
> > > bogus.
> >
> > Rather than repurposing CXL_REGION_F_AUTO, this splits decode-reset policy
> > from AUTO. A new region-scoped CXL_REGION_F_PRESERVE_DECODE flag is introduced
> > and cleared on explicit decommit in commit_store(). AUTO remains origin/assembly
> > state.
>
> Just like the decoder LOCK bit the preservation setting is a decoder
> property, not a region property. Region auto-assembly is then just an
> automatic way to set that decoder policy.
>
> So, no, I would not expect a new region flag for this policy.
>
> > This does still leave two cleanup decisions:
> > 1) decode reset (now keyed off PRESERVE_DECODE)
> > 2) target teardown (still using existing AUTO behavior)
> >
> > No change to cxl_region_teardown_targets() in this step.
>
> Turns out that cxl_region_teardown_targets() never needed to consider
> the CXL_F_REGION_AUTO flag.
>
> > > At a minimum I think commit_store() should clear CXL_REGION_F_AUTO on
> > > decommit such that cleaning up decoders and targets later proceeds as
> > > expected.
> >
> > This point is addressed by clearing CXL_REGION_F_PRESERVE_DECODE instead.
> > Explicit decommit is treated as destructive and disables decode preservation
> > before unbind/reset.
> >
> > >
> > > 2/ The hard part about CXL region cleanup is that it needs to be prepared
> > > for:
> > >
> > > a/ user manually removes the region via sysfs
> > >
> > > b/ user manually disables cxl_port, cxl_mem, or cxl_acpi causing the
> > > endpoint port to be removed
> > >
> > > c/ user physically removes the memdev causing the endpoint port to be
> > > removed (CXL core can not tell the difference with 2b/ it just sees
> > > cxl_mem_driver::remove() operation invocation)
> > >
> > > d/ setup action fails and region setup is unwound
> >
> > Agreed. This change targets 2b, 2c.
> >
> > >
> > > The cxl_region_decode_reset() is in __cxl_decoder_detach() because of
> > > 2b/ and 2c/. No other chance to cleanup the decode topology once the
> > > endpoint decoders are on their way out of the system.
> >
> > Agreed. The reset remains. Proposed change only makes it conditional on
> > explicit region policy rather than AUTO.
> >
> > >
> > > In this case though the patch was generated back when we were committed
> > > to cleaning up failed to assemble regions, a new 2d/ case, right?
> > > However, in that case the decoder is not leaving the system. The
> > > questions that arrive from that analysis are:
> > >
> > > * Is this patch still needed now that there is no auto-cleanup?
> >
> > Not for this Soft Reserved series, but yes for Type2 hotplug.
>
> Type-2 hotplug is not the issue, it is boot-time configuration
> preservation over device reset which is a different challenge.
>
> > > * If this patch is still needed is it better to skip
> > > cxl_region_decode_reset() based on the 'enum cxl_detach_mode' rather
> > > than the CXL_REGION_F_AUTO flag? I.e. skip reset in the 2d/ case, or
> > > some other new general flag that says "please preserve hardware
> > > configuration".
> >
> > I looked at using and expanding the cxl_detach_mode enum and rejected as
> > not the right scope. The current detach mode is attached to an individual
> > detach operation, whereas preserve vs reset decision applies to the region
> > decode topology as a whole. If we expand detach mode for this region
> > wide policy, then may risk inconsistent handling across endpoint of the
> > same region. Just seemed wrong place. I could be missing another reason
> > why you looked at it.
>
> Regions are an emergent property from decoder settings. Decoder settings
> come from firmware, user actions, and with the type-2 series driver
> actions. Firmware, user and driver actions are per-decoder especially
> because the behavior needed here is similar to the decoder LOCK bit.
>
> Region assembly can set a default decoder policy, but the management of
> that decoder policy need not go through the region.
>
> Either way, settling this question can be post type-2 base series event,
> not a lead-in dependency.
>
> [..]
> > > It is helpful that violence has been the default so far. So it allows to
> > > introduce a decoder shutdown policy toggle where CXL_REGION_F_AUTO flags
> > > decoders as "preserve" by default. Region decommit clears that flag,
> > > and/or userspace can toggle that per endpoint decoder flag to determine
> > > what happens when decoders leave the system. That probably also wants
> > > some lockdown interaction such that root can not force unplug memory by
> > > unbinding a driver.
> >
> > As a step in the direction you suggest, AND aiming to address Type2
> > need, here is what I'd like a direction check on:
> >
> > Start separating decode-reset policy rom CXL_REGION_F_AUTO:
> > - keep CXL_REGION_F_AUTO as origin / assembly semantics
> > - introduce CXL_REGION_F_PRESERVE_DECODE as a region-scoped policy
>
> Not yet convinced about this.
>
> > - initialize that policy from auto-assembly
> > - clear it on explicit decommit in commit_store()
>
> My expectation is still clear it on decoder configuration change, add an
> attribute to toggle it independent of changing the decoder
> configuration.
>
> > - use it to gate cxl_region_decode_reset() in __cxl_decoder_detach()
>
> cxl_region_decode_reset() just automates asking each decoder to carry
> out reset if the decoder policy allows.
>
> > The decode-reset decision is factored through a small helper,
> > cxl_region_preserve_decode(), so the policy can be extended independent
> > of the detach mechanics. Maybe overkill in this simple case, but I
> > wanted to acknowledge the 'policy' direction.
>
> Appreciate you pulling this together. I want to land type-2 with the
> existing expectation that unload is always destructive then circle back
> to address this additional detail because it is more than just decoder
> policy that needs to be managed. The type-2 driver may need help finding
> its platform firmware configured address range if a device reset
> destroyed the decoder settings.
I did go first for decoder policy, but switched to region thinking
that this has to be managed at region level because region decode topology
is programmed as a unit so mixed preserve/reset policy across decoders
partitipating in a single active region seems scary.
I can switch to a decoder policy where region auto assembly still inits
a uniform preserve policy across participating programmed decoders and
teardown paths would clear the policy.
The other angle was future userspace access. It seems user intent would
likely be at region scope, with the kernel applying the policy across
the decoders that make up the region. So, sort of a control surface
that is region oriented, but implementation that lives on decoders.
I'll let the dust settle and see if there is anything I can pick up.
^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2026-03-25 1:52 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 6:44 [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-02-10 6:44 ` [PATCH v6 1/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-02-19 3:22 ` Alison Schofield
2026-02-10 6:44 ` [PATCH v6 2/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
2026-02-19 3:23 ` Alison Schofield
2026-02-10 6:44 ` [PATCH v6 3/9] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
2026-02-19 3:44 ` Alison Schofield
2026-02-20 20:35 ` Koralahalli Channabasappa, Smita
2026-03-11 21:37 ` Dan Williams
2026-03-12 19:53 ` Dan Williams
2026-03-12 21:28 ` Koralahalli Channabasappa, Smita
2026-03-13 12:54 ` Alejandro Lucero Palau
2026-03-17 2:14 ` Dan Williams
2026-03-18 7:33 ` Alejandro Lucero Palau
2026-03-18 21:49 ` Dave Jiang
2026-03-18 21:27 ` Alison Schofield
2026-03-24 14:06 ` Alejandro Lucero Palau
2026-03-24 19:46 ` Dan Williams
2026-03-24 22:23 ` Alejandro Lucero Palau
2026-03-25 1:51 ` Alison Schofield
2026-02-10 6:44 ` [PATCH v6 4/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
2026-02-18 15:54 ` Dave Jiang
2026-03-09 14:31 ` Jonathan Cameron
2026-02-10 6:44 ` [PATCH v6 5/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
2026-02-18 16:04 ` Dave Jiang
2026-03-09 14:37 ` Jonathan Cameron
2026-03-12 21:30 ` Koralahalli Channabasappa, Smita
2026-03-12 0:27 ` Dan Williams
2026-03-12 21:31 ` Koralahalli Channabasappa, Smita
2026-02-10 6:44 ` [PATCH v6 6/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
2026-03-12 0:29 ` Dan Williams
2026-02-10 6:44 ` [PATCH v6 7/9] dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination Smita Koralahalli
2026-02-18 17:52 ` Dave Jiang
2026-02-20 0:02 ` Koralahalli Channabasappa, Smita
2026-02-20 15:55 ` Dave Jiang
2026-03-09 14:49 ` Jonathan Cameron
2026-02-10 6:45 ` [PATCH v6 8/9] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
2026-02-18 18:05 ` Dave Jiang
2026-02-20 19:54 ` Koralahalli Channabasappa, Smita
2026-02-20 10:14 ` Alejandro Lucero Palau
2026-03-12 2:28 ` Dan Williams
2026-03-13 18:41 ` Koralahalli Channabasappa, Smita
2026-03-17 2:36 ` Dan Williams
2026-03-16 22:26 ` Koralahalli Channabasappa, Smita
2026-03-17 2:42 ` Dan Williams
2026-02-10 6:45 ` [PATCH v6 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2026-02-10 19:16 ` [PATCH v6 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Alison Schofield
2026-02-10 19:49 ` Koralahalli Channabasappa, Smita
2026-02-12 6:38 ` Alison Schofield
2026-02-20 21:00 ` Koralahalli Channabasappa, Smita
2026-02-12 14:44 ` Tomasz Wolski
2026-02-12 21:18 ` Alison Schofield
2026-02-13 7:47 ` Yasunori Goto (Fujitsu)
2026-02-13 17:31 ` Alison Schofield
2026-02-16 5:15 ` Yasunori Goto (Fujitsu)
2026-02-12 20:02 ` [sos-linux-dev] " Koralahalli Channabasappa, Smita
2026-02-13 14:04 ` Gregory Price
2026-02-20 20:47 ` Koralahalli Channabasappa, Smita
2026-02-20 9:45 ` Tomasz Wolski
2026-02-20 21:19 ` Koralahalli Channabasappa, Smita
2026-02-22 23:17 ` Tomasz Wolski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox