public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
@ 2026-03-22 19:53 Smita Koralahalli
  2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Smita Koralahalli @ 2026-03-22 19:53 UTC (permalink / raw)
  To: linux-cxl, linux-kernel, nvdimm, linux-fsdevel, linux-pm
  Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Dave Jiang,
	Davidlohr Bueso, Matthew Wilcox, Jan Kara, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Li Ming, Jeff Johnson, Ying Huang,
	Yao Xingtao, Peter Zijlstra, Greg Kroah-Hartman, Nathan Fontenot,
	Terry Bowman, Robert Richter, Benjamin Cheatham, Zhijian Li,
	Borislav Petkov, Smita Koralahalli, Tomasz Wolski

This series aims to address long-standing conflicts between HMEM and
CXL when handling Soft Reserved memory ranges.

Reworked from Dan's patch:
https://lore.kernel.org/all/68808fb4e4cbf_137e6b100cc@dwillia2-xfh.jf.intel.com.notmuch/

Previous work:
https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/

Link to v7:
https://lore.kernel.org/all/20260319011500.241426-1-Smita.KoralahalliChannabasappa@amd.com

The series is based on Linux 7.0-rc4 and base-commit is
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c

[1] After offlining the memory I can tear down the regions and recreate
them back. dax_cxl creates dax devices and onlines memory.
850000000-284fffffff : CXL Window 0
  850000000-284fffffff : region0
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)

[2] With CONFIG_CXL_REGION disabled, all the resources are handled by
HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
and dax devices are created from HMEM.
850000000-284fffffff : CXL Window 0
  850000000-284fffffff : Soft Reserved
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)

[3] Region assembly failure: Soft Reserved range shows up in /proc/iomem
and dax devices are handled by HMEM.
850000000-284fffffff : Soft Reserved
  850000000-284fffffff : CXL Window 0
    850000000-284fffffff : region0
      850000000-284fffffff : dax6.0
        850000000-284fffffff : System RAM (kmem)

[4] REGISTER path:
The results are as expected with both CXL_BUS = y and CXL_BUS = m.
To validate the REGISTER path, I forced REGISTER even in cases where SR
completely overlaps the CXL region as I did not have access to a system
where the CXL region range is smaller than the SR range.

850000000-284fffffff : Soft Reserved
  850000000-284fffffff : CXL Window 0
    850000000-280fffffff : region0
      850000000-284fffffff : dax6.0
        850000000-284fffffff : System RAM (kmem)

kreview complained on the deadlock for taking pdev->dev.mutex before
wait_for_device_probe(). Hence, I moved it.

From kreview:
The guard(device) takes pdev->dev.mutex and holds it across
wait_for_device_probe(). If any probe function in the system tries to
access this device (directly or indirectly), it would need the same
mutex:

process_defer_work()
  guard(device)(&pdev->dev)     <- Takes pdev->dev.mutex
  wait_for_device_probe()       <- Waits for all probes globally
    wait_event(probe_count == 0)

Meanwhile, if another driver's probe:

  some_driver_probe()
    device_lock(&pdev->dev)     <- Blocks waiting for mutex

The probe can't complete while waiting for the mutex, and
wait_for_device_probe() won't return while the probe is pending..

v8 updates:
- New patch to handle kref lifecycle correctly.
- New patch to factor hmem registration.
- Reversed teardown order in dax_region_unregister().
- Replaced INIT_WORK() with __WORK_INITIALIZER.
- Added forward declaration for process_defer_work().
- Added if !work->pdev return in process_defer_work() as a
  defensive check.
- One liner pdev assignment using to_platform_device(get_device()).
- Module reload handling: Reload fix to return 0 if dax_hmem_initial_probe
  is set.
- Enforced CXL to always win irrespective of whether SR covers cxl
  regions. If userspace wants HMEM to own, unload cxl_acpi.
- hmem_register_cxl_device() calls __hmem_register_device() instead of
  hmem_register_Device() to properly register resources through HMEM
  during deferred walk bypassing cxl check at boot.
- Gated flush_work() and put_device() under if dax_hmem_work.pdev in
  dax_hmem_exit().
- kmalloc -> kmalloc_obj.
- Added if (!dax_hmem_initial_probe) guard in process_defer_work() to
  skip the walk entirely. Without !dax_hmem_initial_probe guard I could
  see below on region assembly failure testings at boot..

  hmem_register_device: hmem_platform hmem_platform.0: await CXL initial probe: ..
  hmem_register_cxl_device: hmem_platform hmem_platform.0: CXL did not claim resource ..
  alloc_dev_dax_range:  dax6.0: alloc range[0]: ..
  hmem_register_cxl_device: hmem_platform hmem_platform.0: CXL did not claim resource ..
  alloc_dax_region: hmem hmem.9: dax_region resource conflict for ..
  hmem hmem.9: probe with driver hmem failed with error -12 .. 

v7 updates:
- Added Reviewed-by tags.
- co-developed-by -> Suggested-by for Patch 4.
- Dropped "cxl/region: Skip decoder reset for auto-discovered regions"
  patch.
- cxl_region_contains_soft_reserve() -> cxl_region_contains_resource()
- Dropped scoped_guard around request_resource() and release_resource().
- Dropped patch 7. All deferred work infrastructure moved from bus.c into
  hmem.c
- Dropped enum dax_cxl_mode (DEFER/REGISTER/DROP) and replaced with bool
  dax_hmem_initial_probe in device.c (built-in, survives module reload).
- Changed from all-or-nothing to per-range ownership decisions. Each range
  decided individually — CXL keeps what it covers, HMEM gets the rest.
- Replaces single pass walk instead of 2 passes to exercise per range
  ownership.
- Moved wait_for_device_probe() before guard(device) to avoid lockdep
  warning (kreview, Gregory).
- Added guard(device) + driver bound check.
- Added get_device()/put_device() for pdev refcount.
- Added flush_work() in dax_hmem_exit() to ensure work completes before
  module unload.
- dax_hmem_flush_work() exported from dax_hmem.ko — symbol dependency
  forces dax_hmem to load before dax_cxl (Dan requirement 2).
- Added static inline no-op stub in bus.h for CONFIG_DEV_DAX_HMEM = n.
- Added work_pending() check (Dan requirement 3).
- pdev and work_struct initialized together on first probe, making
  singleton nature explicit. static struct and INIT_WORK once.
- Reverted back to container_of() in work function instead of global
  variables.
- No kill_defer_work() with the struct being static.

v6 updates:
- Patch 1-3 no changes.
- New Patches 4-5.
- (void *)res -> res.
- cxl_region_contains_soft_reserve -> region_contains_soft_reserve.
- New file include/cxl/cxl.h
- Introduced singleton workqueue.
- hmem to queue the work and cxl to flush.
- cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match().
- Included descriptions for dax_cxl_mode.
- kzalloc -> kmalloc in add_soft_reserve_into_iomem()
- dax_cxl_mode is exported to CXL.
- Introduced hmem_register_cxl_device() for walking only CXL
  intersected SR ranges the second time.

v5 updates:
- Patch 1 dropped as its been merged for-7.0/cxl-init.
- Added Reviewed-by tags.
- Shared dax_cxl_mode between dax/cxl.c and dax/hmem.c and used
  -EPROBE_DEFER to defer dax_cxl.
- CXL_REGION_F_AUTO check for resetting decoders.
- Teardown all CXL regions if any one CXL region doesn't fully contain
  the Soft Reserved range.
- Added helper cxl_region_contains_sr() to determine Soft Reserved
  ownership.
- bus_rescan_devices() to retry dax_cxl.
- Added guard(rwsem_read)(&cxl_rwsem.region).

v4 updates:
- No changes patches 1-3.
- New patches 4-7.
- handle_deferred_cxl() has been enhanced to handle case where CXL
  regions do not contiguously and fully cover Soft Reserved ranges.
- Support added to defer cxl_dax registration.
- Support added to teardown cxl regions.

v3 updates:
- Fixed two "From".

v2 updates:
- Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem
  depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian)
- Added TODO note. (Zhijian)
- Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE
  conditional check. (Zhijian)
- insert_resource_late() -> insert_resource_expand_to_fit() and
  __insert_resource_expand_to_fit() replacement. (Boris)
- Fixed Co-developed and Signed-off by. (Dan)
- Combined 2/6 and 3/6 into a single patch. (Zhijian).
- Skip local variable in remove_soft_reserved. (Jonathan)
- Drop kfree with __free(). (Jonathan)
- return 0 -> return dev_add_action_or_reset(host...) (Jonathan)
- Dropped 6/6.
- Reviewed-by tags (Dave, Jonathan)

Dan Williams (3):
  dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
    ranges
  dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
  dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding

Smita Koralahalli (6):
  dax/bus: Use dax_region_put() in alloc_dax_region() error path
  dax/hmem: Factor HMEM registration into __hmem_register_device()
  dax: Track all dax_region allocations under a global resource tree
  cxl/region: Add helper to check Soft Reserved containment by CXL
    regions
  dax/hmem, cxl: Defer and resolve Soft Reserved ownership
  dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree

 drivers/cxl/core/region.c |  30 ++++++++
 drivers/dax/Kconfig       |   2 +
 drivers/dax/Makefile      |   3 +-
 drivers/dax/bus.c         |  20 +++++-
 drivers/dax/bus.h         |   7 ++
 drivers/dax/cxl.c         |  28 +++++++-
 drivers/dax/hmem/device.c |   3 +
 drivers/dax/hmem/hmem.c   | 146 +++++++++++++++++++++++++++++++++-----
 include/cxl/cxl.h         |  15 ++++
 9 files changed, 231 insertions(+), 23 deletions(-)
 create mode 100644 include/cxl/cxl.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-03-25 12:12 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 19:53 [PATCH v8 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 1/9] dax/bus: Use dax_region_put() in alloc_dax_region() error path Smita Koralahalli
2026-03-23 17:11   ` Dave Jiang
2026-03-23 17:57   ` Jonathan Cameron
2026-03-23 19:37   ` Dan Williams
2026-03-22 19:53 ` [PATCH v8 2/9] dax/hmem: Factor HMEM registration into __hmem_register_device() Smita Koralahalli
2026-03-23 17:14   ` Dave Jiang
2026-03-23 17:59   ` Jonathan Cameron
2026-03-22 19:53 ` [PATCH v8 3/9] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-03-23 19:54   ` Dan Williams
2026-03-24  5:46     ` Koralahalli Channabasappa, Smita
2026-03-24 16:25       ` Dan Williams
2026-03-22 19:53 ` [PATCH v8 4/9] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 5/9] dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 6/9] dax: Track all dax_region allocations under a global resource tree Smita Koralahalli
2026-03-23 17:31   ` Dave Jiang
2026-03-23 20:55   ` Dan Williams
2026-03-22 19:53 ` [PATCH v8 7/9] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
2026-03-22 19:53 ` [PATCH v8 8/9] dax/hmem, cxl: Defer and resolve Soft Reserved ownership Smita Koralahalli
2026-03-23 18:03   ` Jonathan Cameron
2026-03-23 18:13   ` Jonathan Cameron
2026-03-24 21:50     ` Koralahalli Channabasappa, Smita
2026-03-25 12:12       ` Jonathan Cameron
2026-03-23 18:17   ` Dave Jiang
2026-03-22 19:53 ` [PATCH v8 9/9] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2026-03-23 21:09   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox