[PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: dave.jiang@intel.com
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	Smita.KoralahalliChannabasappa@amd.com,
	alison.schofield@intel.com, terry.bowman@amd.com,
	alejandro.lucero-palau@amd.com, linux-pci@vger.kernel.org,
	Jonathan.Cameron@huawei.com
Subject: [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory
Date: Mon, 15 Dec 2025 16:56:10 -0800	[thread overview]
Message-ID: <20251216005616.3090129-1-dan.j.williams@intel.com> (raw)

Changes since v1 [1]:
- Introduce cxl_memdev_autoremove() one patch earlier in the series to
  fix no_free_ptr() induced NULL pointer de-reference bisect bug. (Ben)
- Drop returning a custom error code via @cxlmd->endpoint that can be
  added back by the accelerator enabling series if needed. (Ben)
- Document how providing @attach changes the semantics of
  devm_cxl_add_memdev(). (Ben)
- Rename @ops to @attach
- Add support to release an accelerator driver when the logical CXL link
  drops.
- Pick up test and review tags from Alison, Ben, Jonathan, Dave, and
  Alejandro.

[1]: http://lore.kernel.org/20251204022136.2573521-1-dan.j.williams@intel.com

Original Cover:
===============

The CXL subsystem is modular. That modularity is a benefit for
separation of concerns and testing. It is generally appropriate for this
class of devices that support hotplug and can dynamically add a CXL
personality alongside their PCI personality. However, a cost of modules
is ambiguity about when devices (cxl_memdevs, cxl_ports, cxl_regions)
have had a chance to attach to their corresponding drivers on
@cxl_bus_type.

This problem of not being able to reliably determine when a device has
had a chance to attach to its driver vs still waiting for the module to
load, is a common problem for the "Soft Reserve Recovery" [2], and
"Accelerator Memory" [3] enabling efforts.

For "Soft Reserve Recovery" it wants to use wait_for_device_probe() as a
sync point for when CXL devices present at boot have had a chance to
attach to the cxl_pci driver (generic CXL memory expansion class
driver). That breaks down if wait_for_device_probe() only flushes PCI
device probe, but not the cxl_mem_probe() of the cxl_memdev that
cxl_pci_probe() creates.

For "Accelerator Memory", the driver is not cxl_pci, but any potential
PCI driver that wants to use the devm_cxl_add_memdev() ABI to attach to
the CXL memory domain. Those drivers want to know if the CXL link is
live end-to-end (from endpoint, through switches, to the host bridge)
and CXL memory operations are enabled. If not, a CXL accelerator may be
able to fall back to PCI-only operation. Similar to the "Soft Reserve
Memory" it needs to know that the CXL subsystem had a chance to probe
the ancestor topology of the device and let that driver make a
synchronous decision about CXL operation.

In support of those efforts:

* Clean up some resource lifetime issues in the current code
* Move some object creation symbols (devm_cxl_add_memdev() and
  devm_cxl_add_endpoint()) into the cxl_mem.ko and cxl_port.ko objects.
  Implicitly guarantee that cxl_mem_driver and cxl_port_driver have been
  registered prior to any device objects being registered. This is
  preferred over explicit open-coded request_module().
* Use scoped-based-cleanup before adding more resource management in
  devm_cxl_add_memdev()
* Give an accelerator the opportunity to run setup operations in
  cxl_mem_probe() so it can further probe if the CXL configuration matches
  its needs.

Some of these previously appeared on a branch as an RFC [4] and left
"Soft Reserve Recovery" and "Accelerator Memory" to jockey for ordering.
Instead, create a shared topic branch for both of those efforts to
import. The main changes since that RFC are fixing a bug and reducing
the amount of refactoring (which contributed to hiding the bug).

[2]: http://lore.kernel.org/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com
[3]: http://lore.kernel.org/20251119192236.2527305-1-alejandro.lucero-palau@amd.com
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order

Dan Williams (6):
  cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
  cxl/mem: Arrange for always-synchronous memdev attach
  cxl/port: Arrange for always synchronous endpoint attach
  cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
  cxl/mem: Drop @host argument to devm_cxl_add_memdev()
  cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation

 drivers/cxl/Kconfig          |   2 +-
 drivers/cxl/cxl.h            |   2 +
 drivers/cxl/cxlmem.h         |  17 ++++--
 drivers/cxl/core/edac.c      |  64 +++++++++++---------
 drivers/cxl/core/memdev.c    | 109 +++++++++++++++++++++++++----------
 drivers/cxl/mem.c            |  73 ++++++++++-------------
 drivers/cxl/pci.c            |   2 +-
 drivers/cxl/port.c           |  40 +++++++++++++
 tools/testing/cxl/test/mem.c |   2 +-
 9 files changed, 200 insertions(+), 111 deletions(-)

base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
-- 
2.51.1

next             reply	other threads:[~2025-12-16  0:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  0:56 Dan Williams [this message]
2025-12-16  0:56 ` [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
2025-12-16  0:56 ` [PATCH v2 2/6] cxl/mem: Arrange for always-synchronous memdev attach Dan Williams
2025-12-16  0:56 ` [PATCH v2 3/6] cxl/port: Arrange for always synchronous endpoint attach Dan Williams
2025-12-16  0:56 ` [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
2025-12-17 15:48   ` Jonathan Cameron
2025-12-16  0:56 ` [PATCH v2 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev() Dan Williams
2025-12-16  0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
2025-12-17 15:57   ` Jonathan Cameron
2025-12-17 16:27     ` dan.j.williams
2025-12-18 14:46       ` Jonathan Cameron
2025-12-18 19:50         ` dan.j.williams
2025-12-19 10:26           ` Jonathan Cameron
2025-12-20  7:31   ` Alejandro Lucero Palau
2026-02-10  5:19   ` Gregory Price
2026-02-10 15:05     ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251216005616.3090129-1-dan.j.williams@intel.com \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=terry.bowman@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox