Re: [PATCH 0/6] cxl: Initialization reworks in support Soft Reserve Recovery and Accelerator Memory

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Alejandro Lucero Palau <alucerop@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, dave.jiang@intel.com
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	Smita.KoralahalliChannabasappa@amd.com,
	alison.schofield@intel.com, terry.bowman@amd.com,
	alejandro.lucero-palau@amd.com, linux-pci@vger.kernel.org,
	Jonathan.Cameron@huawei.com, Shiju Jose <shiju.jose@huawei.com>
Subject: Re: [PATCH 0/6] cxl: Initialization reworks in support Soft Reserve Recovery and Accelerator Memory
Date: Fri, 5 Dec 2025 15:15:16 +0000	[thread overview]
Message-ID: <143deecb-aa53-42e6-b7eb-91fb392e7502@amd.com> (raw)
In-Reply-To: <20251204022136.2573521-1-dan.j.williams@intel.com>


On 12/4/25 02:21, Dan Williams wrote:
> The CXL subsystem is modular. That modularity is a benefit for
> separation of concerns and testing. It is generally appropriate for this
> class of devices that support hotplug and can dynamically add a CXL
> personality alongside their PCI personality. However, a cost of modules
> is ambiguity about when devices (cxl_memdevs, cxl_ports, cxl_regions)
> have had a chance to attach to their corresponding drivers on
> @cxl_bus_type.
>
> This problem of not being able to reliably determine when a device has
> had a chance to attach to its driver vs still waiting for the module to
> load, is a common problem for the "Soft Reserve Recovery" [1], and
> "Accelerator Memory" [2] enabling efforts.
>
> For "Soft Reserve Recovery" it wants to use wait_for_device_probe() as a
> sync point for when CXL devices present at boot have had a chance to
> attach to the cxl_pci driver (generic CXL memory expansion class
> driver). That breaks down if wait_for_device_probe() only flushes PCI
> device probe, but not the cxl_mem_probe() of the cxl_memdev that
> cxl_pci_probe() creates.
>
> For "Accelerator Memory", the driver is not cxl_pci, but any potential
> PCI driver that wants to use the devm_cxl_add_memdev() ABI to attach to
> the CXL memory domain. Those drivers want to know if the CXL link is
> live end-to-end (from endpoint, through switches, to the host bridge)
> and CXL memory operations are enabled. If not, a CXL accelerator may be
> able to fall back to PCI-only operation. Similar to the "Soft Reserve
> Memory" it needs to know that the CXL subsystem had a chance to probe
> the ancestor topology of the device and let that driver make a
> synchronous decision about CXL operation.


IMO, this is not the problem with accelerators, because this can not be 
dynamically done, or not easily. The HW will support CXL or PCI, and if 
CXL mem is not enabled by the firmware, likely due to a 
negotiation/linking problem, the driver can keep going with CXL.io. Of 
course, this is from my experience with sfc driver/hardware. Note sfc 
driver added the check for CXL availability based on Terry's v13.


But this is useful for solving the problem of module removal which can 
leave the type2 driver without the base for doing any unwinding. Once a 
type2 uses code from those other cxl modules explicitly, the problem is 
avoided. You seem to have forgotten about this problem, what I think it 
is worth to describe.


>
> In support of those efforts:
>
> * Clean up some resource lifetime issues in the current code
> * Move some object creation symbols (devm_cxl_add_memdev() and
>    devm_cxl_add_endpoint()) into the cxl_mem.ko and cxl_port.ko objects.
>    Implicitly guarantee that cxl_mem_driver and cxl_port_driver have been
>    registered prior to any device objects being registered. This is
>    preferred over explicit open-coded request_module().
> * Use scoped-based-cleanup before adding more resource management in
>    devm_cxl_add_memdev()
> * Give an accelerator the opportunity to run setup operations in
>    cxl_mem_probe() so it can further probe if the CXL configuration matches
>    its needs.
>
> Some of these previously appeared on a branch as an RFC [3] and left
> "Soft Reserve Recovery" and "Accelerator Memory" to jockey for ordering.
> Instead, create a shared topic branch for both of those efforts to
> import. The main changes since that RFC are fixing a bug and reducing
> the amount of refactoring (which contributed to hiding the bug).
>
> [1]: http://lore.kernel.org/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com
> [2]: http://lore.kernel.org/20251119192236.2527305-1-alejandro.lucero-palau@amd.com
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order
>
> Dan Williams (6):
>    cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
>    cxl/mem: Arrange for always-synchronous memdev attach
>    cxl/port: Arrange for always synchronous endpoint attach
>    cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
>    cxl/mem: Drop @host argument to devm_cxl_add_memdev()
>    cxl/mem: Introduce a memdev creation ->probe() operation
>
>   drivers/cxl/Kconfig          |   2 +-
>   drivers/cxl/cxl.h            |   2 +
>   drivers/cxl/cxlmem.h         |  17 ++++--
>   drivers/cxl/core/edac.c      |  64 ++++++++++++---------
>   drivers/cxl/core/memdev.c    | 104 ++++++++++++++++++++++++-----------
>   drivers/cxl/mem.c            |  69 +++++++++--------------
>   drivers/cxl/pci.c            |   2 +-
>   drivers/cxl/port.c           |  40 ++++++++++++++
>   tools/testing/cxl/test/mem.c |   2 +-
>   9 files changed, 192 insertions(+), 110 deletions(-)
>
>
> base-commit: ea5514e300568cbe8f19431c3e424d4791db8291

next prev parent reply	other threads:[~2025-12-05 15:15 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04  2:21 [PATCH 0/6] cxl: Initialization reworks in support Soft Reserve Recovery and Accelerator Memory Dan Williams
2025-12-04  2:21 ` [PATCH 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
2025-12-04 16:48   ` Dave Jiang
2025-12-04 20:15     ` dan.j.williams
2025-12-04 19:09   ` Cheatham, Benjamin
2025-12-05  2:46   ` Alison Schofield
2025-12-08 14:19   ` Alejandro Lucero Palau
2025-12-15 21:11     ` dan.j.williams
2025-12-08 19:20   ` Shiju Jose
2025-12-15 12:00   ` Jonathan Cameron
2025-12-04  2:21 ` [PATCH 2/6] cxl/mem: Arrange for always-synchronous memdev attach Dan Williams
2025-12-04 16:58   ` Dave Jiang
2025-12-04 19:09   ` Cheatham, Benjamin
2025-12-05  2:49   ` Alison Schofield
2025-12-15 12:08   ` Jonathan Cameron
2025-12-04  2:21 ` [PATCH 3/6] cxl/port: Arrange for always synchronous endpoint attach Dan Williams
2025-12-04 18:36   ` Dave Jiang
2025-12-04 19:09   ` Cheatham, Benjamin
2025-12-05  3:36   ` Alison Schofield
2025-12-15 12:09   ` Jonathan Cameron
2025-12-04  2:21 ` [PATCH 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
2025-12-04 18:58   ` Dave Jiang
2025-12-04 19:09   ` Cheatham, Benjamin
2025-12-04 20:50     ` dan.j.williams
2025-12-05  3:37   ` Alison Schofield
2025-12-04  2:21 ` [PATCH 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev() Dan Williams
2025-12-04 19:09   ` Cheatham, Benjamin
2025-12-04 20:02   ` Dave Jiang
2025-12-05  3:38   ` Alison Schofield
2025-12-15 12:15   ` Jonathan Cameron
2025-12-04  2:21 ` [PATCH 6/6] cxl/mem: Introduce a memdev creation ->probe() operation Dan Williams
2025-12-04 19:10   ` Cheatham, Benjamin
2025-12-04 21:11     ` dan.j.williams
2025-12-04 22:02       ` dan.j.williams
2025-12-04 22:15         ` Cheatham, Benjamin
2025-12-04 20:03   ` Dave Jiang
2025-12-05 15:15 ` Alejandro Lucero Palau [this message]
2025-12-05 21:17   ` [PATCH 0/6] cxl: Initialization reworks in support Soft Reserve Recovery and Accelerator Memory dan.j.williams
2025-12-08 14:04     ` Alejandro Lucero Palau
2025-12-09  7:53       ` dan.j.williams
2025-12-08 17:04 ` Alejandro Lucero Palau
2025-12-15 23:29   ` dan.j.williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=143deecb-aa53-42e6-b7eb-91fb392e7502@amd.com \
    --to=alucerop@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=terry.bowman@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox