public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] What's brewing in CXL?
@ 2026-04-08  2:35 Dan Williams
  0 siblings, 0 replies; only message in thread
From: Dan Williams @ 2026-04-08  2:35 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-cxl, linux-mm, linux-fsdevel

For the summit I offered to organize a discussion on what has been
happening in CXL since Plumbers and a summary of the monthly CXL calls
since. The session goal is present top challenges, address concerns /
questions from the room, and perhaps introduce the CXL topics that have
their own session later in the day.

Please do comment, question, ack/nak, or add to the themes below to
highlight interest and help focus on the few topics we can cover in the
limited time.

* CXL vs MM:

  * Much of the angst of "CXL vs MM" has been harnessed by Gregory in
    his Private Memory Nodes proposal [1]. The meta discussion for this
    session would be questions like "how many former device-dax use cases
    can be subsumed by a mechanism like this?". In general, the game here is
    how to properly isolate memory that does not behave like locally
    attached DRAM.

  * The granularity of CXL hotplug vs memory_blocks vs various
    distribution hotplug policies stimulated a proposal to support full
    "region" hotplug [2].
 
* CXL vs Platform Firmware (ACPI/EFI/BIOS):

  * Attempts to use software interleaving to amortize the surprise of
    "memory that does not behave like locally attached DRAM", introduce
    firmware dependencies. The firmware descriptions of the performance need
    to be complete and match a shared understanding of the requirements.
    Surprise, sometimes this does not line up. [3].

  * Firmware, in trying to be helpful to pre-CXL aware OSes, pre-map CXL
    memory (whether public generic expansion or private belonging to an
    accelerator) into the system address map. This causes problems for a
    subsystem that wants to support hot remove and re-assignment of
    host-bridge resources. Lack of a specified protocol and the resulting
    problems it causes for accelerator reset and driver reload, need more
    thought.
 
* CXL vs PCI:

  * While CXL capabilities are enumerated over PCI, to software it is an
    additional optional protocol. The subsystem supports being built as a
    module. This has led to a design for error handling that sees the PCI
    core minimally involved to forward events over kfifo. This arrangement
    is raising ongoing questions for uAPI like PCI reset and vfio_pci
    that expect to be able to manage device with PCI core services alone
    [4].

* CXL vs Tooling / RAS:

  * Error injection, tests, and usability have continued to improve. For
    folks looking to deploy CXL in production what is cxl-cli missing?

  * While CXL Error isolation has a hard time justifying its utility for
    general purpose memory expansion, the accelerator use case at least
    has a reasonable chance to construct a recoverable scenario.

[1]: https://lore.kernel.org/linux-cxl/abwRu1FNqI3dVyqL@gourry-fedora-PF4VCD3F/
[2]: https://lore.kernel.org/linux-cxl/20260321150404.3288786-6-gourry@gourry.net/
[3]: http://lore.kernel.org/20260316051258.246-1-rakie.kim@sk.com
[4]: http://lore.kernel.org/20260401143917.108413-1-mhonap@nvidia.com

"CXL, making MM problems worse since 2021..." -- CXL subsystem tagline


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-04-08  2:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08  2:35 [LSF/MM/BPF TOPIC] What's brewing in CXL? Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox