Linux CXL
 help / color / mirror / Atom feed
* RFC: CXL Isolation Support
@ 2026-01-30 19:47 Cheatham, Benjamin
  2026-01-30 21:30 ` Gregory Price
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Cheatham, Benjamin @ 2026-01-30 19:47 UTC (permalink / raw)
  To: linux-cxl; +Cc: benjamin.cheatham

Quick Background:
CXL.mem isolation and timeout is a mechanism that allows the host to
continue operation in the event a CXL.mem link goes down or a CXL.mem
transaction times out (semi-analogous to PCIe DPC for CXL)[1]. After CXL.mem
isolation is triggered all CXL memory below the root port is inaccessible.
At this point writes to the memory are dropped and reads return synchronous
exceptions (platform specific, but probably poisoned data). The alternative
to this support (which is the case now) is the host system resets when a
CXL.mem link goes down or a CXL.mem transaction timeouts out.

Why I'm Sending This:
I sent out a patch series a few months back that implemented CXL.mem
error isolation to this list [2]. It didn't really gain traction due
to not having a customer requesting it. We (AMD) have heard from some
customers that they are interested in this support, but aren't willing to
help out upstream. The main motivation behind using isolation we've heard
is that customers would like to use CXL but are worried about system
reliability since it's still a new technology.

My main goal here is to gauge whether we're wasting our time here on trying
to push this upstream. With that being said, here's some info on the technical
hurdles with implementing this feature:

Technical Details
=================

1. CXL memory may be used for kernel allocations

Kernel allocations in CXL aren't a problem at the moment because if the CXL.mem
link goes down the hardware resets. When isolation is enabled, this isn't the case.
The kernel can keep chugging along until it eventually errors out trying to access
the now inaccessible memory (possibly causing data corruption until then).

In my v1 submission, I opted to just panic the system when isolation occurs when
any CXL driver couldn't handle the event. The handler for type 3 devices
(cxl_pci driver) did some RAS stuff and then panic'ed the system. I think an isolation
handler for CXL device drivers will probably be part of a final solution, but the
handler in that series was hamstrung by allowing CXL memory into the system ram pool.

Possible Solutions
------------------
Keeping CXL memory out of sysram is doable today; it just requires a combination
of udev rules, tooling, and kernel configuration options. The flow (afaik) is to:
	1) Configure the kernel to not automatically online hotplugged memory
	2) Add a udev rule to remap CXL-backed DAX devices from sysram mode to devdax mode
	when added to the system

Gregory Price has submitted a set that changes the second part of this flow to instead
use sysfs [3]. With that, the need for udev rules is removed and users (or their tooling)
can set their CXL memory to devdax mode before it's added to the system. At that point,
all that needs to be done is restrict enabling isolation to CXL devices in devdax-mapped
regions and make sure the memory mode doesn't change (i.e. devdax -> sysram).

2.PCIe Portdrv Dependency

CXL isolation interrupts are delivered using an MSI/-X interrupt, with the specific
vector being in the MMIO-space isolation capability register (CXL 3.2 8.2.4.24.1). This is
a problem because the PCIe portdrv is in charge of setting up MSI/-X interrupts,
but to map the isolation vector it needs the CXL register mapping code. Since
the portdrv is only available as built-in to the kernel, using isolation would require
restricting at least the cxl_core module to built-in.

Possible Solutions
------------------
There's a couple of things we could do here. First is to restrict isolation to when
the CXL core is built-in (CXL_BUS=y && depends on PCIEPORTBUS). I'm not particularly
happy about this approach since it removes the modularity of the CXL driver(s), but I
won't gripe if that's what's settled on.

Another approach would be to move the CXL register mapping code in cxl/core/regs.c to a
library, or always make the file built-in when CXL_BUS is selected. This is more palatable
(imo) but splits the CXL code up in a potentially weird way.

Last one is to rework the PCIe port bus driver to allow for re-allocating MSI/-X interrupts.
Jonathan Cameron sent out a series where there was some discussion on this. This support
would be limited to MSI-X interrupts only due to the PCI maintainers not wanting to add
more support for MSI [4]. This wouldn't work for AMD platforms because we use MSI interrupts
for this support. There is still a way to make this work, however. AMD server platforms
use the same MSI vector for all PCIe interrupts, so we could introduce a quirk to use that
same vector as another PCIe interrupt for CXL isolation. That would require no register mapping
code in the PCIe portdrv code but would introduce a platform quirk instead. I doubt anyone would
be happy about introducing a quirk but I thought I'd throw it out as an option.

Thanks for reading,
Ben

Footnotes
=========
[1]: CXL 3.2 spec, section 12.3 "Isolation on CXL.cache and CXL.mem"
[2]: https://lore.kernel.org/linux-cxl/20250730214718.10679-1-Benjamin.Cheatham@amd.com/
[3]: https://lore.kernel.org/linux-cxl/20260129210442.3951412-1-gourry@gourry.net/
[4]: https://lore.kernel.org/linux-pci/87plpsbbe5.ffs@tglx/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-02-05 22:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-30 19:47 RFC: CXL Isolation Support Cheatham, Benjamin
2026-01-30 21:30 ` Gregory Price
2026-02-02 15:59   ` Jonathan Cameron
2026-02-02 16:50     ` Gregory Price
2026-02-02 17:31       ` Cheatham, Benjamin
2026-02-02 17:30   ` Cheatham, Benjamin
2026-02-02 19:52     ` Gregory Price
2026-02-02 15:52 ` Jonathan Cameron
2026-02-02 19:28 ` Vikram Sethi
2026-02-02 20:20 ` dan.j.williams
2026-02-05 20:49   ` Cheatham, Benjamin
2026-02-05 21:52     ` dan.j.williams
2026-02-05 22:54       ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox