From: "Cheatham, Benjamin" <benjamin.cheatham@amd.com>
To: <dan.j.williams@intel.com>, <linux-cxl@vger.kernel.org>
Subject: Re: RFC: CXL Isolation Support
Date: Thu, 5 Feb 2026 14:49:12 -0600 [thread overview]
Message-ID: <75eb28c0-e696-470f-8cee-c47bae6ee15d@amd.com> (raw)
In-Reply-To: <69810708cf7df_55fa10055@dwillia2-mobl4.notmuch>
On 2/2/2026 2:20 PM, dan.j.williams@intel.com wrote:
> Cheatham, Benjamin wrote:
>> Quick Background:
>> CXL.mem isolation and timeout is a mechanism that allows the host to
>> continue operation in the event a CXL.mem link goes down or a CXL.mem
>> transaction times out (semi-analogous to PCIe DPC for CXL)[1]. After CXL.mem
>> isolation is triggered all CXL memory below the root port is inaccessible.
>
> ...and this is unrecoverable in the generic memory expansion case as
> detailed previously [1].
>
> [1]: http://lore.kernel.org/65cea1bc6ac0c_5e9bf294ed@dwillia2-xfh.jf.intel.com.notmuch
>
>> At this point writes to the memory are dropped and reads return synchronous
>> exceptions (platform specific, but probably poisoned data). The alternative
>> to this support (which is the case now) is the host system resets when a
>> CXL.mem link goes down or a CXL.mem transaction timeouts out.
>>
>> Why I'm Sending This:
>> I sent out a patch series a few months back that implemented CXL.mem
>> error isolation to this list [2]. It didn't really gain traction due
>> to not having a customer requesting it. We (AMD) have heard from some
>> customers that they are interested in this support, but aren't willing to
>> help out upstream.
>
> Then they get the status quo until that "interest" matures into shared
> requirements definition, clarification of assumptions, and consensus of
> tradeoffs.
Understood.
>
>> The main motivation behind using isolation we've heard
>> is that customers would like to use CXL but are worried about system
>> reliability since it's still a new technology.
>
> That does not appear prohibitive given CXL uptake to date. Isolation
> does not improve reliability on its own. It replaces hangs with poison
> that is fatal outside of constrained use cases.
>
> Now, all of the push back to date has been with respect to the general
> purpose memory expansion use case. The way forward from there is new
> evidence that the expected mitigations to make isolation useful still
> result in a usable feature. The evidence of *that* is the new use case
> that Vikram proposed several months back in the CXL collaboration call,
> CXL Accelerator error recovery.
>
> In that case there is a chance that the acclerator error model meets the
> requirements to make isolation useful. Guarantees like 1:1 host bridge
> to endpoint direct-attach, non-interleaved CXL.mem, and limited risk of
> core kernel dependencies on that CXL.mem.
That's reasonable.
>
> I am interested in the isolation for CXL accelerator discussion. I am
> not interested in muddying through isolation for the general memory
> expander use case without engagement from deployment use cases.
I can't remember any internal discussions about using isolation for only accelerators so
I'll need to check and see if that's something we're interested in.
As for the memory expander case: would something like the N_PRIVATE node set Gregory sent out [1]
be enough to change your mind on this? It doesn't provide the same guarantees as a type 2 set
would, but it does limit the usage of CXL memory to be more like type 2 memory.
Regardless, I (really) won't bring it up again until we have someone who wants to deploy this thing.
Thanks,
Ben
[1]: https://lore.kernel.org/linux-mm/20260108203755.1163107-1-gourry@gourry.net/
next prev parent reply other threads:[~2026-02-05 20:49 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 19:47 RFC: CXL Isolation Support Cheatham, Benjamin
2026-01-30 21:30 ` Gregory Price
2026-02-02 15:59 ` Jonathan Cameron
2026-02-02 16:50 ` Gregory Price
2026-02-02 17:31 ` Cheatham, Benjamin
2026-02-02 17:30 ` Cheatham, Benjamin
2026-02-02 19:52 ` Gregory Price
2026-02-02 15:52 ` Jonathan Cameron
2026-02-02 19:28 ` Vikram Sethi
2026-02-02 20:20 ` dan.j.williams
2026-02-05 20:49 ` Cheatham, Benjamin [this message]
2026-02-05 21:52 ` dan.j.williams
2026-02-05 22:54 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75eb28c0-e696-470f-8cee-c47bae6ee15d@amd.com \
--to=benjamin.cheatham@amd.com \
--cc=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox