From: "Cheatham, Benjamin" <benjamin.cheatham@amd.com>
To: <dan.j.williams@intel.com>, <linux-cxl@vger.kernel.org>
Subject: Re: RFC: CXL Isolation Support
Date: Thu, 5 Feb 2026 14:49:12 -0600 [thread overview]
Message-ID: <75eb28c0-e696-470f-8cee-c47bae6ee15d@amd.com> (raw)
In-Reply-To: <69810708cf7df_55fa10055@dwillia2-mobl4.notmuch>
On 2/2/2026 2:20 PM, dan.j.williams@intel.com wrote:
> Cheatham, Benjamin wrote:
>> Quick Background:
>> CXL.mem isolation and timeout is a mechanism that allows the host to
>> continue operation in the event a CXL.mem link goes down or a CXL.mem
>> transaction times out (semi-analogous to PCIe DPC for CXL)[1]. After CXL.mem
>> isolation is triggered all CXL memory below the root port is inaccessible.
>
> ...and this is unrecoverable in the generic memory expansion case as
> detailed previously [1].
>
> [1]: http://lore.kernel.org/65cea1bc6ac0c_5e9bf294ed@dwillia2-xfh.jf.intel.com.notmuch
>
>> At this point writes to the memory are dropped and reads return synchronous
>> exceptions (platform specific, but probably poisoned data). The alternative
>> to this support (which is the case now) is the host system resets when a
>> CXL.mem link goes down or a CXL.mem transaction timeouts out.
>>
>> Why I'm Sending This:
>> I sent out a patch series a few months back that implemented CXL.mem
>> error isolation to this list [2]. It didn't really gain traction due
>> to not having a customer requesting it. We (AMD) have heard from some
>> customers that they are interested in this support, but aren't willing to
>> help out upstream.
>
> Then they get the status quo until that "interest" matures into shared
> requirements definition, clarification of assumptions, and consensus of
> tradeoffs.
Understood.
>
>> The main motivation behind using isolation we've heard
>> is that customers would like to use CXL but are worried about system
>> reliability since it's still a new technology.
>
> That does not appear prohibitive given CXL uptake to date. Isolation
> does not improve reliability on its own. It replaces hangs with poison
> that is fatal outside of constrained use cases.
>
> Now, all of the push back to date has been with respect to the general
> purpose memory expansion use case. The way forward from there is new
> evidence that the expected mitigations to make isolation useful still
> result in a usable feature. The evidence of *that* is the new use case
> that Vikram proposed several months back in the CXL collaboration call,
> CXL Accelerator error recovery.
>
> In that case there is a chance that the acclerator error model meets the
> requirements to make isolation useful. Guarantees like 1:1 host bridge
> to endpoint direct-attach, non-interleaved CXL.mem, and limited risk of
> core kernel dependencies on that CXL.mem.
That's reasonable.
>
> I am interested in the isolation for CXL accelerator discussion. I am
> not interested in muddying through isolation for the general memory
> expander use case without engagement from deployment use cases.
I can't remember any internal discussions about using isolation for only accelerators so
I'll need to check and see if that's something we're interested in.
As for the memory expander case: would something like the N_PRIVATE node set Gregory sent out [1]
be enough to change your mind on this? It doesn't provide the same guarantees as a type 2 set
would, but it does limit the usage of CXL memory to be more like type 2 memory.
Regardless, I (really) won't bring it up again until we have someone who wants to deploy this thing.
Thanks,
Ben
[1]: https://lore.kernel.org/linux-mm/20260108203755.1163107-1-gourry@gourry.net/
next prev parent reply other threads:[~2026-02-05 20:49 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 19:47 RFC: CXL Isolation Support Cheatham, Benjamin
2026-01-30 21:30 ` Gregory Price
2026-02-02 15:59 ` Jonathan Cameron
2026-02-02 16:50 ` Gregory Price
2026-02-02 17:31 ` Cheatham, Benjamin
2026-02-02 17:30 ` Cheatham, Benjamin
2026-02-02 19:52 ` Gregory Price
2026-02-02 15:52 ` Jonathan Cameron
2026-02-02 19:28 ` Vikram Sethi
2026-02-02 20:20 ` dan.j.williams
2026-02-05 20:49 ` Cheatham, Benjamin [this message]
2026-02-05 21:52 ` dan.j.williams
2026-02-05 22:54 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75eb28c0-e696-470f-8cee-c47bae6ee15d@amd.com \
--to=benjamin.cheatham@amd.com \
--cc=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.