Linux CXL
 help / color / mirror / Atom feed
From: "Cheatham, Benjamin" <benjamin.cheatham@amd.com>
To: <dan.j.williams@intel.com>, <linux-cxl@vger.kernel.org>
Subject: Re: RFC: CXL Isolation Support
Date: Thu, 5 Feb 2026 14:49:12 -0600	[thread overview]
Message-ID: <75eb28c0-e696-470f-8cee-c47bae6ee15d@amd.com> (raw)
In-Reply-To: <69810708cf7df_55fa10055@dwillia2-mobl4.notmuch>

On 2/2/2026 2:20 PM, dan.j.williams@intel.com wrote:
> Cheatham, Benjamin wrote:
>> Quick Background:
>> CXL.mem isolation and timeout is a mechanism that allows the host to
>> continue operation in the event a CXL.mem link goes down or a CXL.mem
>> transaction times out (semi-analogous to PCIe DPC for CXL)[1]. After CXL.mem
>> isolation is triggered all CXL memory below the root port is inaccessible.
> 
> ...and this is unrecoverable in the generic memory expansion case as
> detailed previously [1].
> 
> [1]: http://lore.kernel.org/65cea1bc6ac0c_5e9bf294ed@dwillia2-xfh.jf.intel.com.notmuch
> 
>> At this point writes to the memory are dropped and reads return synchronous
>> exceptions (platform specific, but probably poisoned data). The alternative
>> to this support (which is the case now) is the host system resets when a
>> CXL.mem link goes down or a CXL.mem transaction timeouts out.
>>
>> Why I'm Sending This:
>> I sent out a patch series a few months back that implemented CXL.mem
>> error isolation to this list [2]. It didn't really gain traction due
>> to not having a customer requesting it. We (AMD) have heard from some
>> customers that they are interested in this support, but aren't willing to
>> help out upstream.
> 
> Then they get the status quo until that "interest" matures into shared
> requirements definition, clarification of assumptions, and consensus of
> tradeoffs.

Understood.

> 
>> The main motivation behind using isolation we've heard
>> is that customers would like to use CXL but are worried about system
>> reliability since it's still a new technology.
> 
> That does not appear prohibitive given CXL uptake to date. Isolation
> does not improve reliability on its own. It replaces hangs with poison
> that is fatal outside of constrained use cases.
> 
> Now, all of the push back to date has been with respect to the general
> purpose memory expansion use case. The way forward from there is new
> evidence that the expected mitigations to make isolation useful still
> result in a usable feature. The evidence of *that* is the new use case
> that Vikram proposed several months back in the CXL collaboration call,
> CXL Accelerator error recovery.
> 
> In that case there is a chance that the acclerator error model meets the
> requirements to make isolation useful. Guarantees like 1:1 host bridge
> to endpoint direct-attach, non-interleaved CXL.mem, and limited risk of
> core kernel dependencies on that CXL.mem.

That's reasonable.

> 
> I am interested in the isolation for CXL accelerator discussion. I am
> not interested in muddying through isolation for the general memory
> expander use case without engagement from deployment use cases.

I can't remember any internal discussions about using isolation for only accelerators so
I'll need to check and see if that's something we're interested in.

As for the memory expander case: would something like the N_PRIVATE node set Gregory sent out [1]
be enough to change your mind on this? It doesn't provide the same guarantees as a type 2 set
would, but it does limit the usage of CXL memory to be more like type 2 memory.

Regardless, I (really) won't bring it up again until we have someone who wants to deploy this thing.

Thanks,
Ben

[1]: https://lore.kernel.org/linux-mm/20260108203755.1163107-1-gourry@gourry.net/

  reply	other threads:[~2026-02-05 20:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 19:47 RFC: CXL Isolation Support Cheatham, Benjamin
2026-01-30 21:30 ` Gregory Price
2026-02-02 15:59   ` Jonathan Cameron
2026-02-02 16:50     ` Gregory Price
2026-02-02 17:31       ` Cheatham, Benjamin
2026-02-02 17:30   ` Cheatham, Benjamin
2026-02-02 19:52     ` Gregory Price
2026-02-02 15:52 ` Jonathan Cameron
2026-02-02 19:28 ` Vikram Sethi
2026-02-02 20:20 ` dan.j.williams
2026-02-05 20:49   ` Cheatham, Benjamin [this message]
2026-02-05 21:52     ` dan.j.williams
2026-02-05 22:54       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75eb28c0-e696-470f-8cee-c47bae6ee15d@amd.com \
    --to=benjamin.cheatham@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox