From: <dan.j.williams@intel.com>
To: Alison Schofield <alison.schofield@intel.com>,
<dan.j.williams@intel.com>
Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<nvdimm@lists.linux.dev>, <linux-fsdevel@vger.kernel.org>,
<linux-pm@vger.kernel.org>, Davidlohr Bueso <dave@stgolabs.net>,
"Jonathan Cameron" <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Len Brown <len.brown@intel.com>, Pavel Machek <pavel@kernel.org>,
Li Ming <ming.li@zohomail.com>,
Jeff Johnson <jeff.johnson@oss.qualcomm.com>,
Ying Huang <huang.ying.caritas@gmail.com>,
Yao Xingtao <yaoxt.fnst@fujitsu.com>,
Peter Zijlstra <peterz@infradead.org>,
Greg KH <gregkh@linuxfoundation.org>,
Nathan Fontenot <nathan.fontenot@amd.com>,
Terry Bowman <terry.bowman@amd.com>,
"Robert Richter" <rrichter@amd.com>,
Benjamin Cheatham <benjamin.cheatham@amd.com>,
PradeepVineshReddy Kodamati <PradeepVineshReddy.Kodamati@amd.com>,
Zhijian Li <lizhijian@fujitsu.com>
Subject: Re: [PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration
Date: Wed, 23 Jul 2025 00:34:02 -0700 [thread overview]
Message-ID: <6880906ac4e68_14c35610091@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <aIAwnACBeWindJ-s@aschofie-mobl2.lan>
Alison Schofield wrote:
> On Tue, Jul 22, 2025 at 02:04:00PM -0700, Dan Williams wrote:
> > Smita Koralahalli wrote:
> > > Refactor cxl_acpi_probe() to use a single exit path so that the fallback
> > > DAX registration can be scheduled regardless of probe success or failure.
> >
> > I do not understand why cxl_acpi needs to be responsible for this,
> > especially in the cxl_acpi_probe() failure path. Certainly if
> > cxl_acpi_probe() fails, that is a strong signal to give up on the CXL
> > subsystem altogether and fallback to DAX vanilla discovery exclusively.
> >
> > Now, maybe the need for this becomes clearer in follow-on patches.
> > However, I would have expected that DAX, which currently arranges for
> > CXL to load first would just flush CXL discovery, make a decision about
> > whether proceed with Soft Reserved, or not.
> >
> > Something like:
> >
> > DAX CXL
> > Scan CXL Windows. Fail on any window
> > parsing failures
> >
> > Launch a work item to flush PCI
> > discovery and give a reaonable amount of
> > time for cxl_pci and cxl_mem to quiesce
> >
> > <assumes CXL Windows are discovered
> > by virtue of initcall order or
> > MODULE_SOFTDEP("pre: cxl_acpi")>
> >
> > Calls a CXL flush routine to await probe
> > completion (will always be racy)
> >
> > Evaluates if all Soft Reserve has
> > cxl_region coverage
> >
> > if yes: skip publishing CXL intersecting
> > Soft Reserve range in iomem, let dax_cxl
> > attach to the cxl_region devices
> >
> > if no: decline the already published
> > cxl_dax_regions, notify cxl_acpi to
> > shutdown. Install Soft Reserved in iomem
> > and create dax_hmem devices for the
> > ranges per usual.
>
> This is super course. If CXL region driver sets up 99 regions with
> exact matching SR ranges and there are no CXL Windows with unused SR,
> then we have a YES!
>
> But if after those 99 successful assemblies, we get one errant window
> with a Soft Reserved for which a region never assembles, it's a hard NO.
> DAX declines, ie teardowns the 99 dax_regions and cxl_regions.
Exactly.
> Obviously, this is different from the current approach that aimed to
> pick up completely unused SRs and the trimmings from SRs that exceeded
> region size and offered them to DAX too.
>
> I'm cringing a bit at the fact that one bad apple (like a cxl device
> that doesn't show up for it's region) means no CXL devices get managed.
Think of the linear extended cache case where if you just look at the
endpoint CXL decoders you get half of the capacity and none of the
warning that the DDR side of the cache can not be managed. Consider the
other address translation error cases where endpoint decoders do not
tell the full story.
Now think of the contract of what the CXL driver offers. It says, "hey
if you change this configuration I have high confidence that I have
managed everything associated with this region throughout the whole
platform config", or "if an error happens anywhere within a CXL window
you can rely on my identification of the system components to suspect".
> Probably asking the obvious question here. This is what 'we' want,
> right?
So, in the end it is not a fair fight. It is a handful of Linux
developers vs multiple platform BIOS teams per vendor and per hardware
generation. The amount of Linux tripping, specification-stretching,
innovation is eye watering.
The downside of being drastic and coarse is worth the benefit.
The status quo is end users get nothing (stranded memory), or low
confidence partial configs with latent RAS problems.
The coarse policy is a spec compliant, safe fallback that gets end users
access to the memory the BIOS/firmware promised. It buys time for the
platform vendor to get Linux fixed, or even better, these Linux
coniptions catch problems earlier in platform qualification cycle.
The alternative fine grained policy hopes that the memory the CXL driver
gives up on was not critical or indicative of a deeper misunderstanding
of the platform. I do not want to debug hopeful guesses on top of
undefined breakage.
next prev parent reply other threads:[~2025-07-23 7:34 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 18:04 [PATCH v5 0/7] Add managed SOFT RESERVE resource handling Smita Koralahalli
2025-07-15 18:04 ` [PATCH v5 1/7] cxl/acpi: Refactor cxl_acpi_probe() to always schedule fallback DAX registration Smita Koralahalli
2025-07-22 21:04 ` dan.j.williams
2025-07-23 0:45 ` Alison Schofield
2025-07-23 7:34 ` dan.j.williams [this message]
2025-07-15 18:04 ` [PATCH v5 2/7] cxl/core: Rename suspend.c to probe_state.c and remove CONFIG_CXL_SUSPEND Smita Koralahalli
2025-07-22 21:44 ` dan.j.williams
2025-07-15 18:04 ` [PATCH v5 3/7] cxl/acpi: Add background worker to coordinate with cxl_mem probe completion Smita Koralahalli
2025-07-17 0:24 ` Dave Jiang
2025-07-23 7:31 ` dan.j.williams
2025-07-23 16:13 ` dan.j.williams
2025-08-05 3:58 ` Zhijian Li (Fujitsu)
2025-08-20 23:14 ` Alison Schofield
2025-08-21 2:30 ` Zhijian Li (Fujitsu)
2025-08-22 3:56 ` Koralahalli Channabasappa, Smita
2025-08-25 7:50 ` Zhijian Li (Fujitsu)
2025-08-27 6:30 ` Zhijian Li (Fujitsu)
2025-08-28 23:21 ` Koralahalli Channabasappa, Smita
2025-09-01 2:46 ` Zhijian Li (Fujitsu)
2025-07-29 15:48 ` Koralahalli Channabasappa, Smita
2025-07-30 16:09 ` dan.j.williams
2025-07-15 18:04 ` [PATCH v5 4/7] cxl/region: Introduce SOFT RESERVED resource removal on region teardown Smita Koralahalli
2025-07-17 0:42 ` Dave Jiang
2025-07-15 18:04 ` [PATCH v5 5/7] dax/hmem: Save the DAX HMEM platform device pointer Smita Koralahalli
2025-07-15 18:04 ` [PATCH v5 6/7] dax/hmem, cxl: Defer DAX consumption of SOFT RESERVED resources until after CXL region creation Smita Koralahalli
2025-07-15 18:04 ` [PATCH v5 7/7] dax/hmem: Preserve fallback SOFT RESERVED regions if DAX HMEM loads late Smita Koralahalli
2025-07-15 21:07 ` [PATCH v5 0/7] Add managed SOFT RESERVE resource handling Alison Schofield
2025-07-16 6:01 ` Koralahalli Channabasappa, Smita
2025-07-16 20:20 ` Alison Schofield
2025-07-16 21:29 ` Koralahalli Channabasappa, Smita
2025-07-16 23:48 ` Alison Schofield
2025-07-17 17:58 ` Koralahalli Channabasappa, Smita
2025-07-17 19:06 ` Dave Jiang
2025-07-17 23:20 ` Koralahalli Channabasappa, Smita
2025-07-17 23:30 ` Dave Jiang
2025-07-23 15:24 ` dan.j.williams
2025-07-21 7:38 ` Zhijian Li (Fujitsu)
2025-07-22 20:07 ` dan.j.williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6880906ac4e68_14c35610091@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=benjamin.cheatham@amd.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=gregkh@linuxfoundation.org \
--cc=huang.ying.caritas@gmail.com \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jeff.johnson@oss.qualcomm.com \
--cc=jonathan.cameron@huawei.com \
--cc=len.brown@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lizhijian@fujitsu.com \
--cc=ming.li@zohomail.com \
--cc=nathan.fontenot@amd.com \
--cc=nvdimm@lists.linux.dev \
--cc=pavel@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rrichter@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
--cc=yaoxt.fnst@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).